FNet using the Fourier Transform to mix tokens

SeanOConnor · May 23, 2021, 8:33am

There is a paper about FNet where they use the Fourier Transform to mix tokens. Using the property that a change in one input element alters all the output elements.
https://youtu.be/JJR3pBl78zw
The fast Walsh Hadamard transform (WHT) can effect the same thing. Which is not too surprising since the WHT is just the FFT with the multiplies removed!!!

SeanOConnor · May 23, 2021, 8:42am

An ultra short example of using the fast Walsh Hadamard transform to convert the uniform random distribution to the Normal distribution:
https://editor.p5js.org/congchuatocmaydangyeu7/sketches/RJSJna98l
There are some slight technical points about that for high precision applications.

SeanOConnor · May 29, 2021, 8:00am

If you are a connectionist, you should use fast transforms to do the connecting for you. They are very efficient at it.
The main problem is fast transforms take a spectrum, which is a very biased thing to do. That is not connecting everything in a fair way. However a little bit of cheap preprocessing can solve that.

Just for amusement.
The underlying Sequency patterns of the Walsh Hadamard Transform in natural order (the sequency is the number of transitions between +1 to -1, and -1 to +1):
https://editor.p5js.org/congchuatocmaydangyeu7/sketches/SlK3cuD-W
Then a Walsh Hadamard transform is just how much of each pattern is embedded in say an input image. And that information is complete and invertible. Conceptually a lot simpler than a Fourier transform.

I will say this again about ReLU.
ReLU is a switch. f(x)=x is connect, f(x)=0 is disconnect. A light switch in your house is binary on off, yet connects and disconnect a continuously variable AC voltage signal. In your house you decide the switch state (on or off). In a ReLU the switch state is decided by a predicate (x<0.)
Then a ReLU neural network is a switched composition of weighted sums. The outputs of weighted sums are disconnected from the inputs to other weighted sums, or remain connected.
For a particular input vector all the switch states become known during feedforward. The network then is a particular (switched) composition of weighted sums connecting back to the input vector.
Since that is an entirely linear system (due to the switch states being known) it can be simplified. Down to a simple square matrix mapping the input vector to the output vector.
That viewpoint is so unexpected to some people it can actually prompt not very good behavior.
To me it is a way of getting a handle on the mathematics of a ReLU neual network. However I mainly use it to understand the behavior of an unconventional system.

SeanOConnor · May 29, 2021, 10:11am

I will clarify even more.
When you graph out ReLU there is a 45 degree line when the input x is greater than zero.
If you graph out the behavior of an electrical switch in the on position, zero volts in gives 0 volts out, 1 volt in gives 1 volt out, 2 volts in gives 2 volts out. That gives a 45 degree line. Obviously if the electrical switch is off you get zero volts out.

I believe the brain melt down I have caused in some senior researchers is due to the their belief that turning on a switch should result in a Heaviside step function type response:
https://en.wikipedia.org/wiki/Heaviside_step_function
That would only be the case it there were a fixed non-zero voltage on the input to an electrical switch.

The issue for me is, it is making it very difficult for me to explain an unconventional neural network of my devising to people.

Topic		Replies	Views
Fast Transform neural networks trained by evolution and BP Lounge	10	1340	July 12, 2020
Fixed Filter Bank Neural Networks Machine Learning	9	1785	September 27, 2019
Compression/Weight Sharing WHT Neural Network Idea Lounge	0	382	January 21, 2019
Weak learners and fast transforms Lounge	3	119	December 9, 2024
Walsh Hadamard Transform on KDNuggets Lounge	1	379	July 30, 2021

FNet using the Fourier Transform to mix tokens

Related topics