Fixed Filter Bank Neural Networks

SeanOConnor · July 31, 2019, 10:39pm

Adjust the nonlinear functions rather than the filter bank:
https://github.com/S6Regen/Fixed-Filter-Bank-Neural-Networks

You can use the fast Walsh Hadamard transform (WHT), the FFT or other transforms for the fixed filter bank. The fast WHT does not intrinsically require multiply circuits and could be very power efficient on specialized hardware.

SeanOConnor · August 28, 2019, 4:15am

I wrote this as a comment on twitter (ie. will be deleted in a day or 2 there):

"A weighted sum has as an input a vector with length and an angular distance between the input vector and weight vector. Which fully determine the response.
The problem is when you have n weighted sums (operating on a common input vector) which are not orthogonal to each other, the outputs are highly entangled. There are linear correlations between the outputs. They are poorly separated. And you are paying a large computational price of n squared fused multiply accumulates or equivalent.

If the weight vectors are orthogonal to each other then they can basically be replaced by far more efficient orthogonal transforms such as the Walsh Hadamard transform. At a cost of n*log_base_2(n) add subtract ops. and n multiplies per layer, for both transform and parameterized nonlinear functions.

An alternative way to break the linear correlations between the outputs of the weighted sums is to calculated a different vector to vector random projection followed by nonlinearity as input to each weighted sum.

Anyway that is quite a strong indication that the weight parameters in conventional neural networks are being used inefficiently. "

SeanOConnor · September 15, 2019, 12:37pm

A conventional artificial neural network with ReLU activation function effects a linear projection from a particular input to the output, and for all inputs within a particular local domain of that input. Ie. The particular linear mapping persists until a switching of some ReLU function occurs in the network.
Exactly the same circumstances pertain in the fixed filter bank neural network except it is more efficiently organized.
It is perfectly possible you could find an even more efficient algorithm. For example by replacing the Walsh Hadamard transform with the hStep() function in the out of place Walsh Hadamard transform:

I’ll look into it even if no one else will !!!

SeanOConnor · September 19, 2019, 3:38pm

I presume and sincerely hope it has been noted before that a ReLU neural network is a system of switched linear projections. Then you are left with the extraordinary observation that each output element of such a neural network is just a weighted sum of the inputs. And that weighted sum can serve as a local approximation of how the output should change for minor changes of the input. Of course non-locally the weights change to reflect the switch in linear projection.

Please tell me I am wrong or the matter is well known and noted somewhere!!

Otherwise one may feel a certain despair about the haphazard meanderings of very well paid neural network researchers.

SeanOConnor · September 19, 2019, 9:55pm

A useful metric then would be the angle between the input to the weighted sum (derived from the linear projection in effect) and the weight vector itself.
If the angle was zero then the central limit theorem would be fully in play and there would be strong noise cancellation.
If the angle were closer to 90 degrees the output of the weighted sum would be very sensitive to noise in the inputs and even in the best case likely to produce only an approximation to an exact optimal value. That is because approaching 90 degrees the length of the weight vector has to be very large to get any kind of output.
I guess there are a lot of other math insights you can extract should you wish to.

SeanOConnor · September 25, 2019, 10:46pm

Some people don’t get how ReLU is a switch. I had to explain a little around the place:
“When a switch is on 1 volt in gives 1 volt out, n volts in gives n volts out.
When it is off you get zero volts out.
You can understand ReLU as an on off switch with its own decision making policy.
The weighted sum of a number of weighted sums is still a linear system.
For a particular input with a ReLU neural network the switches (ReLUs) are all in definite states, on or off. The result is a particular arrangement of weighted sums of weighted sums of…
There is a particular linear projection from a particular input to the output and for inputs in the neighborhood of the input that do not result in any switch changing state.
A ReLU neural network then is a system of switched linear projections.
Since switching happens at zero there are no sudden discontinuities in the output as the input gradually changes.
For a particular input and a particular output neuron the output is a weighted sum of weighted sums of… of the input. This can be converted to a single weighted sum.
You can examine that single weighted sum to see what the networks is looking at in the input. Or you can calculate some metrics like the angle between the input vector and the weight vector of the single weighted sum.”

Bitking · September 25, 2019, 10:54pm

How do I apply this to HTM networks?

SeanOConnor · September 26, 2019, 1:12pm

That’s a difficult question because ReLU combines hard switching with linear behavior in such a way that you avoid discontinuous abrupt changes in output for gradual changes in input.
Could you do something similar with firing rate as a linearly variable quantity and the trigger point of a binary neuron as a sort of ReLU.
Below a certain firing rate input, the neuron produces zero output, above a certain firing rate it itself starts firing in a linearly corresponding rate.
Then I suppose you could have switched linear projection behavior in biological brains. With HTM I’m not too sure.

Bitking · September 26, 2019, 8:35pm

HTM is based fixed time steps and is not rate coded; I do not see how what you offered is useful in an HTM context.

Let me explain why is see it that way: HTM seems to convey information by spatial coding in a temporal stream in a sparse fashion.

The overlap calculation for similarity is a simple tally - it is more of a set membership or at least a statistical sense than in a sense of a directed vector

This does not come together is such a way that there is ever any sense of a Eigenvector as is commonly used in calculating distance in point neuron models. I feel that it would be a significant theoretical breakthrough if you were able to do that.

This is why I was asking if you saw some way that this was relevant to HTM and I was just missing it. Can you add anything here?

SeanOConnor · September 27, 2019, 6:20am

Well really I don’t know. A tally is a sum anyway. It also would be a bit demanding for biological neurons to implement switched linear projections. Whatever biology implements, it has to be very robust.

Anyway some sort of conversation about matters went on here:https://github.com/max-andr/relu_networks_overconfident/issues/2

To which I would add or did add:
"Only 1 non-linearity per N weights in conventional artificial neural networks !!!

In a conventional artificial neural networks there are n weighed sums operating on a single common vector of non-linear terms (the output of the previous layer.)

One obvious problem is it takes n^2 operations to process the n weighted sums. Ie. A lot.

A less obvious problem is there are n weight parameters for each non-linear term.
The greater the number of non-linear terms the greater the ability to separate inputs into different decision regions and the non-linear terms also reduce correlations.
In fact, if the weight vectors in each weighted sum in a layer in a conventional neural network are not orthogonal there will be correlations between the outputs of the weighted sums.

Using varied random projections of a common vector followed by application of non-linear functions it is possible to give each weight parameter its own non-linear term. There is one weight parameter per non-linear term.
"

That got me thinking about the weight efficiency of ReLU versus switch slope at zero (f(x)=a.x x>=0, f(x)=b.x x<0) in fixed filter bank neural networks or similar.
I think ReLU would win out on the efficiency basis because you are getting more Independence per weight. However you lose the possibility of getting free ResNet like transport of information to where it is needed in the network that is possible with switched slope.
The system, while designing itself can decide to pass information straight through (by setting a=b=1).
With ReLU an axe must sometimes fall and information must fail to get through about half the time. ReLU would also likely work a mischief with the behavior of the filter bank.

Topic		Replies	Views
Fast Transform neural networks trained by evolution and BP Lounge	10	1336	July 12, 2020
FNet using the Fourier Transform to mix tokens Machine Learning	3	935	May 29, 2021
Compression/Weight Sharing WHT Neural Network Idea Lounge	0	382	January 21, 2019
Walsh Hadamard Transform on KDNuggets Lounge	1	377	July 30, 2021
Double weighting for neural networks Lounge	0	462	November 9, 2017

Fixed Filter Bank Neural Networks

Related topics