Fast Transform neural networks trained by evolution and BP

Fast Transform (aka. Fixed Filter Bank) neural networks trained by evolution and by backpropagation.

Evolution: https://s6regen.github.io/Fast-Transform-Neural-Network-Evolution/

Backpropagation: https://s6regen.github.io/Fast-Transform-Neural-Network-Backpropagation/

A fast transform (Walsh Hadamard Transform) is used as a fully connected set of fixed weights which, if used a conventional neural network activation function would give you a real but completely nonadjustable neural network layer.
Something must bend. What you can do is use individually adjustable parametric activation functions.
With switching based parametric activation functions the resulting fast transform neural network is very similar to a ReLU based network, though faster to compute and using fewer parameters per layer.

Information:

1 Like

What is the commonality between so called fast transform neural networks and conventional ReLU neural networks? ReLU can be viewed as single pole single throw switch, the activation function in the fast transform net can be viewed as single pole double throw switch. In either case they are connecting and disconnecting different (or potentially different) dot products. In one case the dot products (weighted sums) are internally fully adjustable, in the other only the magnitudes of a set of orthogonal dot products are adjustable. If might seem obvious which one to choose but I wouldn’t be so hasty because of the ‘orthogonal’ part and how variously scaled versions of one set of orthogonal dot products will interact when processed by further orthogonal dot products.
In any event for a particular input the state of each switch becomes known.
Each neuron in the net receives some particular composition of dot product of switched (connect or disconnected) dot products of switched dot products…
However the dot product of a number of dot products can be condensed right back down to a single simple dot product of the input vector.
In particular the value of each output neuron can be view as that of a single dot product of the input vector. The entire output vector a matrix of such dot products.
For both types of neural network a particular input causes the switches in the system to be thrown decidedly one way or the other, inducing a particular matrix, and a matrix multiply mapping of the input vector to the output vector.
Then what they have in common is some kind of dancing matrix operation, like Proteus changing shape on the beach.

What if most of the synapses in the brain where fixed and random then the thing would be a vast 3d fast transform random projection. Only a small percentage of synapses/neurons would need to act as activation functions within that matrix of fast transform random projections. That would make learning more efficient in the sense that far fewer parameters would need to be adjusted to get a particular wanted behavior or response.
I’m not saying that is how it is, I’m just putting it forward as an idea.

It does intuitively sound like the expressiveness of having large numbers of adjustable parameters in a conventional artificial neural network would mean a fast transform network was more or less useless. However if you frame the issue in terms of the number of parameters per switch, the number of parameters needed to make a switching decision then things appear in a decidedly different light. It is the switching decisions which actually fit changes in curves, the more per parameter the better all other things being equal.
I don’t see any problem with the biological brain constructing quite efficient random projections, it should actually be able to do that far better than a digital computer. With a neural fan-in fan-out of more than 1000 then 2 layers of neurons would be able to randomly spread out a change in 1 input over 1,000,000 neurons. Then say 1 million parameterized switches, leading into the next random projection. That is better than 1 million neurons which would need 1 million parameters each in a conventional artificial neural network layer.

1 Like

So if you had a conventional ReLU network of width 1 million, then 1 million weight parameters are needed per neuron for it to fully connect back to the previous layer.
And for those 1 million parameters you only get one bend in the fitting curve, basically. Of course you can go into more details because there are multiple layers, but it seems you don’t get much for so many parameters.
I think I highlighted this video on the ‘breakpoints’ of an ReLU neural network before: https://youtu.be/QEWe-aRBUAs

There is this paper from 2014 when compressive sensing was a thing:

https://arxiv.org/abs/1411.5383

They suggest it is possible for the biological brain to compute random projections.
I don’t think it is necessary to have strictly separate parametric switching layers to create a fast transform neural network out of that. Parametric switching could be interspersed in the random projection calculating system to yield efficient dot product decision making and recombination.

Is there any biological justification at all for the predominant artificial neural network circuit arrangement? Or was it simply plucked out of the air by someone in the 1950s?

1 Like

I am not aware of any reason to think the brain does random projections.

If I understand this correctly, it copies inputs and attempts to recall this input when presented with similar inputs. If it is unable to recall the prior input it learns the delta between recall and sensation.

As much as it annoys some people:

1 Like

I agree there is no reason to think the biological brain uses random projections as part of its main form of processing. Maybe evolution missed an opportunity or found something better. There is some wiring in insect brains where random projections appear to be used for olfactory discrimination.

In terms of artificial neural network somewhere along the line there was a jump from the single layer perceptron to the multilayer perceptron.
An assumption being that no great inefficiency is caused by stacking perceptron layers and indeed what else would you do?

1 Like

Thinking about the types of “easy” development/construction methods, duplicating layers and maps are the kind of things that the DNA decoding/expression system use to make neural systems. Evolution is very opportunistic. The functional stacking was sure to be tried and clearly, nature found it to be useful enough to keep doing it in a big way.

1 Like