Polynomial Regression As an Alternative to Neural Nets.
I don’t know if “alternative” is the word since they conjecture an equivalence between polynomial regression and deep neural networks in this paper:
There are lots of insights and plenty to think about.
You could partially interpret their system as a form of extreme learning machine.
Anyway the more viewpoints on deep neural networks the better. There are quite a few now.
It also suggests (to me at least) that having each layer in a deep neural network have some weight connections back to the input is a good idea and also having a linear readout layer taking in the entirety of the layers in the network should allow it to “design” its own depth complexity.
There was however another paper recently that suggested that you can simply go ahead and have a conventional feed forward net with 1000’s of layers and that is all fine!
Ultimately though these are very simple systems and someone will get a proper handle on them in the next few years.
In this case you end up closer to what the nodes in the subcortical structures are doing.
In many ways I like to think of these as closer to a Boltzmann/Hopfield network than cortex.
I think I see what is going on in that paper. Even though random projection is not used the large number of polynomial terms results in extreme learning machine/ associative memory type behavior. That seems to happen inadvertently quite a lot these days, when people discover alternatives to BP. What they are actually doing is creating associative memory systems with a large excess of parameters resulting in repetition code error correction and if they use multiple layers (as in direct feedback alignment) they are getting a reduction in crosstalk between memories. Allowing nearby examples to be more sharply separated.
This is causing a lot of confusion.
Then the simplest mode of operation in a neural network is the associative memory (AM) mode. With only a single layer that can have a major problem setting the decision boundary between 2 nearby inputs, resulting in crosstalk. With minor disturbances resulting in misclassification due to the decision boundaries not being in the optimal places. Adding more layers in AM mode can help space out the decision boundaries more correctly. This mode requires a massive number of weight parameters.
Reducing the number of parameters and using the weak optimizer back-propagation (BP) then the system can start getting a bit smarter about where to place decision boundaries. There still has to be enough parameters so that only saddle points exist in the network and there are few if any actual local minimum for BP to get stuck in while training. Anyway after training it seems you can prune out many redundant weight parameters.
Training with evolutionary algorithms maybe you can get an even a smarter network with internal logic and linkage of concepts.
Anyway there are a lot of papers emerging where neural networks are viewed as systems of differential equations. Such systems are well known to have attractors states and bifurcations. Corresponding to decision regions and decision boundaries.
Interesting read. As Judea Pearl interprets it, "All the impressive achievements of deep learning amount to just curve fitting,”
Anyway very impressive curve fitting. I would like a way to smoothly integrate memory with deep neural networks. Also if you have ever experimented with evolutionary algorithms they can be very impressive too. You could say evolutionary algorithms were the first form of AI that actually worked, not deep nets.
You can train neural nets with evolution but if say the training set was 1 million images that is probably too much computationally. Training 1000 different small networks on say 1000 images each is certainly doable. Then you can regard each small network as an expert feature detector. Then from your 1000 network feature detectors train a massive depth 1 readout layer over the entire 1 million images. That is similar in spirit to transfer learning.
Somewhat off topic there is a very nice lecture on recurrent neural networks here:
Evolutionary algorithms are a good thing to have in some way experimented with, or at least understand. But for myself and Numenta we have to start from biology where the ultimate goal is to model what biological brains are actually made of that nibble off synapses, battle invading cells, and exchange RNA and DNA on into the reproductive system.
Further modeling in genetic detail makes it possible to do away with a “fitness function” and other problems like when their timer runs out they no longer exist as food for others that may have hunting them down because they needed something to eat.
The question here has to be: how do we most easily model (among other things) autonomous mobile virtual microglia to do the pruning?
That was interesting. You can prune away most of the weights in deep neural networks after you have trained them. But they must be there during training to avoid getting trapped in local minima.