Adding a readout layer to whatever kind of feature detector seems to be a plan these days:
Since it may require too much hardware to evolve a deep neural network directly on 1 million images (the current industrial scale with BP) perhaps you could evolve 1000 small nets on 1000 images each and combine them with a readout layer or associative memory layer:
This could go a long way towards the common problem of “it works but we have no idea what it is doing!”
As far as HTM - the readout should be coming from the bursting neurons. I don’t have any problems using a “traditional” deep network as a readout of the HTM anomaly detection to drive a report of WHAT is anomalous.
For this scenario I envision snapping a copy of the HTM response state and the applied stimulus to form the paring of a particular SDR response to the stimulus.
Each snapshot of the HTM response to a “surprising” stimulus would provide the training fingerprint for that stimulus. The collection of these parings would make up the deep net training set.
During the normal monitoring the “column winning” cells would signal the recognized state. The bursting cells should be related in some useful way to the prior learned states.
Using a SOM as part of the deep net may be useful to group these responses.
It was known empirically in the 1990’s that neural networks with sufficient parameters for the problem at hand don’t actually have local minimums that weak optimizers like back propagation (BP) could get trapped in. Rather they have many saddle points which only slow the learning process. A few years ago there were a number of papers that proved that.
Given that information can an evolutionary algorithm do better than BP? If you reduce the number of parameters so that local minimums start appearing in large numbers even the best evolutionary algorithm will fail. It may be then that if the problem is solvable using neural networks BP is sufficient. That is kind of an open question.
The networks I am experimenting with at the moment have a sparsity inducing activation function and don’t seem to need a bias term. Therefore they are reasoning purely with sparse patterns. I am inclined to think that evolution is better poised to work out smart ways of using that mode of reasoning than BP but it is not certain.
I will try to add a memory heavy output layer to an evolved network and have a cake that is more icing than anything else. We have the technology, we have 6 million dollars, we can build it. Well no, not really, maybe $6.
Github code: https://github.com/S6Regen/Thunderbird