I don’t have a problem in the world assuming that the cortex is an extreme learning machine.
It may come as a surprise to you but HTM does much of the memory functions you call out as advanced - right now - without waiting for some future technology.
You have to put it in the correct relationship with the sub-cortical structures; these lead the cortex around like a tame puppy. That relationship points the cortex at the right attention points to make scanning and scene digestion to occur. The emotional weighting of the scanning drives decision - arguably another learned behavior.
Hi @Bitking thank you for your response. My background is in computer science so I didn’t quite understand the third paragraph from your reply. Could you please elaborate more on extreme learning machine? Am I correct to assume that you somehow agree that the brain is memorizing?
One of the reasons I ask this question is that because I am quite curious about why AI/ML practitioners out there always equate Learning to Generalization. I know that Generalization is a bit convincing as it presents more artificial intelligence and there is Math that backs it up, at least for now. But what if biologically/computationally the cortex is really just a preferential and volatile data memory, like an ant’s trail perhaps, and these memories are simply the permutations of neuron connections? There must be a proof (likely I’m ignorant about) that the cortex is trying to generalize when it learns because all of todays ML algorithms (e.g. gradient descent) tend to focus on this mindset rather than focusing on memorization, which will probably involve more Information Theory (e.g. encoding, decoding, compression, etc), surprisingly though the HTM has many of these techniques.
Hmmm. Sort of a mathy background?
Let’s try this. HTM learns on single presentation with unsupervised live learning. It can learn a lot. As neural networks go that’s outstanding performance. The basic SDR theory is well documented in the Numenta papers location. What I got from it is that dendrites are capable of encoding a ridiculously large number of features. And connections between these features.
So as the world streams in it is continuously parsed based on prior learning. You build up an internal model to match the external perception. If they don’t match you are “surprised” and learn this new pattern. Delta coding! Everything you learn is in terms of what you have learned before.
So you are seeking novelty you are really trying to experience orthogonal experiences in each of your sensory streams.
This forming an internal model thing is something that should be based on observed function.
It’s not like a film strip but it encodes what a critter might need to know about what it’s seen before.
I don’t know how to respond to the claim that generalization is the end product to be desired. I see this as part of the descriptive functions. Forming an internal model is just another description of learning. Which we do well.
Yeh, Krotov is combining convolution with dense associative memory to get better generalization. I think for many problems if you have suitable pre-processing to give say elements of rotation, scaling and translation invariance and massive amounts of memory (as the human brain has) that probably is as effective as current deep neural networks. Especially if you use error correcting associative memory. https://researcher.watson.ibm.com/researcher/view.php?person=ibm-krotov
I would rather hand the general problem over to evolution to solve but I am having trouble integrating a controller deep neural network with associative memory. I don’t want to preordain too much structure but it seems I will have to provide more than I want. Also I’m intending to shift more toward HTML 5 software and less on unpaid work.
Especially as I don’t have up-to-date hardware to write specialized code for, which makes things less interesting as I’m not pushing the envelope.
Adding this here as this is quite related my original question. I’ve always been curious of the possibilities that the brain is memorizing or massively overfitting. On the mainstream side, the ML world is crazy about Generalization, which I believe is more of an emergent result than a concrete algorithm.
The more layers I use the more restrained the output is to some underlying truth or reality manifold. Ie. The less junk and artifacts I see in the output for inputs that were not in the training set.
This is extremely apparent in the fixed filter bank neural networks I am experimenting with at the moment.
With single layer networks (or call it associative memory if you like) pure noise in will give junk out. If the single layer network has high capacity and was trained with a few examples then there is an error correction effect, which really results in a form of vector dictionary lookup. You get a some jumbled combination of images in the training set but little Gaussian noise. Nearer to full capacity the same thing but lots of Gaussian noise out for pure noise in.
With deep networks and pure noise in you get far more coherent outputs that are not simply random combinations of dictionary look-ups.
With the fixed filter bank neural networks you really can put pure noise in and get highly coherent outputs. As the name implies there is a lot of filtering going on to allow that. And that also is related to a lot of boosting where a neuron is able to look at all the (weaker) neurons in the prior layer and combine their outputs into a stronger decision or more meaningful result.
So in short with single layer and massive overcapacity you will get vector dictionary lookup. Where the dictionary is simply the training set. Not quite a hash table because there is some interpolation and error correction going on, but not far from it.
To add, I have experimented with SP->Softmax for MNIST classification using the htm community example code. At most iterations where the model reached ~95% accuracy the number of active column combinations almost equaled to the number of unique inputs. It seems it has “memorized” these inputs.
This is probably going to get me in trouble because I don’t have the math to defend the assertion but here goes:
HTM in a full implementation is a layered system (the H of HTM) and as such, uses micro-parsing to distribute semantic meaning between layers of the hierarchy. Yes, the patterns/sequences are memorized. As the patterns are recalled there does not have to be a strict 1-2-1 mapping and recall between the layers. You should get islands of semantic meaning in each layer in terms of that level of parsing. Given this organization the selection of “the best recall match” between layers should get generalization, composed of the best match in each layer of recall; sort of a Frankenstein patchwork of best matches.
I think that this is why there is both level skipping and such rich diversity of connection path between maps.
Ah I see what you mean. In relation to this, I believe that the recall match is performed on different (probably many) instances of this memorization states. I find it really almost impossible to do this recall match and result to generalization given only Heirarchy. My 2 cents (because no NS evidence) is that these memorized patterns (I call this a state of the model such as the SP) exist at the same time and given some consensus algorithm they emerge a more accurate result which we call “generalization”.
In HTM many SDRs can map to a given column. This allows the column to respond as part of more than one global pattern. If you preserve topology instead of the common fully connected maps that Numenta usually uses related inputs should produce activity in related areas. Think of how this will work in semantic distribution of related patterns - you will build hot-spots of meaning in a single map and (how to say this?) planes of related meanings running “vertically” and “diagonally” through the maps.
As a side note, read this if case you are wondering why we don’t use topological networks yet. Topology is super important, but it takes massively parallel systems taking advantage of sparse computations. To do anything interesting with topology, we need better hardware. It will come.
In theory/intuition yes. But still, this remains to be seen and tested because classical computation may not be enough to simulate this. The way I see topology if it is incorporated in HTM and in relation to memorization is that it forces memorization to more coherent and smaller areas of an input. ML people would castigate me for saying this but I believe the brain overfits, builds a solution space of these overfitted models, and the recall or matching algorithm is an overfitted model itself. Everything is executable data as opposed to classical computation where there is knowledge(data) and function(executable).
Like activating in parallel the memorized fragments of “a nice oak under which I took a nap after my last walk in the forest” & “a palm tree on the beach during my previous holidays” & “a painting of a tree I saw at an exhibition” to generalize the concept of a tree ? Generalization by union-ing sets? But activations would get denser & denser with time. Where is the limit?
Perhaps, but that would end up:
A) being distributed over a network of connections. There is much to suggest that memories are redistributed over the interconnected maps during dream cycles.
B) SDRs are capable of holding an astonishingly large number of discrete values, with unions of common features.