A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex
This is a little early but it is available now so I wanted you to get it from us, not Twitter.
A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex
This is a little early but it is available now so I wanted you to get it from us, not Twitter.
Yay! Iāve been looking forward to reading this since seeing Jeff Hawkins speak at Johns Hopkins APL last month. Thanks for the link!
w00t!
Nice, some weekend reading
Lots of discussion on Reddit ML about this. I havenāt read any of it yet. I just canāt bring myself to go there.
I see redit as more of the howling mob. I think you are wise to hang back.
I got some stuff for twitter, but I canāt post here at work. I do get the tweet notifications.
One or two do have a good answer that I would like to offer.
The rest - meh.
Iām finding a much nicer conversation in general taking place here:
https://news.ycombinator.com/item?id=18214707
Some folks just want to insult for the sake of it. The anonymity of the internet probably helps that and Iāve arrived at the point where I just move on when I encounter it. Another common thing: āI donāt get what youāre saying, so you must be stupid.ā
Itās a good example of how we humans project our understanding on to others, and an issue weāre likely to face when creating any AGI in the future as well.
I particularly like the way this theory explains the were and what pathways.
I have now read the paper and the supporting complementary article. I have also watched HTM School ep. 15. To try to solidify my understanding I will try to describe a setup and ask questions relating to category inclusion and category separation.
I have been playing around with the MNIST data set since itās come to be some kind of gold standard. In it, there are 10 categories with 6000 instances per category in the training data. Thus there exists 6000 instances of the category containing, for example, the number ā3ā. My experiment has been guided by the previous paper regarding object detection and I use a similar metric to see how well the network detects the categories.
In my experiment, Iāve chosen to, in some sense, simulate what a patch of the retina (minus all the extra computations that occur there) would report back when getting to see a sequence of four slightly overlapping parts of every image that is used for training. I do this since Iāve seen claims relating to saccades and I assume it maps well to sensing different parts of an object with a finger.
I train a spatial pooler on the patches and its output is fed into a temporal pooler. The temporal pooler is also fed with apical data from two layers that I, informally, call ālocationā and ācategoryā layers. If I were to map this to actual cortical layers Iād say that I have proximal input from the SP that is fed to layer 4. Layer 4, in turn, gets apical information from layer 6 (location) and layer 2/3 (category). Layer 2/3 is fed the state from layer 4 during training (as described in the earlier paper) and is trained on its own state to strengthen the category patterns and make the inference stronger when being tested. Locations and categories are randomly generated SDRs with 4 for location and 10 for category.
This leads into my first question:
Has the function of layer 2/3 been revised and thus removed from the theory about object detection? This paper makes no mention of it playing an active part but I have a hard time to intuit how it would work without it.
My second question:
On what level are the patterns for object detection unique? As stated earlier, the cortical column will have been trained on 6000 instances of every category but the value of being able to recognize a specific instance is very small compared to being able to pick the right category. If the sensor (in my very simplified example above) is moved in the same way for every instance during training and testing, the displacements will be the same and add little to no value. The four locations will repeat but add much more information together with the stable pattern in the category layer that only changes when an instance from a different category is to be learnt.
During my own experiments with the setup described above, I use a column with 100 mini-columns with 32 neurons each. I use the raw MNIST data with no pre-formatting. If it is enough to uniquely pick the right category at least once during the four exposures during testing, I reach something like 45%. If I add that this must remain the same during any remaining exposures I end up around 35%.
By temporal pooler, do you mean temporal memory, which recognizes places in sequences? The temporal pooler was from before the focus on objects and locations. It was meant to recognize whole sequences by pooling the sequence of inputs from the temporal memory.
It probably shouldnāt get confused about what the object is after it recognizes it. Does your layer 2/3 narrow down possible objects with each new input?
Iām not sure this is good with only one cortical column. You seem to be describing voting, where each cortical column tells the other columns what possibilities it sees so they can together narrow down the possible objects.
I donāt think so because it is mentioned briefly in the locations paper. I recall that is the output layer, which narrows down possible objects, so I donāt know the answer if L2/3 is for something else.
I donāt think the object detection system generalizes well right now, so it pretty much recognizes the instance of the object rather than the category. There will probably be ways to generalize better once more progress has been made, like similar representations for similar locations. I donāt really know if it does anything like that yet though. The current goal doesnāt seem to be accurate categorization of different instances of the same thing, let alone novel instances.
Behavior is fairly random. I donāt think the displacements are just movements of the sensor. They might be the movements between each feature, or something closer to the object, like the path integrated difference in the locations of each feature pair.
The locations wonāt repeat if the instances of the number ā3ā are different. They are locations of the sensor, but of the sensor when it is touching the object. Otherwise, thereās no feature so the location of the sensor has no influence. This is probably easier to think about in terms of touch.
ā3ā might be too complicated to recognize in big chunks. Try reducing the size of the sensory patch, so it only sees a simple curvy line at each moment. 3 isnāt too complicated for us to recognize with a single sensory patch, I think, but HTM is incomplete and we have huge visual cortices with a bazillion specializations. Our vision is really good. Instead of people, I would think about it in terms of a rat, which has blurry vision. With blurry vision, it canāt just look at the whole object and see what it is. It needs to move around to a bunch of different locations to figure out what it is. I imagine thatās true even if it can see the whole object at once because it provides more information about the shape.
Thank you for taking the time to digest my wall of text. I had a hard time deciding on how much information was needed for the intended context.
Iām referring to the algorithm as described in HTM School with some of my own additions relating to inhibition and dendrite activity. I guess Iām a bit stuck in the nomenclature as it looked at least two years, or more, ago.
It would of course be best if the category stays stable but Iām not surprised that data of this sort can give unstable results. And yes, the training of the 2/3 layer is intended to give the same result as in the previous paper on object detection. As in, the first exposure activates as many possible object patterns as possible and for every new exposure, the state in layer 4 is biased by the state in layer 2/3. Layer 2/3 then uses this as a bias to further narrow down its own possibilities since the neurons in that layer have been trained to be activated by specific object patterns in layer 2/3.
My implementation follows my understanding of the previous paper on object detection. Maybe Iāve misunderstood how layer 2/3 is supposed to strengthen the internal connections between neurons in the same pattern. Either way, this training of neurons in layer 2/3 seems to help with the narrowing down of possible patterns with few exposures and my column performs worse if I remove this functionality.
Perhaps this is a result of me using a column with a magnitude fewer mini-columns than typically are being used in Numenta research and in Nupic. My reasoning when it comes to number of mini-columns is that if I get 100 mini-columns to perform well enough, having a magnitude more of them should result in dramatic improvements. I havenāt decided on what is enough but my gut feeling is that if I can reach a stable 50% with 100 mini-columns, stepping it up to the common 2048 mini-columns would make sense.
Further, getting a small network performing well enough to solve simple problems offers more opportunities when it comes to running the algorithm on very limited hardware.
Ah, ok. Then Iāll assume Iāve overlooked something. Iāll spend some more time with the paper.
This sounds a bit unlikely. If we look at the popular coffee mug example, many different coffee mugs will, taking subsampling and SDR attributes into account, appear very similar. For example, you can not sense colour with your fingertip so the same model of mug in a different colour will appear identical even though they in one sense are very different.
So, feeling a lip on the edge, a cylindrical form with a bottom and open top together with an ear starting somewhere close to the edge and terminating somewhere close to the bottom should make category detection very possible. It would, of course, be possible to get into more details with a finer sensor but I claim that moving from the category āmugsā to āmugs with texture on the outsideā is a very small step.
Iād say that my results show that the ability to detect categories, even if not intended, seems to work on at least some level with a combination of location, sequence of sensory inputs and category biasing.
But, just to be clear, a network that has trained on mugs will of course not do well if you show it a cat or something from some other very different domain.
To me, this sounds like a description of what Iāve done. The sensory patch is small and projects to a small number of mini-columns. Sub-sampling removes even more of the information that is needed to properly separate a ā1ā from an ā8ā or a ā4ā from a ā9ā. Thus I let the sensor be exposed to overlapping patches (that are smaller than the training image) that offer separation of location for similar features and topological information that connects the features.
Going a bit further down the grid cell encoding rabbit hole: If the highest level of representation is at the autobiographical memory level and cognitively there is some sort of representation and relationships between objects there should be some sorts of basic operations.
This is exactly one of the issues I have been thinking about for many years. (my oldest notes on this run back over a decade)
One of the possibilities that keeps bubbling up as a strong candidate for internal representation is the tuple. That uses our internal spatial representation and system arranges our objects as (object)(relation)(object)
BTW: Itās nice that the rest of the world is starting to converge on the internal spatial representation that seemed most likely to me over the years!
With this long, self-congratulatory introduction - tonight I bumped into a very interesting paper that explores some of these same concepts:
https://www.ncbi.nlm.nih.gov/m/pubmed/30146305/
If you are the sort of person that uses a certain SH web page to view your papers you will be needing to use this DOI address:
doi: 10.1016/j.neuron.2018.07.047
Just posting an idea. This is highly likely not biologically possible.
We could make a grid sell encoder out of a capsule network. The routing mechanism works like a displacement layer. We could track where the values are ended up after routing, thus use it as the displacement.
Please read my post on Hex-grid cells. This is much simpler and more biologically plausible than using capsules.
6 posts were merged into an existing topic: Intelligence vs Consciousness
This theory is so illuminating and beautiful, I really enjoy thinking about it and speculating further ideas. Many thanks to Numenta for sharing all these in an open and accessible way.
I have a question about āwhatā and āwhereā pathways. Letās say I instruct another person/agent to manipulate an object and I already know the agentās body space and behaviours well. So the task is to specify the movement in agentās body space to get the desired location/state in the objectās space. Could it be possible that during this task, āwhereā region performs location computations on agentās body space? If so, what could be the extent of spaces that āwhereā region compute locations on?
I will connect, this convergence is what will lead to the singularityā¦āWe shall Ionize!iā
Based on your idea that there are ācortical grid cellsā in L6 and ādisplacement cellsā in L5, do you have any testable predictions to make about L5/L6 neurons?
@mrcslws @jhawkins
The paper makes several specific and novel proposals regarding the neocortex which means there are many ways the theory can be tested (both to falsify or support). In the posters we presented at the Society For Neuroscience conference this week we listed several testable hypotheses. Here is the poster about the new āframeworksā paper. It lists several testable predictions on the right side.
In practice it can be difficult to actually test these predictions. What is necessary, and what we do, is to sit down with experimentalists and carefully understand what their lab is capable of measuring and how that intersects the theory. This can take hours or even days just to design a potential experiment. For example, it isnāt known how capable rats are at distinguishing different objects via whisking (the active sense of moving whiskers). We predict that whisking should work on the same principles as vision and touch in humans, but we canāt ask the rat what it knows. We canāt even be certain that the whisking in the rat hasnāt evolved alternate strategies for operation. There have been recent advances in fMRI related to detecting grid cells in human neocortex. We list some of these in the same poster. fMRI might turn out to be a more fruitful experimental paradigm for testing the theory, but is limited in spatial and temporal resolution.
Bottom line is the theory makes many surprising predictions that should be testable, but it may take time to figure out how to actually test them.