Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World



I watched the first 12 minutes of the talk. The range of timescales he is showing is derived from a simulation of a graph of cortical areas. The connectivity graph is based on empirical data but the simulation is wildly artificial. He is modeling cortical areas with a couple of equations that I didn’t understand but I are so simple that I am certain have little to do with what real cortical areas do. He puts in a pulse at the first region and watches what happens.

I am with you, I don’t see how it relates to experimental data, it is very abstract.


This just published paper in Nature Neuroscience seems to support @jhawkins concepts in V1: https://www.nature.com/articles/s41593-018-0135-z


I reached a paywall, but found a good article that in my opinion does support Jeff:


Yes. That relates to the same paper.
This link may give access to the paper.


I think the findings (and please let me know if I’m wrong) indicate: the retinal view spatially mapped out on the surface of V1 is being superimposed by another map as to compare what a “higher” level view predicted, and highlight important fine detail.

There seems to be a hierarchy of what are more like maps in motion where towards the (hippocampal end) top the word “dog” temporally recalls tails wagging and big wet tongues licking our face while recalling barking and falling backwards from one knocking us over, etc…



I’ve read this paper about how columns learn the structure of the world and there is a thing that I don’t understand:

The output layer’s activation is fixed while it is getting different feature-location signals. Also the layer can represent union of possible objects and over time it finds the right object. So the activation of the output layer does change. So how can these things happen at the same time?

Thank you for your help!


Hi @n_neumann,

During learning, the output layer is determined randomly for each object, and then it is kept fixed while learning that object. So that particular set of output cells learn all the feature/location signals for that object. The specific pattern of cells represents that object.

During inference, a union of cells in the output layer initially become active. This union represents all objects that are consistent with the current input feature/location signal. For each subsequent sensation, the set of active cells is narrowed down to be the intersection of cells that are consistent with the new feature/location signal and the previous union.

Eventually it should converge onto a unique representation that will stay stable while you continue to sense the same object.


So after training we have to manually set the system from ’learning mode’ to ’inference mode’, right?
If yes, is there a way to make the algorithm learn and infer at the same time?


I would assume that the error signal (novelty detection) can be an output in continuous operation and learning can happen anytime something “new” is detected.


I’m thinking about that the activation of the output layer could indicate if a feature/location signal is learned or not, because a learned feature/location pattern makes the cells in the output layer “more” active than a new pattern would. So maybe if the feed-forward activation is high enough then we can use the ‘infer mode’ otherwise fix the activation of the output layer to learn the new object.


Yes, the algorithm as described in the paper requires separate learning and inference modes for the output layer.

These are promising ideas to explore. Some of the corner cases are difficult, such as a new object which shares many feature/location pairs with a bunch of other already learned objects.


To be fair, even in young children, such as my not-quite-2 year old niece, after she learned “kitty/cat”, for a while, anything close (such as a puppy) was also a kitty. In humans, the way we get around this is some parental/classmate/teacher/sibling feedback mechanism. What if these corner cases prompted an inquiry from the system when the confidence level was low or the classification was wrong?


I was thinking about that very subject quite recently.
One way to have that teacher effect taken into account, for our online-learning models, is:
(I was taking the example of learning your ABC in a chat with @Bitking)

Any thoughts/criticism appreciated ^^

Note : Of course, the above is oriented towards training a feedforward path, for the purpose of evoking ‘A’ back in the higher area when we experience a sight of ‘A’, but from almost same scheme, we could also wire apical tufts of visual pathway, in reverse, to feedback from the higher-area-‘A’, in the hope that part of the ‘object recognition’ functions finally… percolates? compress? … down to early areas of vision themselves, like Numenta is maybe proposing recently (if I got this right).

[Edit] Although… maybe what I have in mind is not really anything new… since

Reading this more carefully… well, in essence, I’m only replacing ‘randomly’ by ‘preset’, here…
However those input/outputs would be split in two hierarchically distinct areas in the proposal above, while I believe the HTM ‘output layer’ represents some bunch of nerve cells in same area.

Hierarchy & Invariance

Alright, now it is clear. Thank you!

Edit: I made some simulations and the following idea could successfully detect and learn new objects, but it had some bad effects on the inference part, so it’s not the right solution.

Just guessing but it may not be that difficult:

Let’s assume that a learned feature/location pattern (P1) causes the output layer to represent an already learned object (this is in ‘infer mode’ since the feed-forward activation was strong enough).
Let’s call the output layer’s activation A1.
After P1, a new, unseen pattern (P2) arrives. Because it isn’t a learned pattern the feed-forward activation will be less “strong”, so the output layer switches to ‘learn mode’, and chooses an other set of active cells to represent the new object (A2).
Now, as described in the paper, the output layer’s active neurons adapt their connections to the last time step’s activations:
So the A2 neurons will be more likely to become active, if the A1 neurons are activated.

At the next epoch we feed P1 to the system again. If we follow the output layer’s activation rule then only the A1 cells will be active (because A2 neurons don’t have feed-forward activations):
However if we assume that the feature/location signal is a “strong” signal, then the A1 cells could be active enough to make A2 cells active aswell:
After the activation of the output layer is determined, active cells in the output layer grow connections to the input layer’s active cells. So A2 neurons will also learn the P1 pattern…

The only change that I made in the output layer’s activation rule is that if the layer is in ‘infer mode’ (so if the feed-forward signal is strong enough) then cells in the layer don’t need feed-forward input to activate, it’s enough if the basal signal is strong enough.
I don’t know if it can happen in the brain (probably not), but to me it sounds logical to say that if a neuron gets a strong activation signal then it can activate other cells in the same layer.

BTW when I say “strong” feed-forward signal then I mean that a cell’s overlap is above a threshold. If the active cells in the input layer is let’s say 10, and during initialization we set 50% of the permanences to be connected, then we could say that a neuron has “strong” signal if it’s overlap is >= 9.

So if the signal is a strong signal then we would switch to “infer mode”, and during inference cells can be activated without feed-forward signal.

However if the feed-forward signal is not strong enough then we would select the ‘learn mode’ (=fix the output layer’s activation and learn the new object).


I share some of @Paul_Lamb and @n_neumann 's questions.


Resetting the learning process between objects means that the system doesn’t learn the concept of the object itself, but it fills observations into an existing concept. The “teacher” knows the concept of the object and shows this object to the system for it to explore; then, it will move on to the next “topic”.

The question is, therefore, how do we conceptualize? How do we decide on what constitutes separate objects, on boundaries? I believe that conceptualization should also be part of the unsupervised learning process, in the same sense that clustering algorithms decide to separate clusters.

Maybe this observation is insightful. We know that grid cells reorient and rescale their representations when they enter a new environment, keeping to an “allocentric” representation. Therefore they change reference frame, moving from that of the 1st room to that of the 2nd. This is a discrete decision. How do rats learn to make this decision, and therefore build a concept of a room?

Could it have to do, as in sequence learning, with surprise?

Fixed activation of output layer

The paper states an activation rule for the output layer, eq 5. This holds only during inference?
If yes, how is the activation kept fixed during learning?
Again externally, with the supervisory signal of the concept of the object?


Maybe @subutai, @jhawkins or @ycui could help me understand the above? Thank you!



I can answer only your last three questions:

During inference you use this equation at each time-step to compute the output layer’s activation.

During learning you use this equation at only the first time-step. This activation will be fixed as long as you don’t get that reset signal. BTW at the first time-step eq. 5’s output and eq. 4’s output are the same.

Yes, but this reset signal doesn’t have to come from a ‘teacher’. Quoting from the paper:

“During training, we reset the output layer when switching to a new object. In the brain, there are several ways the equivalent of a reset could occur, including a sufficiently long period of time with no sensation. When a new object is learned we select the object representation based on best match via random initial connectivity.”

I hope this will help