Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World



Fresh off the press from bioarxiv…

Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World, by @jhawkins, @subutai, and @ycui.

How do human brains represent things?
Why Neurons Have Thousands Of Synapses, A Theory Of Sequence Memory In Neocortex

Thanks Jeff et al. for the great paper. I’m looking forward to further work on how the crucial allocentric location signal is calculated within cortical circuitry. As a roboticist, this is a familiar problem. The argument has been made that localization (of the body relative to the world, and objects relative to the body) is the only real problem in robotics, and I tend to agree. It is unfortunately, but not surprisingly, also arguably the hardest problem.

On the topic of grid cells. My PhD advisor Michael Milford has done a lot of work on simultaneous localization and mapping (SLAM) for robots using a computational model of the rat hippocampus that’s based on attractor networks of grid cells, called RatSLAM [1]. The researchers at Numenta may be aware of the work, but it’s been applied to challenging real-world localization and navigation problems so it may be a potential source of ideas that are “battle-hardened” in a way that some other models are not.

[1] Milford, Michael, and Gordon Wyeth. “Persistent navigation and mapping using a biologically inspired SLAM system.” The International Journal of Robotics Research 29.9 (2010): 1131-1153.

RatSLAM uses grid cell techniques to map subdivision

Still reading. You might want to clarify the difference between the second and third sentences. It reads like they are the same. That is, you should clarify “receiving direct feedforward input” and “driven by input layers”. They seem the same.

Cellular layers vary in the connections they make, but a few general rules have been observed. Cells in layers that receive direct feedforward input do not send their axons outside the local region and they do not form long distance horizontal connections within their own layer. Cells in layers that are driven by input layers form long range connections within their layer, and also send an axonal branch outside of the region, constituting an output of the region.


Good suggestion.

“receiving direct feedforward input” means that the input didn’t come from another cell in the same region, that would be “indirect”. The input to the layer came from a source outside the region, making it a “direct” input. For a primary sensory region such as V1 this would be a signal from the retina to the thalamic relay to the input layer in the cortex. “Feedforward” means we are discussing signals that are moving away from sensory organs.

“driven by input layers” implies two things. One is the input to this layer is indirect, it comes from the input layer in the same region. Two, it is a “driver” input. A “driver” input is one that can make a cell fire, as opposed to a “modulator” input which changes some aspect of activity but on its own does not make a cell fire. These terms are used in the neuroscience community. In HTM theory, modulatory inputs depolarize the cell, act as predictions, and cause the cell to fire a little sooner than it would otherwise.


@jhawkins best thanks for interesting paper and explain your sensori-motor concept.

In which project at htmresearch do you test and validate your concept? I try to learn by doing …


I agree. I figured out the meaning after a while, but it might not be so easy for someone unfamiliar with the material.

I also like the separation and definitions of “driver inputs” and “modulator inputs”. That makes things easy to understand.

I think the main takeaway from this paper is that you infer the existence of a location mechanism and generate some testable hypotheses for the neuroscience community. Thank you for the paper. It helps a lot with some of the questions I had.


The code is not prepared for public consumption, but I can point you in the right direction for now: https://github.com/numenta/htmresearch/blob/master/htmresearch/frameworks/layers/l2_l4_network_creation.py. Not sure you’ll get anything working, but I will provide some guidance as I create video content for SMI.


@rhyolight thank Matt


Hi Jake,
Thank you for the link to this paper. I was not aware of this work, and it was very helpful. Did you work on this problem too, or just your advisor?

I am currently studying the entorhinal cortex (grid cells, head direction cells). I want to understand exactly what they do and how they do it from a theoretical and cellular perspective. I am then mapping that functionality to what is required in the cortex, and then mapping the mechanisms to the specific layers and connections in a cortical column. The Milford/Gordon paper describes how roboticists are solving the same problem as the entorhinal cortex but using slightly different mechanisms and terminology. E.g. the “pose” cells in the paper are the same as cells in the entorhinal cortex that combine grid cell plus head direction cell responses.

One of the key issues I am working on is how to balance between knowing where you are by sensory observation and knowing where you are by dead reckoning (called path integration in grid cell literature). The paper you referenced was specifically about this problem. Reading it was like going from two languages on a Rosetta stone, to three languages. I made progress by triangulation between the three approaches (robotics, entorhinal cortex, cortical columns). Thanks again.


BTW, I should have mentioned that Marcus Lewis (Numenta) has also been focused on this problem for a few months. He is running simulations to understand how ambiguity is resolved.

There is a key difference between what the neocortex does and what the entorhinal cortex does. The entorhinal cortex is solving one problem, it needs to know/represent where one animal is and what direction that animal is facing relative to its environment.The neocortex is solving the same problem but for hundreds of sensory patches at the same time. When you grab an object with your fingers, each finger tip is like a rat in a room. Instead of one rat we have five rats, each moving independently and each facing its own direction. The finger tip rats have a very limited sensory input (analogous to looking through a s straw) so on their own they will have hard time figuring out where they are on an object (aka location in a room). However, if they collaborate then together they can solve the problem.

In our just-posted manuscript we show how long range connections allow columns to resolve ambiguity. The cortex will use this mechanism so that columns (rats looking through straws) can together figure out where they are. I don’t think there is anything analogous to the multiple column problem in the entorhinal cortex (I hope this analogy made sense!)


For anyone interested in the question, “How do neurons calculate the allocentric location?”, I described our recent progress in last week’s HTM Hackers’ Hangout. I start talking at 6:34, and I start discussing location at 8:45.

We’ve got more work to do, but I think we’ve nailed down many parts of an elegant solution. (I describe most of these parts in this video.)


In terms of HTM, the introduction of a location signal in all parts of neocortex seems to me a radical move, extending the theory quite heftily. What does this imply for cognition in areas higher up in the cortical hierarchy? The suggestion that our mind’s eye perceives concepts as abstract sensory patterns seems intuitive, but what to make of the location signal? Is it the Cartesian ego? Or to be practical about it: what is the purpose, evolutional advantage, of a notion of location with abstract concepts?


@rhyolight @cmaver

Does anyone have a link to the video that’s longer than 2:20? The one posted on Numenta’s twitter site (the pinned paper announcement), cuts off after 2:20… :frowning:


Christy is on vacation. She will probably fix it tomorrow.


@rhyolight Thanks bud!


You can see the original movie published with the paper here: http://www.biorxiv.org/content/early/2017/07/12/162263.figures-only


We have been wondering about this too for almost a year. If our hypothesis is correct, then the evidence that an allocentric location is created in all cortical regions is pretty strong. But what purpose does it serve in high-level thought? Locations are represented by SDRs and they are dimensionless. So we don’t need to think of these dimensions as x, y, and z. I suspect the answer to this question will involve motor transforms. In sensory regions, motor signals change the orientation and state of objects (this is something we are currently trying to understand). In higher regions, the same mechanisms might allow the transformation of ideas in an abstract space. For example, when I work on problems, I feel as if the problem I am working on has structure, almost like a physical object, but not quite. I speculate that when we manipulate ideas they have their own space and “motor” capabilities and rely on these to find a model that fits the data. This probably sounds vague, because it is, but I think we can figure this out.

Intelligence vs Consciousness

No problem, I’m glad it was of value. I work on the problem of robot localization, mapping, and navigation, but I haven’t been using Michael’s hippocampal model. He did that work for his PhD quite a while ago. I’m currently looking at the problem through the lens of deep learning, and it looks like there may be some analogies to biology to be found and exploited there.

This is the essence of robot localization, and if the agent is exploring the environment for the first time, then it’s the essence of the SLAM (localization+mapping) problem as well. There is a huge amount of work in this area. It mostly comes down to path integration (from motor odometry, inertial cues, and visual tracking) while placing observed features in an internal representation of the world and refining their location and identity as you move (and tracking them to estimate your own motion). Another crucial component is a location recognition system to perform “loop closure”, where you identify when you’ve returned to a known location and “snap” the locations in your map together, correcting the path integration error that you accumulated along the way.

Here’s a video of a state-of-the-art visual SLAM system doing visual tracking against feature locations, plus several loop closures: https://www.youtube.com/watch?v=8DISRmsO2YQ

I take a different view on this. If you assume that the state estimates in the EC (grid cells, HD cells, conjunctive grid cells) and the hippocampus (place cells, path cells, time cells, etc) are distributed representations, then even though each population of specialized cells may be representing just a single quantity (some aspect of the location of the animal in space), the individual cells will likely be representing different features of that variable, different sensitivity to sensory and dead reckoning context, and will have to converge together on a distributed representation that makes sense. And although you could think of the animal’s 6-degree-of-freedom pose as one variable, it may be more appropriate to think of it as several independent quantities that are estimated jointly, similar to the locations and orientations of multiple sensors at once. So in my view, there may be substantial analogies between the tasks of the neocortex and the EC in this case.

Thanks again for the paper, and very much looking forward to future work (and thanks Marcus for the video).


I like the paper. Nevertheless, I have some “naïve” doubts about the allocentric problem. My main doubt is if it is a cortex duty to solve this problem. In some way this is a input issue: you have to transform the sensory information in some form of invariant.

Although you briefly cite the auditory cortex in the paper, there is no “deeper” explanation about what allocentric means in that context. Presumably, the task should be to cancel internal noises, integrate the head position, etc… to transform auditory nerve signal in an invariant. I think, It is suspected that the DCN (Dorsal cochlear nucleus) [part of the cochlear nucleus which is located into the brain-steam] might be doing that [1][2]. DCN is very similar to cerebellum (purkinje + parallel fibers+ fusiform cells). DCN is receiving inputs from the cochlea, the vestibular system, cortex, etc… to actually produce the input to the auditory cortex (note that DCN is not well understood in humans).

My understanding is that Cerebellum is doing something similar (integrate body position, precortex-motor commands, other sensory information,…) Cerebellum affects not only body coordination but many high cognitive functions [3].

Perhaps that entorhinal cortex is too far away from the “input ports” to be a effective solution to solve this. The brain steam nuclei + Cerebellum might be playing a key role in this problem.

[1] D. Oertel and E. D. Young, “What’s a cerebellar circuit doing in the auditory system?,” Trends Neurosci., vol. 27, no. 2, pp. 104–110, 2004.

[2] S. Singla, C. Dempsey, A. G. Enikolopov, R. Warren, and N. B. Sawtell, “A cerebellum-like circuit in the auditory system cancels responses to self-generated sounds,” Nat. Publ. Gr., no. August 2016, 2017.

[3] M. Ito, “Control of mental activities by internal models in the cerebellum,” Nat. Rev. Neurosci., vol. 9, no. 4, pp. 304–313, 2008.


I sympathize with the naïve doubts in that I recognize a redundancy in functionality. I wonder therefore if it would be instructive to consider the location signal, at least to the extent it originates from outside the column, as a modulating navigation mechanism. It would make sense that behavior signals should communicate directly with cortical regions if possible and not merely indirectly by way of meddling with the (possibly fantasized) world. In low regions this would generate models of concrete objects perturbed by the motor control of extremities; in higher regions it would manifest itself in for instance the composing of words in a text or the solving of a mathematical equation. How in line is all of this with what the neuroscience says?

I appreciate the point that SDRs are dimensionless and the potential significance this bears. I think, as a layman mind you, this should be emphasized in the article.