A Theory of How Columns in the Neocortex Enable Learning the Structure of the World

In terms of HTM, the introduction of a location signal in all parts of neocortex seems to me a radical move, extending the theory quite heftily. What does this imply for cognition in areas higher up in the cortical hierarchy? The suggestion that our mind’s eye perceives concepts as abstract sensory patterns seems intuitive, but what to make of the location signal? Is it the Cartesian ego? Or to be practical about it: what is the purpose, evolutional advantage, of a notion of location with abstract concepts?

@rhyolight @cmaver

Does anyone have a link to the video that’s longer than 2:20? The one posted on Numenta’s twitter site (the pinned paper announcement), cuts off after 2:20… :frowning:

Christy is on vacation. She will probably fix it tomorrow.

1 Like

@rhyolight Thanks bud!

1 Like

You can see the original movie published with the paper here: http://www.biorxiv.org/content/early/2017/07/12/162263.figures-only

1 Like

We have been wondering about this too for almost a year. If our hypothesis is correct, then the evidence that an allocentric location is created in all cortical regions is pretty strong. But what purpose does it serve in high-level thought? Locations are represented by SDRs and they are dimensionless. So we don’t need to think of these dimensions as x, y, and z. I suspect the answer to this question will involve motor transforms. In sensory regions, motor signals change the orientation and state of objects (this is something we are currently trying to understand). In higher regions, the same mechanisms might allow the transformation of ideas in an abstract space. For example, when I work on problems, I feel as if the problem I am working on has structure, almost like a physical object, but not quite. I speculate that when we manipulate ideas they have their own space and “motor” capabilities and rely on these to find a model that fits the data. This probably sounds vague, because it is, but I think we can figure this out.


No problem, I’m glad it was of value. I work on the problem of robot localization, mapping, and navigation, but I haven’t been using Michael’s hippocampal model. He did that work for his PhD quite a while ago. I’m currently looking at the problem through the lens of deep learning, and it looks like there may be some analogies to biology to be found and exploited there.

This is the essence of robot localization, and if the agent is exploring the environment for the first time, then it’s the essence of the SLAM (localization+mapping) problem as well. There is a huge amount of work in this area. It mostly comes down to path integration (from motor odometry, inertial cues, and visual tracking) while placing observed features in an internal representation of the world and refining their location and identity as you move (and tracking them to estimate your own motion). Another crucial component is a location recognition system to perform “loop closure”, where you identify when you’ve returned to a known location and “snap” the locations in your map together, correcting the path integration error that you accumulated along the way.

Here’s a video of a state-of-the-art visual SLAM system doing visual tracking against feature locations, plus several loop closures: https://www.youtube.com/watch?v=8DISRmsO2YQ

I take a different view on this. If you assume that the state estimates in the EC (grid cells, HD cells, conjunctive grid cells) and the hippocampus (place cells, path cells, time cells, etc) are distributed representations, then even though each population of specialized cells may be representing just a single quantity (some aspect of the location of the animal in space), the individual cells will likely be representing different features of that variable, different sensitivity to sensory and dead reckoning context, and will have to converge together on a distributed representation that makes sense. And although you could think of the animal’s 6-degree-of-freedom pose as one variable, it may be more appropriate to think of it as several independent quantities that are estimated jointly, similar to the locations and orientations of multiple sensors at once. So in my view, there may be substantial analogies between the tasks of the neocortex and the EC in this case.

Thanks again for the paper, and very much looking forward to future work (and thanks Marcus for the video).

1 Like

I like the paper. Nevertheless, I have some “naïve” doubts about the allocentric problem. My main doubt is if it is a cortex duty to solve this problem. In some way this is a input issue: you have to transform the sensory information in some form of invariant.

Although you briefly cite the auditory cortex in the paper, there is no “deeper” explanation about what allocentric means in that context. Presumably, the task should be to cancel internal noises, integrate the head position, etc… to transform auditory nerve signal in an invariant. I think, It is suspected that the DCN (Dorsal cochlear nucleus) [part of the cochlear nucleus which is located into the brain-steam] might be doing that [1][2]. DCN is very similar to cerebellum (purkinje + parallel fibers+ fusiform cells). DCN is receiving inputs from the cochlea, the vestibular system, cortex, etc… to actually produce the input to the auditory cortex (note that DCN is not well understood in humans).

My understanding is that Cerebellum is doing something similar (integrate body position, precortex-motor commands, other sensory information,…) Cerebellum affects not only body coordination but many high cognitive functions [3].

Perhaps that entorhinal cortex is too far away from the “input ports” to be a effective solution to solve this. The brain steam nuclei + Cerebellum might be playing a key role in this problem.

[1] D. Oertel and E. D. Young, “What’s a cerebellar circuit doing in the auditory system?,” Trends Neurosci., vol. 27, no. 2, pp. 104–110, 2004.

[2] S. Singla, C. Dempsey, A. G. Enikolopov, R. Warren, and N. B. Sawtell, “A cerebellum-like circuit in the auditory system cancels responses to self-generated sounds,” Nat. Publ. Gr., no. August 2016, 2017.

[3] M. Ito, “Control of mental activities by internal models in the cerebellum,” Nat. Rev. Neurosci., vol. 9, no. 4, pp. 304–313, 2008.

1 Like

I sympathize with the naïve doubts in that I recognize a redundancy in functionality. I wonder therefore if it would be instructive to consider the location signal, at least to the extent it originates from outside the column, as a modulating navigation mechanism. It would make sense that behavior signals should communicate directly with cortical regions if possible and not merely indirectly by way of meddling with the (possibly fantasized) world. In low regions this would generate models of concrete objects perturbed by the motor control of extremities; in higher regions it would manifest itself in for instance the composing of words in a text or the solving of a mathematical equation. How in line is all of this with what the neuroscience says?

I appreciate the point that SDRs are dimensionless and the potential significance this bears. I think, as a layman mind you, this should be emphasized in the article.

1 Like

The reason we concluded that this is happening in each cortical column is that allocentric location has to be calculated for each local patch of sensory cortex. In V1 and S1 there need to be hundreds, probably thousands of separate allocentric locations. To calculate allocentric location you need local sensory input as well as motor input and a fair amount of intercolumn communication. There are no other locations in the brain that have the required amount of neural tissue and have the necessary anatomical connections. The cerebellum would be the best candidate but it just doesn’t have the connections needed (plus you can live an almost normal life without a cerebellum). That doesn’t mean the cerebellum doesn’t do something like this, just that the cortex isn’t relying on it.

Each cortical column has a large amount of neural machinery which is completely unexplained. We are working our way through how layers 5 and 6 (which are actually multiple layers each) perform the allocentric calculation. So far the anatomy and physiology is matching well.

What role allocentric location plays in audition (and higher thought) is an interesting question.I have some ideas on this for a future discussion.


Perhaps is the cerebellum “not critical” because the cortex is flexible enough to perform cerebellum functions if it fails? In any case DCN seem to be a fundamental piece to understand language. For example, the degree of success of Auditory Brain-Steam Implants (ABI) in children is really low [1] (ABI are done when coclhear implants are not possible due to lack/damage un the of auditory nerve or cochlea)

The dendritic tree connectivity of Purkinje cells is really massive. Seems to be some sort of bus (each input bit can be combined really a lot of state signals [such as the cortex prediction, body position, etc…]). Pyramidal cells seems to be more like a point-to-point network. For this kind of problem (all-to-all) is more efficient to use a bus. For example, many super computers (such as the BlueGene) have a point-to-point network for regular communication and a bus alike (a tree) for broadcast communications.

Certainly language (ASR) is really interesting but hard. Just to map different speakers phonemes over the same set of columns is a though problem :slight_smile:

[1] E. P. Wilkinson et al., “Initial Results of a Safety and Feasibility Study of Auditory Brainstem Implantation in Congenitally Deaf Children,” Otol. Neurotol., vol. 38, no. December, pp. 121–220, 2016.

1 Like

I suspect the answer to this question will involve motor transforms. In sensory regions, motor signals change the orientation and state of objects (this is something we are currently trying to understand). In higher regions, the same mechanisms might allow the transformation of ideas in an abstract space. For example, when I work on problems, I feel as if the problem I am working on has structure, almost like a physical object, but not quite. I speculate that when we manipulate ideas they have their own space and “motor” capabilities and rely on these to find a model that fits the data. This probably sounds vague, because it is, but I think we can figure this out.

I actually have an experiment in mind that would exercise this idea. I need to write it up. It blurs the line between motor commands and conceptual understanding.


Hi David @cogmission,

The full video is on the Numenta YouTube channel: https://www.youtube.com/watch?v=fhnMUc36opI

As @lscheinkman mentioned, it’s also available as supplementary material supporting the paper on the bioRxiv entry: http://www.biorxiv.org/content/early/2017/07/12/162263, if you’re interested in downloading it. But if you just want to view it, the YouTube link has the full 4.5 minutes.

Thanks for the note!



I noticed in your recent papers that the hierarchical aspects of the CLA have been left aside. Are there particular reasons for this in your thinking more than just putting a hold on the bigger picture as of now? I get the impression that hierarchy is considered of less importance now than from earlier presentations of HTM theory. From reading On Intelligence, for instance, I would have concluded that the locally predicted features as pertaining to allocentric location would be strengthened and fully dealt with by the higher level context feedback.


Our view of hierarchy has changed, somewhat. We now realize that individual regions are far more powerful than we previously thought. It doesn’t make sense to focus on hierarchy until we completely understand what regions do. For example, there are several different sources of both feedforward and feedback signals in the hierarchy. These originate and terminate in different cellular layers. We need to understand what type of information these layers represent to understand what we gain from a hierarchy.

We are actually getting close to understanding all/most of this, in fact knowing that some layers project up or down the hierarchy helps us determine what those signals are, but we are not ready to focus on hierarchy yet.


BTW @jhawkins have you seen this yet: https://physics.aps.org/synopsis-for/10.1103/PhysRevLett.119.038101?
The arXiv link is https://arxiv.org/abs/1706.06914


About whether or not L6a and L5 form another instance of the input/output layer model:

I think L5 is similar to L2/3a, but it solves a related problem. L5 generates behavior by predictively firing. The way I see it, L2/3a does something similar to predictive firing, because it represents the features it might sense on the object if the sensors were to touch the correct location. So both output layers sort of predictively fire, although it’s not the same as activating predicted cells in temporal memory.

The thick-tufted/pyramidal tract cells in L5 don’t seem to represent entire external objects. Before a saccade, each cell stops responding to the features in the receptive field and starts responding to the features expected to be in the receptive field after the saccade (Shin and Sommer, 2012). Only ~10/50 cells in that study stopped responding to the current receptive field and started responding to the post-saccade receptive field (most did one or the other). Still, it doesn’t make sense to keep responding to the current feature to generate a saccade to another feature.

That doesn’t mean L5 should be an input layer. It’s useful to narrow down possibilities using lateral connections between columns for any sort of prediction. The ability to narrow down possibilities is also useful for action selection. By narrowing down the represented set of features, it can narrow down what it wants to perceive as a result of moving sensors or changing the environment. There is biological evidence that this happens.

Maimon and Assad, 2006 suggest that a reverberant circuit involving the cortex and the basal ganglia causes ramping pre-movement activity, which triggers a burst of activity once it passes a threshold to cause movement. In the period leading up to a saccade, some neurons in superior colliculus, mediodorsal thalamus, and the frontal eye field (at least pyramidal tract cells) increase their firing rates until just before the saccade, reaching firing rates above 100 hz (Sommer and Wurtz, 2004), which is important because bursting above 100 hz might play roles in pyramidal tract cell plasticity, lateral communication, and corticothalamic signaling, and bursting is controlled by complex disinhibitory and winner-takes-all circuits.

In addition to the ramping activity of cells which will trigger the movement, cells which do not contribute to the movement probably gradually decrease their firing rates (see Maimon and Assad, 2006), especially since pyramidal tract cells stop responding to the current visual input as part of presaccadic predictive remapping and the current visual input is not the saccade target. Since the basal ganglia also exhibits ramping activity before a movement (Lee and Assad, 2003), it could selectively gate a feedback loop from FEF to SC to MD to FEF to narrow down options based on reward learning. It could also control timing based on the amount of positive feedback (and thus rate of ramping) it allows.

Another difference L5 has with L2/3a is that it should only learn to represent possibilities that the brain can cause by behavior. Otherwise, it might try to cause predicted external events. Thick-tufted cells in L5 respond well to sensory input without behavior (Oberlaender et al., 2011). However, slender-tufted cells seem to track behavior because they increase their firing rates during behavior on average, and those in barrel cortex are modulated by the phase of the whiskers (Oberlaender et al., 2011). They also receive strong input from L4 (Schubert et al., 2006) and they might be highly sensitive to the combination of self-motion and sensory input, although I’m not sure which cells this study recorded (Turner and DeLong, 2000). It seems that slender-tufted L5 cells copy L4, except they do not respond to sensory input alone, so they might track the sequence of behavior and behavioral results.

Slender-tufted cells project to thick-tufted cells, but not really vice-versa (Naka and Adesnik, 2016). Perhaps slender-tufted cells act as another input layer to thick-tufted cells. Regardless of their projection to thick-tufted cell basal dendrites, they also project to the apical tuft of thick-tufted L5 cells. Oberlaender et al., 2011 propose that this projection allows thick-tufted cells to perform coincidence detection between behavior and sensory input, since thick-tufted cells respond to temporally associated proximal and tuft input by bursting. Since pyramidal tract cells probably burst to generate movement, this is a possible mechanism to ensure that they do not burst to cause an external event.

They still might predictively fire in anticipation of an external event, which might allow them to model objects at the same time as they generate behavior. This could explain why only ~10/50 cells both stopped responding to the current input and started responding to the post-saccade input. Alternatively, since bursting might be required for LTP everywhere on the cell (Kampa et al., 2006; Ramaswamy and Markram, 2015) and cells might continue their burst for a bit after the end of the movement, thick-tufted L5 cells might learn not to predict things which behavior cannot cause.

There are other possibilities for preventing bursting unless an event was caused by behavior, such as input from motor thalamus (or e.g. POm) or inhibition of bursting by martinotti cells during quiet periods. However, those both have possible problems.

I’m not sure how the thick-tufted cells or subcortical structures convert predictions into behavior to cause those predictions. If bursting depends on slender-tufted cell input to the apical tuft, then pyramidal tract cells might learn to only ramp activity and burst if they target the correct subcortical cells to trigger the result.

I’m going to quote some parts of the paper.

Recent experimental studies found that the axons of L6 CT neurons densely ramified within layer 5a in both visual and somatosensory cortices of the mouse, and activation of these neurons generated large excitatory postsynaptic potentials (EPSPs) in pyramidal neurons in layer 5a (Kim et al., 2014) [. . .] There are three types of pyramidal neurons in L5 (Kim et al., 2015). Here we are referring to only one of them, the larger neurons with thick apical trunks that send an axon branch to relay cells in the thalamus (Ramaswamy and Markram, 2015).

In barrel cortex, thick-tufted cells are mostly confined to L5b, whereas slender-tufted cells are preferentially but not exclusively in L5a (Naka and Adesnik, 2016; Groh et al., 2009). Slender-tufted cells do not project to the thalamus, only the striatum (Oberlaender et al., 2011). I think L6 probably targets another input layer, the slender-tufted cells (which receive thalamic input from POm, part of which is primary sensory thalamus for self-movement), equivalent to L6 targeting L4. So maybe both input/output layer pair is equivalent, but they share an input from L6.

Although slender-tufted cells communicate between columns, they only do so weakly (Oberlaender et al., 2011), so the lateral connectivity is probably for predictive depolarization like in temporal memory. Also, even though they receive thalamic input from multiple whiskers via POm, they have narrow receptive fields (Bureau et al., 2006). They also communicate in both directions with L4 (Schubert et al., 2006), so they might serve similar functions.

It’s also possible that L6 really is the input layer to the L5 output layer, but slender-tufted cells are an intermediate step. This extra step could reflect the requirement to only predict results of behavior, whereas L2/3a can predict both externally- and behaviorally-caused input.

However, there is also empirical evidence our model does not map cleanly to L6a and L5. For example, (Constantinople and Bruno, 2013) have shown a sensory stimulus will often cause L5 cells to fire simultaneously or even slightly before L6 cells, which is inconsistent with the model.

As far as I know, somatosensory cortex (or at least barrel cortex) is the only region where thalamus prominently drives firing in layer 5 thick-tufted cells. Those cells only receive input from narrow receptive fields of the whiskers via VPM, yet they have wide receptive fields (Manns et al., 2004). Oberlaender et al., 2011 suggest that their wide receptive fields result from strong lateral connectivity. Even though thalamus can drive their activity before L6, it probably doesn’t drive most of their responses. Perhaps a little thalamic input to the output layer helps represent the possibilities indicated by the initial input, or perhaps it helps prevent directly sensed features from getting ruled out because they aren’t considered consistent with the other features.

Division of labor in frontal eye field neurons during presaccadic remapping of visual receptive fields (Shin and Sommer, 2012)
Parietal Area 5 and the Initiation of Self-Timed Movements versus Simple Reactions (Maimon and Assad, 2006)
The time course of perisaccadic receptive field shifts in the lateral intraparietal area of the monkey (Kusunoki and Goldberg, 2002)
Putaminal Activity for Simple Reactions or Self-Timed Movements (Lee and Assad, 2003)
What the brain stem tells the frontal cortex. I. Oculomotor signals sent from superior colliculus to frontal eye field via mediodorsal thalamus (Sommer and Wurtz, 2004)
Three-dimensional axon morphologies of individual layer 5 neurons indicate cell type-specific intracortical pathways for whisker motion and touch (Oberlaender et al., 2011)
Morphology, electrophysiology and functional input connectivity of pyramidal neurons characterizes a genuine layer va in the primary somatosensory cortex (Schubert et al., 2006)
Corticostriatal Activity in Primary Motor Cortex of the Macaque (Turner and DeLong, 2000)
Inhibitory Circuits in Cortical Layer 5 (Naka and Adesnik, 2016)
Requirement of dendritic calcium spikes for induction of spike-timing dependent synaptic plasticity (Kampa et al., 2006)
Anatomy and physiology of the thick-tufted layer 5 pyramidal neuron (Ramaswamy and Markram, 2015)
Sub- and suprathreshold receptive field properties of pyramidal neurones in layers 5A and 5B of rat somatosensory barrel cortex (Manns et al., 2004)
Cell-Type Specific Properties of Pyramidal Neurons in Neocortex Underlying a Layout that Is Modifiable Depending on the Cortical Area (Groh et al., 2009)
Interdigitated paralemniscal and lemniscal pathways in the mouse barrel cortex (Bureau et al., 2006)


TL;DR: What if locations signals also have the purpose of defining the next move to make to gather input data? In such case, they would be useful for abstract objects to, in order to guide the choice of mental manipulations to apply.

I always considered the brain a machine to act, rather than a machine to perceive. After all, natural selection selected us based on our actions and reactions, not on our perceptions (perceptions only mattered evolutionary in the measure that allowed us to perform better actions). Following this philosophy, for each part of the brain I always ask myself “How does it help us to perform better actions?” (if it didn’t, it would have been erased by evolution).

For the areas which recognize concrete objects, I believe that the location signal does not have only the utility of allowing the recognition of 3D objects as per the mechanism described in the paper this thread is about, but also that of helping the recognition process by suggesting actions that would modify perception in order to get the sensory data needed for discriminated between similar objects. For example, let’s say that I close my eyes and touch with my finger a cup. As per the model described in the paper, my finger will recognize a pair feature/location, and the column relative to my finger will generate predictions about other feature/location pairs possessed by objects which comprise the first pair. Now, my finger will have to touch the object in another location in order to generate another feature/location pair to help discriminate between objects. How does my brain decide how to move the finger? My guess is that it will try to move it in such a way as to reach one of the locations relative to one of the predicted feature/location pairs. In other words, the location signal also served the purpose of helping choose the manipulation needed to proceed in the inference task performed by the column.
Proceeding by analogy, I would suggest that for abstract concepts the location signal serves the purpose of facilitating the choice of the next manipulation to apply in order to complete whatever task the column was performing (I explain better the concept of manipulations here https://medium.com/thought-models/mental-manipulations-3b7797343aa2).

1 Like

Quick question : As all the minicolumns consume the same feedfarward input , is there a preference between any one particular type of dendrite like proximal basal dendrites for these feedfarward inputs to connect to only to these proximal synapses in the neurons of these mini column.

1 Like

The feed forward inputs to cells in a minicoluimn (such as in Layer 4) make large synapses that are close to the cell body (proximal). These synapses are estimated to less that 10% of the total number of synapses.

1 Like