Locations in the Neocortex: A Theory of Sensorimotor Object Recognition Using Cortical Grid Cells

Locations in the Neocortex: A Theory of Sensorimotor Object Recognition Using Cortical Grid Cells

Marcus Lewis, Scott Purdy, Subutai Ahmad, Jeff Hawkins

The neocortex is capable of modeling complex objects through sensorimotor interaction but the neural mechanisms are poorly understood. Grid cells in the entorhinal cortex represent the location of an animal in its environment, and this location is updated through movement and path integration. In this paper, we propose that grid-like cells in the neocortex represent the location of sensors on an object. We describe a two-layer neural network model that uses cortical grid cells and path integration to robustly learn and recognize objects through movement. Grid cells exhibit regular tiling over environments and are organized into modules, each with its own scale and orientation. A single module encodes position within the spatial scale of the module but is ambiguous over larger spaces. A set of modules can uniquely encode many large spaces. In our model, a layer of cells consisting of several grid-like modules represents a location in the reference frame of a specific object. Another layer of cells which processes sensory input receives this location input as context and uses it to encode the sensory input in the object’s reference frame. Sensory input causes the network to invoke previously learned locations that are consistent with the input, and motor input causes the network to update those locations. Simulations show that the model can learn hundreds of objects even when object features alone are insufficient for disambiguation. We discuss the relationship of the model to cortical circuitry and suggest that the reciprocal connections between layers 4 and 6 fit the requirements of the model. We propose that the subgranular layers of cortical columns employ grid cell like mechanisms to represent object specific locations that are updated through movement.


What a great surprise this morning!



Thanks for this Rhyolight et al.

Great paper. I really enjoyed going through it. The code will be next. :slight_smile:

If you are interested, I think that I found one minor error so far: On page 6, column 2, paragraph 2, line 2, “Eq. (7)” should read “Eq. (6)”


I bumped this topic because we are discussing grid cell modules at the next HLC call and I am going to try and present a little about this paper.

A question I have:

At each point in time a neuron is either active or inactive. In the sensory layer, each neuron’s output is a function of its two inputs

Is this referring to the two inputs of sensory input and location input? Or is this referring to the basal dendrite segment and the proximal dendrite segment?

Has the way in which the system deals with object orientation been published?

1 Like

Its Distal and proximal dentritic segments
A Single neuron can Not Integrate sensor and location, the whole macro column does

Your second question: i guess not

What exactly do you mean?
How a macro column figures out wether something is on its head or not?

1 Like

@jensmorawe have you read the paper?

Multiple dendritic segments enable each neuron to robustly recognize independent sparse patterns, and thus be associated with multiple location or sensory contexts

Don’t make it more complicated than it is
“neuron’s output is a function of its two inputs”
What are the two inputs to an neuron?
In htm its the proximal and distal dentrites.
End of story, there is no more to it.
Of course we know how the whole HTM algorithm work, so of course a single neuron needs to activate whith a couple of diferent contexts and they get distinguished by different distal segments.
But a single neuron newer represents a location on its own. The sparse activation does, of which the neuron is a part.

Read the paper…

IIRC It is referring to both.
The sensory inputs connect to the proximal dendrites of the input layer neurons, and are modeled as a spatial-pooler.
The location inputs connect to the distal dendrites of the input layer neurons, and are modeled as a temporal-memory.


No, but there is much speculation…

My hypothesis is that orientation is a special type location information, and that the algorithms dealing with object location should also work with object orientation.

1 Like

It feels like there should be a normalization of the orientation. If the orientation becomes another dimension, like the location, then the number of sequences to learn expands enormously.

Consider a sense like touch and a thin stick as an object. Is there any need for orientation to detect the features? The finger is either running along the length or at the end. If there are basic features like “edge” and “tip” then it is not the object orientation that allows for the detection of the features but the features that allow for the object orientation detection. That orientation is relative to the finger, which would in turn be relative to the arm etc to get a position relative to the body.

This leads to a strange idea - what if the neocortex allows for dynamic grids? So a few features allow for orienting a grid then the grid can normalize the sensory input.

1 Like

A clarification: these are not sequences of sensory features, but rather sensory features @ locations.

A counterpoint: yes if you consider orientation then the number of things to learn expands enormously, but is that a problem? In theory, these models have a large capacity to learn many things. The capacity of an SDR to represent things is exponential with respect to its size. It seems to me that if a model can cope with locations, then locations & orientations is only marginally more difficult.

1 Like

On the clarification, the system is working on sequences of features@location: “Objects are disambiguated over time through successive sensations and movements”

Is capacity an issue? It seems to be one of the main concerns in the paper and the model does not scale as well as one might hope: “This model’s performance depends on its ability to unambiguously represent multiple locations simultaneously. A grid cell code’s union capacity does not scale exponentially with number of modules”

I would be suspicious of there being huge amounts of redundant information unless that is serving to provide robustness. Which is why some sort of normalization feels more reasonable - but I have no idea if there is neuroscience supporting that.