Should grid cell or displacement cell modules include minicolumns?

Sorry for the delay in jumping into this interesting conversation. I would like to respond to Paul’s original query and not try to address the subsequent thoughtful comments. Part of the confusion might be due to the discrepancy between the current state of the “theory” and the current state of our network “simulations”. You correctly point out that the “location layer” simulation in our recent manuscript doesn’t rely on mini-columns whereas I talked about mini-columns in the podcast I did with Matt. When we do simulations we are almost always implementing a subset of what we think the brain is actually doing. Either we don’t know enough yet to implement a more complete network and/or we pick a subset to help us better understand the results of the simulation. As long as the simulation illustrates an important point and helps us better understand the ultimate solution, then it is worth doing the simulation.

In this case the general principle of L4 and L6 interacting via unions to resolve ambiguity of location is an important idea. The simulation and network doesn’t include mini-columns, orientation, learning of the grid cell modules, etc. Even though we know it is not complete, we hope others find it useful. We did. BTW, a very recent paper from David Tank’s lab suggests yet another way grid cells could represent unique locations, and unions of locations. I managed to squeeze in a last minute reference to Tank’s paper in our “Frameworks” paper that was posted last week.

Now a bit about mini-columns.
The brain needs a way to represent similar inputs differently in different contexts. For example, a melody is composed of a series of intervals. The intervals, and even sequences of intervals repeat and yet the brain doesn’t lose track where it is in a melody. It must have an internal state representing “this interval at this location”. Similarly, the same muscle contractions occur in different behavioral sequences, which are just like melodies. Representing something differently in different contexts is a basic need of brains. Our mini-column hypothesis addresses this functional need in an elegant way and matches numerous experimental observations.

Something like mini-columns are needed in the representation of location. As explained in the frameworks paper, objects have a location space. What occupies a particular location in that space depends on the state of the object. If my finger is at some location in the space of a stapler, what the finger feels depends on the state of the stapler, is it open or closed. Similarly, what icon appears in the corner of my smart phone display depends on the state of the smart phone. Cortical grid cells represent location, therefore we need a method representing the same location in different contexts. Mini-columns are a logical candidate to do this.

As also mentioned in the frameworks paper, the cortex needs to learn sequences of displacement cells, therefore we suspect mini-columns are used here too. (BTW, I now think that L5 displacement cells might be the only place where pure sequence memory exists. Displacement cells are ideal for representing musical intervals, that is pitch invariance, and therefore this might be where melodies and other sequences are learned.)

We are currently trying to unite a whole slew of things that we know macro-columns must be doing. I am working on the idea that mini-columns span across layers providing a mechanism for tying the different layers together. For example, in V1, iso-orientation slabs are created in L4. Mini-columns with these receptive fields intersect L6 grid cells (as in Tank’s paper) forming a unique representation of location based on the context of sensory input.

I hope that helps.


Thanks, Jeff, makes perfect sense. I especially like the idea of minicolumns for context in displacement cells. The main thing I’m trying to work out is if object state is represented in the displacement cells, that must somehow feed back to the sensory layer (since you often can only predict what you will sense if you know the state of the object). There are a number of different approaches that I am exploring (should be interesting to see how far off the mark I am once the research is advances further and even more of the system is understood).

One approach is to have a feedback signal from the displacement cell layer to the grid cell layer, using minicolumns in the grid cell layer to relay state back to the sensory layer. Another approach is to have both the grid cell layer and displacement cell layer providing feedback to the sensory layer independently, so that object state is relayed to the sensory layer from the displacement cell layer directly.


Sounds like you are working on the same types of problems as we are. If we make progress on these issues we will be sure to share and compare notes. In case this helps, here are some ideas I/we am working on…just ideas, nothing set yet.

  • An object is comprised of a set of sub-objects. These are represented by a set of displacement vectors.
  • L2/3 projects to L5 and back again. I like the idea that L2/3 is a stable representation of an object which invokes a union of displacement vectors in L5. So L2/3 is an object and L5 is the actual definition of the object, meaning a set of sub-objects are relative positions to each other.
  • By invoking an object representation in L2/3 you invoke a union of sub-objects L5
  • Similarly, when you observe a novel object, you serially attend to different parts, building a union of displacement vectors. This union will invoke activity in L2/3 of any objects that had similar parts at similar relationships.
  • Your conscious perception is of an object in a particular state. I perceive an open stapler or a closed stapler, not just “stapler”. This suggests that L2/3 is actually a stable representation of the object in a particular state.
  • This brings up a problem. If L2/3 is an object in a particular state then how do I know the open and closed stapler are really the same object? I can think of one possible answer.
  • Perhaps L2 represents the generic object and L3 represents the object in a particular state. L2 pools over the states in L3. We perceive L3 (which makes sense because L3 is passed to higher regions). L2 and L3 can be used for column to column voting.

L2 = base object, best for classifying, pools over possible object states in L3
L3 = object in particular state, what we perceive, invokes the correct union of displacements in L5 for the current state of the object, pools over Lj4
L5 = union of displacements appropriate to the current object state

Behaviors could be learned as a sequence L5 or perhaps in L3.


hello please i am still trying to have clear understanding of Grid cells… what i know is grid cells work like minicolumns in neocortex according to this is there any difference between grid cell and minicolumns? if not then why not using minicolumns instead

We don’t make any statements about how grid cells are related to minicolumns because we don’t really understand yet.

1 Like

I would only add that they are performing two different functions, which I think addresses the intent of your question:

Grid cells are a mechanism for encoding coordinates in physical space, in a way which supports path integration (i.e. a given location has the same encoding, regardless of the path taken to arrive there). For example, if I take 1 step forward and 2 steps back, that is equivalent to taking one step back, which is also equivalent to turning 180 degrees and taking 1 step forward.

Minicolumns are a mechanism for depicting the same value in different contexts. For example, in a sequence like “A B A C A D A B A”, each “A” can activate the same set of minicolumns, but different cells within those minicolumns are used for each the five different contexts in this example.


I can’t help but think for now that the potential use case (intuitively) of grid cells will be similar to a combined SP and TM at least in terms of their capabilities. Location signals I think can be expressed as filtered inputs due to SP receptive fields and path integration may be achieved via a TM’s sequence learning capability. I think the big difference will be the voting part of a grid-cell-based model and that it can make use of multiple models simultaneously, whereas the current SP implementation is passively utilized. I really liked the question as I’ve thought of this while reading the paper.


This seems to describe SP and TM replacing grid cells, not the other way around. The only difference here is that location is the encoded input, and it doesn’t assign any functional relevance to physical minicolumns (cells just need to have similar receptive fields and capable of inhibiting eachother, but don’t need to be stacked vertically).

I would argue that the input source to such a SP +TM system would probably be grid cells. So, ultimately not replacing them, just processing their activity.

On second thought I’m not sure this would learn true path integration, though. The TM algorithm is meant to differentiate the same input (location in this example) in different contexts. So one step forward and two steps back would end up represented by a different set of cells that one step back. Same input but two different contexts. This would serve the function of learning behaviors, but not path integration IMO.


This is getting weird. Moser grid cells are just cells that are active when a particular input is present. They are a “symptom” of the large scale coding of information representation in the cortical map under question.

The question is not about the individual cell but what kind of high level representation results in a pattern that has a node that is activated by spatial positioning. The local details of SP & TP should be considered as a component of this higher level coding.

To be clear: there is no “grid cell.” There is coding that makes certain cells active in response to certain inputs.


I was manly talking from an algorithm perspective. Agree that the “grid cell” location representation had to come about as a result of processing sensory input.


Intuitively GC and SP share the same goal and that is to generate an sdr or loosley a sparse input encoding. Forgetting the concept of location in GC which IMO is acceptable since it needs to be agnostic to type of information anyway, then they even are more similar.

Correct me if I’m wrong but the advantage of grid cell based model is that it can use multiple modules at the same time, so a local (bits) and global (modules) consensus is possible. Whereas the SP can only use a certain region and consensus is local only. I always wish that the SP would have been processed/utilized in a stateful way and these states can vote with each other potentially accomplishing a global consensus. However this may be an irreleveant intuition for biology.

1 Like

My personal belief on this (differs from Numenta) is that the SP algorithm is not happening in the brain. Instead the functions of sparsification, topology, and voting happen via L2/3 forming hex grids (not the same thing as “grid cells”). This function would be a combination of SP and TP (not TM).


I think I remember one of your posts about your idea.

My ideas/observations are best treated as purely algorithmic only. I’m not a biologist/neuroscientist. But I like to study computational machines and determine their equivalence before even implenting them. In my experience algorithms that tries to solve the same problem are more likely to be in the same family of algorithms. I believe that the grid+cell based model is a generalization of an sp.


Maybe I don’t understand what you are saying, but the SP does this when local inhibition is enabled and minicolumn competitions happen in local neighborhoods. This competition is not voting, but the global consensus is reached through local competitions.

1 Like

Just to be sure I understand, you are talking about the physical manifestation of hexagonal patterns across cortex because of dendritic topology axonal lateral projections?


If the hex-grid resonates after firing it would act as temporal pooling in addition spatial pooling / sparsification as described in my hex-grid post. For this to work correctly the model may have to include habituation.

BTW: it’s axonal lateral projections.


Yes, this is based on some of Bitking’s previous posts. The hex-grid post describes the spatial pooling and topology aspects in detail.

Additionally, if we assume the activity in this layer is more temporally stable (as is the “object layer” described by TBT), that implies it is also performing temporal pooling due to the temporal differential between it and the less stable layers it connects with.


Forgive my inaccurate use of terms. The local inhibition you have mentioned here and its result is what I mean about local consensus. The global consensus I’m talking about is a bit different, at least in my intuition and perspective.

When the SP does online learning, it can also be intuited that it is replicating instances of itself (states). These states, by intuition prefer an input, say for example if we consider this preference as a spectrum with values 0 to 1, for a particular input A, a state can be anywhere in this spectrum of preference, 0, 0.1, .50, .99, and etc. The higher the value the better.

Now the big question is at least for me, is that which of these states are the “healthiest”? By healthiest I mean it has significantly learned something. These states can be mutually exclusive/inclusive or mixed of both, so we cannot easily ignore some of these states, also unlike backpropagation where we can easily ignore the previous config of the parameters (state) because it is getting better through adjusting of errors, the SP cannot easily guarantee that State 1000 is healthier than State 99.

Another way of imagining these states is like a vertex of a graph and each of these vertices is a partial solution. If there’s a way to form consensus on these solutions I strongly believe that the SP can improve its capabilities. Today, at least of my understanding, we only use the latest state/vertex to test things (e.g. classify, cluster or encode). I believe that a subset of the previous states are as important as the latest state. These states if utilized I believe are analogous to multiple parallel modules in a grid cell.

You really need to read my hex-grid thing with cooperation in voting on a global state with local states.
I am pretty sure this is what you are asking.

1 Like

I have started reading it. To be honest I easily get lost with imagining neuro/biology structures. But thanks for mentioning, I will certainly try to understand and contemplate on it some time.

1 Like