Should grid cell or displacement cell modules include minicolumns?

Current implementations of the “location layer” work as a collection of grid cell modules where one or more bumps of activity are essentially “pushed around” as the location being represented changes. When the object and location being sensed is unambiguous, this means one active cell per module.

However, this seems to contradict (or at least not fully implement) other observations about the system related to encoding context by use of minicolumns, and I am having some trouble reconciling the ideas in my mind. I thought I would start a thread to get some insights from other HTM theorists.

One example where minicolumns are implied to be part of a location layer, is in part 1 of the podcast (around 26:00). Jeff talked in some length about applying to locations what has been learned from TM (with respect to minicolumns to encode context). A couple of relevant quotes from that discussion:

So, the point is, this idea that you represent or you have something, like – we started by talking about the sensory input, and I want to represent the sensory input in different contexts. I now have a location on an object, but I want to represent that location under different contexts. Because if I’m going to predict what’s going to be at that location, I need to know the context, or the state, of the cell phone or the state of the stapler.

So this basic idea … this is happening everywhere, that you represent something, like a sensory input or like a location … and I want to be able to represent it under many different contexts. And so that’s where the role of minicolumns comes into play.

The idea of minicolumns being part of the location system also seems to be implied in the Frameworks paper as well. A couple of relevant quotes:

As the stapler top rotates upward, the displacement of the stapler top to bottom changes. Thus, the rotation of the stapler top is represented by a sequence of displacement vectors. By learning this sequence, the system will have learned this behavior of the object.

Opening and closing the stapler are different behaviors yet they are composed of the same displacement elements, just in reverse order. These are sometimes referred to as “high-order” sequences. Previously we described a neural mechanism for learning high-order sequences in a layer of neurons (Hawkins & Ahmad, 2016).

This is of course a reference to minicolumns, which are how HTM learns high order sequences.

Note that I am not necessarily implying the existence or functional relevance of cells arranged into physical minicolumns (Jeff has made this point before as well), but rather the underlying function (which is easier to visualize as arrangements of minicolumns).

In my current (perhaps limited) understanding, the “minicolumn effect” should be a basic, inherent property of a neural network where cells have a proximal receptive field close to the cell body, and several are positioned near enough together that they share similar enough receptive fields and can compete to inhibit each-other.

In other words, should the bumps of activity in the GCM actually represent more than one cell sharing a common receptive field, thus allowing a TM-like algorithm to be applied for representing context?


In my Hex-Grids post, I show how any mini-column can be a nexus of a hex-grid pattern, and that the hex grid can be of a range of sizes and orientations.

This is completely compatible with current HTM, assuming that the hex-grid is used as the “macro-column” sparsing operation. (As a different implementation of the spatial pooler) This replaces step 5 as described in the BAMI section below.

Spatial Pooling algorithm steps

  1. Start with an input consisting of a fixed number of bits. These bits might represent sensory data or
    they might come from another region elsewhere in the HTM system.
  2. Initialize the HTM region by assigning a fixed number of columns to the region receiving this input.
    Each column has an associated dendritic segment, serving as the connection to the input space. Each
    dendrite segment has a set of potential synapses representing a (random) subset of the input bits.
    Each potential synapse has a permanence value. These values are randomly initialized around the
    permanence threshold. Based on their permanence values, some of the potential synapses will already
    be connected; the permanences are greater than than the threshold value.
  3. For any given input, determine how many connected synapses on each column are connected to active
    (ON) input bits. These are active synapses.
  4. The number of active synapses is multiplied by a “boosting” factor, which is dynamically determined by
    how often a column is active relative to its neighbors.
  5. A small percentage of columns within the inhibition radius with the highest activations (after boosting) become active, and disable the other columns within the radius. The inhibition radius is itself dynamically determined by the spread of input bits. There is now a sparse set of active columns.
  6. The region now follows the Spatial Pooling (Hebbian-style) learning rule: For each of the active
    columns, we adjust the permanence values of all the potential synapses. The permanence values of
    synapses aligned with active input bits are increased. The permanence values of synapses aligned with
    inactive input bits are decreased. The changes made to permanence values may change some
    synapses from being connected to unconnected, and vice-versa.
  7. For subsequent inputs, we repeat from step 3.

The hex grid pattern activation can be a small patch or a large extent of the map that it resides in.

So yes, mini-column are very much a supporting thing for the upgrade to the HTM theory that now describes the possible content of the “macro-column” activation. Or, hex-grid, to my way of thinking.


I haven’t yet taken the time to understand the algorithms used in the locations paper, so I might be wrong about things.

The focus of that paper seems to be on the power of narrowing down possible locations, so the neuron-level details like minicolumns are left open.

I think the main reason to use minicolumns for representing locations is because they offer massive union capacity of the minicolumn states, if you put those minicolumn states in some sort of context. In the temporal memory, representations of predictions would have low union capacity if those predictions were of minicolumn states. But since each prediction is of minicolumn states in context, there is exponentially more capacity for unions of predicted minicolumn states, except those are in sequence context. So for massive union capacity, put it in context of something.

In the paper, unions of possible locations are represented as lists of sets of bumps. In biology, if you were to take the union of activity bumps (so each grid cell module has multiple bumps), it would be ambiguous since each location is represented as a combo of grid fields. Maybe that’s wrong. I assume the capacity for unions is very low since each attribute (grid field) is very ambiguous and exact combinations are important for grid cell location representations.

You could put the locations in context of the objects with which they are each consistent*. That would massively increase capacity for unions, and it’s probably useful for other reasons since allocentric locations are object-specific. It could also help distinguish locations if you used lateral context, converting a grid field representation into something closer to the combination of those grid fields. Maybe those contexts are apical and basal distal dendrites.

* There might be a reason to put the location in some other context, because using minicolumns just for context or just for unions is bad, so some of the ways we currently divide the problem (objects, sequences, location, feature) or some aspects of those are probably combined into something elegant which also does other stuff.

Locations, sequences, and objects all involve representations of possibilities (for sequences, minicolumn bursting represents multiple possible sequence contexts, and when two cells are predicted in a minicolumn they both activate to represent two possible causes of the sequence item.) I think the idea of representing possibilities and narrowing them down is the most powerful idea HTM has. So I think there might be a more general/elegant thing which combines those things, at least in part of how they are handled.

They all involve narrowing down possibilities, and possibilities for all three sequence/location/object are inter-dependent so using separate cells for all three things requires signalling between them to tell each other when possibilities have been narrowed down by one of them.

Also, features are currently represented as proximal input in context of location, with no unions involved, which misses out on that potential processing power (it’s a very sparse representation so it requires a lot of cells which are going to waste if they aren’t used for unions, and that is very unlikely in biology anyway since all cortical cells even in L4 have lateral connections as far as I know).

So that’s four things which have potential for combining with any of the other three in any aspect of their processing. Evolution would tend to take advantage of minicolumn union or context capacity after evolving either one, and some of those four things don’t have a need for unions or context, so it is likely that some things are combined into the same set of cells for a more general, elegant, and powerful mechanism. Whatever it is, it probably capitalizes on the power of narrowing down unions of SDRs. No clue what it is, I’m just saying we should look for that and more relevant to this topic, that means grid cell modules should include minicolumns.

This would probably be done using minicolumns, as opposed to just making each cell selective for an arbitrary combination of location (proximal input) and object (distal input) because it’s more efficient since there are far more objects than grid fields. That means you could use activity bumps organized two dimensionally, like the spatial pooler, so the activity bumps could be minicolumn states.


Im currently experimenting with grid-minicolumns. My hypothesis is that the same principals L4 & L2/3 uses are also used by L5&L6. L6 in my experiment is a TM on top of grid cells, and L5 (should) detect motions.

L6 seems more movement-related than L5 in terms of processing motor copy signals.

I need to research it more, but it seems pretty extreme, like it’s the motor copy signal receiver layer. It is also involved in sensing moving things but I don’t know much about that and that might not be part of it’s role.

L6 corticothalamic cells are strongly motion direction-selective. Or at least they’re more motion direction selective than the average in V1 when the experiment is limited to a sine wave grating of bars moving perpendicular to the orientation of the bars. That might just be one subtype. L6 CC cells are not very selective for movement direction, but some do receive very strong signals in terms of response from M1 (presumably motor copy signals). In terms of number of synapses, M1 is the largest input to L6 in barrel cortex. L6 CT cells responded to M1 the weakest of several tested cell types in multiple layers.

But in visual cortex, L6 corticocortical cells receive the vast majority of their inputs from within the same region, whereas L6 CT cells receive a lot of input from other regions, including a head rotation signal, which also modulates L6 CC cells probably indirectly. Some types of L6 CT cells are completely unresponsive to sensory stimulation, so they are probably involved in behavior processing.

There is a circuit (a specialization) which suggests that slender tufted layer 5 cells have a big role in processing self-movement. They receive a sensory input gated by behavior basically.


Part of my hypothesis is that if you feed a TM location information, then it should remember sequences of locations. Said another way: L6 should represent the current location in the context of the previous locations. It makes sense to me that L6 uses efferent motor commands to update both the location and the context under which it got there. L6 is where path integration happens in my hypothesis.

Layer 5 recognizes sets of L6 cells which commonly activate near-in-time to each other, and assigns a static/stable set of L5 cells to represent them. L5 cells then respond to movement via L6 grids.


I think that the most interesting aspect of my hypothesis that L6 grid-minicolumns drive L5 stable motion cells is that: if you force/inject an L6 location to activate then those L6 minicolumns will burst, which represents a union of all movements through that location. If this additional L6 activity is combined with the true location info in L6, then Layer 5 should output a motion which passes through both the true current location as well as the forced/injected location. Layer 5 then drives the animal to move towards the forced/injected location. Assuming such a motion exists and has been trained up.


One of the things that JH is looking for in his new take on what is going on with grids is displacement. Location is a key element but he is looking to combine location with displacement for object recognition.

I have to point out that motion is displacement over a period of time.

The alpha rate of 10 hz gives a fixed time window - a motion (change) in some parts of the brain could well be just a displacement in other parts of the brain.

You may be onto something.


So each minicolumn is a location, and instead of going straight from movements to location, it goes from movements to sequence of locations? That’s an interesting approach. Maybe it could help with the orientation problem too since the way it gets to a location is basically the orientation, although maybe only if it codes that path in allocentric terms.

That seems useful for causing it to move to possible locations, to test if something is actually there.

Minicolumns which can burst probably can’t be corticothalamic L6 cells since they activate interneurons a lot and are involved in gain control through the whole cortical depth. They also don’t activate other cells in L6 much so aren’t suited for this.

Since L6 CT cells receive head rotation signals but weren’t activated much by motor cortex in that study, they are probably for dealing with orientation. Maybe you could link that to attention/surprise because movement direction (like the direction you’re moving when you contact something, so basically orientation) is predictive of things. Maybe through inhibitory cells, they do some sort of trimming a bunch of things to leave it with the right orientation thingamajig of some sort. Also, V1 L6 CT cells have axon arbors in the thalamus which are retinotopically either parallel or perpendicular to their preferred bar orientation, and one L6 CT type in barrel cortex seems to project to the thalamic representation of rows of whiskers (horizontal to the ground) while the other seems to project to that for arcs of whiskers (vertical), so there’s a lot of line-y axon arbors going on, so they’re probably concerned with orientation, as well as behavior.

That pretty much leaves L6 corticocortical cells for grid minicolumns, which makes sense because location info should be shared with other parts of the cortex. They also have axons into other cortical columns, more or less unlike corticothalamic cells in layer 6. Those could be used for voting on the location like in the object output layer or temporal memory. Even though location isn’t sensory, it still needs to deal with multiple locations because it can touch multiple features at once (if it operates in terms of feature locations) and has multiple sensory patches (if it operates in terms of the location of the sensory patch). Maybe it’s temporal memory in the same column and voting between columns. L6 corticocortical cells are still closely tied to the sensory input since they have stronger sensory responses than L6 CT cells, but that makes sense for anchoring based on sensory input.

L6 corticocortical cells have elevated initial firing rates. They’re a lot like thalamic relay cells in that way because, I’m guessing, if they stay depolarized they won’t be able to fire rapidly again, so they need a break from depolarization (even subthreshold) to fire like that again. Corticothalamic cells have an elevated initial firing rate but to a lesser extent than CC cells and their first spike produces a very weak response in other cells.

That enhanced initial response could be useful for grid cell minicolumns. When the sensor reaches a location, the L6 corticocortical cells could respond, and then they could stop responding as long as it stays in that location because if it’s still at the same location, it isn’t reaching that location. That would make them tied to behavior because movement causes it to reach a location.

Another possible use of this characteristic is, if these corticocortical cells are depolarized ahead of time (like predictively), they won’t fire as much when the proximal input activates them. That could just silence them, or be for sequence context if only the most strongly predictively depolarized cells fire. It could be a mechanism for sequence memory column bursting/not bursting when the column was predicted. That could be useful for these minicolumns for the brain’s implementation of minicolumn bursting.

How does that initially elevated firing link to the thalamus, where something similar happens? A popular theory is that the thalamus sends a stronger signal when something is surprising. If something is surprising, it is at a new location on an object because it’s a completely new object. The same idea of surprise also applies to the L6 corticocortical cells.

Sorry about the rambling.

1 Like

There’s some evidence that L6 corticothalamic cells are closely tied to oscillations around that frequency. They project to the thalamus, and the TRN (which can almost certainly produce spindle oscillations independently). When you activate a bunch of them with optogenetics, after you turn the laser off, it produces a slow oscillation.

L6 CT outputs facilitate, which could be a mechanism for basically taking the antiderivative of motion to get displacement since the signal increases the longer it moves, and the further it goes.

So that’s two ways for L6 CT cells to be involved in determining displacement. They also seem concerned with movement of orientation (they receive a strong head rotation signal). They could path integration from rotation to angle the same way. So they have a direction and distance to work with to get displacement.

Or at least the cells downstream get that information.

...thinking out load, not too relevant...

They send some signals to excitatory cells in the same layer, but mostly just in other layers. Their signals to interneurons at least in the same layer depress except for some rarer subtypes (probably martinotti cells for example), so that’s not really suited for this. Maybe this downstream path integrated displacement is sent to layer 5 slender tufted cells (L5a), but L6 CT cells don’t seem to send many signals to L5 in primates (I still need to research that more though). So the only targets left are thalamic relay cells and L4 (anterior thalamic nuclei, which I think are for the head direction signals in the hippocampus/EC/etc. system, lack connections with the TRN). L4 seems more likely.

So I guess this would have to be the output of the location system, sent to L4 as location context for the sensory inputs. Or maybe studies just don’t detect connections from L6 CT cells to other L6 cells because they are initially weak.

1 Like

Sorry for the delay in jumping into this interesting conversation. I would like to respond to Paul’s original query and not try to address the subsequent thoughtful comments. Part of the confusion might be due to the discrepancy between the current state of the “theory” and the current state of our network “simulations”. You correctly point out that the “location layer” simulation in our recent manuscript doesn’t rely on mini-columns whereas I talked about mini-columns in the podcast I did with Matt. When we do simulations we are almost always implementing a subset of what we think the brain is actually doing. Either we don’t know enough yet to implement a more complete network and/or we pick a subset to help us better understand the results of the simulation. As long as the simulation illustrates an important point and helps us better understand the ultimate solution, then it is worth doing the simulation.

In this case the general principle of L4 and L6 interacting via unions to resolve ambiguity of location is an important idea. The simulation and network doesn’t include mini-columns, orientation, learning of the grid cell modules, etc. Even though we know it is not complete, we hope others find it useful. We did. BTW, a very recent paper from David Tank’s lab suggests yet another way grid cells could represent unique locations, and unions of locations. I managed to squeeze in a last minute reference to Tank’s paper in our “Frameworks” paper that was posted last week.

Now a bit about mini-columns.
The brain needs a way to represent similar inputs differently in different contexts. For example, a melody is composed of a series of intervals. The intervals, and even sequences of intervals repeat and yet the brain doesn’t lose track where it is in a melody. It must have an internal state representing “this interval at this location”. Similarly, the same muscle contractions occur in different behavioral sequences, which are just like melodies. Representing something differently in different contexts is a basic need of brains. Our mini-column hypothesis addresses this functional need in an elegant way and matches numerous experimental observations.

Something like mini-columns are needed in the representation of location. As explained in the frameworks paper, objects have a location space. What occupies a particular location in that space depends on the state of the object. If my finger is at some location in the space of a stapler, what the finger feels depends on the state of the stapler, is it open or closed. Similarly, what icon appears in the corner of my smart phone display depends on the state of the smart phone. Cortical grid cells represent location, therefore we need a method representing the same location in different contexts. Mini-columns are a logical candidate to do this.

As also mentioned in the frameworks paper, the cortex needs to learn sequences of displacement cells, therefore we suspect mini-columns are used here too. (BTW, I now think that L5 displacement cells might be the only place where pure sequence memory exists. Displacement cells are ideal for representing musical intervals, that is pitch invariance, and therefore this might be where melodies and other sequences are learned.)

We are currently trying to unite a whole slew of things that we know macro-columns must be doing. I am working on the idea that mini-columns span across layers providing a mechanism for tying the different layers together. For example, in V1, iso-orientation slabs are created in L4. Mini-columns with these receptive fields intersect L6 grid cells (as in Tank’s paper) forming a unique representation of location based on the context of sensory input.

I hope that helps.


Thanks, Jeff, makes perfect sense. I especially like the idea of minicolumns for context in displacement cells. The main thing I’m trying to work out is if object state is represented in the displacement cells, that must somehow feed back to the sensory layer (since you often can only predict what you will sense if you know the state of the object). There are a number of different approaches that I am exploring (should be interesting to see how far off the mark I am once the research is advances further and even more of the system is understood).

One approach is to have a feedback signal from the displacement cell layer to the grid cell layer, using minicolumns in the grid cell layer to relay state back to the sensory layer. Another approach is to have both the grid cell layer and displacement cell layer providing feedback to the sensory layer independently, so that object state is relayed to the sensory layer from the displacement cell layer directly.


Sounds like you are working on the same types of problems as we are. If we make progress on these issues we will be sure to share and compare notes. In case this helps, here are some ideas I/we am working on…just ideas, nothing set yet.

  • An object is comprised of a set of sub-objects. These are represented by a set of displacement vectors.
  • L2/3 projects to L5 and back again. I like the idea that L2/3 is a stable representation of an object which invokes a union of displacement vectors in L5. So L2/3 is an object and L5 is the actual definition of the object, meaning a set of sub-objects are relative positions to each other.
  • By invoking an object representation in L2/3 you invoke a union of sub-objects L5
  • Similarly, when you observe a novel object, you serially attend to different parts, building a union of displacement vectors. This union will invoke activity in L2/3 of any objects that had similar parts at similar relationships.
  • Your conscious perception is of an object in a particular state. I perceive an open stapler or a closed stapler, not just “stapler”. This suggests that L2/3 is actually a stable representation of the object in a particular state.
  • This brings up a problem. If L2/3 is an object in a particular state then how do I know the open and closed stapler are really the same object? I can think of one possible answer.
  • Perhaps L2 represents the generic object and L3 represents the object in a particular state. L2 pools over the states in L3. We perceive L3 (which makes sense because L3 is passed to higher regions). L2 and L3 can be used for column to column voting.

L2 = base object, best for classifying, pools over possible object states in L3
L3 = object in particular state, what we perceive, invokes the correct union of displacements in L5 for the current state of the object, pools over Lj4
L5 = union of displacements appropriate to the current object state

Behaviors could be learned as a sequence L5 or perhaps in L3.


hello please i am still trying to have clear understanding of Grid cells… what i know is grid cells work like minicolumns in neocortex according to this is there any difference between grid cell and minicolumns? if not then why not using minicolumns instead

We don’t make any statements about how grid cells are related to minicolumns because we don’t really understand yet.

1 Like

I would only add that they are performing two different functions, which I think addresses the intent of your question:

Grid cells are a mechanism for encoding coordinates in physical space, in a way which supports path integration (i.e. a given location has the same encoding, regardless of the path taken to arrive there). For example, if I take 1 step forward and 2 steps back, that is equivalent to taking one step back, which is also equivalent to turning 180 degrees and taking 1 step forward.

Minicolumns are a mechanism for depicting the same value in different contexts. For example, in a sequence like “A B A C A D A B A”, each “A” can activate the same set of minicolumns, but different cells within those minicolumns are used for each the five different contexts in this example.


I can’t help but think for now that the potential use case (intuitively) of grid cells will be similar to a combined SP and TM at least in terms of their capabilities. Location signals I think can be expressed as filtered inputs due to SP receptive fields and path integration may be achieved via a TM’s sequence learning capability. I think the big difference will be the voting part of a grid-cell-based model and that it can make use of multiple models simultaneously, whereas the current SP implementation is passively utilized. I really liked the question as I’ve thought of this while reading the paper.


This seems to describe SP and TM replacing grid cells, not the other way around. The only difference here is that location is the encoded input, and it doesn’t assign any functional relevance to physical minicolumns (cells just need to have similar receptive fields and capable of inhibiting eachother, but don’t need to be stacked vertically).

I would argue that the input source to such a SP +TM system would probably be grid cells. So, ultimately not replacing them, just processing their activity.

On second thought I’m not sure this would learn true path integration, though. The TM algorithm is meant to differentiate the same input (location in this example) in different contexts. So one step forward and two steps back would end up represented by a different set of cells that one step back. Same input but two different contexts. This would serve the function of learning behaviors, but not path integration IMO.


This is getting weird. Moser grid cells are just cells that are active when a particular input is present. They are a “symptom” of the large scale coding of information representation in the cortical map under question.

The question is not about the individual cell but what kind of high level representation results in a pattern that has a node that is activated by spatial positioning. The local details of SP & TP should be considered as a component of this higher level coding.

To be clear: there is no “grid cell.” There is coding that makes certain cells active in response to certain inputs.


I was manly talking from an algorithm perspective. Agree that the “grid cell” location representation had to come about as a result of processing sensory input.