HTM Mini-Columns into Hexagonal Grids!

I have been thinking about Calvin tiles for a very long time; how they might work and the neural substrates that they depend on. I was reviewing some old notes on the Braddick & Sleigh book “Physical and Biological Processing of Images” this evening and found this note.

Title page of the notes with a date in the margin:

And this comment in my notes:

Sorry about the sloppy handwriting - this was me thinking aloud as I read the text.

While those are possible connections they are not the correct distance away.
The lateral connections are usually a fixed distance apart.
You should like them - they are natural triangles.

More details on the geometry here:

1 Like

I’m quite late for the party, but lateral connections are mGluR, right? (efference copies of L6 descending axons)[1]. Those “nasty” synapses are really “interesting”: integrate (sum) over time, are really persistent and only have modulatory effects on proximal dendrites. My understanding is that the drivers (iGluR) are 10x more powerful.

[1] C. C. Lee and S. M. Sherman, “Modulator property of the intrinsic cortical projection from layer 6 to layer 4,” Front. Syst. Neurosci. , vol. 3, no. February, pp. 1–5, 2009.

How about the layer 2/3?
These are the ones that most concern me for grid forming behavior.

Also, do you know how these projection axons interact with the surrounding inhibitory interneurons?
This question is addressed to both layers 2/3 and 6.

1 Like

Those synapses are in L4 PC proximal dendrites. I.e. modulates cortical column “input”. I guess are who incline the balance to who are the inhibition winners. The axons are really dense. According to this review [1] are a big contributor to the column input (at least for V1). Since the shape of the L6 axons is hexagon-to-hexagon, I thought that where your “lateral” inputs. Sorry about that.

[1] T. Binzegger, “A Quantitative Map of the Circuit of Cat Primary Visual Cortex,” J. Neurosci. , vol. 24, no. 39, pp. 8441–8453, 2004.

1 Like

@Bitking Because of conversation elsewhere, I reinvestigated these ideas again, and I again brought it up with Jeff. My first questions were on the L2/3 axonal projections:

These axons have one major bifurcation, where one path goes out of the cortex, so we are only talking about the path that stays in the cortex. The axons do not split up into lots of paths inside the cortex as shown on the right below.

The picture on the left looks right (except for missing the path out of cortex). But I can’t find evidence for the picture on the right. The axon doesn’t split up into a cloud to create this circle of influence. This cortical-cortical axon will create a cluster of synapses about 0.5mm away from the soma, then continue onward in one direction to do it again later. It’s more like a subway line than a hub.


Why are we restricting the path to local operations?
The L2/3 distant reciprocal connections are running in parallel with the local competition. This is part of the H of HTM.

BTW: where did you get the idea that there is a single bifurcation?
I have my research papers to draw on but even a casual google search (pyramidal axonal projections topology; show as image) shows that this is not so:

The picture on the left looks right (except for missing the path out of cortex). But I can’t find evidence for the picture on the right. The axon doesn’t split up into a cloud to create this circle of influence. This cortical-cortical axon will create a cluster of synapses about 0.5mm away from the soma, then continue onward in one direction to do it again later. It’s more like a subway line than a hub.

The original Calvin book was my starting place. In the text, he acknowledges that he was interpreting the work of a neurobiologist and that is not his direct specialty. I have invested the time to work through a more detailed version that is more biologically plausible. That said - he offered enough of a starting point that I could see what he was trying to convey.

I have invested a considerable amount of text on how the fuzzy cloud of axonal projections around a given cell establishes enough variation to support learning a range of scaling/rotation/translation possibilities; this is a feature and not a bug.

As far as linear vs triangles; for the self-reinforcement/ inhibition mechanism, the shortest path back to the cell is a triangle. Those three cells and the inhibitory basket cells between form a self-reinforcing group. At each node of this triangle there are many cells that theoretically could respond - what is special about these three is that they are all seeing some part of a pattern that was learned at some point in the past and are the best possible match at this time; they vote to prove this. Even if each and every one did not actually see “this” pattern they are generalizing that this is the best match right now. The joy of a distributed pattern is that this little triangle unit is voting at the same time as tens or hundreds of other units at the same time and bad matches are getting shouted down if the general population thinks that a small subsection is wrong. This is actually very similar to the TBT model.

Last note - keep in mind that the basic HTM mechanism assumes a unit signal per unit time. The hex-grid voting mechanism is actually implemented with a phase/rate mechanism. We already acknowledge that phase is critical to the temporal voting mechanism. Phase + rate is important to the hex-grid formation.


I’m not making any comment on the boundaries of cortical columns here.

Yes, they are all running in parallel, but this is not the hierarchy. How are lateral connections between L2/3 neurons part of the hierarchy?

I got it from Jeff, but now I am questioning it in light of Levitt el al 1993.

He’s OOTO the rest of the week, so this may need to wait until Monday.


Here is a good example showing 3 clusters of horizontal long-distance axonal connections of a L2/3 pyramidal cell:


I note that there are 3 clusters. Coincidence or consequence of Bitking hex-grid theory?

One further question to Bitking: you talk about 0.5mm long-distance connections, but I see more often a distance of several mm in the literature. Is it because the articles focus more on V1 which has different characteristics ? (Gabor filter vs hex-grid ?)


Papers like the ones referenced here?


My perspective on this is that long distance lateral connections (besides being a voting mechanism) result in representations which associate a wider collection of features that just the input contributed by a single column in isolation. Thus, this representation is more abstract, and is the bases of the hierarchy.

I believe there are also other mechanisms (such as TP) happening in this layer, whose purpose is also to assist in forming abstractions (this is the object layer in TBT afterall). But even this mechanism alone should intuitively be forming abstractions.

Considering this increase in level of abstraction, I believe the transition between input and output layers within the same region is actually the logical boundary between hierarchical levels (not the transition from the output layer of one region to the input layer of another region).


Maybe I am according to much importance to a detail, but I am still struggling with the length of long-distance lateral connections of L2/3 pyramidal cells:

  • What “standard” length ?
  • Is there a “standard” length ?

For the first question, I haven’t found the 0.5mm distance in the papers mentioned in the other thread. I think that this number came directly from the book of Calvin where he gives more details but without mentioning clearly the source:

“That 0.5mm mentioned earlier is really as small as 0.4mm (in primary visual cortex of monkeys) or as large as 0.85mm (in sensorimotor cortex)” - The Cerebral Code, Calvin

But the paper you cited in the first post of this thread talk about several millimeters (I have seen this in other papers as well):

“As is the case for other regions of the macaque monkey neocortex,
pyramidal neurons in the supragranular layers of the
dorsolateral prefrontal cortex (PFC) furnish intrinsic axon
collaterals that travel for substantial distances, up to several
millimeters, tangential to the pial surface (Levitt et al., 1993).” -

I think that it matters because that would mean that the long distance lateral connection between two minicolumns would be way bigger than the receptive field size. In other words, there would be considerable missed space: not all inputs could participate in a given grid pattern. Right ?

About the existence of a “standard” length, Calvin did not give his source (or is it hidden in the long exhaustive list at the end?). Have you seen a graph of the distribution of these long-distance lateral connections?

Maybe the grid-forming patterns are stable enough to cope with lateral connections of different distance. I would like to build some simulations to see how it behaves.


I don’t know the definitive resource or answer to your questions.
I have been through all of the references in the Calvin book and he does defend the lengths he references. These are lengths associated with L2/3.

The references in the “VI post” gives a pretty good breakdown and the lengths do have a statistically mean that clusters around this standard length. I see the range of input lengths are useful to allow learning to form grids of a wide variety of spacing, orientation, and phasing.

I have also seen the references that throw in much longer lengths; these seem to go with the lower levels and are not part of the hex-grid forming mechanism. I have often wondered what they do but I don’t have any real theoretical foundation to explain what is going on there. I am open to ideas - no matter how wild.

BTW: my current wild idea (this very little foundation to support it) is that this forms larger grids to discipline the smaller grids to handle communications between scale of representation; sort of a lateral hierarchy. I don’t have a shred of evidence to support this but in long reflections when I am out walking at night this is the only thing I could come up with.


I believe one fundamental characteristic of Calvin-like grid formation is the part played by the effective length of inhibitory diffusion around currently activated spots.
If you assume each neuron in L2/3 is on the verge of spiking at each (otherwise mediated ?) gamma-tick, then the resulting grid is a tightest packing of marbles on a plane, aka, a hex-grid, where the minimum length depends mostly on inhibition radius. In that sense, “longer” axonal ranges (I have repeatedly come across figures in the range of 3 mm in diameter for lateral axonal arbors of L2/3) does not impact the tightness of the packing per se, only the range at which one cell can directly recruit and attract others.


Another point, after reading Bitking’s introductory post again :

For it to really work, implementing voting+sparsity(SP)+economy, and Calvin-like while explaining the report of a 0.5mm spacing, considering the 1.5mm radius of axonal arbors, I’d go with the hybrid model (surround inhibition + reverberating), with a twist :

  • strong inhibition is becoming dominant only when there’s actually some uncertainty to resolve. This ensures that even when uncertain (voting), or (maybe?) learning something entirely new, the activation pattern will converge, eventually, to some 0.5mm-spaced grid.
  • when the input is straightforward (or straightforward given context), then only the obvious spots come to be activated in the first place. So the overall signal isn’t very loud, and doesn’t incur so much inhibitory effort… and thus, in cruise-mode, the grid forming (spreading) can work by Calvin resonance only, following those synapses formed by the already-known grid which is now replayed.

Is this an intuition you have, or do you see similarities with mathematical or geometrical transformations?

Also, I heard @jhawkins talk more than once about ideas how different phases have roles in learning and operation of neurons clusters, but never in great detail. Does anyone know more about this? If I understand this correctly, it is still on topic.

1 Like

If the output is hex-grids that have different spacing, phasing, and size, as we see in the large scale emergent properties shown by Moser grids, the underlying mechanism must have the same structure.

If you think about this from a purely geometric view - that means that the elements that make up that grid will have to have different strides to make up the different size spans inside the hex-grid, and any mini-column must be able to be a grid hub to support the phase difference.

If you start from that premise, and note that the branching lateral projections have a range of lengths, you can see the overlap in the hardware vs the theory requirements.

Does this make sense to you?


It does, certainly. But I don’t understand how a cluster of projecting neurons can produce a particular and stable shape, and how this can be transformed.

There are probably a number of steps I don’t yet ‘get’. Unfortunately these things take frustratingly long with me. I was hoping to get a clue out of this.


Um, I thought that this was self evident but I guess that I should add that the “other” end of that axonal projection is a dendrite, able to learn this connection. We learn both the feature space and who our neighbors are at the same time.

Since L2/3 fires on pattern match of apical AND proximal connections (unlike temporal cells) both are learned at the same time, using the same mechanism.

I don’t know how to prove this but I think that this separation of grid topology and pattern matching is part of a mechanism of generalizing; each of the trio may be matching some pattern, and they each could have learned this pattern as part of very similar, but not the same, prior sessions. At this time each is offering that it knows this little bit and through the voting nature of the hex-grid - a grid grows to match this input that is “almost” the same as things we have seen before.

Another hex-grid property that seems self evident but perhaps needs to be mentioned: a “hot spot” of recognition could form initially. As it sees repeated exposure the strong recognition in this center could induce cells on the edge of this pattern that are seeing many related but changing presentations (like the hair around a face for example) to learn the related features and respond when any of the related features are in that spot, increasing the size of the strongly responding patch over time.


I’m glad this is being discussed again given how interesting a theory / approach it couldmturn out to be.

One misconception I originally had and am hoping to be clarified (Bitking, gimery,…) was what is an actual Hex Grid in the terms outlined by Calvin and how it should be represented. Now originally through Bitkings excellent posts and interpretation of the book i simply thought of the hex grid activity as being represented as a singular set of triangular arrays (brought about through this lateral excitation) that grow into the much larger hex grid structure and grow further in size, but still through local inhibition squelch out all other neurons / columns giving sparsity.

However i have come to believe that when it came to this hex grid Calvin, only uses it as a more abstract template (think like looking through a hex shaped straw down on activity) where actually multiple triangular arrays exist, each tied to a single feature (colour, shape, edge, texture etc). This makes the task of inhibition a bit more tricking, but it does make Calvins view on what he thought was a possible representation of the Hebb cell assembly more appealing as this could possibly allow for the ‘binding’ of the features as they move up the hierarchy.

Anyway i was just curious on anyones thoughts about what they feel this hex grid is, as i dont think it necessarily has to been seen as this strict Hex Grid structure of activity alone.

1 Like