HTM Mini-Columns into Hexagonal Grids!

I believe one fundamental characteristic of Calvin-like grid formation is the part played by the effective length of inhibitory diffusion around currently activated spots.
If you assume each neuron in L2/3 is on the verge of spiking at each (otherwise mediated ?) gamma-tick, then the resulting grid is a tightest packing of marbles on a plane, aka, a hex-grid, where the minimum length depends mostly on inhibition radius. In that sense, “longer” axonal ranges (I have repeatedly come across figures in the range of 3 mm in diameter for lateral axonal arbors of L2/3) does not impact the tightness of the packing per se, only the range at which one cell can directly recruit and attract others.

7 Likes

Another point, after reading Bitking’s introductory post again :

For it to really work, implementing voting+sparsity(SP)+economy, and Calvin-like while explaining the report of a 0.5mm spacing, considering the 1.5mm radius of axonal arbors, I’d go with the hybrid model (surround inhibition + reverberating), with a twist :

  • strong inhibition is becoming dominant only when there’s actually some uncertainty to resolve. This ensures that even when uncertain (voting), or (maybe?) learning something entirely new, the activation pattern will converge, eventually, to some 0.5mm-spaced grid.
  • when the input is straightforward (or straightforward given context), then only the obvious spots come to be activated in the first place. So the overall signal isn’t very loud, and doesn’t incur so much inhibitory effort… and thus, in cruise-mode, the grid forming (spreading) can work by Calvin resonance only, following those synapses formed by the already-known grid which is now replayed.
6 Likes

Is this an intuition you have, or do you see similarities with mathematical or geometrical transformations?

Also, I heard @jhawkins talk more than once about ideas how different phases have roles in learning and operation of neurons clusters, but never in great detail. Does anyone know more about this? If I understand this correctly, it is still on topic.

1 Like

If the output is hex-grids that have different spacing, phasing, and size, as we see in the large scale emergent properties shown by Moser grids, the underlying mechanism must have the same structure.

If you think about this from a purely geometric view - that means that the elements that make up that grid will have to have different strides to make up the different size spans inside the hex-grid, and any mini-column must be able to be a grid hub to support the phase difference.

If you start from that premise, and note that the branching lateral projections have a range of lengths, you can see the overlap in the hardware vs the theory requirements.

Does this make sense to you?

3 Likes

It does, certainly. But I don’t understand how a cluster of projecting neurons can produce a particular and stable shape, and how this can be transformed.

There are probably a number of steps I don’t yet ‘get’. Unfortunately these things take frustratingly long with me. I was hoping to get a clue out of this.

2 Likes

Um, I thought that this was self evident but I guess that I should add that the “other” end of that axonal projection is a dendrite, able to learn this connection. We learn both the feature space and who our neighbors are at the same time.

Since L2/3 fires on pattern match of apical AND proximal connections (unlike temporal cells) both are learned at the same time, using the same mechanism.

I don’t know how to prove this but I think that this separation of grid topology and pattern matching is part of a mechanism of generalizing; each of the trio may be matching some pattern, and they each could have learned this pattern as part of very similar, but not the same, prior sessions. At this time each is offering that it knows this little bit and through the voting nature of the hex-grid - a grid grows to match this input that is “almost” the same as things we have seen before.

Another hex-grid property that seems self evident but perhaps needs to be mentioned: a “hot spot” of recognition could form initially. As it sees repeated exposure the strong recognition in this center could induce cells on the edge of this pattern that are seeing many related but changing presentations (like the hair around a face for example) to learn the related features and respond when any of the related features are in that spot, increasing the size of the strongly responding patch over time.

2 Likes

I’m glad this is being discussed again given how interesting a theory / approach it couldmturn out to be.

One misconception I originally had and am hoping to be clarified (Bitking, gimery,…) was what is an actual Hex Grid in the terms outlined by Calvin and how it should be represented. Now originally through Bitkings excellent posts and interpretation of the book i simply thought of the hex grid activity as being represented as a singular set of triangular arrays (brought about through this lateral excitation) that grow into the much larger hex grid structure and grow further in size, but still through local inhibition squelch out all other neurons / columns giving sparsity.

However i have come to believe that when it came to this hex grid Calvin, only uses it as a more abstract template (think like looking through a hex shaped straw down on activity) where actually multiple triangular arrays exist, each tied to a single feature (colour, shape, edge, texture etc). This makes the task of inhibition a bit more tricking, but it does make Calvins view on what he thought was a possible representation of the Hebb cell assembly more appealing as this could possibly allow for the ‘binding’ of the features as they move up the hierarchy.

Anyway i was just curious on anyones thoughts about what they feel this hex grid is, as i dont think it necessarily has to been seen as this strict Hex Grid structure of activity alone.

1 Like

an actual Hex Grid in the terms outlined by Calvin and how it should be represented. Now originally through Bitkings excellent posts and interpretation of the book I simply thought of the hex grid activity as being represented as a singular set of triangular arrays (brought about through this lateral excitation) that grow into the much larger hex grid structure and grow further in size, but still through local inhibition squelch out all other neurons / columns giving sparsity.

Thanks for the kind words. So far it sounds like we are on the same page.

(think like looking through a hex shaped straw down on activity) where actually multiple triangular arrays exist, each tied to a single feature (colour, shape, edge, texture etc). This makes the task of inhibition a bit more tricking, but it does make Calvins view on what he thought was a possible representation of the Hebb cell assembly more appealing as this could possibly allow for the ‘binding’ of the features as they move up the hierarchy.

I like to think of each grid hub as a nexus of feature recognition. I suppose it’s a matter of viewpoint.

What you may not have considered is that the hex-grid may be the output of a map - but what is the input? Sure - in the very early stages there is a one-to-one correspondence to the sensory field. By jiggering the ratio between the lateral axonal projections and inhibitory interneurons you get a Garbor filter - highly useful in this level of representation.

What happens as you move up to other maps? If it was just a bucket brigade having multiple stages would be mostly a waste of energy. In biology, almost every stage has multiple input fiber bundles that are bringing different aspects of the senses together to be compared & contrasted to find any relationship between these various aspects of the sensation. As Jeff Hawkins has pointed out, at every stage, more fibers tend to come out than go in. Add that to the level skipping and you can see that some of the partially processed signals are being compared to the raw version to see if anything else can be teased out of the stream. (Comparing the raw to the first and second differential is a great edge finding technique - image processing software does this all the time)

As you do move up the stream what is presented to the association regions are a rich mix of everything we could suck out of the stream in micro-features, to be sampled and create a stable hex-grids that stands for that object in the WHAT stream, and spatial information in the WHERE stream.

The objects at that level are more complicated at that level because it is holding multiple objects that are represented over time (multiple saccades and multiple touches, and trains of sounds) but that is more than I want to dive into in this post.

3 Likes

The first few minutes of our latest Hackers’ Hangout I talked about the omnidirectional axonal projections referenced in the hex-grid hypothesis and relating it to how we understand TM to be working based on biology in V1 (segments and tufts). @Bitking was also on chat confirming ideas with me (thanks!).

4 Likes

just watched the hackers hangout:

speaking of the hex grid, can that just be seen as overlapping nodes in a network? that is networks layered on top of each other? what is the benefit of seeing it as a hex grid specifically?

This hex-grid thing is a single mechanism that delivers:

  • Spatial pooling with ~3% sparsity enforcement
  • Temporal pooling via reverberation
  • A natural lateral voting mechanism À la TBT.
  • This lateral voting extends seamlessly over the range of the grid formation.
  • Self-organizing both in learning the grid connections AND adding new features to the recognition.
  • Self-organizing in adding new members to the periphery of the hex-grid formation.

All in one neat biologically plausible package.

7 Likes

You talked a bit about the scale of things in the hackers’ hangout. One confusing thing is that “long-range” connections aren’t always very long, at least in rodents. They’re just axons which travel out of the cortical region to another region or elsewhere. I don’t have a good sense of scale in primates, but in rodents a millimeter is big. Mouse and rat brains are ~15 to 25 mm long. The dendritic arbors are something like .3 mm in diameter (roughly the size of a cortical column), and regions are a few millimeters across.


I think figure 8b in the article above shows the scale of regions well. Ignore the axons, they’re from the thalamus. They grey blob shaped things are the parts of cortical columns in layer 4.


I think this one shows the scale of dendritic and axonal arbors well in its drawings of neurons. Fig. 8-11 probably have more typical sizes of axon arbors. The axons of cells in the other figures are more restricted to a cortical column, ~.2 to .3 mm in the region this article is about.

https://physoc.onlinelibrary.wiley.com/doi/pdf/10.1113/jphysiol.2011.219576?frame=sidebar
Figure 2 shows a thick tufted cell in L5. They have pretty large basal dendritic arbors. In that figure, the dendrites have yellow dots and the axon does not.


Drawings of axons and dendrites in other layers.

In some of these figures you can sort of see the axon branch which descends out of the cortex. Not all cells have that.

In those figures, the axon arbors don’t seem to form any hex grid-like pattern, but maybe that doesn’t matter. I think hex grids would have to be pretty messy, but maybe a little bias towards some sort of preferred hex grid-ish pattern is enough.

Maybe layers 5 and 6 don’t form hex grids and L2/3 is different, in that it has more regularly spaced densities of axon length. I haven’t seen that in L2/3 but I haven’t seen as many examples of those cells.

You’d also have to consider the dendrites and how they’re positioned unless you’re talking about small enough dendritic segments which basically receive input from axons located around a single point. The hex grid might be too messy if you are talking about proximal dendrites since those occupy ~.1 mm diameter (and summate all their inputs together unlike the more spatially confined summation in distal dendrites).

3 Likes

My guess is this reduces cross talk between networks. They’re all hex-like grids rather than overlapping spaghetti.

2 Likes

They wouldn’t. Show such a pattern, in a calvin thing.
Specific hex grids (dynamic activation patterns) would represent something, right ? We all expect each cortical map to be able to (statically, from all connection schemes) encode more than one. Thing. Don’t we ? Far, far more. So the connectivity pattern around one given cell is compatible with all of them, where that cell is involved. More like your red ring in the figure below.

And yes, L2/3 is distinct from other layers in that regard, since most of its local axonal projections are to itself. This is where you’d expect such resonance to kick in.

1 Like

I will add that the cells learn the grid pattern the same way they learn input patterns.
The topology offers input on the distal dendrites on the cell body at the same time as input patterns are arriving on the apical dendrites.
The learning that happens when the cell fires an action potential can happen in all dendrite synapses at the same time.

3 Likes

Ok - biological justification for a proposed learning rule:


We have a stream of spikes coming at us from senses or a lower level.

  • We try to fire but we are suppressed in the grid competition - no firing spikes to learn, so no learning at all.
  • We are the winner of the grid competition so we are not suppressed- our firing is in response to the input triggers spike timing learning as we freely respond to the input
  • Stimulation from successful grid formation increases the firing rate. We now fire even faster in response to the incoming spike train.
  • At some point we are firing at the same rate and in phase with the inputs, perhaps even faster with the grid drive - and if the hex-grid rate exceeds the rate of the input there can be negative learning. A very local form of negative feedback - cool your jets hotshot!

Note that even though we have described L5 input bounced through thalamus relays as axonal projections, lateral axonal projections, and apical projections, we assumed that those apical projections are from lower levels. In fact, we know that L2/3 has reciprocal projections with related maps in the hierarchy. This implies that as we go into grid resonance this will provide a significant spike train for the related area in the next map to become activated, and it will respond by projecting a similar spike train back to our general vicinity. Since we know that map-2-map fiber bundles maintain topology this should work to cement the bond between the hex pattern in this map and whatever related pattern is forming in the next map. This is what I was getting at when I mentioned hierarchy in the main hex-grid post.

Since we are not excited about embracing a full-on spike based system (at least I am not) we will use an activation value to stand for the spike rate. Likewise, the synapse values could be a scaler. 8 bit values should be more than sufficient to capture real neuron behavior. (Actually- 4 bit values should be sufficient!)

So a simplified learning rule that can be used to write code:

Note: I envision this pooling algorithm running at gamma rate so 4 rounds of this competition for every alpha rate cycle.

Tally activation inputs and drive lateral axonal output that activate local inhibitor field. If you are running map-2map connections update these at the same time. Tally resulting cell inputs including inhibition. Repeat 3x.

On the final round …
If suppressed to silence, learn nothing.
If our activation is above some threshold, also do nothing as we clearly don’t need to learn anything else.
Otherwise, strengthen all active inputs.
This will boost anyone that is a winner of local competition, whether part of a grid or not. The outputs from these winners will learn to hook up grid connections later when they get strong enough.

Variation #1: tally inputs and if not suppressed, apply learning based on this formula:
Some Max learning minus activation tally. A slightly trained cell will learn fast, an iffy cell will get a boost, and a well trained cell learns nothing.

4 Likes

I forget that people here may not be familiar with neural wiring. An important detail for both L2/3 and L5 is that both layers have chandelier inhibitory cells reading the axon hillocks and there is a local competition for both the temporal winner in L5 and the competition to be a grid-hub in L2/3. In both cases, from a mini-column point of view, there can be only one active cell in that layer.
If L5 has no winner with a strong input it bursts. In L2/3 it either enters the neighborhood competition or gets suppressed by stronger mini-columns - I don’t see bursting as something that L2/3 does.

1 Like

This is a bit beyond my imagination, how did this hexagon shape suddenly come out? @Bitking

Hexagons are not the goal, but a side effect.
Start with the most efficient packing of circles in a plane; the hexagon is the polygon that best fits this with short line segments. The reach of dendrites and lateral projection axons describe this natural circle around the cell.

See beehive honeycomb for another example.

7 Likes