I have been having long free-wheeling chats with @Paul_Lamb and @gmirey We have been working though basic neurology, griddy things, and implementation details, and possible applications. I have walked through a very long chat stream and pulled out some of the more interesting exchanges and comments. Anyone trying to understand the hex-grid concept and how it fits in with the cortex may find something of interest in this wall of text. Or not.
Background picking out some key features of the original hex-grid post athat started this thread:
The lateral axonal projections are originating from the same layer area, rise at about the same angle, and target about the same layer area. This defines a circular band around the cells in a mini-column. There is natural variety in all three of these parameters and this is a feature- not a bug - as this allows different scale/angle patterns to be learned. As to the phasing variation - Numenta spends its time with primary sensory areas. These are anchored to the incoming sensory fibers and I agree the the positions are fixed. There is absolutely nothing in the structure that restricts a hex-grid node from being centered on any mini-column.
For a given pair of active nodes that will win the sparsity competition- as the axonal projections rise through the inhibitory layer they will both be signaling lesser excited cells to go to sleep. The cells that are recognizing parts of a larger pattern will have the recognition of whatever pattern they are seeing AND the excitement from other cells that are also seeing whatever part they are sensing. Contrast this with cells that are just seeing some bit of pattern they have learned. Without the lateral connections their level of excitement will be much less insuring that they will lose the sparsity competition.
I am less sure of how L5 enters into this but I speculate that this output lends even more excitement to the sparsity competition, insuring that the pattern association paired with the prediction gets an extra push to form the grid that best matches the prediction.
This was the transition feature that drew me to the work of Numenta in the first place. I could see how a pattern would form a Calvin tile but what would preserve that tile when the pattern completely changes like in a saccades? The prediction cell is the missing element.
Summary: a hex-grid node activation formula =
(THRESHOLD 2(
- (each reciprocal connections)
PLUS - (a local recognized pattern)
PLUS - (a recognized prediction in TM-L5))
)
I surmise that a “hole” in the grid is driven into activity by a strong surrounding hex-grid pattern.
The L5 prediction cell is exactly what Numenta describes. The L5 bursting/ learning is about the same as Numenta suggests, but with the enhancement of thalamus gating and reinforcement I outlined earlier.
I would add a habituate factor so a hex-grid has a little persistence but without the local match or prediction match it goes away fairly quickly. I am not sure how long this period should be but I suspect that if I mine the biology literature there is already good research that would provide guidance. I see this as an accumulator that increments on each update cycle and is subtracted from the activation potential sum.
This is the same way that the interaction with the local inhibitory field is used. Each local inhibitory cell feeds a minus value into the cells within its topology - I could see it being some scaled/thresholded output of the activation is is receiving from lateral axonal connections.
Of course this is all predicated on the structure being a 2D sheet like cortex. I have no clue how you might use this with a different topology.
Q: If L2/3 is representing the subject of what is being experienced, then the representations that it forms must be more stable (changing less frequently) than the representations that are forming in L5 where temporal memory is occurring.
A: Ok - if you start from this then I can see what this is confusing.
At the column level there are only three points in time: the prediction from the last step, now and the next predicted time step.
Again, some of the issue comes from trying to put everything in a single column - it’s peristaltic where the out put from this layer in this map is the input to that layer in that map. Once you realized that the H of HTM needs to use some of the layers to do communication and object representation things start to make more sense.
Remember, the rising axon in the cortex is available to signal to all layers from the prior maps as required. After the primary sensory map all following maps are receiving projections from two or more maps to process one stream against another. The streams name implies that it is just map to map but that is the wrong way to look at it, it is the combination of maps that is being processed.
See the images towards the bottom of this post here:
Q: Do you see any maps with one input and one output?
A: Yes, there are a couple and I think that those are doing some special function not related to the general cortical column computation. The rest are taking multiple inputs and processing those to some output.
Q: I guess the core of my question is where does temporal pooling occur? How do collections of inputs over time become one representation. If all layers are operating on the same time scale, it doesn’t matter how many times you mix and match and combine inputs between different maps – you would only ever be pooling spatial information.
A: Grid resonance; Looking at the activation formula again (below) - you can see that a fully formed grid with no additional sensed input should exceed the firing potential. I would add a habituate factor so a hex-grid has a little persistence but without the local match or prediction match it goes away fairly quickly. I am not sure how long this period should be but I suspect that if I mine the biology literature there is already good research that would provide guidance. I see habituation as an accumulator that increments on each update cycle and is subtracted from the potential activation sum. Continued input extends this lifetime.
Summary: a hex-grid node activation formula = (THRESHOLD 2((each reciprocal connections) PLUS (a local recognized pattern) PLUS (a recognized prediction)))
I surmise that a “hole” in the grid is driven into activity by a strong surrounding hex-grid pattern.
Temporal pooling is an emergent behavior.
Comment: This is what I mean by temporal pooling, and also what I have been assuming. This persistence means the representation of the grid is changing less frequently than it’s constituent input representations. The temporal differential results in TP.
Comment: The temporal pooling bit explains why you have to stop thinking about something for a bit to let the pattern dissipate when you know it is not the answer you are looking for. That grid pattern residue lurks a while until all the connections go back to baseline excitation.
Comment: For a resonating grid without any new supporting input the suppressed cells are receiving strong input and they act to suppress the current hex-grid and establish a new one quickly.
Q: This leads to two potential gaps in my understanding, however. Before I try to explain them further, however, let me begin by asking, in your theory is L5 forming a temporal memory of activity from L2/3, or is it forming a temporal memory of a constituent part of the activity of L2/3? (i.e is it forming a temporal memory of the input space or of the output space?)
A: I see them as a team -
one says - OH - I know this pattern!,
the other says - pffft -I know what is going to happen next!
The driver for the local L2/3 SP is the combination of both activations; there are several potential candidates for how L5 drives L2/3.
Comment: In my mind I see the stream of V1 saccades accumulating in the association regions to stabilize on a representation that means whatever you are scanning with your eyes - every part being the bit that that part of the retina is seeing at that point of the image, in the sequence that it is being scanned; each being a sequence of micro-features (WHAT) as the object is closer or further you still see essentially the same features, just at slightly different locations.(WHERE) This gives some measure of object distance or size invariance.
This is the thing that artists play with by making objects much larger or smaller than you would normally expect.
The entire collection of extracted features accumulates in the association region in the SP + TP representation and the grid formed is the code for the object.
Comment: I will add this:
Yes, L5/TM sees A-> B-> C and learns the transitions
Is there any particular reasons that L2/3/SP would not also learn A, B, & C as static patterns? It has an apical feed to sense the rising axons. The proximal dendrites could be dedicated to the SP function of lateral axonal projections. I don’t know that anyone has mapped out these connections - or at least I have not seen a paper on it yet.
Combine both and you have high confidence that we know this pattern and we know that we are inside a sequence so our confidence is much higher as both layers agree that this is something we know.
Comment: Sure, we see the lateral connections here in THIS map. This other map over there? All it sees is some sparsely spaced activation spikes on the fiber bundle projecting to its L1 field. It will ALSO have at least one other bundle of fibers also projecting some sparsely spaced pattern on its bundle of fibers. It may end up forming its own hex-grid but it most likely will not be a copy of this grid. It will have its own phase/spacing/rotation pattern.
Q: By TP algorithm you mean increasing the temporal stability of an area’s output, all the while being able to react to change (online) when needed ?
A: Correct. For some time it has been clear to me that distributed semantics can be distilled from an input stream in two ways – spatially (relating two or more bits which are physically distant from each other in a single input of the stream) and temporally (relating two or more bits which are active in separate inputs of the stream).
In order to bring together these relationships in a hierarchy, it is necessary to “pool” them in some way. Classic HTM has addressed half of the problem via the Spatial Pooler algorithm. My quest for the last three years has been to address the other half of the problem by developing a Temporal Pooler algorithm. I have developed a few TP algorithms, but so far all have had flaws and fallen short of the concept that I have in my mind for how TP should work.
Performing both SP and TP in the same algorithm is a new epiphany for me, so feeling like I am at least on the right track now.
Q: So… you propose that, without a TM-like prediction, all cells in that L2/3 minicolumn still fire kinda together, by virtue of the inhibitory fields pushing the possible “activation spot” towards that whole minicolumn ?
A: Yes, I am saying that L2/3 recognizes here-and-now patterns all by itself IN ADDITION to the TM model that is proposed by JH.
The firing law for L2/3 is a summation of:
L5 (Apical pattern prediction matching proximal inputs)
and
L2/3 (apical pattern recognition)
and
grid forming input (proximal sensing of lateral projections)
Weighing for each to be determined; I suspect that this may be a dynamic thing including habituation.
Q: The hypothesis here, I hoped you would confirm (or disprove) before I could focus on anything else (and L5 etc), was more along the line of : are whole L2/3 multi-cell assemblies in same column potentially all-“active”, in your view ? (that is, before an HTM-like prediction kicks-in to only activate one predictive cell in that minicolumn).
A: As I see it, at each cell in a mini-column is a prima-donna in each level. They act collectively both up and down the column, and within a layer. Firing on any one hits the chander cells that shut down the rest of the similar cells in that layer. Both L2/3 and L5 have these things attached to the axon hillock.
Q: Interresting idea. I haven’t thought about those lines at all, I shall try and see what could come out of it. However, I have the feeling that a TM in L2/3 would jeopardize the “Temporal Pooling” functionality (as a stabilizer over several ticks) that (I believe) Paul and I hoped to cram into that L2/3 layer.
A: We are diverging in understanding, TM is still firmly in L5, and L2/3 is firmly involved with SP and sparsity. If you have any other understanding of my words I am at fault here.
Q: Do you see parts of thalamus currently having searchlight attention as tonic, and thus the quiescent parts having their relay cells in bursting mode, or the other way around ?
A: I see bursting at the bit that signal novelty, which from a functional point of view, worthy of further attention. So, following this line, tonic is the signal that we recognize this input and no special attention is required.
The amount of novelty is related to its influence in drawing attention to itself. If there is not a lot going on then large scale attention could be drawn to this input. On the other hand, people report that in a pitched fight they did not notice that they were shot until after the battle.
Going the other way - meditation seeks to remove all inputs so you are receptive to the smallest internal state forming.
Q: Why would you want L5 to process exactly in sync, minicolumn-wise, with L2/3 ?
A: This was tripping me up as well. If L2/3 activity is more stable than L5, then it cannot be driving the minicolumns for L5 below it. Instead, it makes more sense for it to be driving (or at least contributing to) the active minicolumns in the next map this one is outputting to.
Q: This bit is actually the hardest problem that I see that HAS to be resolved to offer a complete theory.
How to integrate the layers! Still wishing for that nice direct L5 → L2/3 signaling path …
Q of Q: Would you be so kind as to explain to me the issue here ? And which phenomenon you are trying to match, exactly.
A: Lets start at the very beginning: stock HTM.
This centers on the L5 structure learning some pattern on the apical dendrites and a separate pattern on the body dendrites.I refuse to get caught up in trying to differentiate “proximal” from other body dendrites.
If we see a pattern that we know on the apical dendrites we are primed to fire faster than our neighbors if we recognize some pattern on out proximal dendrites and use our connected chandelier cell to stifle our neighbors.
Then it’s time to go out on the neighborhood and battle it out with other columns to see who is the most sincere about recognizing the local input - spatial pooling. Now we fire off competition using basket cells - in the end - there can be only one! (in this neighborhood)
I propose to enhance this spatial pooling with hex-grid cells. You know the basics of the hex-grid but the question stands - if the L5 wins how does that winning get up to the L2/3 to start the hex-grid formation?
Sure - the hex-grid is great for recognizing patterns but it can’t predict.That is the super power of L5.
Q: Okay. So merging HTM & Calvin, right. Simply L2/3 sensing something, anything, no matter what or where, will attract its response to some griddy pattern, right ? Thus I don’t see a specific match to be drawn between tiny activation spots in L5 and tiny activation spots in L2/3, requiring that they’d share same minicolumn. Do you ?
A: Yes, The brain invests a huge amount of resources to get this predictive thing in L5, and the griddy thing in L2/3.
It is entirely possible that ALL the L5 thing does is sense surprise and kick that down to the thalamus, to be returned to the L2/3 in thalamus-> cortex projections to L1. I was sort of hoping that there was a more direct connection.
The POM layer does shoot right up to L1.
Q: Ah ? Lateral voting would mostly be L2/3 in my view
A: Yes that is what I am talking about. L2/3 votes on a higher level abstraction, meaning many neighboring maps agree on what is the thing that is being experienced. The representation of that thing then needs to bias a set of lower level components of that higher level thing, in another layer which is doing TM (in this context, that being L5). Then a second signal (can’t be the same one, because it is a different level of abstraction) needs to predict the next lower level thing. Temporal unfolding can then execute as both biasing signals coincide during each component of the sequence.
Comment: An interesting point is that signals in L1 would in fact be a mixture of activity from L2/3 at a higher level of abstraction as well as activity before it passes through L2/3 (i.e. at a lower level of abstraction). In theory, this could enable two (or more) separate signals to be broadcast to apical dendrites reaching up from the lower layers. So, yeah… maybe could do away with distal dendrites and do everything on the apicals…
Adding a pointer to the post above that describes the training algorithm for L2/3 grid-forming cells: