What do you mean? The TM doesn’t activate minicolumns in HTM theory.
Yup, there may have to be some very slight revisions.
The mini-columns are getting multiple inputs on the apical dendrites, as all layers with apical presence are doing their own thing.
TM is trying to match the last state/present state thing for TM, L2/3 is trying to match the local pattern AND the inputs from lateral axonal projections, AND the output from TM.
If the TM layer is really sure of itself it gets a strong voice in this voting.
All this is being integrated in L2/3 and sent back out though the lateral axonal projections to fight it out using inhibitory inter-neurons, both chandelier and basket.
After the 4 gamma rounds the winners stand proud as a hex-grid (if they are able to form one) to signal the winner of this alpha state, which is what is projected to the attached maps via outgoing L2/3 inter-map connections AND whatever projections (bursting or tonic) come our of L5 going down to the thalamus.
L5 does it’s local activation wave back-prop training thing, L2/3 does the training thing outlined in post 98 above.
Then it all starts again for the next alpha cycle.
I may be interpreting this incorrectly, but in my mind, L2/3 is equivalent to the “object layer” in TBT. Please note that I am using the term “object” very loosely here (it could range anywhere from a full complex object as current HTM theory suggests to a unique perspective of a complex object, to some collection of primitives).
In any case, the activity in this layer is more stable than the activity in whichever layer is performing TM below it. In other words, the representation for a object is less granular than the representation for the current component of that object, and it cannot be what is activating the minicolumns representing that current component. It can, however, be itself a feature of an object represented by the next map.
Thus, I believe this must work by L2/3 performing the pooling and sparsification actions for the next map (not this map). I would visualize the basic relationship between two maps something like this:
Where L2/3 is performing an algorithm equivalent to SP and TP for the next map.
Adding a note to this & repeating a point I mentioned in the live-stream today - this hex-grid model is only valid in layers after the primary sensory cortex.
This will be all about processing map-in -> map-out streams.
@Bitking @Paul_Lamb So do you guys think that Spatial Pooling is happening in sensory regions? Does this idea of hex grids invalidate any of our previous papers? Or could it all be happening together?
I may differ from @Bitking, but personally I am a strong believer in the theory that the cortical circuit is universal across the cortex, and any differences are primarily driven by where the input is coming from. As such, my initial impression is that L2/3 should be running the same algorithm wherever it happens to be found (including in sensory regions). I am not as familiar with the neuroscience, though, so probably not the best person to answer this.
Yes, there are differences in morphology of the neural zoo all over the brain.
In fact, the original Brodmann cortical maps were based on visible differences in Cytoarchitecture.
It may(?) be that Numenta papers are specific to the sensory cortex and have to be extended to apply to other regions. I have done as much as I can to point to what I think is the way forward.
I talked to @Bitking today on my stream:
I think you are correct. An example is [1]. TLDR: children with missing V1 and part of V2 (due to a condition in the first two weeks) is able to “see” using MT(Middle Temporal area). So, algorithm seems to be "common2 (and potentially parameters are the same also in prime state) .
[1] I. C. Mundinano et al. , “More than blindsight: Case report of a child with extraordinary visual capacity following perinatal bilateral occipital lobe injury,” Neuropsychologia , vol. 128, no. November 2017, pp. 178–186, 2019.
Well, MT is still in posterior cortex, which is a broadly defined sensory cortex. Things may be different in frontal (motor) cortex.
Coming from same standpoint as Paul, even though a lot of my views have been shaken…
Still wishing to play devil’s advocate for a minute: even if we’re right in thinking parts of biological cortex can be substituted by others, and proven that all individual cells follow same per layer rules, that’s a long call from having proven they can be functionally abstracted the same way across all areas. Different kind of sensors may have qualitatively different modalities at such a level. And by the same token really high up or motor may be distinct also.
In case of V1, I’m raising concerns about specific rhythms I’ve read about recently. Rhythms are especially likely to have an impact on the formation of a griddy-resonance-thing, or lack thereof.
I see a lot of this rhythm arising from the thalamus interacting with the layers.
Yes, I have read that cortex in a dish starts resonance all by itself but I think that the thalamus is a master conductor to coordinate how different area process “the contents of consciousness” so that spike timing can work to transfer the information as learning between areas.
As far as areas taking over for each other - when I see connectivity maps between different areas almost everything is connected to a huge number of other areas. It is well documented that if you nip off a monkey finger that the cortical areas that used to serve that finger eventually morph to represent other skin areas.
I will go further and state that this is the basis for Synesthesia.
Good analogy. Neocortex probably doesn’t drive itself. It resonates and informs itself, but at the will of the thalamus. I hesitate to use the word “will”.
I think you are right :). I see thalamus, to a large extent, as a battle ground or market place between cortical areas.
That’s interesting. But these overlapping connections should normally be latent / inhibited, so synesthesia is still a disorder.
Hi, I watched the recording. A quick summary of what I am working on currently – attempting to implement an algorithm based on input from @Bitking. This first pass will be an over-simplified Frankenstein hybrid, but hopefully will spark further iterations.
At a high level, what I am building is a layer of cells which operates an algorithm that processes distributed semantics in two ways – spatially (relating two or more bits which are physically distant from each other in a single input of the stream) and temporally (relating two or more bits which are active in separate inputs of the stream).
One input to this layer is the output of a TM layer. Another input is lateral connections from neighboring maps. These inputs will start a cycle of four grid-forming competitions (these four competitions run within a single tick from the TM layer’s perspective).
Within each competition, the contenders generate fields of inhibition around themselves. If one of these contender cells is within the sweet spot distance of other contenders, it gets boosted to help counteract any other closer cells which are trying to inhibit it.
At the end of the competitions, the layer should have settled onto a particular sparse grid pattern. Learning rules are applied, and this sparse activation is the output of this map. It becomes the input for the next map(s), and activates the minicolumns for the TM layer in the next map.
I have been having long free-wheeling chats with @Paul_Lamb and @gmirey We have been working though basic neurology, griddy things, and implementation details, and possible applications. I have walked through a very long chat stream and pulled out some of the more interesting exchanges and comments. Anyone trying to understand the hex-grid concept and how it fits in with the cortex may find something of interest in this wall of text. Or not.
Background picking out some key features of the original hex-grid post athat started this thread:
The lateral axonal projections are originating from the same layer area, rise at about the same angle, and target about the same layer area. This defines a circular band around the cells in a mini-column. There is natural variety in all three of these parameters and this is a feature- not a bug - as this allows different scale/angle patterns to be learned. As to the phasing variation - Numenta spends its time with primary sensory areas. These are anchored to the incoming sensory fibers and I agree the the positions are fixed. There is absolutely nothing in the structure that restricts a hex-grid node from being centered on any mini-column.
For a given pair of active nodes that will win the sparsity competition- as the axonal projections rise through the inhibitory layer they will both be signaling lesser excited cells to go to sleep. The cells that are recognizing parts of a larger pattern will have the recognition of whatever pattern they are seeing AND the excitement from other cells that are also seeing whatever part they are sensing. Contrast this with cells that are just seeing some bit of pattern they have learned. Without the lateral connections their level of excitement will be much less insuring that they will lose the sparsity competition.
I am less sure of how L5 enters into this but I speculate that this output lends even more excitement to the sparsity competition, insuring that the pattern association paired with the prediction gets an extra push to form the grid that best matches the prediction.
This was the transition feature that drew me to the work of Numenta in the first place. I could see how a pattern would form a Calvin tile but what would preserve that tile when the pattern completely changes like in a saccades? The prediction cell is the missing element.
Summary: a hex-grid node activation formula =
(THRESHOLD 2(
- (each reciprocal connections)
PLUS - (a local recognized pattern)
PLUS - (a recognized prediction in TM-L5))
)
I surmise that a “hole” in the grid is driven into activity by a strong surrounding hex-grid pattern.
The L5 prediction cell is exactly what Numenta describes. The L5 bursting/ learning is about the same as Numenta suggests, but with the enhancement of thalamus gating and reinforcement I outlined earlier.
I would add a habituate factor so a hex-grid has a little persistence but without the local match or prediction match it goes away fairly quickly. I am not sure how long this period should be but I suspect that if I mine the biology literature there is already good research that would provide guidance. I see this as an accumulator that increments on each update cycle and is subtracted from the activation potential sum.
This is the same way that the interaction with the local inhibitory field is used. Each local inhibitory cell feeds a minus value into the cells within its topology - I could see it being some scaled/thresholded output of the activation is is receiving from lateral axonal connections.
Of course this is all predicated on the structure being a 2D sheet like cortex. I have no clue how you might use this with a different topology.
Q: If L2/3 is representing the subject of what is being experienced, then the representations that it forms must be more stable (changing less frequently) than the representations that are forming in L5 where temporal memory is occurring.
A: Ok - if you start from this then I can see what this is confusing.
At the column level there are only three points in time: the prediction from the last step, now and the next predicted time step.
Again, some of the issue comes from trying to put everything in a single column - it’s peristaltic where the out put from this layer in this map is the input to that layer in that map. Once you realized that the H of HTM needs to use some of the layers to do communication and object representation things start to make more sense.
Remember, the rising axon in the cortex is available to signal to all layers from the prior maps as required. After the primary sensory map all following maps are receiving projections from two or more maps to process one stream against another. The streams name implies that it is just map to map but that is the wrong way to look at it, it is the combination of maps that is being processed.
See the images towards the bottom of this post here:
Q: Do you see any maps with one input and one output?
A: Yes, there are a couple and I think that those are doing some special function not related to the general cortical column computation. The rest are taking multiple inputs and processing those to some output.
Q: I guess the core of my question is where does temporal pooling occur? How do collections of inputs over time become one representation. If all layers are operating on the same time scale, it doesn’t matter how many times you mix and match and combine inputs between different maps – you would only ever be pooling spatial information.
A: Grid resonance; Looking at the activation formula again (below) - you can see that a fully formed grid with no additional sensed input should exceed the firing potential. I would add a habituate factor so a hex-grid has a little persistence but without the local match or prediction match it goes away fairly quickly. I am not sure how long this period should be but I suspect that if I mine the biology literature there is already good research that would provide guidance. I see habituation as an accumulator that increments on each update cycle and is subtracted from the potential activation sum. Continued input extends this lifetime.
Summary: a hex-grid node activation formula = (THRESHOLD 2((each reciprocal connections) PLUS (a local recognized pattern) PLUS (a recognized prediction)))
I surmise that a “hole” in the grid is driven into activity by a strong surrounding hex-grid pattern.
Temporal pooling is an emergent behavior.
Comment: This is what I mean by temporal pooling, and also what I have been assuming. This persistence means the representation of the grid is changing less frequently than it’s constituent input representations. The temporal differential results in TP.
Comment: The temporal pooling bit explains why you have to stop thinking about something for a bit to let the pattern dissipate when you know it is not the answer you are looking for. That grid pattern residue lurks a while until all the connections go back to baseline excitation.
Comment: For a resonating grid without any new supporting input the suppressed cells are receiving strong input and they act to suppress the current hex-grid and establish a new one quickly.
Q: This leads to two potential gaps in my understanding, however. Before I try to explain them further, however, let me begin by asking, in your theory is L5 forming a temporal memory of activity from L2/3, or is it forming a temporal memory of a constituent part of the activity of L2/3? (i.e is it forming a temporal memory of the input space or of the output space?)
A: I see them as a team -
one says - OH - I know this pattern!,
the other says - pffft -I know what is going to happen next!
The driver for the local L2/3 SP is the combination of both activations; there are several potential candidates for how L5 drives L2/3.
Comment: In my mind I see the stream of V1 saccades accumulating in the association regions to stabilize on a representation that means whatever you are scanning with your eyes - every part being the bit that that part of the retina is seeing at that point of the image, in the sequence that it is being scanned; each being a sequence of micro-features (WHAT) as the object is closer or further you still see essentially the same features, just at slightly different locations.(WHERE) This gives some measure of object distance or size invariance.
This is the thing that artists play with by making objects much larger or smaller than you would normally expect.
The entire collection of extracted features accumulates in the association region in the SP + TP representation and the grid formed is the code for the object.
Comment: I will add this:
Yes, L5/TM sees A-> B-> C and learns the transitions
Is there any particular reasons that L2/3/SP would not also learn A, B, & C as static patterns? It has an apical feed to sense the rising axons. The proximal dendrites could be dedicated to the SP function of lateral axonal projections. I don’t know that anyone has mapped out these connections - or at least I have not seen a paper on it yet.
Combine both and you have high confidence that we know this pattern and we know that we are inside a sequence so our confidence is much higher as both layers agree that this is something we know.
Comment: Sure, we see the lateral connections here in THIS map. This other map over there? All it sees is some sparsely spaced activation spikes on the fiber bundle projecting to its L1 field. It will ALSO have at least one other bundle of fibers also projecting some sparsely spaced pattern on its bundle of fibers. It may end up forming its own hex-grid but it most likely will not be a copy of this grid. It will have its own phase/spacing/rotation pattern.
Q: By TP algorithm you mean increasing the temporal stability of an area’s output, all the while being able to react to change (online) when needed ?
A: Correct. For some time it has been clear to me that distributed semantics can be distilled from an input stream in two ways – spatially (relating two or more bits which are physically distant from each other in a single input of the stream) and temporally (relating two or more bits which are active in separate inputs of the stream).
In order to bring together these relationships in a hierarchy, it is necessary to “pool” them in some way. Classic HTM has addressed half of the problem via the Spatial Pooler algorithm. My quest for the last three years has been to address the other half of the problem by developing a Temporal Pooler algorithm. I have developed a few TP algorithms, but so far all have had flaws and fallen short of the concept that I have in my mind for how TP should work.
Performing both SP and TP in the same algorithm is a new epiphany for me, so feeling like I am at least on the right track now.
Q: So… you propose that, without a TM-like prediction, all cells in that L2/3 minicolumn still fire kinda together, by virtue of the inhibitory fields pushing the possible “activation spot” towards that whole minicolumn ?
A: Yes, I am saying that L2/3 recognizes here-and-now patterns all by itself IN ADDITION to the TM model that is proposed by JH.
The firing law for L2/3 is a summation of:
L5 (Apical pattern prediction matching proximal inputs)
and
L2/3 (apical pattern recognition)
and
grid forming input (proximal sensing of lateral projections)
Weighing for each to be determined; I suspect that this may be a dynamic thing including habituation.
Q: The hypothesis here, I hoped you would confirm (or disprove) before I could focus on anything else (and L5 etc), was more along the line of : are whole L2/3 multi-cell assemblies in same column potentially all-“active”, in your view ? (that is, before an HTM-like prediction kicks-in to only activate one predictive cell in that minicolumn).
A: As I see it, at each cell in a mini-column is a prima-donna in each level. They act collectively both up and down the column, and within a layer. Firing on any one hits the chander cells that shut down the rest of the similar cells in that layer. Both L2/3 and L5 have these things attached to the axon hillock.
Q: Interresting idea. I haven’t thought about those lines at all, I shall try and see what could come out of it. However, I have the feeling that a TM in L2/3 would jeopardize the “Temporal Pooling” functionality (as a stabilizer over several ticks) that (I believe) Paul and I hoped to cram into that L2/3 layer.
A: We are diverging in understanding, TM is still firmly in L5, and L2/3 is firmly involved with SP and sparsity. If you have any other understanding of my words I am at fault here.
Q: Do you see parts of thalamus currently having searchlight attention as tonic, and thus the quiescent parts having their relay cells in bursting mode, or the other way around ?
A: I see bursting at the bit that signal novelty, which from a functional point of view, worthy of further attention. So, following this line, tonic is the signal that we recognize this input and no special attention is required.
The amount of novelty is related to its influence in drawing attention to itself. If there is not a lot going on then large scale attention could be drawn to this input. On the other hand, people report that in a pitched fight they did not notice that they were shot until after the battle.
Going the other way - meditation seeks to remove all inputs so you are receptive to the smallest internal state forming.
Q: Why would you want L5 to process exactly in sync, minicolumn-wise, with L2/3 ?
A: This was tripping me up as well. If L2/3 activity is more stable than L5, then it cannot be driving the minicolumns for L5 below it. Instead, it makes more sense for it to be driving (or at least contributing to) the active minicolumns in the next map this one is outputting to.
Q: This bit is actually the hardest problem that I see that HAS to be resolved to offer a complete theory.
How to integrate the layers! Still wishing for that nice direct L5 → L2/3 signaling path …
Q of Q: Would you be so kind as to explain to me the issue here ? And which phenomenon you are trying to match, exactly.
A: Lets start at the very beginning: stock HTM.
This centers on the L5 structure learning some pattern on the apical dendrites and a separate pattern on the body dendrites.I refuse to get caught up in trying to differentiate “proximal” from other body dendrites.
If we see a pattern that we know on the apical dendrites we are primed to fire faster than our neighbors if we recognize some pattern on out proximal dendrites and use our connected chandelier cell to stifle our neighbors.
Then it’s time to go out on the neighborhood and battle it out with other columns to see who is the most sincere about recognizing the local input - spatial pooling. Now we fire off competition using basket cells - in the end - there can be only one! (in this neighborhood)
I propose to enhance this spatial pooling with hex-grid cells. You know the basics of the hex-grid but the question stands - if the L5 wins how does that winning get up to the L2/3 to start the hex-grid formation?
Sure - the hex-grid is great for recognizing patterns but it can’t predict.That is the super power of L5.
Q: Okay. So merging HTM & Calvin, right. Simply L2/3 sensing something, anything, no matter what or where, will attract its response to some griddy pattern, right ? Thus I don’t see a specific match to be drawn between tiny activation spots in L5 and tiny activation spots in L2/3, requiring that they’d share same minicolumn. Do you ?
A: Yes, The brain invests a huge amount of resources to get this predictive thing in L5, and the griddy thing in L2/3.
It is entirely possible that ALL the L5 thing does is sense surprise and kick that down to the thalamus, to be returned to the L2/3 in thalamus-> cortex projections to L1. I was sort of hoping that there was a more direct connection.
The POM layer does shoot right up to L1.
Q: Ah ? Lateral voting would mostly be L2/3 in my view
A: Yes that is what I am talking about. L2/3 votes on a higher level abstraction, meaning many neighboring maps agree on what is the thing that is being experienced. The representation of that thing then needs to bias a set of lower level components of that higher level thing, in another layer which is doing TM (in this context, that being L5). Then a second signal (can’t be the same one, because it is a different level of abstraction) needs to predict the next lower level thing. Temporal unfolding can then execute as both biasing signals coincide during each component of the sequence.
Comment: An interesting point is that signals in L1 would in fact be a mixture of activity from L2/3 at a higher level of abstraction as well as activity before it passes through L2/3 (i.e. at a lower level of abstraction). In theory, this could enable two (or more) separate signals to be broadcast to apical dendrites reaching up from the lower layers. So, yeah… maybe could do away with distal dendrites and do everything on the apicals…
Adding a pointer to the post above that describes the training algorithm for L2/3 grid-forming cells:
Is MT because is repurposing Pulvinar nuclei to “forward” inputs (instead LGN). MT is quite up in the hierarchy of the visual cortex. In any case, lets “solve” sensor cortex before to jump to the FC
Thanks for this very inspiring discussion.
It is sometimes hard to follow, but always rewarding in the end when you are progressively understanding some of the different pieces.
From what I have read in this discussion:
1/ Gamma (25-140 Hz) oscillations are mainly associated to L2/3, and alpha (8-12 Hz) oscillations for L5
2/ L2/3 representations are more stable than L5 representations
Don’t you see a contradiction between those 2 points ?
Or maybe the L2/3 representation of an object/concept corresponds to a sequence of dynamic hex-grid patterns changing at every gamma tick ? But I don’t think that was your idea.
From my perspective, I think of the object layer as a battleground over the higher level concept being sensed. A series of short battles end in winners which squelch everyone else, and it is the winners which are more stable than the representations of the TM layer. Of course this implies that the higher level concept is something which has been learned. If it is something completely unrecognized, then the activity would of course be unstable until new relationships were learned.