I see a lot of this rhythm arising from the thalamus interacting with the layers.
Yes, I have read that cortex in a dish starts resonance all by itself but I think that the thalamus is a master conductor to coordinate how different area process “the contents of consciousness” so that spike timing can work to transfer the information as learning between areas.
As far as areas taking over for each other - when I see connectivity maps between different areas almost everything is connected to a huge number of other areas. It is well documented that if you nip off a monkey finger that the cortical areas that used to serve that finger eventually morph to represent other skin areas.
I will go further and state that this is the basis for Synesthesia.
Hi, I watched the recording. A quick summary of what I am working on currently – attempting to implement an algorithm based on input from @Bitking. This first pass will be an over-simplified Frankenstein hybrid, but hopefully will spark further iterations.
At a high level, what I am building is a layer of cells which operates an algorithm that processes distributed semantics in two ways – spatially (relating two or more bits which are physically distant from each other in a single input of the stream) and temporally (relating two or more bits which are active in separate inputs of the stream).
One input to this layer is the output of a TM layer. Another input is lateral connections from neighboring maps. These inputs will start a cycle of four grid-forming competitions (these four competitions run within a single tick from the TM layer’s perspective).
Within each competition, the contenders generate fields of inhibition around themselves. If one of these contender cells is within the sweet spot distance of other contenders, it gets boosted to help counteract any other closer cells which are trying to inhibit it.
At the end of the competitions, the layer should have settled onto a particular sparse grid pattern. Learning rules are applied, and this sparse activation is the output of this map. It becomes the input for the next map(s), and activates the minicolumns for the TM layer in the next map.
I have been having long free-wheeling chats with @Paul_Lamb and @gmirey We have been working though basic neurology, griddy things, and implementation details, and possible applications. I have walked through a very long chat stream and pulled out some of the more interesting exchanges and comments. Anyone trying to understand the hex-grid concept and how it fits in with the cortex may find something of interest in this wall of text. Or not.
Background picking out some key features of the original hex-grid post athat started this thread:
The lateral axonal projections are originating from the same layer area, rise at about the same angle, and target about the same layer area. This defines a circular band around the cells in a mini-column. There is natural variety in all three of these parameters and this is a feature- not a bug - as this allows different scale/angle patterns to be learned. As to the phasing variation - Numenta spends its time with primary sensory areas. These are anchored to the incoming sensory fibers and I agree the the positions are fixed. There is absolutely nothing in the structure that restricts a hex-grid node from being centered on any mini-column.
For a given pair of active nodes that will win the sparsity competition- as the axonal projections rise through the inhibitory layer they will both be signaling lesser excited cells to go to sleep. The cells that are recognizing parts of a larger pattern will have the recognition of whatever pattern they are seeing AND the excitement from other cells that are also seeing whatever part they are sensing. Contrast this with cells that are just seeing some bit of pattern they have learned. Without the lateral connections their level of excitement will be much less insuring that they will lose the sparsity competition.
I am less sure of how L5 enters into this but I speculate that this output lends even more excitement to the sparsity competition, insuring that the pattern association paired with the prediction gets an extra push to form the grid that best matches the prediction.
This was the transition feature that drew me to the work of Numenta in the first place. I could see how a pattern would form a Calvin tile but what would preserve that tile when the pattern completely changes like in a saccades? The prediction cell is the missing element.
Summary: a hex-grid node activation formula =
(each reciprocal connections)
(a local recognized pattern)
(a recognized prediction in TM-L5))
) I surmise that a “hole” in the grid is driven into activity by a strong surrounding hex-grid pattern.
The L5 prediction cell is exactly what Numenta describes. The L5 bursting/ learning is about the same as Numenta suggests, but with the enhancement of thalamus gating and reinforcement I outlined earlier.
I would add a habituate factor so a hex-grid has a little persistence but without the local match or prediction match it goes away fairly quickly. I am not sure how long this period should be but I suspect that if I mine the biology literature there is already good research that would provide guidance. I see this as an accumulator that increments on each update cycle and is subtracted from the activation potential sum.
This is the same way that the interaction with the local inhibitory field is used. Each local inhibitory cell feeds a minus value into the cells within its topology - I could see it being some scaled/thresholded output of the activation is is receiving from lateral axonal connections.
Of course this is all predicated on the structure being a 2D sheet like cortex. I have no clue how you might use this with a different topology.
Q: If L2/3 is representing the subject of what is being experienced, then the representations that it forms must be more stable (changing less frequently) than the representations that are forming in L5 where temporal memory is occurring. A: Ok - if you start from this then I can see what this is confusing.
At the column level there are only three points in time: the prediction from the last step, now and the next predicted time step. Again, some of the issue comes from trying to put everything in a single column - it’s peristaltic where the out put from this layer in this map is the input to that layer in that map. Once you realized that the H of HTM needs to use some of the layers to do communication and object representation things start to make more sense.
Remember, the rising axon in the cortex is available to signal to all layers from the prior maps as required. After the primary sensory map all following maps are receiving projections from two or more maps to process one stream against another. The streams name implies that it is just map to map but that is the wrong way to look at it, it is the combination of maps that is being processed.
See the images towards the bottom of this post here:
Q: Do you see any maps with one input and one output? A: Yes, there are a couple and I think that those are doing some special function not related to the general cortical column computation. The rest are taking multiple inputs and processing those to some output.
Q: I guess the core of my question is where does temporal pooling occur? How do collections of inputs over time become one representation. If all layers are operating on the same time scale, it doesn’t matter how many times you mix and match and combine inputs between different maps – you would only ever be pooling spatial information. A: Grid resonance; Looking at the activation formula again (below) - you can see that a fully formed grid with no additional sensed input should exceed the firing potential. I would add a habituate factor so a hex-grid has a little persistence but without the local match or prediction match it goes away fairly quickly. I am not sure how long this period should be but I suspect that if I mine the biology literature there is already good research that would provide guidance. I see habituation as an accumulator that increments on each update cycle and is subtracted from the potential activation sum. Continued input extends this lifetime.
Summary: a hex-grid node activation formula = (THRESHOLD 2((each reciprocal connections) PLUS (a local recognized pattern) PLUS (a recognized prediction)))
I surmise that a “hole” in the grid is driven into activity by a strong surrounding hex-grid pattern.
Temporal pooling is an emergent behavior.
Comment: This is what I mean by temporal pooling, and also what I have been assuming. This persistence means the representation of the grid is changing less frequently than it’s constituent input representations. The temporal differential results in TP.
Comment: The temporal pooling bit explains why you have to stop thinking about something for a bit to let the pattern dissipate when you know it is not the answer you are looking for. That grid pattern residue lurks a while until all the connections go back to baseline excitation.
Comment: For a resonating grid without any new supporting input the suppressed cells are receiving strong input and they act to suppress the current hex-grid and establish a new one quickly.
Q: This leads to two potential gaps in my understanding, however. Before I try to explain them further, however, let me begin by asking, in your theory is L5 forming a temporal memory of activity from L2/3, or is it forming a temporal memory of a constituent part of the activity of L2/3? (i.e is it forming a temporal memory of the input space or of the output space?) A: I see them as a team -
one says - OH - I know this pattern!,
the other says - pffft -I know what is going to happen next!
The driver for the local L2/3 SP is the combination of both activations; there are several potential candidates for how L5 drives L2/3.
Comment: In my mind I see the stream of V1 saccades accumulating in the association regions to stabilize on a representation that means whatever you are scanning with your eyes - every part being the bit that that part of the retina is seeing at that point of the image, in the sequence that it is being scanned; each being a sequence of micro-features (WHAT) as the object is closer or further you still see essentially the same features, just at slightly different locations.(WHERE) This gives some measure of object distance or size invariance. This is the thing that artists play with by making objects much larger or smaller than you would normally expect. The entire collection of extracted features accumulates in the association region in the SP + TP representation and the grid formed is the code for the object.
Comment: I will add this:
Yes, L5/TM sees A-> B-> C and learns the transitions
Is there any particular reasons that L2/3/SP would not also learn A, B, & C as static patterns? It has an apical feed to sense the rising axons. The proximal dendrites could be dedicated to the SP function of lateral axonal projections. I don’t know that anyone has mapped out these connections - or at least I have not seen a paper on it yet.
Combine both and you have high confidence that we know this pattern and we know that we are inside a sequence so our confidence is much higher as both layers agree that this is something we know.
Comment: Sure, we see the lateral connections here in THIS map. This other map over there? All it sees is some sparsely spaced activation spikes on the fiber bundle projecting to its L1 field. It will ALSO have at least one other bundle of fibers also projecting some sparsely spaced pattern on its bundle of fibers. It may end up forming its own hex-grid but it most likely will not be a copy of this grid. It will have its own phase/spacing/rotation pattern.
Q: By TP algorithm you mean increasing the temporal stability of an area’s output, all the while being able to react to change (online) when needed ? A: Correct. For some time it has been clear to me that distributed semantics can be distilled from an input stream in two ways – spatially (relating two or more bits which are physically distant from each other in a single input of the stream) and temporally (relating two or more bits which are active in separate inputs of the stream).
In order to bring together these relationships in a hierarchy, it is necessary to “pool” them in some way. Classic HTM has addressed half of the problem via the Spatial Pooler algorithm. My quest for the last three years has been to address the other half of the problem by developing a Temporal Pooler algorithm. I have developed a few TP algorithms, but so far all have had flaws and fallen short of the concept that I have in my mind for how TP should work.
Performing both SP and TP in the same algorithm is a new epiphany for me, so feeling like I am at least on the right track now.
Q: So… you propose that, without a TM-like prediction, all cells in that L2/3 minicolumn still fire kinda together, by virtue of the inhibitory fields pushing the possible “activation spot” towards that whole minicolumn ? A: Yes, I am saying that L2/3 recognizes here-and-now patterns all by itself IN ADDITION to the TM model that is proposed by JH.
The firing law for L2/3 is a summation of:
L5 (Apical pattern prediction matching proximal inputs)
L2/3 (apical pattern recognition)
grid forming input (proximal sensing of lateral projections)
Weighing for each to be determined; I suspect that this may be a dynamic thing including habituation.
Q: The hypothesis here, I hoped you would confirm (or disprove) before I could focus on anything else (and L5 etc), was more along the line of : are whole L2/3 multi-cell assemblies in same column potentially all-“active”, in your view ? (that is, before an HTM-like prediction kicks-in to only activate one predictive cell in that minicolumn). A: As I see it, at each cell in a mini-column is a prima-donna in each level. They act collectively both up and down the column, and within a layer. Firing on any one hits the chander cells that shut down the rest of the similar cells in that layer. Both L2/3 and L5 have these things attached to the axon hillock.
Q: Interresting idea. I haven’t thought about those lines at all, I shall try and see what could come out of it. However, I have the feeling that a TM in L2/3 would jeopardize the “Temporal Pooling” functionality (as a stabilizer over several ticks) that (I believe) Paul and I hoped to cram into that L2/3 layer. A: We are diverging in understanding, TM is still firmly in L5, and L2/3 is firmly involved with SP and sparsity. If you have any other understanding of my words I am at fault here.
Q: Do you see parts of thalamus currently having searchlight attention as tonic, and thus the quiescent parts having their relay cells in bursting mode, or the other way around ? A: I see bursting at the bit that signal novelty, which from a functional point of view, worthy of further attention. So, following this line, tonic is the signal that we recognize this input and no special attention is required.
The amount of novelty is related to its influence in drawing attention to itself. If there is not a lot going on then large scale attention could be drawn to this input. On the other hand, people report that in a pitched fight they did not notice that they were shot until after the battle.
Going the other way - meditation seeks to remove all inputs so you are receptive to the smallest internal state forming.
Q: Why would you want L5 to process exactly in sync, minicolumn-wise, with L2/3 ? A: This was tripping me up as well. If L2/3 activity is more stable than L5, then it cannot be driving the minicolumns for L5 below it. Instead, it makes more sense for it to be driving (or at least contributing to) the active minicolumns in the next map this one is outputting to.
Q: This bit is actually the hardest problem that I see that HAS to be resolved to offer a complete theory.
How to integrate the layers! Still wishing for that nice direct L5 -> L2/3 signaling path … Q of Q: Would you be so kind as to explain to me the issue here ? And which phenomenon you are trying to match, exactly. A: Lets start at the very beginning: stock HTM.
This centers on the L5 structure learning some pattern on the apical dendrites and a separate pattern on the body dendrites.I refuse to get caught up in trying to differentiate “proximal” from other body dendrites.
If we see a pattern that we know on the apical dendrites we are primed to fire faster than our neighbors if we recognize some pattern on out proximal dendrites and use our connected chandelier cell to stifle our neighbors.
Then it’s time to go out on the neighborhood and battle it out with other columns to see who is the most sincere about recognizing the local input - spatial pooling. Now we fire off competition using basket cells - in the end - there can be only one! (in this neighborhood)
I propose to enhance this spatial pooling with hex-grid cells. You know the basics of the hex-grid but the question stands - if the L5 wins how does that winning get up to the L2/3 to start the hex-grid formation?
Sure - the hex-grid is great for recognizing patterns but it can’t predict.That is the super power of L5.
Q: Okay. So merging HTM & Calvin, right. Simply L2/3 sensing something, anything, no matter what or where, will attract its response to some griddy pattern, right ? Thus I don’t see a specific match to be drawn between tiny activation spots in L5 and tiny activation spots in L2/3, requiring that they’d share same minicolumn. Do you ? A: Yes, The brain invests a huge amount of resources to get this predictive thing in L5, and the griddy thing in L2/3.
It is entirely possible that ALL the L5 thing does is sense surprise and kick that down to the thalamus, to be returned to the L2/3 in thalamus-> cortex projections to L1. I was sort of hoping that there was a more direct connection.
The POM layer does shoot right up to L1.
Q: Ah ? Lateral voting would mostly be L2/3 in my view A: Yes that is what I am talking about. L2/3 votes on a higher level abstraction, meaning many neighboring maps agree on what is the thing that is being experienced. The representation of that thing then needs to bias a set of lower level components of that higher level thing, in another layer which is doing TM (in this context, that being L5). Then a second signal (can’t be the same one, because it is a different level of abstraction) needs to predict the next lower level thing. Temporal unfolding can then execute as both biasing signals coincide during each component of the sequence.
Comment: An interesting point is that signals in L1 would in fact be a mixture of activity from L2/3 at a higher level of abstraction as well as activity before it passes through L2/3 (i.e. at a lower level of abstraction). In theory, this could enable two (or more) separate signals to be broadcast to apical dendrites reaching up from the lower layers. So, yeah… maybe could do away with distal dendrites and do everything on the apicals…
Adding a pointer to the post above that describes the training algorithm for L2/3 grid-forming cells:
Thanks for this very inspiring discussion.
It is sometimes hard to follow, but always rewarding in the end when you are progressively understanding some of the different pieces.
From what I have read in this discussion:
1/ Gamma (25-140 Hz) oscillations are mainly associated to L2/3, and alpha (8-12 Hz) oscillations for L5
2/ L2/3 representations are more stable than L5 representations
Don’t you see a contradiction between those 2 points ?
Or maybe the L2/3 representation of an object/concept corresponds to a sequence of dynamic hex-grid patterns changing at every gamma tick ? But I don’t think that was your idea.
From my perspective, I think of the object layer as a battleground over the higher level concept being sensed. A series of short battles end in winners which squelch everyone else, and it is the winners which are more stable than the representations of the TM layer. Of course this implies that the higher level concept is something which has been learned. If it is something completely unrecognized, then the activity would of course be unstable until new relationships were learned.
I would add that even though the competition at the L2/3 level runs at the gamma rate the output of the competition is every 4 cycles, or the alpha rate. The operations on the object layer battleground take some time to work out so that the results are available for the slower-going alpha rate components.
I imagine the big picture, but how could an action potential be selectively propagated through the horizontal axon terminals of a pyramidal neuron, but not through its map2map projection axon terminals at the same time ?
Sounds only possible if there are some Chandelier-like inhibitory neurons that project not directly on the axon hillock or initial segment of pyramidal cells, but a little further away. Does it ?
The map-2-map projections are only one of the rising axons on the L1 jungle. If you think of it that way it is more of a hint and not an imperative command. You also have to remember that in the biology, the output from the neuron is a spike train - not a hard 1/0 - so these hints get stronger as a given L2/3 cell is more influenced by it’s inputs.
If you look at the MRIs of brains there are usually large areas that are not doing very much.
The output from this idle state would be less than an active hex-grid pattern; not enough to confuse the L5 layer.
I see this as the background balance of inhibitory and excitatory cells maintaining some low level of homeostasis. It takes some coordinated recognition to organize the hex-grid and suppress the rest of the cell population to make the grid stand out even more.
This is just idle conjectures but I suspect that the total amount of L2/3 activation between when a hex-grid is formed and the idle hum is about the same.
I will conjecture further that the level of column activity is the highest when the L5 bursts to signal novelty, and much less when an active recognition is happening.
I think I have now a better understanding of the L2/3 dynamics.
I will rephrase and expand the idea with my words.
A map is a continuous subregion of the cortex which can form stable hex-grid patterns on its L2/3 layer.
Each stable hex-grid pattern corresponds to the representation of an object/concept at the map level.
The human neocortex is composed of thousands (ten of thousands ?) of those maps, so an object/concept is represented in parallel in numerous maps at the same time (see TBT)
In a given map, 3 “external” forces are at play to form a stable hex-grid pattern in L2/3:
Inputs from other maps (from L2/3 pyramidal neurons of maps either higher or lower in the hierarchy)
Sensory inputs (from L4 stellate cells)
Temporal predictions inside the given map (from L5 ?)
Those “external” forces apply strongly to the L2/3 layer every 4 cycles (alpha rate processing).
In between, the internal L2/3 dynamics has 3 more cycles to converge towards a stable hex-grid via a competition at each cycle (gamma rate processing).
It is as if the “external” forces put the L2/3 map in its initial position before the relaxation of the system.
At the cellular level, the strong map-2-map input corresponds to a bursting mode happening at each downside of alpha oscillation.
This would be in line with experimental results which show that “gamma amplitude and burst duration are inversely related to alpha amplitude”
Layer-Specific Entrainment of Gamma-Band Neural Activity by the Alpha Rhythm in Monkey Visual Cortex
Eelke Spaak, Mathilde Bonnefond, Alexander Maier, David A. Leopold, and Ole Jensen, 2012 https://www.ncbi.nlm.nih.gov/pubmed/23159599
Concerning the sensory inputs, they could theoretically arrive in L2/3 at any time but they will have a stronger impact if they are phase-locked to alpha oscillations, because they will have more time to influence the hex-grid formation before it is broadcasted to the other maps. That’s why the brain has optimized this timing to do active sensing (for example: by synchronizing the saccades with alpha cycle).
I haven’t thought much about the link with temporal predictions yet. I understand that this is the next piece of the puzzle to get closer to a coherent theory with SP/TP/TM (and that the thalamus will be involved).
A map is a continuous subregion of the cortex which can form stable hex-grid patterns on its L2/3 layer.
Each stable hex-grid pattern corresponds to the representation of an object/concept at the map level.
The human neocortex is composed of about 150 maps, each composed of about a million mini-columns, so an object/concepts are represented as stable grid patterns composed of activated mini-columns in parallel in numerous maps at the same time (see TBT)
For a more in depth view of maps I offer this post of mine:
These are excellent references and I can see that you grasp the concepts to the detail necessary to start to see how it in embodied in the biology.
I am hoping that as we start to see funcional models expressed in code there will be more light shed into the large scale structure of the functioning of the brain. Many disparate models and papers should have a basic framework to tie them all together, much the same way that tectonic plate theory greatly enhanced our understanding of geology.
The interaction with the thalamus and L6 is also very important and I suggest that an excellent starting place is this paper. I found it to be a hard read but well worth the effort: