Does that location layer (L5?) represents a reference frame locally, or it contains references to locations in a separate “where” pathway?
Please expand your idea of an SDR to include the temporal direction. A representation is a spatial/temporal thing.
Your eye only sees a tiny fraction of the space around you. If you hold your arm out the visual hot-spot (fovea) is about the size of your thumbnail. The lower resolution that is the rest of your visual field provides context and visual target selection in the scanning process.
As you scan your eyes around you collect features that are separated by some angle that you had to move your eyes to see the next feature. Part of the feedback in the counter-flowing direction is “how big” the displacement was to collect the next feature.
Numenta (Jeff) has been very clear that displacement/motion is a key part of generating the temporal part of the recognition puzzle. There is a state “before” and “after” that is joined as part of the recognition task. For many objects this collection of features takes many movements.
The displacement can also happen in parallel, grasping an object can involve the displacement that is embodied in the inverse kinematics of the sensed shape of your hand and body - the multiple contact patches are combined with the hand shape.
going back a little, I think there’s no need for anything fancy to make modules agree.
learning lateral exitatory connections should do it naturally, that’s what “Voting” does, the trick is that SDRs are very noise tolerant so they dont need to be identical to “agree”
Can I understand “a displacement” as a discrete step in “sensory movement”? As along the “temporal direction”, in the “before” moment a group of (grid cells + feature cells) spike, and then in the “after” moment, another different group of (grid cells + feature cells) spike?
Then a series of such patterns occur consecutively in a short while, and there can be certain neurons designated (learned somehow) to represent the object, by each interpreting that enough (grid cells + feature cells) conforming to the object’s knowledge (a set of location+feature pairs) fired together, so as to fire to indicate the sensing of the object. Such neurons collectively form the object’s SDR?
I feel I’m still struggling about the definition of SDR, I’m stuck at the imagination of 20 neurons (the on-bits) out of some 1000 neurons spiking to speak of an SDR instance; while all possible yet distinct selections of 20 among the 1000, all form the SDR space.
Can’t there be some better term addressing the temporal aspect?
“all form the SDR space”
Repharased as “all form a tiny corner in the overall SDR space”
The key part of SDR is that a small cell population is scattered though the vast collection of nerve cells that are active with a particular sensation. This concept of sparsity applies at all levels from the dendrite to entire maps to collections of maps. It has a different appearance depending on how you look at it, from the dendrite to the lateral voting, to the serial collection of active cells through time.
The latest few papers from Numenta have been promoting the value of sparsity in a variety of connectionist models. Sparsity seems to be applicable and beneficial in a large number of models.
The next part, distributed, focuses on how this very small population can hold a representation across the vast number of cells in the population. Adding the temporal element forces you to consider not only how the cells talk to each other with mechanisms such as voting and topology, but the how this sparse population of cells talk to each other through time.
This gets even more interesting when you consider the passing of the baton from one sparse population of cells to another over time. Thinking.
I have often considered how these mechanisms form high-dimensional manifolds that evolve though time. Like any high dimensional model, it is so unlike anything that I can see or touch that I struggle to understand it’s shape and properties. I keep trying. The only saving grace is that it is implemented/projected on a two dimensional sheet with a large number of connections.
Yeah, reverse-engineering the brain is so fascinating, but at times also daunting … at least to me Though I’m faithful to make some useful software even with shallow understandings of it.
So far I can only think of (mini-column enabled sequence predicating) falling in this description, out of imagination for other cases. Some teasers please!
Numenta idealizes pyramidal cells as working a certain way, and replaces the action of the inhibitory inter-neurons with a simple k-means function. While this is certainly good enough to demonstrate the basic HTM model, it ignores the many “other” cell types that populate the cortex.
A few minutes reviewing the literature on cell types in the cortex should show you that there are a zoo of different cell types with very different morphology and neurotransmitter profiles.
and
https://www.nature.com/articles/s41586-020-2907-3
and
https://www.researchgate.net/figure/Historical-milestones-in-cortical-cell-type-classification-Morphological_fig1_335713782
One of the ignored types is metabotropic, that have actions measured in seconds or even minutes. This is certainly a good candidate for communications through time.
Another is the interplay between the feed-forward and feedback streams. In particular, the way that the “loop of consciousness” drives action/command states.
The feedback stream can provide context to filter input to match expectation. It can also act as a command pathway to interrogate memory contents. Let’s walk though and example to see how this works.
It does not take much imagination to see the sub-cortex driving the forebrain with a command for some action and though action planning/motor planning/motor drives making the body move.
Speaking is a motor program and the production of words involves a cooperative effort between Broca and Wernicke’s areas to combine syntax and content. For the content part the forebrain is interacting with the object store and stored connections between phonemes, groups of phonemes, and semantic meaning.
So far, so good - let’s focus on this operation - a “higher level” semantic content closer to the injection of the subcortex command has projections to the areas around the top of the hierarchy and is able ping areas in the hierarchy that roughly correspond to concepts. I call this projection “pinning” and I think it is usually focused on one map. Consider the surrounding maps; they have an interconnection with the map that is pinned. These other maps have connections with each other and vote to find the lowest mis-match between themselves and the pinned map. This is an active recall process. Note that the final outcome state of this process sits in the hierarchy stream that goes to the temporal lobe to register as experience and though that, back to the subcortex for evaluation.
Naturally, the temporal lobe breaks it down to a format that the subcortex expects - the dumb boss needs the clever cortex to explain things to it in terms it can understand. My wild guess is the dumb boss works a the level of affordances such as “thingy” and “whatchamacallit” and “go there”, “get or put it,” and “run away.” Likewise, places are likely not much better than “home,” “work,” “place where good stuff is,” and other very basic needs based tagging.
This can drive the subcortex to accept this as the thing it was looking for or focus on some aspect of one of the surrounding maps to pin for further processing, and the process starts again.
This process continues as the vocal utterances are formed. It is a very short step from this to skip the speaking part and go directly to thinking.
Is that teaser enough?
A displacement is a relative location, I think. For example, on a keyboard, the Q key has a relative location to the G key. It’s the same displacement if you flip the keyboard upside down.
The object SDR could be the union of feature SDRs (if you include info about the location of the features). One problem would be distinguishing between similar objects, I think. It’s hard to distinguish between two similar SDRs.
Here’s how I recall the object layer working in the original paper. Each feature (in context of location) is associated with an SDR in the object layer. When it touches the first feature, it sets the object layer to the associated SDR. Then, each time it touches another feature, it takes the bitwise AND of the object layer’s SDR with the feature’s associated SDR. Each feature is essentially removing objects from a list of possibilities (a union of SDRs, one SDR per possible objects), leaving only the objects which have that feature.
It’s the location relative to the object. It’s learned, not spatial computations. They’ve talked about it being L6 before, but I think it’s not set in stone.
Surely be! Thanks and I’m taking a good time digesting it.
Thanks!
I started diving into neuroscience only by Numenta’s model, so far (or short) of my knowledge about how this “removing of possibilites” happens, it’s predicated cells inhibiting other cells in the same mini-column from firing. I don’t know, other good models/theories as inspiring?
I’m not sure whether there are minicolumns in the object layer. It might just use the top k cells. I could be wrong though.