Based on higher-level diagrams of a couple of the layers involved in TM and SMI:
What strikes me as unusual is that L2/3a and L3b appear to be internally connected in exactly the same way (forming distal connections with other cells within the same layer), with the only difference being where they are receiving proximal input from. And yet the former performs representation pooling, while the latter performs sequence memory. What are the structural differences between these two layers which allows them to perform these two different functions?
I donât know much about L2/3, but if I recall correctly, the thalamic input to L2/3 has a gradient in density, so for example thereâs stronger input to lowest L3b and weaker at the border with L3a. That would suggest L3b is basically doing the same things as L2/3a, just with a gradient of SMness and PoolerNess.
Iâm not sure though, maybe there is a sharp border.
If pooling works by removing possible objects, then maybe you just take the SM and inactivate cells which donât match the objects with all of the features thus far. You might not need constantly firing cells to represent the object (at least not before regions with working memory). If you arenât sensing a feature on the object, you arenât sensing the object, so those cells maybe shouldnât fire.
I tend to think there must be a combination of sequence memory and object pooling (or a third option which does both and something else). Auditory objects are sequential features, but location in space also sometimes exists like for physical objects since sources of sounds arenât usually single points in space (the sound of a nearby tree in the wind, for example). Thereâs also probably a lot of information about feature identity in <100 ms time step sequences for other senses and for object identity if youâre sensing multiple parts of the object at once. Every sense is kind of like hearing in that way. Since sequences are tied in to objects, maybe separate layers arenât dedicated to one or the other, but rather a gradient of which they are responsible for. For example, this gradient could be in thresholds for activating for sequence context and for object context.
That question mixes enquiry about actual connections, with how HTM functionally models them. And to a point, mixes intrinsic structural with learned structural. Iâm not sure it can find a simple answer when stated that way.
Iâm not as fluent in the âfunctionalâ abstractions into poolers and stuff as you are. What I know of these concerns from the biological side, however, is that :
layer 4 spiny stellates start their lives as regular PCs looking like L2/3 ones, and arguably decide by themselves to shrink their apical tuft and wire everything instead to what they have as available axons afferent to L4, when those are actually active (eg carrying sensory input, when the sensory pathway isnât impaired).
position of afferent axons to an area depends on the area. As far as V1 is concerned, not everything from LGN will end up âcleanlyâ for L4 to sample⌠part of it will get wired somewhat above it. In fact many semantically distinct information pathways can be seen as ending up to slightly distinct depth positions, all the way up to L2/3.
possibly because of the above, the labeling of layers itself may be subject to controversy. Layer 3b in particular can be seen as a â4aâ for some, or the other way around. I guess it canât be clear cut, and depends on actual inputs.
I believe the local axons, to the contrary, form an âintrinsicâ structure. So, PC which are around classical L4 positions will output signals to some given depths in the macrocolumn, while PC which are more towards L2/3 will have other axonal blueprints. Mixing the two, youâll end up with stuff between L2 and L4 which are, at best, âfunctionally somewhere in betweenâ (And at worse, âfunctionally distinctâ in an emergent way). Note that IN concerns are also likely structural, and likely an important part of the âfunctionalâ question.
I hope this answer opens more possibilities in your view, rather than obscuring things more⌠But these are complicated matters.
Yes, definitely wasnât looking for a simple answer, but hoping this line of inquery will uncover some insights into what biological evidence Numenta leaned on when choosing the functional uses for these two layers in HTM theory (for a non-neuroscientist, they seem like an odd choice based on their similarities in high-level diagrams that are used to explain the theory)
This conclusion would click in my mind if HTM theory were modeling L3b for sequence memory and L2/3a for the SMI input layer (gradient would dictate how much of the context temporal versus spatial). But that isnât how Numenta is modeling the theory. In the model, the temporal vs spacial context difference is between L3b and L4. L2/3a is modeled in HTM theory as the SMI output layer (forming composite object representations from lower-level objects), which in my mind seems a very different function than the function of combining input with context as modeled in L3b and L4.
Anyway, I was just hoping for some insight into the biological evidence that prompted the selection of these two layers for the functions they have been assigned in current HTM theory. On the other hand, not being a neuroscientist myself I may not understand the reasoning anyway (I certainly wasnât able to pick it up from reading the refs that @gmirey posted )
Hey donât make me look as one ! ^^â just trying to make sense of brains as we all do
Iâm certainly not up to date with which HTM functions are supposed to model what layersâŚ
Iâd have guessed for my part that both SP and TM were (necessarily) on same layer. And that it was L4.
Sorry, my response wasnât well explained. By âPoolerNessâ I meant how much it resembles the layer which represents objects. By gradient, I meant between sequence content and representing objects.
I wasnât trying to directly answer your question, just complicate the picture a bit and suggest that maybe the internal connections just reflect sequence memory and representation of objects being not entirely separate and maybe the current binary thinking about which layers do which of those is wrong.
My guess is that the thalamic input to L3b makes those cells more tied to the current sensory input, and perhaps also more inhibited based on thalamic input since the nearby inhibitory cells in the same layer as those pyramidal cells may also receive thalamic input. Meanwhile, L2/3a cells have a larger fraction of their inputs either from each-other or from other cortical cells than from the thalamus, suggesting a function which involves firing not directly tied to the sensory input. For example, when the sensor moves off the object for a bit while going between features, the cells representing the object still fire whereas sequence memory cells do not fire without sensory input.
Hmm⌠so essentially the idea could be that flooding an area with more direct sensory input might cause the population of active cells in that area to change more rapidly, resulting in more granular representations as a result. Less of such input would result in the population of active cells changing less frequently, essentially âsmearingâ the representations into something less granular.