What are the internal differences between L2/3a and L3b?

Based on higher-level diagrams of a couple of the layers involved in TM and SMI:


What strikes me as unusual is that L2/3a and L3b appear to be internally connected in exactly the same way (forming distal connections with other cells within the same layer), with the only difference being where they are receiving proximal input from. And yet the former performs representation pooling, while the latter performs sequence memory. What are the structural differences between these two layers which allows them to perform these two different functions?

I don’t know much about L2/3, but if I recall correctly, the thalamic input to L2/3 has a gradient in density, so for example there’s stronger input to lowest L3b and weaker at the border with L3a. That would suggest L3b is basically doing the same things as L2/3a, just with a gradient of SMness and PoolerNess.

I’m not sure though, maybe there is a sharp border.

If pooling works by removing possible objects, then maybe you just take the SM and inactivate cells which don’t match the objects with all of the features thus far. You might not need constantly firing cells to represent the object (at least not before regions with working memory). If you aren’t sensing a feature on the object, you aren’t sensing the object, so those cells maybe shouldn’t fire.

I tend to think there must be a combination of sequence memory and object pooling (or a third option which does both and something else). Auditory objects are sequential features, but location in space also sometimes exists like for physical objects since sources of sounds aren’t usually single points in space (the sound of a nearby tree in the wind, for example). There’s also probably a lot of information about feature identity in <100 ms time step sequences for other senses and for object identity if you’re sensing multiple parts of the object at once. Every sense is kind of like hearing in that way. Since sequences are tied in to objects, maybe separate layers aren’t dedicated to one or the other, but rather a gradient of which they are responsible for. For example, this gradient could be in thresholds for activating for sequence context and for object context.

1 Like

That question mixes enquiry about actual connections, with how HTM functionally models them. And to a point, mixes intrinsic structural with learned structural. I’m not sure it can find a simple answer when stated that way.

I’m not as fluent in the “functional” abstractions into poolers and stuff as you are. What I know of these concerns from the biological side, however, is that :

  • layer 4 spiny stellates start their lives as regular PCs looking like L2/3 ones, and arguably decide by themselves to shrink their apical tuft and wire everything instead to what they have as available axons afferent to L4, when those are actually active (eg carrying sensory input, when the sensory pathway isn’t impaired).
  • position of afferent axons to an area depends on the area. As far as V1 is concerned, not everything from LGN will end up ‘cleanly’ for L4 to sample… part of it will get wired somewhat above it. In fact many semantically distinct information pathways can be seen as ending up to slightly distinct depth positions, all the way up to L2/3.
  • possibly because of the above, the labeling of layers itself may be subject to controversy. Layer 3b in particular can be seen as a ‘4a’ for some, or the other way around. I guess it can’t be clear cut, and depends on actual inputs.
  • I believe the local axons, to the contrary, form an ‘intrinsic’ structure. So, PC which are around classical L4 positions will output signals to some given depths in the macrocolumn, while PC which are more towards L2/3 will have other axonal blueprints. Mixing the two, you’ll end up with stuff between L2 and L4 which are, at best, “functionally somewhere in between” (And at worse, “functionally distinct” in an emergent way). Note that IN concerns are also likely structural, and likely an important part of the “functional” question.

I hope this answer opens more possibilities in your view, rather than obscuring things more… But these are complicated matters.


Some refs. maybe explaining why I think that way.

Common origin of PC and Spiny Stellate, (or what I’d call ‘learned structural’)

Positions of afferent axons in V1 (efferent from LGN)

Debates over labeling of layers


Yes, definitely wasn’t looking for a simple answer, but hoping this line of inquery will uncover some insights into what biological evidence Numenta leaned on when choosing the functional uses for these two layers in HTM theory (for a non-neuroscientist, they seem like an odd choice based on their similarities in high-level diagrams that are used to explain the theory)

This conclusion would click in my mind if HTM theory were modeling L3b for sequence memory and L2/3a for the SMI input layer (gradient would dictate how much of the context temporal versus spatial). But that isn’t how Numenta is modeling the theory. In the model, the temporal vs spacial context difference is between L3b and L4. L2/3a is modeled in HTM theory as the SMI output layer (forming composite object representations from lower-level objects), which in my mind seems a very different function than the function of combining input with context as modeled in L3b and L4.

Anyway, I was just hoping for some insight into the biological evidence that prompted the selection of these two layers for the functions they have been assigned in current HTM theory. On the other hand, not being a neuroscientist myself I may not understand the reasoning anyway (I certainly wasn’t able to pick it up from reading the refs that @gmirey posted :laughing:)

1 Like

Hey don’t make me look as one ! ^^’ just trying to make sense of brains as we all do :wink:
I’m certainly not up to date with which HTM functions are supposed to model what layers…
I’d have guessed for my part that both SP and TM were (necessarily) on same layer. And that it was L4.

1 Like

Sorry, my response wasn’t well explained. By “PoolerNess” I meant how much it resembles the layer which represents objects. By gradient, I meant between sequence content and representing objects.

I wasn’t trying to directly answer your question, just complicate the picture a bit and suggest that maybe the internal connections just reflect sequence memory and representation of objects being not entirely separate and maybe the current binary thinking about which layers do which of those is wrong.

My guess is that the thalamic input to L3b makes those cells more tied to the current sensory input, and perhaps also more inhibited based on thalamic input since the nearby inhibitory cells in the same layer as those pyramidal cells may also receive thalamic input. Meanwhile, L2/3a cells have a larger fraction of their inputs either from each-other or from other cortical cells than from the thalamus, suggesting a function which involves firing not directly tied to the sensory input. For example, when the sensor moves off the object for a bit while going between features, the cells representing the object still fire whereas sequence memory cells do not fire without sensory input.

Hmm… so essentially the idea could be that flooding an area with more direct sensory input might cause the population of active cells in that area to change more rapidly, resulting in more granular representations as a result. Less of such input would result in the population of active cells changing less frequently, essentially “smearing” the representations into something less granular.

1 Like

I hadn’t thought of that smearing effect, but it makes sense to me.