Two Types of Hierarchies?

I hate to beat a dead horse but I will take another whack at this one.
The “negative attitude” about hierarchy is a direct attack on this classic Christmas tree of concentration as one ascends the hierarchy:

A slightly modified version that still has most of the problems is multiple streams with sensor fusion somewhere at the top:

This hides the Christmas tree in each of the streams but implies that there is some sort of orderly progression from input to output, and that there is some high-level version that can be combined.

What I see after years of tracing out data paths is that everything is connected to everything and the connections are broad copies of each map to the next:

(Not the actual map, look at post #2 above for an example)

The WHAT and WHERE streams do eventually converge at the HC/EC complex, but everything in between is essentially a haystack of connections. Note that the processing at almost every map in the cortex is the convergence of two or more maps. I see that everything between the sensory cortex and the association regions is some sort of compare/contrast operation. After all these years I still have not been able to put my finger on an exact description of what the maps are doing with these inputs other than saying it is extracting features.


I guess that’s addressed to me. I never proposed “classical” hierarchy, it ignores temporal accumulation and discontinuities that result from pruning the search tree. I was talking about incremental generalization and composition, it has to be driven by search expansion in both space and time.

BTW, I think what and where converge in other places too. Specifically in inferior parietal, and indirectly in dlPFC


The classical description of hierarchy seems more useful if you are interested in things like anatomy and medicine, since it’s an approximation of connectivity. I think a lot of us, at least myself, get pretty annoyed by how neuroscience works because neuroscientists are more concerned with healthcare than AI.

I think the less detailed sensory input to non-primary regions is often like a different submodality. If your sensors describe a bunch of different things about the object, some of those things won’t be as spatially fine-grained as others. That can make it hard to tell whether a region is primary or higher order - does it receive direct sensory input because of level skipping, or because it processes a different aspect of the sense?

Some sensory submodalities arguably skip the first level of the hierarchy*. I think the targeted cortical region is still higher order if it doesn’t receive input from pathways directed up the hierarchy. At least, it plays the same role as a higher order region in the thousand brains theory. Maybe it’s better to talk about scale than hierarchy.

*E.g. a couple whisker pathways (VPM head and tail) only target whisker S1 in the septal columns and dysgranular zone (maybe specifically L4, I don’t remember), which are higher order if you define hierarchy by the CTC pathway and take the results from a few papers for granted. The thalamocortical axons spill over into the barrels but those might target distal dendrites of cells outside the barrels. As another example, some types of koniocellular cells in LGN might not target L3/4 of V1. I might be remembering wrong.


Let’s talk about both! Looking at a very large item that takes up your whole Field of View would depend heavily on higher levels of hierarchy, which get larger FOV from the sensory space. V1 is looking at very small FOV (like a straw).

Imagine an elephant standing a few paces in front of you. V1 might not have any columns that can identify the object, even with very high details of small areas. But higher levels will get bigger scale inputs, and be able to match objects it has seen at that scale. In this case, voting from the high levels of hierarchy can inform the lower levels on what the object is. But you can still recognize elephants in V1, they just have to be on the horizon, basically very small. Then V1 can inform your higher Visual levels what that hazy grey spot is.


But, in vision, at the sub-cortical tap level, the coarser sampling gives more weighting to the wider field and not as much to the expanded fovea.
The pre-attention before the scanning in the FEF and V1 serves to have a larger spatial extent in scene analysis.

The sub-cortex response by driving the eye to extract a stream of detailed features without as much concern about how apart they are. The balisitic distance of the scan fixations informs on the scaling of the object.

1 Like

The voting process is bi-directional, right? Each region basically creates a list of possible objects and they compare them to narrow down those lists further. If the connections between the regions are the same in both directions, that doesn’t seem like a hierarchy.
I recall displacements involving a pathway up the hierarchy. I don’t see the connection between that and direct sensory input to higher order regions (the idea of level skipping). I’m not saying that hierarchy isn’t part of HTM, just that it seems like each sensory pathway targets regions based on scale rather than always targeting the first several levels of the hierarchy. There could be regions at the first level for a broad aspect of the sensory input, on equal footing with a region for more precise aspects. I don’t see why it would have to only send the broader input to regions higher in the hierarchy.


Yes and no. As far as I have seen the connections are never one-2-one, it is always two or more regions projecting into the target.


I’d be careful of drawing conclusions from this, although it’s probably right for the most part since connections are usually imprecise. There are ~10 layers/sublayers, so 100 layer-layer connections, and they can direct in either direction or both, so that’s 300 ways for two regions to connect, plus maybe two trans-thalamic pathways (m and c) so 360. Even if a small subset of those connections are allowed depending on the relationship between the two regions, like maybe 10, most regions would connect to most others. The Christmas tree might be hiding somewhere in there, not that I think so.
Another reason to be cautious is that results can be wrong, which contributes to the sense of lack of organization I get when I research something for a while.


But the regions which connect usually connect both ways, right?

Yes, so the learning is both local and distributed.


Almost. Evolution has a very strong test for the success or failure of aggregate judgment: you reproduce or not.

1 Like

That test is extremely coarse and dirty. We can do a lot better with sustained introspection.
Unfortunately, evolution had no use for sustaining it long enough to develop AGI. For this task, we are all ADHD.

That’s true over millions of years, but maybe not tens or hundreds of millions. The cortex is similar for all of its functions since it reuses the same basic 6ish layers. Once evolution stumbles upon something it can re-use, that species gets a boost to its ability to evolve so it or its successor species will survive better.

Right, but I was talking about AGI, not HGI. I meant psychological aspect of the work: human mind has to produce some short-term tangible results to maintain interest in the subject. We have a version of GI, but our motivation and attention span evolved for hunting and gathering. I think that attention span must be intensely cultivated for the brain to understand itself though introspection. Kind of what philosophers are supposed to do, but they are truly, utterly horrible at it.


I did not know what to write about this, but I did ramble on about hierarchy and scale for about 10 minutes.


That was excellent, seemed to clear things up for me quite a bit! I think the word you used, “context” was perfect, then you changed it to “reference frame.” Why do you prefer that term over context? As I consider the “scale” hierarchy I think of larger scale data, as it travels down the hierarchy to V1 for example as context information. Broad contextual information, “here’s how you should view whatever you’re looking at.”

I think the sight I was trying to express or explore was this idea, let me approach it from a different angle:

Forget about the brain for a minute. Just think of a network of nodes; just the concept of a network in general. Give those nodes random connections, but give each node about the same number of connections (a connection being: who the node listens to). Now pick a node at random. That node will have a larger number of connections to “nearby” nodes than “further away” ones. You may ask, ‘what metric for distance are you using here?’ The distance metric is ‘how many nodes are between me and this node, other than our direct connection?’ Some nodes will be very far away by that metric, most will not. Let me draw that out:

Look at A. B, C and D are all really close to A because they’re all really close to each other, they’re all nearly interconnected. But E is not close to any of them. If A lost it’s connection to E (in this diagram we can consider each like to be bidirectional), it would have to wait for the information to propagate up the left side of the image to D before A could hear about it.

Why is this important? because we can consider A’s connection to E to be a connection to a higher level node. A connection to a larger scale node. And we can consider ADBC to comprise a smaller scale node.

Nodes that are highly interconnected, or close, are of course going to be talking about the nearly same patterns, whatever their communication ends up being. They’ll have a similar protocol, further away nodes wouldn’t they’re far away. But there are some patterns that are similar, just fewer of them.

Realizing this same pattern is true for every node, it seems the scale hierarchy is nearly a function of the number of connections each node has. C is connected to some node that represents a far distance from their little clique too, so does B and D presumably. In this way each member of the group has feelers out in different directions to the larger hierarchy. They can take the information they get from their distant connections and translate it for their inner circle, since they each specialize in the protocol and patterns shared in whatever remote corner of the network they happen to be connected to.

So I wondered if a generalized AI network is one in which each node is always trying to connect to the furthest away nodes from its localized group. But anyway I think that’s the extent of my thoughts on the matter… except for this: Since it’s all so elementary I’m sure there’s some metric in mathematics or graph theory that describes this perfectly well, I’m just not educated enough to know about it.

Anyway, I really enjoyed your thoughts on the topic @rhyolight


Every object has it’s own reference frame. The input to V1 intersects with all the object reference frames we’ve ever built. When it is ambiguous, like when you are standing too close to an elephant, you can’t perfectly ID the reference frame. But when V3’s input intersects your object library, it identifies an elephant. This object vote is communicated to V1 through lateral input so it can use it to filter out bad guesses and lock into the same object. Once the object is synced across the regions, all cortical columns can contribute to learning the object within their individual reference frames through inspection.

Your diagrams remind me something I’ve heard a called “the Small World Hypothesis”. It means that links within a very large network are actually always pretty close together through the network graph. I think this is true for social networks as well as neurons, and even more true when you add a physical analog topology to the mix.

1 Like

I’m trying to understand the difference between V1 and V3, or one region and another higher in the hierarchy. I’m looking for some sort of asymmetry, or is the hierarchy not reflected by connectivity? Is voting different from V1 to V3 than V3 to V1?

I think that thinking about scale in terms of hierarchy is an oversimplification. Two scales are like two different senses. For example, MT is specialized for motion in the visual world. That sort of requires its receptive fields to have a certain range of sizes. That’s true of a lot of sensory submodalities, even different types of cortical columns in the same region. Since voting integrates multiple scales of RFs and multiple senses, I don’t see why scale of receptive fields is different from other aspects of sensory input.

Some convoluted reasoning which I spent 8 months on suggests different types of cortical columns in the same region can exist at different levels of the hierarchy. That could mean scale = level in hierarchy even though scale depends on the submodality, multiple of which can exist in the same region. The reasoning would also mean a given level of the hierarchy doesn’t always include all layers of the cortex, which would be weird in its own way.

Convoluted Reasoning

I think the first level of a hierarchy in the whisker region of S1 lacks L2 and L6b.

In the barrel cortex, there are barrel columns and between them is the septal domain. In some layers or sublayers, such as L6b and L2, barrel structures round off and disappear before extending through the whole sublayer. It seems like the septal domain is higher in the hierarchy (defined by L5 -> thalamus -> L4/6) than the barrel columns. Either that, or any of a few papers are wrong and projections from L6a to the thalamus don’t follow an apparent rule.


First of all, I love the diagram, it totally captures the basic idea. I showed it to Jeff and he liked it too (although he did not like that I was using V3 in my elephant example but that is my fault).

It is a nice idea to think of V1, V2, and V4 as different sensors. V1 is like a high acuity narrowly-focused eye, V2 is like a lower acuity wider angle eye. V1 and V2 can vote via long range L3 connections.

The analogy doesn’t explain the feedforward hierarchical connections between regions such V1 to V2. These are the V1 L5 to thalamus to V2 projections. We don’t have a good understanding of this yet. In the frameworks paper we suggested L5 could represent displacements between two objects, thus L5 is passing up a composition of observed objects. That might be true but we don’t have a definitive answer yet. – Jeff



I agree that the CTC pathway contradicts thinking of each scale as a different sense. I don’t mean it just as an analogy though. At least in the whisker system, I think it’s more literally true. Of course each scale isn’t a whole different sense like vision and hearing, but different types of receptor cells in the same sensory organ have differently sized receptive fields. If you want to call different types of receptor cells different submodalities, each submodality exists on its own scale.

That’s not to say that all submodalities are always on different scales, just sometimes. To be clear, I’m referring to sizes of receptive fields, not scales of objects.

I’ll just outline the evidence for now. I can write something much longer and more factually accurate but I think the nitpicks are too distracting to get the point across first.

There are several sensory nuclei in the whisker system, each with their own sensory response properties. They project to the thalamus. The sensory nuclei which project to the three higher order parts of the thalamus (see [1]) have larger receptive fields. Sensory input alone can drive higher order thalamic cells to fire, and since sensory input arrives before signals from L5, it’s more important for producing cortical responses during the window of opportunity*. Those parts of the thalamus are higher order but their direct sensory inputs are more important, so scale of sensory input can determine the appropriate level in the hierarchy.

*Though there is a mechanism involving zona incerta which, depending on motor cortex activity, gates the initial response in at least one of the three higher order parts of the thalamus [3].

Possible additional evidence that level in the hierarchy is tied to submodalities rather than every submodality's hierarchy starting at primary cortex. Some of this is relied on in what I wrote above, namely I relied on the septal part of barrel cortex being higher order.

I couldn’t quite make sense of L6 CT feedback connectivity until I read one study [2]. Part of barrel cortex (barrel columns) projects from L5 to two evidently higher order parts of VPM which project to the other type of compartment in barrel cortex (septa, in the right layers for the feedforward CTC pathway). That means the barrel cortex, part of primary somatosensory cortex, actually has a higher order component in which a bunch of cortical columns of primary cortex are embedded. That solved what seemed like an exception to the rule that CT feedback from upper L6 only travels down the hierarchy.

That’s not directly related to each scale being a different sense, but it’s weird. I don’t know whether the septal and barrel domains act like two separate regions, but just maybe they share some things, like some sublayers where the two domains are less distinguishable. Maybe this organization exists in other primary regions since it involves some pretty thin, difficult to notice subnuclei. For example, a higher order component might exist in V1. The zona incerta mainly targets higher order thalamic nuclei. A couple supposed exceptions in that study which were the higher order parts of VPM, which is generally considered primary since most of it is. There was also an exception in mouse LGN. Maybe that part of LGN receives input from zona incerta because it is actually higher order.

Septal and barrel columns are two types of cortical columns. There are lots of columnar specializations in other regions, like blob and interblob columns in V1. Those two types of columns have different sensory response properties, presumably including RF size. The same region can handle multiple scales if it uses different types of cortical columns for each scale. It can use the same types of columns to handle different sensory properties, perhaps contrast and orientation, because those properties describe somewhat different scales of the visual space.

[1] Convergence of Cortical and Sensory Driver Inputs on Single Thalamocortical Cells

[2] Distribution of Large Terminal Inputs From the Primary and Secondary Somatosensory Cortices to the Dorsal Thalamus in the Rodent
[3] Motor Cortex Gates Vibrissal Responses in a Thalamocortical Projection Pathway