Evolution of the neocortex

I think it is likely there are different algorithms at different scales. For example, at the scale of individual neurons, local collections of neurons, regions, inter-regions, across the brain etc. The highly recurrent patterns of connectivity make all of this extremely complicated. If this is the case, then there are potentially algorithms running at the scale of the neocortex (and the entire brian) that have been evolving for over 65million years. As the brain specialized then it would optimize on all these scales i.e. it is not a compositional system but a complex system.

That there are general principles replicated throughout the neocortex is not, in itself, a claim of much interest. What is of value is defining the exact details. Jeff has made the idea of cortical columns running an identical algorithm throughout the neocortex the centerpiece of TBT.

Like any scientific theory it cannot be proven correct, it can only be proven wrong. Science advances by demonstrating that the most compelling theories are wrong. Given the details above about Brodmann area 4, then I think Jeff might need to change the hypothesis in regards to claims about the uniformity of the ENTIRE neocortex. His theory may still apply to large parts but then he needs to define which parts.

2 Likes

Well it depends how we define algorithm. As more complex algorithms can be seen as composed of other simpler algorithms.

The brain has algorithms for ltp, for ltd, for synaptic competition, for arealization(the creation and organization of brain areas), among others. There are also various algorithms implemented in the hippocampus for memory storage, and in the basal ganglia for reinforcement learning and potentially other things.

Now something that I’ve heard but not sure if its been confirmed, is that some fmri data suggests around a few dozen peaks of activity in different parts of the brain during the performance of single tasks. Given that each hemisphere is said to have around 200 areas, that suggests an interesting possibility. It suggests to me the possibility that only a fraction of higher areas might be active at any one time, which, if such turned out true, would require some sort of inter-area competition algorithm.

To me it highlights the importance of inhibitory activity of neurons in the brain… right now, I feel most folks are focused on activation functions, without giving nearly enough thought to the effect of suppressing activations. Sure, there’s drop-out in neural networks, but even in most cases, that’s random and isn’t driven by any meaningful algorithm.

3 Likes

A similar issue with the glial cells, there is some neuroscience (e.g. Steve Potter) showing they are involved in information processing.

Here is an image showing how distributed the regions associated with detecting face objects are https://science.sciencemag.org/content/sci/320/5881/1355/F1.large.jpg There is some compelling neuroscience about face recognition that seems to fit with the idea of metric reference frames but does not fit with the idea of TBT and cortical columns e.g. from 2017 https://www.researchgate.net/publication/317305790_The_Code_for_Facial_Identity_in_the_Primate_Brain “By formatting faces as points in a high-dimensional linear space, we discovered that each face cell’s firing rate is proportional to the projection of an incoming face stimulus onto a single axis in this space, allowing a face cell ensemble to encode the location of any face in the space…Our work suggests that other objects could be encoded by analogous metric coordinate systems.”

2 Likes

My take is that it is not really one or the other – there is a little bit of both a repeated algorithm and specialized tweaking on top of it (for example, neurons in the areas associated with language are more myelinated, allowing them to conduct impulses at faster speeds). I’ve seen several videos where Jeff has acknowledged the same thing, while noting that the similarities are still more striking than the differences. In any case, hopefully understanding the basics of the CC circuit will give us an opportunity to explore how tweaking various parameters and configurations causes it to behave differently. That sort of tuning will likely be necessary when building more complex intelligent systems.

3 Likes

I discussed a little bit about the argument of “the similarities are still more striking than the differences” earlier in this thread. What is being compared? Any region of the brain has more striking similarities with any other region of the brain compared to anything that is not a brain. That does not argue against the idea of the neocortex being different from the hippocampus. If you are going to make comparisons then they have to be within the neocortex, in which case the differences are massive e.g. a layer disappears. If there were not major differences Brodmann would not have identified differences.

1 Like

Right, I think the argument though is that the differences between regions in the neocortex are notably less than the differences between other specialized components of the brain. It isn’t that there aren’t any differences. If the hypothesis were that there are literally no differences, then it has already been falsified by published observations.

I don’t think that is the hypothesis, though. It’s more that there is a common cortical algorithm that is repeated. If so, then the real test would how much those observed differences affect the function of the circuit. If they create drastic differences, then the hypothesis is falsified. If they provide more of a tuning function, then I would argue that supports the hypothesis. It may be difficult to make this sort of observation until we have a better understanding of the circuit (so we better know what to look for in the published literature).

That said, I am not a neuroscientist myself, so others’ perspective on this thread holds a lot more weight than mine :wink: I am sure if there really are a variety of different cortical circuits (rather than just one that is repeated and tuned), that fact will be discovered as research progresses.

1 Like

I would expect that the differences in regions have much to do with where the area is in the hierarchy. If the feedforward and feedback functions are localized in different layers then being at the start or end of the chain makes one or another layer redundant in some areas. Likewise, emphasizing different parts of the algorithm makes sense to me at different parts of the processing chain.

As I presented in the learning circle and @markNZed so nicely summarized, it may make sense to think of the basic processing kernel as having parameters that could be optimized for different positions in the processing chain.

Some example, in V1 there is no “prior” region so only the feedback is relevant, and the optimization is for Gabor filters.

The areas that are the terminus of both short and long fiber bundles would be another factor that calls for specialization in those regions.

In the hub regions, there is a need for symbol communications between the hubs so the ratios between lateral axonal projections and inhibitory interneurons discipline the TBT behavior to form Calvin tiles. The tile patterns that form are likely to be the equivalent of pattern labels.

The EC/HC, being the terminus of the hierarchy and the phylogenetically oldest regions are specialized to communicate with both each other and “newer” neocortex. The simple EC layer structure is likely to have been the starting point for the newer cortex that branched off from there. The requirement that the EC is able to interface with the HC could explain why the older layer structure is conserved and optimized to contain at least four separate scales of grid formation.

None of this invalidates the base assertion that a common structure/algorithm is the basis for the CC structure clearly visible in all areas of the cortex.

3 Likes

Neocortex is super complicated, so variation might appear more significant than it actually is. There’s a huge list of things common to all regions. I can’t even start to list those, especially their details, because that’d be a book length. They’re major things like cell types and connectivity.

Neuroscience has studied apes and rats/mice mainly, and also cats and others. Apes, rodents, and cats diverged over 70 million years ago, so if there weren’t a common cortical system, you’d expect fewer similarities. Likewise, cortical regions for different senses etc. diverged a long time ago within in each species, whenever the cortex first evolved regions for those things. That’s not exactly the same as different species but evolution doesn’t have to make each region similar.

6 Likes

Let’s just start with one. What is the most similar structure across the entire neocortex at the scale of cortical columns ?

I’m not sure that is even a fair question because I’m unaware of any structural evidence for cortical columns.

What is the most compelling empirical evidence for a common cortical algorithm across the entire neocortex ?

Perhaps that is unfair too because the same experiments across the neocortex don’t exist.

How about something like a cortical column in V1 detecting complex objects e.g. a coffee cup or letters of the alphabet. Surely this evidence must exist by now.

Did you look at the face recognition paper I cited above? Does that align with TBT? Shouldn’t each cortical column be able to model entire faces?

It’s not all the same, as you know. There can be extra sublayers, cell types, connections, etc. It’s just a list of shared characteristics, structural and other things, mostly at the scale of a cortical column.

There isn’t a single primary reason. The same experiments or observations are often repeated in multiple regions, although I wish there were a stronger focus on comparisons.

I don’t think it’s 100% proven that there’s a common cortical algorithm, but to me it’s like 90% at this point, and like 99% that the idea of a common cortical algorithm is at least on the right track. I think the burden of proof is on the argument that there isn’t generic cortex at this point.

I’m not convinced generic cortex is enough for general intelligence, just general perception and a foundation for general intelligence.

Vernon Mountcastle describes them in “The columnar organization of the neocortex”:
https://academic.oup.com/brain/article/120/4/701/372118

He describes them in visual, somatosensory, and motor cortex, and several association regions, saying columns appear unchanged in association cortex. They’re also in auditory cortex: https://www.sciencedirect.com/science/article/pii/S2211124719304620

Even if cortical columns don’t exist in every region, I’m not sure TBT requires them. Columns might just be a widely applicable optimization.

Another reason columns are probably the rule not the exception: interdigitated columns of different types imply the cortex evolved from something with columns and often evolves new regions first by evolving interdigitated columns.

Cortical regions often have interdigitated cortical columns for different roles, e.g. different sensory submodalities. This makes sense to minimize the length of fibers connecting neurons, because neurons with similar response properties tend to be connected more. This brings closer together neurons for a particular role which are also for a particular patch of the sensor, while also putting them somewhat near neurons for a similar patch of the sensor but which are for a different role. This would place arbitrary boundaries between neurons with highly overlapping receptive fields if the sheet had a continuous topography, leading to evolving the trend of receptive field jumps between columns. Overall, the need to position cells with similar receptive fields as well as a similar role (e.g. sensory submodality) leads to evolving interdigitated cortical columns for different roles, each containing cells with clustered receptive fields.
That’s not just true in neocortex. It also makes sense in other sheets. In entorhinal cortex, cytochrome oxidase stains imply similar discrete interdigitated columns of multiple types. That means the neocortex probably evolved from something which already had cortical columns.

The cortex doesn’t always interdigitate columns for different roles. Physically separate regions of the sheet often map the sensor, e.g. V1 and V2. Cortical columns provide a mechanism to evolve new regions.
An existing region can evolve interdigitated columns, and then separate those columns out into a new region. One function difference between types of columns and separate regions is that regions are often at different levels of the corticothalamocortical hierarchy, but the levels of this hierarchy are generally described in terms of whole regions rather than their types of columns. The barrel cortex might be an example of this separation from interdigitation to a new region in process. The interdigitated septal columns are likely at a different level of the corticothalamocortical hierarchy and are similar to the adjacent dysgranular zone (sometimes even considered part of it), suggesting the septal columns are part of that region.

If the idea of object composition is correct, you’d expect V1 to look for simpler objects, even just edges.

I read the abstract just now. What’s the way you’re thinking it doesn’t align with TBT? A big thing missing in HTM theory right now is egocentric representation, meaning reference frames relative to the body rather than the object, which are necessary for behavior like grasping an object. One idea I heard is that the egocentric location info can get added back in. That makes sense because there are some egocentric coordinate systems which are completely different from location on the sensor (see Hemineglect - Scholarpedia).
It probably doesn’t matter here, but facial recognition definitely has specializations (Prosopagnosia - Wikipedia).

In some part of the cortex, yes. That doesn’t necessarily mean individual cells respond to specific faces. It might not even mean the SDR represents a face, if the representation is encoded by neurons not firing (like how neurons not firing, leaving just one or a few firing in each minicolumn, represent sequence context in the temporal memory).

1 Like

The primary functions of the cortical algorithm, besides the central concept of movement-oriented reference frames, as I see it (this may deviate from the official theory) are:

  1. Feature extraction. A lot of information comes in from the sensors, and the algorithm needs to split out various features from it. I expect the outputs from a cortical column to be radiating out in multiple directions, sending sub-sets of the original semantic information

  2. Feature binding. As the semantic information radiates out of each cortical column, it would be interacting with information from other cortical columns, and as signals meet they are either interfering or supporting each other. Where they are supporting, the algorithm learns correlations.

  3. Precision through voting. Each cortical column would be tuned to modeling the objects that it encounters most frequently. An individual column is akin to an individual grid cell module. It is imprecise and has only part of the information. Like a bunch of GCMs together vote to depict a precise location, a bunch of CCs together vote to depict a precise object.

With this view, individual CCs are modeling complete objects, but those models are imprecise and need to work together to depict something with precision. To me, this would explain why, for example, you would see activity in association areas for complex objects.

This may deviate from the official theory (I haven’t seen networks of CCs being explicitly compared to networks of GCMs), but I don’t see it as contradictory to the basic concept of TBT. Individual CCs would still be generating complete models of things, and be networked together with thousands of other CCs to bring lots of fuzzy, imprecise models into precision. And it would imply a common, repeated algorithm.

1 Like

The link is not working.

That is a better argument. I was under the impression he had done the tests in areas associated with touch.

I’m not sure about that. My impression is that TBT implies objects like letters lower down in the visual hierarchy than what is typically claimed. But that is just an impression.

From the abstract (I’ve not read it either) “Using this code, we could precisely decode faces from neural population responses and predict neural firing rates to faces.” Considering the image I linked to you can see that the scale is much larger (regions not cortical columns of around 1mm2).

It would be good if you could provide the reference to the Mountcastle paper, thanks.

My impression is that Jeff is looking for this at every minicolumn not per cortical column.

I would rather keep this thread on TBT and avoid everyone’s favorite interpretation. It would be good to have a compelling case for Jeff’s claims.

1 Like

I meant an individual CC is analogous to an individual GCM. In TBT, there would be many GCMs per CC, I didn’t mean to deviate from that prediction.

Fair enough. But do keep in mind that interpretations within the framework outlined in TBT might be necessary to counter some of the specific arguments against it that you are leveling. Agreed though that it would be better to get those explanations from Jeff and the research team rather than from one of the “cogs in the machine” :wink:

Anyway, to remain on topic, I won’t discuss the aforementioned interpretation here. If others choose to comment on it I can always split it off into a separate thread.

2 Likes

It should be fixed.

I’m still not understanding. Multiple columns can represent the same object. They often have to vote together to figure out what something is, and then each represents it.

1 Like

I have wondered about this point myself.

A column (or collection of columns voting) fires when an object is within its sensory field. We know from basic HTM theory that a given column can learn many things, perhaps thousands.

With the 1K brain model and every column doing its own thing, and as far as I can tell, every column object is not labeled, what does it mean to “recognize” an object at the column level?

How is that column (or voting group) firing for a face different from firing for a tiger?

I am at a loss to see how that unlabeled recognition drives behavior selection; what exactly does that “recognition” mean as far as selecting a useful behavior?

With hierarchy and object recognition at the association regions (and in my way of thinking, forming a Calvin tile coding) I can see a coherent story that leads to behavior selection.

3 Likes

I see this as a particular column alone might not have enough information to distinguish a human face from a tiger face, but other columns that are also voting will (for example some might receive input from “old brain” elements involved in innate instinctual fear of predators, others might be good at recognizing furry animals, etc). Their activity together build up the semantics of a tiger vs those of a human.

I do also agree that there has to be some type of “labeling”, in the sense that there must be some form of emotional flavoring as part of the models (I’ve seen examples of patients which indicate we are not able to make choices without emotional context).

This is definitely one part of the theory that I have been waiting impatiently for (though I get that they can’t boil the ocean and have to start somewhere). At the risk of getting off topic again, I believe voting probably works not only for the recognition part, but also for the behavior part.

A consensus could be reached not only about what is being experienced, but also on what to do about it (keeping in mind that there is also the “old brain” in the loop here – we know that it is doing action selection and driving the neocortex). A movement vote would probably be in the context of a given CC’s learned reference frames for a given “object”. For physical movements (versus virtual “thinking” movements) something would need to translate a collection of these votes from multiple CC’s into body movements though. An associative memory of some type would probably fit – it would have to be learned (since the reference frames themselves are learned), not hard-coded.

2 Likes

I think it might be showing population encoding at a much larger scale than cortical columns.

From the paper “Columns only vary from 300 to 600 um in diameter” I am confused by this. I think Jeff refers to columns that are about 1mm in diameter. How do you read that part (start of p.702)?

2 Likes