Evolution of the neocortex

Jeff’s recent book motivated me to better understand the evolution of the neocortex. I came across the series Progress in Brain Research with a chapter The origin and evolution of neocortex: From early mammals to modern humans published in 2019 by Jon H.Kaas, a Professor of Psychology who was an assistant professor in neurophysiology and has “studied the brain for some 40 years”. I will present a few quotes and raise some concerns regarding the TBT hypothesis.

“Comparative studies of the subdivisions of neocortex of mammals indicate the number of cortical areas varies greatly, but a small number of cortical areas, about 20, are consistently found across members of the three major branches of mammalian evolution”

“Early mammals were small and largely nocturnal.” which means vision was not as highly developed as other senses while dinosaurs were around about 65 million years ago.

“it appears that the overall pattern of the variation in neuron packing densities was more even across cortex in early primates and not that much different from other mammals, but increased as anthropoid primates emerged, and increased further with the evolution of apes and humans. This variation is important because it is part of the specializations of cortical areas for different functions.”

“Motor cortex lacks an obvious layer 4 of small neurons, and is specialized for summing information by having large pyramidal neurons with large dendritic arbors. In contrast the small layer 4 neurons of granular prefrontal cortex are ideal for preserving information. We see the starts of these specializations in the neocortex of small strepsirrhine primates, and these beginnings of neuron specializations are greatly enhanced in monkeys, apes and humans.”

“chimps and humans shared the common ancestors 6–8 million years ago”

“Language is a unique human accomplishment that is completely dependent on new features of the human brain. First, the neural mechanisms that mediate language are highly lateralized to the left cerebral hemisphere. The major advantage of such an arrangement is that it avoids the need for massive connections between the two hemispheres, which would be costly in conduction time, energy, and bulk”

“Language appears to depend on sub-networks that were derived from cortical networks for object recognition and action that emerged in early primates, the so-called ventral and dorsal streams of processing for vision that have been joined by auditory and somatosensory components.”

Jeff has claimed that the rapid expansion of the neocortex justifies the idea of a common cortical algorithm that is reproduced. From TBT p.26 “the major expansion of the modern human neocortex relative to our hominid ancestors occurred rapidly in evolutionary time, just a few million years. This is probably not enough time for multiple new complex capabilities to be discovered by evolution, but it is plenty of time for evolution to make more copies of the same thing.”

Millions of years seems a long time for evolution - it can get from chimps to modern humans. The specialization of the language area is a concrete example of just how fast evolution can change brain structure when driven by functional adaptation.

There are over 200 specialized regions in the human neocortex and at least 20 of those have been evolving for over 65 million years. If there was a common algorithm that leads to increasing intelligence by replication, then it is hard to see why there is a rapid increase in brain size of the human evolutionary line in only the last few million years. This seems to be a strong case for there being different cortical “algorithms”. Large parts of our neocortex are still using similar structures to mammals from 65 million years ago. I think the concept of “functional shift” makes a lot of sense in fitting the puzzle together. For example, a functional shift in visual processing gave rise to new functions of language. Support for this would be structural differences in the different regions - which are observed.

TBT brings other arguments to bear for a common algorithm:

“If I showed you two silicon chips with nearly identical circuit designs, it would be safe to assume that they performed nearly identical functions.” This surprised me coming from a an engineer - nearly identical circuits lead to radically different functions in digital circuits. The combinatorial logic that composes the general function of logic circuits looks very similar unless seen under a very powerful microscope examining each connection. We could draw an analogy with the similarity of connectivity in the neocortex. But the algorithms being implemented by the combinatorial logic in a USB interface is nothing like the function of an interrupt controller. This is an example where the smallest details absolutely matter. At the details of neurons there are not even two identical neurons (at least transistors are similar in digital circuits).

Another argument from TBT is “the function of neocortical regions is not set in stone. For example, in people with congenital blindness, the visual areas of the neocortex do not get useful information from the eyes. These areas may then assume new roles related to hearing or touch.”

This is an interesting point. Given that vision is using about 30% of the neocortex and that we grow up in an environment that is intensely geared toward achieving academic success - then we should literally see a majority of geniuses are blind. It is not that being blind does not allow for genius. We know that blind people have enhanced touch and hearing. This argues for a neocortex that has a particular algorithm for sensing and something very different going on when it comes to other human traits like language ability.

The final argument from TBT supporting the common algorithm: “Finally, there is the argument of extreme flexibility. Humans can do many things for which there was no evolutionary pressure. For example, our brains did not evolve to program computers or make ice cream—both are recent inventions.”

Many other animals can also demonstrate extreme flexibility. It is more impressive to see a dolphin learn to jump through hoops than a human writing using a keyboard instead of a pen. The general abilities of humans are very impressive to humans - I suspect dolphins think we are quite stupid and overly specialized given our inability to understand the language that dolphins use. So much of the brain is highly specialized - for example 30% typically used for vision yet we can’t just close our eyes and suddenly be 30% better at general human skills like math. That is telling us the opposite story - that there are parts of the neocortex that do have more general capabilities and parts of the neocortex that are more specialized. Again this seems to argue against the common algorithm.

I would really like to believe the idea of a common algorithm. It would seem to make general AI just around the corner. But I need more compelling arguments to support other theories of intelligence being abandoned. Perhaps my main concern is that the engineering approach of reverse engineering assumes that the brain is a machine and that, like human-made machines, it is composed of simple parts. But the reason we build machines that way is because we are too stupid to engineer complex systems like evolution does. In regards to this it will be interesting to see where Google gets by using AI to design digital circuits and AI algorithms, my suspicion is that the resulting circuits and algorithms will be impossible for humans to understand and radically outperform anything a human designer can come up with using compositional thinking.

7 Likes

I think that part of the answer lies in what scope you think the common algorithm covers.
If you think of the transistor as a common algorithm then it can be applied over a very large range.
If you restrict the transistor to some higher-level combination of transistors, say a logic gate, then it has a much smaller scope of application.

1 Like

Inline with TBT the common algorithm is at the scale of a cortical column i.e. it is a complicated circuit.

1 Like

Do you think there is anything happening outside of the functions listed in this post?

2 Likes

There are theories about why the cortex expanded so quickly, like calories from tools and distance running prey to the point of heat stroke I think.

With things as complex as a processor or cortical region, it’d be a huge coincidence if they shared half their characteristics but did totally different things using totally different mechanisms.

4 Likes

If there was a common algorithm that leads to increasing intelligence by replication, then it is hard to see why there is a rapid increase in brain size of the human evolutionary line in only the last few million years. This seems to be a strong case for there being different cortical “algorithms”.

The brain is an extreme energy hog, it takes up 2% of body but consumes 20% of the energy, during childhood it is said it can even consume up to 50~% of the body’s energy, iirc. In order for brain to increase in size, it needs ample justification to push this, in an environment with limited resources and frequent famines.

Not only that but there are limits to the usefulness of intelligence in the wild without the accumulation of external knowledge. For example humans were hunter gatherers and had basically no accomplishments for hundreds of thousands of years, iirc.

It was only the advent of hunting and some say potentially cooking thanks to fire, that enabled the explosive growth of the brain.

Even so, not only does the brain take ridiculous amounts of energy, the larger brain complicates birth, and can result in increased risk of death of pregnant women especially without technological assistance. Not only that but the added development time for the brain slows generation time by requiring longer lifespan. Essentially slowing down the rate of evolution. A species with a generation time of 3 years evolves far faster than one with a generation time of 12+ years.
Also note that as during brain expansion later period it is likely ancestor lifespan was significantly long, similar to chimps or bonobos, and these have sexual maturity of around 9+~ years, that means generation time and rate of evolution is significantly slower than for other species like many rodents with far shorter generation time.

Well maybe I missed it, but it seems you did not recall or read the earlier book on intelligence. In it the example is given of how the sensory organs to cortex wiring was surgically changed in some mammals during early plasticity window time, and how the functionality of cortex was able to handle and process sensory information from an entirely distinct sensory organ than what it normally does.

MIT researcher finds that part of brain used for hearing can learn to ‘see’
MIT researcher finds that part of brain used for hearing can learn to 'see' | MIT News | Massachusetts Institute of Technology Apr 19, 2000 — The animal’s auditory cortex successfully interpreted input from its eyes. … the animals did see when visual input reached their auditory cortex .

The surprising result is that the ferrets develop fully functioning visual pathways in the auditory portions of their brains. In other words, they see the world with brain tissue that was only thought capable of hearing sounds. https://www.nytimes.com/2000/04/25/science/rewired-ferrets-overturn-theories-of-brain-growth.html

There is likely some minor specialization, but as can be seen similar functionality can occur in different parts of the brain.

Another thing is that one of the strongest predictors of the intelligence of an animal across species is the neuron count in their cortex. Higher neuron count in cortex produces greater intelligence across the board.

Practically no species has higher functional neuron count in the cortex than humans. Elephants have far lower neuron count in their cortex and most of their neurons are in the cerebellum. EVEN whales most whales being cetacean sleep with half their brain at any moment and alternate sides, so functionally even practically all whales even the largest have significantly lower functioning cortex at a time than humans. The only exception I believe is sperm whales, i think, which I’ve heard can have their entire brains awake at a time, but being without opposable thumbs and underwater technological progress is pretty much near impossible for them, it would be very difficult to tell if these could have comparable or greater intelligence than humans.

Also another thing is that the genome only has about 50MB of compressed information, iirc. The design of the brain is likely less than 25MB. Insects and simpler organisms can have more genetically specified circuitry. But as the number of connections increases the less specialized and more general it must be, as there isn’t enough information in the genome to specify it.

Not only that but another example is that when visual data is transmitted as touch through the tongue, the brain can make sense of it. Also the brain needs to be able to handle new functionality at a fast pace. There are also other things like the ability to have impressive recoveries if signifcant brain damage occurs during childhood. The brain is able to reorganize and allow even almost normal functionality even after extensive and sometimes varied forms of damage. Another example is the mutation of a novel receptor say a novel photoreceptor or smell receptor, such mutation would confer no advantage if the brain wasn’t able to adapt to information from new receptors allowing for things like more complex color vision upon such mutations occuring.

Another thing that is being observed is the evidence from AI research, it is being observed that simple algorithms with minor variations are able to handle even vastly different types of data(text, sound, images), and produce impressive results, iirc. An example is the transformer, which has been used in both text and image processing with nice results or look at GANs(generative adversarial neural networks) and their multiple successful applications https://machinelearningmastery.com/impressive-applications-of-generative-adversarial-networks/ . This also suggest that such algorithms are likely to exist, on top of the suggestive evidence from nature.

6 Likes

thanks, learned alot of stuff

1 Like

I’ve heard of a connection with cooked food providing more calories. It makes sense that changes are connected with changes in context like more calories. I am not sure how long it takes for functional specialization to emerge through evolution but I’m guessing humans have many functional shifts e.g. vocal range. The rapid increase in size of neocortex does not seem that rapid in evolutionary time, recent changes like our shrinking jaw size is perhaps more remarkable. Complex systems evolve as complex systems and not like computers where you drop in extra memory or processors. For example, supporting the slow development of human children probably requires a complex social system, which requires language, which requires…

The example of the digital circuit has almost identical mechanisms. There is much less variation in transistors than there are in neurons. The point is that you do get totally different algorithms across digital designs. If you take into account the ratio of different transistors and there density then the circuit looks identical. But if you take into account the connectivity and activity then the different parts of the IC appear radically different. Consider the brain, there is much more variation in connectivity and neuron types and the activity is so complex that nobody has a clear circuit diagram. It almost supports the opposite - there are visible differences in the structure of different regions of the neocortex, the brain has been under evolutionary pressure for large periods of time to support specializations like using opposable thumbs and language skills. It would surprise me if there is not more specialization at the algorithm level because evolution is much smarter than IC designers.

1 Like

These types of issues would push more toward specialization than replication of something that is not energy efficient. It is also pointing to diverse external factors supporting the growth which would encourage specialization for those particular niche needs.

There are obviously mechanisms throughout the neocortex that are shared e.g. there are pyramidal neurons and inhibitory neurons everywhere. There are also dynamics like hebbian learning that are shared. The point I was raising is that, if the reason the neocortex got so big was because it was using the same algorithm, then it would have happened earlier. All the points you are raising above point to other reasons that were driving human evolution. The book could have made a stronger case as to why over millions of years the most complex system in existence could not specialize.

We can say that all areas of the neocortex look alike if we compare it with anything else than a neocortex and that is true. But this argument is not in support of the common minicolumn algorithm. The support of the common mini-column algorithm requires there to be identical circuits across the neocortex. It seems to me that there is evidence for differences across the neocortex. Differences like granularity are well known and go against the idea of a homogenous structure.

I have read On Intelligence but it was maybe 15 years ago. Here I was focused on TBT because I recently read that.

This argument is also made in TBT - have you read it? In my original post you can read a quote from TBT "For example, in people with congenital blindness, the visual areas of the neocortex do not get useful information from the eyes. These areas may then assume new roles related to hearing or touch.”

How do you know it is minor specialization? What is the definition of minor in the most complex system known? Certainly sensory processing seems to do something similar and the case for this is strong given the adaptability. If TBT was only proposing an explanation of sensory processing then it would be compelling, it seems more of a stretch to other functions such as planning or language.

This is an interesting example, possibly more in favor of a scalable global algorithm rather than a composition of local computations. TBT does not address this concern, if there is a scalable algorithm at the scale of the entire neocortex (and/or regions) then this would also fit the data of rapid expansion once environmental constraints like caloric intake shift.

2 Likes

An example that might be interesting regarding diversity of structure in the neocortex is Brodmann area 4 Brodmann area 4 - Wikipedia the cortex is unusually thick; the layers are not distinct; the cells are relatively sparsely distributed; giant pyramidal (Betz) cells are present in the internal pyramidal layer (V); lack of an internal granular layer (IV) such that the boundary between the external pyramidal layer (III) and the internal pyramidal layer (V) is indistinct; lack of a distinct external granular layer (II); a gradual transition from the multiform layer (VI) to the subcortical white matter.

1 Like

That is a different example, and different thing, blind people do get some different functionality from visual cortex. But this was vision being rerouted in ferrets surgically and allowing them to see through a different area of cortex. I’m not sure if the ferret example is also brought in thousand brain, read it a while ago, and I read a lot of stuff so might be a detail I forgot if it is there

The point is there is at most 50MB of data and more likely 25MB only in the design of the cortex, very likely that there is not enough data to do too many specializations.

Well I agree with part of TBT, but not all. I think part of the reasoning for it is due to taking conscious perception at face value, the brain sometimes gives the conscious sensation of having predicted something when in fact it has not predicted but has generated the conscious sensation after the fact. This is what is called postdiction.

There are a few experiments that show the conscious sensation must emerge after events and the brain puts together a conscious plausible explanation for the sequence of events.

The color phi is the clearest example showing this happening

edit: In color phi a color change is perceived mid movement of an object, but that color isn’t given in the experiment till at the very end. Thus at least part of the perception of movement occurred after the end of the experiment, but gave the conscious sensation of having occurred in realtime prior to the experiment concluding.

I believe there is a common algorithm, but it may be simpler than what is postulated TBT. If the brain was indeed successfully predicting so many things, yeah something like TBT might be needed. But if the brain is merely giving the sensation of having predicted after getting all the facts and analyzing them, then much less is needed. It is predicting a lot but not as much as it gives the sensation of predicting.

There are some specializations, for example I’ve heard that going up the hierarchy neuron count in minicolumns drop, and the number of connections per neuron increases, iirc. The area 4 is one of the strongest output focused areas, so it likely has removed things it might not need and increased things that help it with its function.

Will say another example of general purpose hardware is cpus, cpus can run any algorithm, but they have a basic and sometimes very limited set of functionality that allows them to do that. The human brain must at some level have very general functionality, because programmers are generally able to get a feel for the inner working of practically any arbitrary algorithm. Not only can they come up with the feel, but they can discover a near infinity of arbitrary novel algorithms too.

Also another thing we have to recall is the 100 step rule that some people have said even Jeff, Though not sure if he originated it. The brain cannot run deep algorithms, whatever it is doing must execute in just a few steps, given the vast latency of neurotransmission through axons and at the synapses.

What is it doing? I think Jeff has suggested previously that it is a sort of memory system.

In computing there is something called memoization, where you store the results of expensive function calls, and when you encounter them again you provide the results

Likewise in some way I think it is just memorizing most common spatiotemporal pattterns, in both the sensory organs as well as the motor output, creating a sensory motor memory. The simpler patterns at the lower levels, and more complex patterns are stored at higher levels of the hierarchy. There are many types of complex patterns but most are not stored, you can easily recognize different faces but have lots of trouble telling apart random dirt or gravel patches.

2 Likes

This paper is widely quoted as the original source of the 100-step rule, although it never states it as such.
See page 206.
The critical resource that is most obvious is time. Neurons whose basic computational speed is a few milliseconds must be made to account for complex behaviors which are carried out in a few hundred milliseconds (Posner, 1978). This means that entire complex behaviors are carried out in less than a hundred time steps.

https://onlinelibrary.wiley.com/doi/pdf/10.1207/s15516709cog0603_1

4 Likes

I don’t see the importance of the difference. In one case the system naturally reroutes hearing/touch and in your example vision is surgically rerouted. The surgery seems impressive as far as surgery goes but I’m not sure it shows anything new. If they routed it to an area that is not used for sensing that would be very impressive but your description implies they did not do that

This does not argue against specialization. The entire organism is encoded with a huge array of diverse innate abilities. Consider all the specialization in the rest of the brain.

I don’t think Jeff is unaware that we react to stimulus. The claim is that the same mechanism can explain prediction and postdiction. For example remembering the past would still require prediction in TBT.

Nobody is making the argument that the brain does not have general ability. TBT is claiming this means the same cortical column algorithm must be used everywhere in the neocortex.

That there are common information processing behaviors across the entire brain is true e.g. it uses neurons. The claim of TBT is far more specific and regards cortical columns.

I would like to see the neuroscientific proof of cortical columns even existing across the entire neocortex. I can buy the idea of mini columns. I guess that is how biological processes build sheets of material. It would be interesting to know if similar minicolumn structure is observed in layered tissue outside of the brain.

We know that vision area has activity to touch and sound, but we do not know to what extent that offers any benefit to blind people, afaik.

The ferrets got actual real vision in response to the rerouting.

This isn’t reacting to stimulus. This is the conscious sensation being misleading. The conscious sensation seems to suggest it is in the moment, in realtime. But color phi shows that at least in some instances, the conscious sensation occurs after events have happened. But I suspect that it could even be all the time the brain is creating conscious sensation of the present after the facts, giving the sensation that it is predicting the present, but in reality it is only acting with foresight of what came after the simulated conscious present.

Everything you see and hear is a simulation within your head. If the brain is creating consciousness after the fact, not in the actual present, to me this suggests the brain might even be operating unconsciously even at a high level during the real present, but creating a delayed conscious summary of what it did, a ‘fake’ present.

Jeff claims we do (in TBT) I don’t see a reason to doubt him in your response.

I think you might be confusing your own surprise about this with other people’s understanding of experience. The idea that the now you experience is not the “wall clock time” present instant (that it seems to be) is well known and widely documented. I am certain Jeff is aware of it. You can read more about it in this paper (as an example) https://www.frontiersin.org/articles/10.3389/fnint.2011.00066/full

A simple experiment demonstrating this is to watch a tennis game while moving further away from the players. At a distance of something like 50 meters you will suddenly notice the sound of the ball being hit is delayed. That is the limit of the “time window” in which present moment experience is being assembled.

1 Like

Jeff is most likely aware, but when he comments about conscious sensation, it seemed to suggest to me that the brain was successfully predicting a lot of stuff, perhaps as slight delay in conscious perception, but it was able to act due to most predictions being successful.

Here we have another phenomenon, the constructed present includes information of what would otherwise be the future.

And this can lead to interesting consequences, for example if the person is asked to push a button if the dot moves after changing color to the second color, they probably will as the dot is falsely perceived to change color midway through movement.

We’d have to hear more from Jeff to see how this correlates with the prediction view of consciousness.

From what I gather it is minor additional benefits, nothing as great as actually fully replacing an entire sensory system as is the case in the ferret experiment with vision.

Was scrounging the web for the article which I read a while back which correlates intelligence to cortical neuron number. Couldn’t find the article, seems to have been scrubbed due to censorship, as such has potentially politically incorrect implications.

Wanted to provide reference. Couldn’t find the exact article but the following article makes reference to a similar article posted a few years back

Scott Alexander, in a March 25, 2019 post on Slate Star Codex, titled “NEURONS AND INTELLIGENCE: A BIRDBRAINED PERSPECTIVE”
So Alexander argues that “the number of cortical neurons may be one of the most important biological substrates of intelligence.” Elephants have big brains, but the neurons in their brain are also very big. So, it seems, it is the number of cortical neurons, not the size of the brain matters more.

Comparing species, we find that elephants have about 7,000 neurons per mg of brain tissue. Humans have about 25,000. Birds have up to 200,000. A small crow has roughly the same number of neurons as a large monkey. Counting cortical neurons reveals that humans have 16 billion and elephants only5.6 billion. Alexander concludes, “A list of animals by cortical neuron count really beautifully matches our intuitive perceptions of which animals are more intelligent, whether we’re talking about primates or birds or whatever. All else being equal, people with larger brain volumes tend to be smarter than people with smaller brains, suggesting that the neuron number/intelligence relationship holds true for us too.”

But every measure, even counting cortical neurons (which seems the best) produces some questionable result with respect to some specie or another. For cortical neuron counters, the problem is pilot whales, which have about 37 billion cortical neurons, which is twice as many as humans. So are pilot whales really twice as smart as humans and we just don’t know that?
Comparing Species: Autonomy, Intelligence, and Morality (famous-trials.com)

As I mentioned previously I’m not sure about pilot whales, but they also probably sleep with half their brain at any time. Suggesting only half the number of cortical neurons are being used at any time, we’d have to measure their intelligence to see if they have comparable one to humans.

It seems to me that if there is specialisation by area (eg vision vs language vs reasoning), then it should happen mainly at the lower levels of processing, and be less pronounced at higher levels, mainly because at higher levels the brain is handling concepts and invariant representations regardless of the input which underlies them. The extreme case of this would be the pre-processing that occurs in vision before the optic nerve even reaches the brain. The other big clue (imo) is the way the thalamus feeds input into and receives output from each cortical area. It seems really hard to argue that the thalamus is doing different things for different areas if it occupies a constant position in that loop. So I think there’s a lot to be said for the one-algorithm approach.

You could look into the mental effects of living with only one brain hemisphere. I read it’s surprising how little effect it has, but maybe that’s just if it’s removed when the person is an infant (for brain tumors etc. it has happened.)

2 Likes

I doubt dolphins have only one hemisphere awake all the time since then only one eye is opened Unihemispheric slow-wave sleep - Wikipedia