Evolution of the neocortex

I already acknowledged there’s a lot more going on. The connections between regions with the same characteristics as the sensory input, those ones only go up the hierarchy. If you don’t think that fact is useful for understanding cortical circuitry, that’s fine, but it’s a fact as far as I know.

That seems like a stretch. HTM is meant to be consistent with biology but not simulate all facts about it, just what’s needed to create AI. I don’t see a problem with biomimicry being done like that.

I did not imply that HTM should emulate all properties of the brain. Correlations like sparsity provide critical insights. If significant aspects of the biology are going to be ignored then it needs to be done with great caution. The distribution of spiking is one area I would like to hear more justifications for ignoring.

1 Like

Now you really have me worried :wink: From the BAMI Temporal Memory Algorithm Steps “As mentioned in the terminology section above, HTM cells can be in one of three states. If a cell is active due to feed-forward input we just use the term ‘active’. If the cell is active due to lateral connections to other nearby cells we say it is in the ‘predictive state’ (Fig. 3).” I read this to mean that the predictive states are output (i.e. the cell is active). Can you please clarify this?

1 Like

That sentence is worded badly. They have reused the word “active” which is also the name of a status. What this should say is that if a cell is receiving activity due to lateral connections, it is in the predictive state.


Well in the traditional view, I think it is simply a result of the input to the higher area. It is possible cortex is doing more complex stuff, but if we take the hypothesis it is doing simpler stuff, the mechanism behind the recognition of more complex stuff can be speculated upon.

For example in a lower area called V1 some have hypothesized that the responses of simple cells are what one would expect from pooling the incoming center surround signals coming from the retina. And the responses of complex cells in v1 seemed explainable from pooling the simple cells responses.

There was a good book by David. Hubel that used to be available for free online at harvard vision website, but no longer seems to be available. It covered some of the basic research done in this area.

I believe it is this book

The basic idea would be that if we assume a universal algorithm, what could be happening, is that just like in v1 input from earlier areas appears to be grouped or pooled together in higher areas, that is similar happens in higher areas with the input from lower areas. You simply pool or group the edge detection signals to create a more complex response in the higher areas. Doing this through multiple levels would then yield responses to ever more complex objects the higher up you went

That said a lot of the minicolumn is said to not be active at the same time, at least when trying to probe with simple stimuli, so it could potentially be doing other things.

There was a paper that said the first visual areas tend to connect to 1 or 2 higher areas, but some cells could connect with up to 7. There was also a neuroscience book which I forget its name, that claimed each level of the hierarchy from all the sensory areas had connections to the corresponding level of the motor cortex hierarchy, and that motor cortex also projected back to the sensory areas at each level of hierarchy. I don’t know if there’s any truth to such claims.

Well in the book by Hubel, it said that complex cells respond to an edge at various positions across their receptive field. Some complex cells were also selective to motion of the edge within these positions in a particular directions. There were also cells called hypercomplex cells, iirc, that also were called end stopped, and responded less if the edge got too long.

edit: Googling found this description which is part of what was used in Hubel’s book

1 Like

The output of a cell is active/inactive and the state of a cell is active/inactive/predictive. The mapping of state to output is the issue.

From BAMI “The cells activated by connections within the layer constitute a prediction of what is likely to happen next.” The description of Figure 3 in BAMI is also implies that predicted states are visible to the next layer.

The pseudo code section states “The resulting set of active cells is the representation of the input in the context of prior input.” this and the pseudo code suggests to me that you are right. The predictions are not available as outputs.

This is very disappointing! We have a prediction but it does not allow the rest of the system to use the prediction.

I’ll quote from On Intelligence “Prediction means that the neurons involved in sensing your door become active in advance of them actually receiving sensory input.”

So this is not just a minor change. We have gone from a predictive framework in On Intelligence to a reactive framework in TBT. Disaster!

Perhaps I now understand why they don’t work on behavior. From a predictive coding perspective you generate behavior from predictions - it is the output of predictions from higher levels that ripple “down” toward muscle control etc and allow things like controlling fine motor movement.

I guess I am glad you have ruined my day :wink:

I’ve brought this up with Jeff briefly on another thread. The way he suggested it might work (and the route that I am exploring in my own tests) is that when a motor prediction receives both apical and distal input, it activates. I have been referring to this algorithm as Temporal Unfolding.

I’m planning to post another thread about this with my current work, but there are also some older threads on here where we have talked about it, such as here. Incidentally, I also did a project with my son a while back called Reverse Simon (or as he called it, “Simon Press Button HTM Edition!”) where we explored a (far simpler) version of unfolding predictions into actions.

You may come across references to a project on the forum here we called “Dad’s song”, where where we (myself, Bitking, gmirey, sebjwallace,and a few others) were exploring this. That particular project was a bit overly ambitious at the time, but I do feel like I have progressed quite a bit since then, so hopefully it will be achievable in the future. It is still a good goal to work toward IMO.


To add to what Paul Lamb said, there’s a third neuron state besides predicted (depolarized but not enough to fire) and firing a spike. It’s called a burst, not to be confused with minicolumn bursting in the temporal memory. When a neuron bursts, it rapidly fires two or more spikes.

Bursting starts with a normal single spike caused by proximal input. If bursting weren’t a thing, the cell would just fire that one spike. If the cells also receives apical input, it can fire one or more additional spikes rapidly, bursting.

Many pyramidal cells can burst. They’re mostly studied in layer 5 thick tufted cells, where they’re especially prominent. Other cells burst less and usually just with two spikes, whereas L5 TT cells often burst with more than two spikes. Those are the cells which send motor commands from cortex to subcortical things and the spinal cord.

Bursting requires proximal input, so it’s not predictive firing, so it probably doesn’t cause behavior.

It could still cause behavior directly responding to sensory stimuli. Bursting is also involved in synaptic plasticity, so it could be involved in learning from results of behavior generated by other cells and other parts of the brain.

There’s evidence it’s related to behavior, coordinate transformations, and/or perceptual detection (as opposed to a sensory stimulus which goes unnoticed). I’ll try to figure that out and maybe make a summary of some papers.


From the latest revision of Principles of Neural Science. The structural differences are interesting. I’ve heard of rewiring one area of the sensory processing to another (e.g vision to auditory). It would be interesting to know if there have been successful or failed experiments to rewire from sensory cortex to associations cortex (for example).


From Principles of Neural Design (2015):

“We suggest that sensory cortex is structured similarly across modalities because it performs similar computations”

“cortical areas beyond V2 should invest in circuits that rapidly identify what matters for survival and reproduction”

“Again the engineer’s rule applies: specialize. Each need requires a particular computation”

An interesting reference is http://dx.doi.org/10.1016/j.cortex.2008.04.004 from 2008 on disconnection syndromes of basal ganglia, thalamus, and cerebrocerebellar systems. A more recent paper that references the 2008 paper is http://dx.doi.org/10.1016/j.neubiorev.2021.01.014 Flexible and specific contributions of thalamic subdivisions to human cognition, from the abstract “A set of brain regions was flexibly involved with thalamus in several cognitive domains. Thalamic subdivisions showed ample cognitive heterogeneity. Our proposed model represents thalamic involvement in cognition as an “ensemble” of functional subdivisions with common cell properties embedded in separate cortical circuits rather than a homogeneous functional unit.”


I think even if it was possible in principle to rewire association cortex with sensory signals there’d be some issues, first if arealization(the formation of the distinct areas with the wiring between areas) is not a self organization phenomena but a more designed predetermined more fixed phenomena, you will have issues as higher areas tend to be smaller than sensory areas(and there’s the issue of the connections between areas). But I agree It’d be interesting to know if the brain can rewire and resize areas if fed sensory input in association area.

Second issue is that while I’ve heard that neural density(the number of neurons per mm of cortex) and neural connectivity tends to be more uniform across rodents brains, primate brains appear to have some distinct specializations. Neural density varies across the cortex of primates, and so too does the amount of connections per neuron in a given area. Early sensory areas primate have higher neuron density than rodents(but with fewer connections per neuron than rodents), but iirc higher areas primates have lower density of neurons but these fewer neurons are bigger and have a larger number of connections. IIRC, neural density varies like 5~X between areas of cortex, and I suspect number of connections per neuron also likely varies to a similar but perhaps lower degree.

There is also a variation in the thickness of cortex at different areas even in the same animal.


This presentation provides some strong arguments against Mountcastle’s generalisation of the macrocolumnar structure. I think it would be valuable for anyone interested in the TBT to at least be aware of this work Paul Cisek on 'phylogenetic refinement': using evolutionary thinking to study behavior - YouTube The link starts at around 1h16m into the video where Yohan John starts his presentation following the presentation by Paul Cisek.


I always try to be careful to define “cortical uniformity” in a way that isn’t trivially disproved by the existence of agranular cortex. I say that “cortical uniformity” is the statement that the whole neocortex is running the same learning-and-inference algorithm, but (1) there are different “hyperparameters” on the algorithm that vary across the cortex (and with age), and (2) there’s an innate “gross wiring diagram” of region-to-region connections (between different parts of the cortex and each other and with other parts of the brain), loosely analogous to a neural network architecture, which then get edited at a fine scale by within-lifetime learning.

I feel like that talk made me more apt to believe that version of cortical uniformity, not less, because of his emphasis on continuity, e.g. between granular and agranular cortex. Like, if I have a computer chip, and one part of it is a random access memory, and another part of it is a CPU core, I absolutely would not expect there to be a smooth and continuous gradation between those two things, and everywhere along that gradation the chip is doing useful things. That just wouldn’t make any sense, they’re different algorithms, you can’t interpolate between them.

However, those kinds of continuous gradations are a perfect fit for “hyperparameter variation”. And, I mean, it makes sense to me that the size of layer 4 should vary continuously with how complex and high-dimensional the space of feedforward inputs to that part of cortex is, and then it also makes sense to me that motor output cortex gets either no feedforward input or like a really simple 1D signal, so it has little or no layer 4. And it makes sense that V1 gets a quite complicated space of feedforward inputs from LGN, so it has a big layer 4.

(I’m only halfway through the talk and responding to what he said near the start. Maybe he makes other arguments later on?)


Hi Steve,

It’s unclear to me if you are presenting your understanding of TBT or whether you are proposing another theory. The idea of a repeating macrocolumn that consists of repeating mini columns seems central to Numenta’s work. The basic idea is that the macrocolumn repeats and several arguments are presented to justify this in Jeff’s book The presentation I linked to is proposing something quite different.

The idea of hyperparameters is very abstract. It seems like it could be used to describe any neural network. Are you claiming there is a single minicolumn algorithm that is parameterized? If yes, is it something like the current HTM algorithm?

Can the hypothesis of hyperparameters be disproved or will it adapt to whatever knowledge we gain?


1 Like

I do suspect predictions, at least some types of predictions must result in active state. As my take is the brain works akin or similar to what is called GANs, I believe nearby neurons in the same area and neurons from higher areas send predictive signals, with precise timing to a neuron in a column, these signals compete at the synapse level with the incoming sensory signals synapses. If the signals successfully predict their timing will coincide with the incoming sensory signal, and both predictive synapses*(discriminator or training or fake input) as well as sensory synapses(real input) will both strengthen. But if there’s a mismatch in timing, in prediction, the synapses could very well weaken, as it will be less likelier to result in activation of the neuron especially if the sensory input was covered in noise or only a partial match.

That way over time by a survival of the fittest mechanism of synaptic competition, the neurons that can accurately predict or fake the signal to a neuron, will become more strongly wired to it, and those that do not predict or fail to accurately fake real signals will become disconnected. Over time the connection of neurons that can accurately predict activity in different contexts, results in an increased ability to overcome large quantities of noise in the signal, as a pattern can be detected via the completions from predictions.

1 Like

Thanks! Jeff wrote recently “common cortical algorithm doesn’t mean there are no variations … the issue is how much is common in all cortical regions, and how much is different. The evidence suggests that there is a huge amount of commonality.” My impression is that this is what Jeff has always believed, and if the TBT book suggests otherwise I would assume it’s just poor choice of words.

My impression of TBT was that they were imagining something like current HTM plus some missing ingredients (involving grid cells) that they’re still working out the details of. Someone can correct me if I’m wrong.

Yes the thing I’m talking about is “single minicolumn algorithm that is parameterized”.

Current HTM definitely has loads of adjustable parameters (what I’m calling “hyperparameters”), like how many coincident firings you need until you form a synapse, and the number of neurons in a pooling layers, etc. etc. (or something like that, I forget the details of HTM).

I guess there’s bound to be some gray area between “hyperparameter variation” and “totally different algorithms”, but there are also things that are clearly one or the other. Like merge-sort vs a database query compiler are definitely “totally different algorithms”; and a ConvNet with learning rate 1e-3 vs a ConvNet with learning rate 1e-4 is definitely “hyperparameter variation”. Again there’s probably a gray area between those but I can’t think of any examples off the top of my head. :slight_smile:


I agree that “single minicolumn algorithm that is parameterized” probably fits with the TBT story. The idea of areas with more layers being more recent is contrary to the idea of a common structure (minicolumn/macrocolumn) being replicated. So I don’t think Yohan’s work should be taken as supportive of Jeff’s interpretation - he does mention Jeff’s work as an example of a different approach.


At 1:23:32, Mac Shine from the University of Sydney asks an interesting question:

“Are there in those missing or weak layers 4 in agranular and dysgranular types no direct projections from the thalamus?” (paraphrasing) to which presenter Yohan J John replies that there are a little and they are matrix projection to the bottom of layer 3 and top of layer 5".

(Side question: What are matrix projection vs core projection?)

Couldn’t this be considered as an invisible layer 4 where the functionality (input from thalamus) is less pronounced but still present?

As a metaphor, in a corner shop the shop owner receives the deliveries while he/she works the counter and does the accounting after hours. There is no big procurement nor accounting department like in the Walmart organisation, but the store still needs to balance its books and buy goods.

1 Like

From a brief search, this distinguishes

  • core projections that are “topographically precise and have readily identifiable physiological properties”
  • matrix projections “project to superficial layers of the cerebral cortex over relatively wide areas, unconstrained by architectonic boundaries. They generally receive subcortical inputs that lack the topographic order and physiological precision of the principal sensory pathways.”

Stolen from Viewpoint: the core and matrix of thalamic organization - PubMed


For more information regarding the thalamus, I recommend the communications of Prof. Sherman: