Max Bennett: An Attempt to Model The Neocortical Microcircuit in Sensory Neocortex - Sept 9, 2020

This week, we invited Max Bennett to discuss his recently published model of cortical columns, sequences with precise time scales, and working memory. His work builds on and extends our past work in several interesting directions. Max explains his unusual background, and then discusses the key elements of his paper.
Link to his paper:

This is a long article and I think it would greatly benefit if it were more focused on a single or few aspects of the brain, instead of proposing a grand theory.

I think it is a mistake to focus on delayed input tasks. These tasks are challenging and performing them involves many brain regions and many different mechanisms. This means they are not great for studying an individual phenomena in isolation.

This article proposes a hypothesis for how the neocortex learns viewpoint invariant representations (“unique sequence codes”). A projection from the hippocampus to cortical layer 5 will hold a set of layer 5 cells active throughout the duration of a sequence, and Hebbian plasticity will cause those active cells to associate with all of the elements in the sequence. There are several issues with this hypothesis:

  1. This does not actually solve the problem of viewpoint invariance, rather it moves the challenge from the cortex to the hippocampus. How does the hippocampus know when to hold the cortical activity constant versus allowing it to change?
  2. Sequence learning (viewpoint invariance) is a critical function of the cortex, and you’re proposing that it happens far away in a smaller brain region. This seems like an information bottleneck. There are many cortical regions, all operating independently and in parallel, but there is only one hippocampus.

I have a competing hypothesis for how the neocortex learns viewpoint invariant representations. I have described it in this video: Video Lecture of Kropff & Treves, 2008


I think that this is a mayor problem for the proposal, too.

Nevertheless, HC is playing a mayor role in the learning process. You might be unable to reach a practical system without it. In fact, I think that BG/Thal and other subcortical structures/HC are required to “digest” any complex sensory flow.

You can’t have a “practical” hierarchy without them. Perhaps this model is not the right approach to it, but you can’t ignore them. It’s all or nothing system.


I don’t know if Max is watching the forum, but I would be more than happy to work with him on developing simulations to test out his ideas.

I’ve done a variety of simple code sketches to teach myself the basic SP/TM algorithms and to visualize the evolution of the resulting networks. I feel like I’m ready to start working on a more detailed C++ implementation, but I think I’ve been waiting for the right motivation and/or collaborator to come along. It sounds like we both have about the same amount of free time to spend on the project, so maybe it would be a good collaboration.

Max; Feel free to reach out if you’re interested.


@dmac I’m confused by how you are drawing an equivalence between the process of “viewpoint invariance” and “sequence learning” - I don’t see these as the same. The idea of generating “unique sequence codes” wasn’t intended to explain viewpoint invariance, only how a column learns from a stream of sensory input and predicts what the next input is likely to be, without accounting for variations in viewpoint, orientation, scale, etc.

The proposal is that the hippocampus has to perform nothing more than replaying a single “episode code” to enable the neocortex to maintain, replay, and learn any arbitrary sequence. Hence sequence learning does not occur in the hippocampus, but the hippocampus is required for the neocortex to learn sequences (consistent with lesion studies). So I’m curious where you see an information bottleneck?

There are two possible ways (in the context of this model) that sequences/representations can be reset (what you are referring to as holding cortical activity constant versus allowing it to change). The first way is triggered by a failed prediction or “surprise”, whereby multiareal matrix cells in the thalamus reset representations in the cortical columns that generated the failed predictions - the connectivity of the thalamus is consistent with the idea that matrix neurons fire only in the presence of failed predictions (surprise). Recording studies are also consistent with this. The second way, is that the hippocampus can learn transitions in these episode codes - playing sequences of episode/place codes in CA1 is a well documented phenomena - and hence if the hippocampus learns to shift episode codes given cues from its input from the cortex, then representations can shift in the neocortex. Note that this is not the same as saying the hippocampus performs all sequence learning.

None of this solves viewpoint invariance, which I agree is an essential part of processing in the neocortex. I briefly suggested a few ways this may happen, but it wasn’t a core focus.


Sorry for the delay @CollinsEM! I just realized this forum even exists haha. Yeah would be happy to chat! Shoot me an email at :slight_smile:

1 Like

Welcome to the forum, @Max_Bennett.

I was very impressed by your presentation. Fascinating ideas. And also happy to see you in other Numenta research meetings. I hope you’ll continue to work together for a long and succesful cooperation.


Thanks @Falco! Appreciate it :slight_smile: I am very excited and humbled to be included.


Hello Max Bennet,

I took another look your article and I think I understand it better. I don’t think that I really understood the parts about the hippocampus when I first read it.

These “episode codes” are a very interesting way to think about the hippocampus and what purpose it serves! It seems like the episode code represents the entire state of the world of an animal, as a small identifier.

  • It uniquely identifies the current time, location, and emotional state.
  • It does not contain specific information about the current state, that is distributed throughout the neocortex. The episode codes can be used to access the associated information in the neocortex.
  • The hippocampus, which generates the episode codes, is similar to the cortex and can probably manipulate episode codes in many of the same ways that the neocortex manipulates information.
  • You’re right, there is no information bottleneck.

@dmac Yes exactly! This idea also aligns nicely with the general hypothesis that the hippocampus operates as a “pointer” to memories in the neocortex - but takes it a step further to explain how the pointer mapping is learned and how it can be used for working memory/sequence learning.

Viewpoint invariance is still a huge missing piece though and a very important problem to solve - there are massive commercial/practical applications of solving it. State of the art neural networks still substantially underperform humans in this regard. I’m excited to watch your lecture you linked above :slight_smile:

1 Like

I just came across the “hippocampal memory indexing theory” which I think proposes a similar thing, but I haven’t yet read through the whole article.

Teyler and Rudy, 2007


So HC memory indexing theory was first proposed in1986, then shown in 2007 to have ‘aged very well’. Now it resurfaces in 2020 here.

Where is it today? Still a theory? Aging gracefully?

1 Like

Had to watch the video a couple of times, nice, like it.

The interesting perspective on this is the inherent human assumption that we make in relation to the input represents something we associate with directly. i.e. we like to input data as we see it from an external perspective, which I think inherently destroys the ability of the model to work properly when it’s so close to biology.

When considering a delta only approach the multiple repreating note becomes interesting as you may need to change the representation as to what the input sense is and this is where out external representation and understanding breaks.

For all that I have seen the cortex never sees anything directly like what we “easily” comprehend, i.e. vision is abstracted to higher level pattern deltas, etc. Yes, the first “seen” frame is very rich but we then only comprehend changes.

Vision and senses are based around change detection and not absolute levels but we seem to always feed in absolute values to models as time stable representations in perpetuity.

If the model was to have a delta input only it would be quite interesting to see how it performed. The whole point of delta detection to me just looks like an additional form of added sparsity, which we seem to have re-implemented back into a more dense form within the models to cope with our external representations ?

The delta sparsity to me also means that the initial representation (“relatively” dense initial sensory input) is far larger than the subsequent delta abstract micro patterns (“relatively” sparse deltas) the models may see, whereas the current approach seems to have a fixed assosiation. Maybe this is just an artifact that occurs when you start to add in thousands of macro columns into the mix and jumping the gun.

The particular thought on this was with the tone pattern recognition (we remember the changes and not the absolute as much) but the curious one is where motor plausibility enters the process. If you had a tune that started off at a very low pitch and progressed up the scale to say a high but not impossible pitch to achive, we would still be able to deal with it if the starting not does not start too high, but it get’s interesting if the starting note is at a very high pitch, whereby the last note is obviously and purposefully blatanly well outside of the motor and perceptual capabilities (sense checking loop ?) and we then have trouble dealing with it without further abstracting the conceptual representation of the sound ?

Just some thoughts to stir the pot…


From what I gather, HTM doesn’t compute the deltas on the lowest level, per pixel or any scalar parameter. That could, and I think should, be done through lateral inhibition, something like computing contrast in retina. But HTM only uses lateral inhibition for spatial pooling, which forms far more coarse type of sparsity?