1000 Brains Theory Q&A

This topic was pulled out of another (Crazy quilting in the cortex). At this point I’m trying to focus this discussion on the 1000 Brains Theory (TBT). The post below is a sort of introduction to cortical columns in the cortex and the TBT says they are all performing object recognition at all levels of hierarchy.

I have reached the end of this investigation. A few things in my initial notes and preliminary findings were incorrect.

  • The structure and topology of major sensory field projections can change with learning
  • The patterns formed by “minor input property projections” and “low level input patterns” are not so different, they are both the expression / extraction of some feature at a location

I thought initially that striping patterns found in cortex were expressing some property of how cortical RFs responded to certain input patterns in sensory input. I was hoping that understanding these patterns would tell me something about cortical columns.

I now think it is more likely these artifacts are expressions of the structure of sensory input to the cortical sheet, not the lowest level responses to the input within the cortical sheet. I still like to think about all these response properties as “echoes” of sensory input into a vacant cortex.

These patterns seem to be a way biology has found to spread topological information across a substrate without dominating local spaces. For example it allows both eyes to project their topology across the same area of striate cortex, allowing it to have a more complete local picture. In somatic cortex, it allows different types of nerve input on the skin to express themselves across the somatic cortex substrate.

Another conclusion is that V1 and S1 are probably not as dissimilar as I thought. The cortical columns shown in Demonstration of Discrete Place-Defined Columns in the Cat CI are the cortical columns we are looking for.

I also think you can see some evidence of potential columnar structures in V1 as shown by H&W:

This looks to be the best evidence for columnar structures in visual cortex, although it doesn’t say anything about receptive field qualities of the columns. While V1 certainly has a nice continuous global projection of the FOV across striate cortex, hints of these columns are found in these echoes of orientation coming from the retina (I believe the retina might be doing some very low-level shape recognition before cortex is involved).

It is still unclear whether the RFs of these columns are nicely grouped for all neurons in the column as clearly seen in somatic cortex, or if each cell’s RF is more independent and not overlapping its neighbors in the cortical column.

@bitking with regards to level-skipping and your “well-shuffled” argument for these striping patterns, I’m no longer sure they apply. If these striping patterns are purely artifacts of sensory input as I believe they are, they would not be playing a role in hierarchical understanding. As the to sensor fusion problem this idea theoretically solves, I don’t see how you need it at all if each column is doing object representation independently of other levels in the hierarchy (1000 brains). If you allow that each cortical column in the cortex is doing object recognition, you solve the sensor fusion problem.


How does this solve the fusion problem and the related segmentation problem?

Just to give you something to work with: you are walking down a tree lined path and hear several birds and insects - all mixed together.
You are scanning your eyes around and are laying successive snapshots of the path and trees around the path overlapping on the same portions of V1. How are the thousand brains resolving the objects into trees and path and fusing some of that into a bird that is perched on one of the trees?
How is all this being combined into episode experience in the very distant temporal lobe as fused objects?
If you can put this together it will help me fill in the parts of the TBT I am having trouble with.

I can see how to do this with hierarchical decomposing and binding at the association areas. I don’t see how to do this with TBT.

Following the footsteps of Mountcastle we assume that the same basic algorithm is being applied everywhere in the cortex. In basic HTM theory you have a collection of features that are sampled to for some opinion about what is being sensed. We can see this in play in the orientation columns in V1. Other features are being teased out locally such as disparity. The packing of these features into the same space is a necessary byproduct imposed by the sensor feeding that chunk of cortex.

So - what is the local algorithm that has been painstakingly teased out of SC/V1/A1. There seems to be several things all happening at the same time.
Feature aggregation (To columns)
Feature extraction (Edges/orientation)
Coincidence detection. (displarity in vision/audition)
Pattern matching to learned patterns. (prediction/learning combined)
Building up to Sequence learning and generation.
Anti-coincidence detection (Surprise!)

Is this sufficient as we move from map to map?

Combinations of extracted features acts as second order discrimination (and higher degrees as you move up the hierarchy) to learn more about the sensed perception. The connections between maps are fixed so a feature has permanence. Place coding is implicit in this system; a feature or combination of features with some fixed spatial relationship is built into the system. That is sufficient to support the concept that hierarchy is part of the analysis chain for feature extraction.

How universal is this cortical algorithm? There are visible difference from one area of the cortex to another. In other places in this forum I have pointed to papers that looked at the lateral connections, particularly in layer 2/3. From a functional point the distance of these connections and interactions with inhibitory cells can change the function from “analytical” (Garbor filters) in the early stages to signalling “Hex-grid coding” in the later stages. I don’t think that is is unreasonable to assume that there are similar functional tunings in other parts of the cortex. In the areas where the tuning is to hex-grids the focus is likely to be collection of these fixed features and juxtaposition and communication.

The communication of place coded features and hierarchical analysis is consistent with much of the literature that I am familiar with. Lesion studies support that there is a localization of function. How to reconcile that with a Universal Algorithm? This makes sense if you allow that functions follow connections - something hooked to sensors code at the level sensed, as you move up the chain the feature are more abstract.

Coming at this from the other way - why would a critter be analyzing the world in the first place? Why learn patterns and sequences? I would have to assume that this learns motor sequences and select the ones that best match the detected patterns. Extracting more features for better discrimination would allow more nuanced discrimination of features.

At some point this discrimination of features has to be collected into a form that allows this discrimination to be performed - the features have to be collected in some useful way to decide if that is a lion or a housecat.

Localization in the cortex seems to place this function in the association regions. Hex-grid coding has been observed at the hub level and this is about where I would expect the fully dissected feature stream to be collected back together to form this hex-grid coding.

I have a collection of papers that support every bit of this proposed stream in great detail and from multiple sources.

I cannot make the connection that this importation fixed feature stream is replaced by local units that will learn what an object is - from the point of view of that chunk of cortex.

If you are looking at this from a point of view that this is how to describe local feature learning then we just have a terminology issue.

If you assume that each column knows all about every possible sensed object then I have no narrative to explain how they might work together to select actions. I have tried again and again and I really don’t see a way for the TBT proposal to explain what is know about know localization of functions and connections between them.


Something that jogged my brain was when I realized that if every cortical column has grid cell modules representing the space of the universe, each cortical column’s grid codes for space would be unique and non-comparable, therefore one column’s sensory-location representation is not only different based upon sensory input but also using an entirely different location space altogether.

So there might be columns in auditory cortex emitting “birdsong” at some fuzzy direction (not sure auditory cortex is very good at locating) and columns in V1 emitting “bird” in a localized part of the FOV. The associative areas have the burden of putting these ideas together using their unique representations of the universe, learning that they are associated over time.

Thinking about object recognition and representation (the what pathway), it really confused me that columns had unique location spaces, because I though it would be super useful to be able to compare objects using locations and sensory features between columns, but these location spaces are essentially private to each column, used to compare to previous input and come up with an object representation.

Each column outputs the best unique object representation it can come up with. Sometimes these outputs represent distinctly identified objects, and sometimes (when the object is ambiguous) contains a bag of features instead.

Neighboring columns get input and learn over time what objects shared by their neighbors are associated with their object representations. This helps them decide what object to resolve to, learning to trust their neighbors over time.

I think of one column at one moment being able to represent only one feature on one object. I’m not sure how this merges with yours below, but let’s see. :slight_smile:

Yes, some columns will get some features of an object while others will get totally different features of the same object. They must somehow communicate laterally in TBT to vote on what the object is.

One could argue that these low-level features are already extracted by sensory organs. There is still argument about what the “lowest level feature” could be extracted by cortex. I think the orientation lines seen in V1 are already processed by the retina and presented to the cortex. I like thinking of the lowest level object sensed is a line. But that’s just me.

I feel like HTM brings a lot to the table here.

1 Like

Please look at “filling in” and illusory contours.
This is illustrated by the figure on the cover of David Marr’s book “Vision.”
A good google search for "filling in vision’ will return a vast amount of work done in this area
This is also what happens in the “blind spot” in the center of your visual field; check out the illusions that come up with this search. Much effort has been expended measuring and describing this phenomenon.

This has been localized to both cortical and subcoritcal structures and is intimately tied to the visual scanning process and the interaction between different frequencies (spatial scales?) of visual filtering.

This is also the basis for my thought in fitting this to the “traditional” view of the cortical hierarchy; it’s what brought me here.

1 Like

I certainly agree that feature extraction is happening, I’m not arguing that.

We all used to think that.

Sure it’s implicit, but it’s not shared between columns. Each column has it’s own representation of space. The associative levels have their own representation of space created by their raw sensory input (level skipping), and combine it with what they get from lower levels.

There is a visual hierarchy, and there is a somatic hierarchy. They also merge into a larger more complete hierarchy that combines them. These associative areas that combine these sensory modalities should operate in basically the same way as the associative areas of each sensory hierarchy. It works together because all columns involved are modeling space independently. Each version of space doesn’t have to be communicated up and down the hierarchy.

When you say “fixed feature stream” I’m not sure what you mean. Are you referring to a modelled location in space where a feature exists as represented at some core level by grid codes? If so I’ll just argue again that there is no global location space and each column has it’s own grid space.

I am talking about local feature learning, but I’m not sure we have a terminology issue.

I don’t. For example, cortical columns receiving input from your feet will bring nothing to the table when you try to recall a cigarette (unless you developed a habit of smoking with your toes). But your olfactory columns, somatic columns (if you’ve held and smoked one before) and your visual columns will all contribute to that representation. There will also be experiential columns in associative areas that will respond to the idea of cigarettes with half-baked episodic memories of cigarettes if you let your mind wander.

We’re not saying there is no hierarchy, we’re saying there is lots of connectivity that’s outside of hierarchy. This lateral connectivity is used for localized sharing of object representations between neighboring columns, and also occurs in associative areas, and potentially across sensory boundaries. We know this type of thing can happen with synesthetic brains. The hierarchy still works on top of all that. But each column is doing object representation (or at least tries to).

I am totally onboard with lateral connections and voting - it is the heart of my post RE columns to Hex-grids.
I posit that the qualities of the grid formed (Spacing/phasing/angle) are the outcome of the voting on pattern matching between columns.

You cover a lot of topics in this post. I will try to answer several of them. Let me start with what we mean by the “sensor fusion problem.” We humans have a unified perception of the world. This is derived from a large number of inputs from different sensory modalities. When the inputs arrive in the neocortex they are expanded. The million fibers from the retina become tens of millions of fibers leaving V1. V2 is even bigger. The concept of an “image” is immediately lost, as the representations in V1 and V2 are highly distorted and pieces are missing. The inputs to the brain are also constantly changing, yet we are not aware of most of these changes. The sensor fusion problem is how do these disparate, distorted, and changing features get fused into a single stable perception.

If we think about touch the problem seems harder. There is no “image” on your skin, just patches of sensations that are sensing different parts of objects over time, yet you still perceive a stable and complete object as you touch it. How are somatic inputs fused into a single and stable percept?

Sensor fusion also has to work across modalities. Often, as in your example, we perceive an object based on partial input from one sense and partial input from another sense, neither of which might be sufficient.

The standard belief is that sensory regions such as V1 and V2 are extracting features and that somehow these features are combined in higher regions of the cortex to create a single and stable perception. You say you can see how this happens in association areas, but no one has any idea how real neurons in the brain do this. You might argue that artificial neural networks work this way, but that is a red herring. ANNs require over one hundred hierarchical levels, assume each level is uniform (convolution) and have no concept of integration and stability over time. I am not aware of any ANN solution for touch.

The thousand brains theory (TBT) proposes a distributed sensor fusion solution. Instead of there being one model of each object in the world (presumably in some region high up the hierarchy that combines all sensors), there are many models. Each model is based on whatever inputs it is receiving. The “fusion” occurs via long-range connections in the neocortex. The vast majority of observed long range connections are not hierarchical and can’t be explained in the hierarchical feature extraction paradigm. In the TBT, models are in essence “voting” on object identity. Each model may have uncertainty but together they reach the correct infernce. For example, say your visual models can’t tell if the animal they are seeing in the bushes is a cat or dog, and the your auditory models can’t tell if what they are hearing is a cat or bird. Through the long range connections they will all quickly settle on “cat” as the object. We showed simulations, the model, and code in our 2017 columns paper.

This is an ongoing debate. There are numerous papers on it. I was at a conference a month ago where this was a constantly discussed topic for 2.5 days. I would say that the vast majority of neuroscientist feel that there is a common algorithm throughout the neocortex and that the variations that are observed are just that, variations on a theme. There is a huge amount of empirical data supporting this idea.

The TBT does not propose each column learns models of every possible object. We explicitly state this in the frameworks paper. What we say is that there are many models of each object. For any object there might be multiple models based on visual input, multiple models based on somatic input, and multiple models based on auditory input. Each model is different because it is based on different inputs. But if different columns in the neocortex are modeling the same object, then long range connections between them make sense.

The TBT also does not propose there is no hierarchy in the neocortex. We state this in the frameworks paper too. The big difference is that in the old way of thinking, features are passed up the hierarchy. The TBT states that complete objects are passed up the hierarchy if possible.


Thank you for the thoughtful answer.

I see that we agree on several points and perhaps some of the quibbles are due to how we frame concepts.
On the vision issue almost nobody seems to pay any attention to the fact that as the eye darts around with wildly different images being fed into V1. I do and have been thinking about how the resulting palimpsest must be parsed to make any sense of this. I certainly do NOT think of some pretty image forming as the data is parsed.

Your parting line begs for some further detailing - how is the object passed up the hierarchy?

I have been thinking about this exact thing for a long time and suggest a line for you to consider.

This may be totally alien to how you are thinking about what happens in the association regions but I see this as a stew of extracted features with as many interpretations of the sensory stream extracted as the maps are able to provide, mixed together with the rough local topology preserved.

I see that the next step is my proposed columns to Hex-grids mechanism extracting the underlying pattern with a voting mechanism. This ends up looking like the same grid pattern that has been observed in these areas.

Thinking about this and the visual palimpsest is a juxtaposition of parts of each fixation overlayed on the next as a sequence - a neural 20-questions game that results in image identification. HTM is all about sequences and each local area is doing its own sequence matching with the tiny patch it can see. Of all the possible images that could be recognized the parallel recognition and voting with neighboring columns would converge on some learned pattern. With the voting mechanism this results in a stable high-level pattern that represents the object being recognized. This is a distributed representation across the cortical extent of the object being recognized.

Since this patch of recognition is being accompanied by the thalamus running in tonic mode - the patch of thalamus tonic mode can be a signal to the rest of the subcortical structures that whatever pattern has formed on the cortical sheet is a learned/recognized object.

I assert that the association regions form a stable hex-grid pattern that stands for the recognized object with the spatially aligned tonic mode thalamus signalling a “data available here” flag. The phase/scaling/rotation of the hex-grid are the unique signature of the object in this part of the cortex.

There is much more to this but this is the central idea of what I see happening in the association regions.


The rigor and passion with which you all and Jeff are pursuing your goals is awesome – and resonates deeply. To “bookend” things:

Over the past decade, I’d seen up-close several laudable – but premature – efforts to plumb the brain and cognition, from Blue Brain to neuromorphic computing. Simply self-reflecting on the capacity and configurability of the brain, it was non-expertly intuitive that these, in their initial versions, wouldn’t get us there. But, folks had to start someplace.

Conversely, upon seeing Graybiel’s detailed brain anatomy, a progenitor to a data-centric “CPU + GPUs” struck me – but sat less than half-baked for years. For clarity, I’m not a core subject-matter expert. If I can be of any use, it’d be as an “outlier case study”.

With that:

My analogue to Jeff’s perception of a cup are the three images perceived when viewing stereoscopic video using only divergence and parallax. Once the fused image is formed, tilting the head accentuates the two 2D images – which are clearly cognitive, vs retinal – on either side. Further, I can “focus” my visual attention on details of any of the three images, noting what happens in the fused image when detail from one of the 2D images is blocked.

While (my) thinking is principally visual, aural semantic cognition is what delineates humans. I’d long simply assumed that the cochlea was a set of bandpass filters – but it may be far more nuanced. With frequency as one differentiator, coherence length and adaptive lowest-layer matched-filtering may be others, providing a set of partly statistically-separable inputs. Further, had wondered: as “differential aural parallax” underlies echolocation, could “reinforcing aural parallax” sharpen speech recognition. My simplistic non-expert reason for thinking this: the incoming raw data rate for sound is so much less than that for images, repurposing image processing for aural processing would allow for much longer sampling periods and more hidden layers. Multi-tap delay lines, which blossomed technically with digital signal processing capabilities, allow for far finer-grained processing than causal signal processors.

But let me circle back. This made me wonder: where is the imaging counterpart to the cochlea. If it’s embedded in the retina, spitballing what’d likely be information-theoretically efficient lowest-layer image neuroprocessing and speculating that there’d likely be analogues between what goes on in the retina and the cortex may be of some use.

Again, my deepest respect for you all.

1 Like