Is Thousand Brains Theory wrong?

I’ve been reading through your various materials and your Thousand Brains Theory but so far it seems to be inconsistent with neurophysiological observations. Especially your model of sensorimotor learning and control is way off. Perhaps I’m misunderstanding something, so feel free to correct me.

  1. You claim that every cortical column builds its own complete model the world. So how do you explain the fact that Fusiform Face Area only processes faces. If you damage that area, you get Prosopagnosia and you are unable to recognise faces. The same goes for object recognition and place recognition. It’s absolutely not true that every cortical column learns something about “cups” and “tea pots” as you often frame it.

  2. You claim that every cortical column sees some small fragment of the input. The “low level columns” see small and fine patches, whereas “higher level columns” see large and coarse patches. You are opposing the hierarchical view that subsequent levels recognise increasingly more complex features. But actually the simple cells in V1 only respond to spots of light, whereas complex cells respond to lines. Subsequent visual regions of ventral visual pathways respond to increasingly more complex shapes and finally gnostic cells in the temporal lobe recognise very specific people, places and categories of objects. There clearly is a hierarchy and the subsequent regions clearly do work with features.

  3. If you were indeed correct that every cortical column builds its own model of the world, then how is it possible that I can see something with my left eye and the recognise it with my right one without needing to re-learn it. Or even better. I can see the texture of some object and then recognise it with my hand. If the cortical columns worked with primitive features (instead of building allocentric representations of objects as you frame it), then everything becomes much easier. The hippocampus could the use conjunctions of all the features at all the different locations. In fact, there are already hundreds of models of hippocampus that do just that.

  4. If I understand correctly, you claim that the Thousand Brains Theory is a kind of universal theory of the neocortex. Which part of neocortex are you exactly talking about? The temporal and parietal lobes do very different things. For example how does your theory account for the split between dorsal and ventral visual pathway? There are gain-modulated neurons in the temporal lobe, which perform translation from retionotopic to egocentric coordinates and from egocentric to allocentric. How does your model of the neocortex account for this? You are claiming that every cortical column has access to features and also to locations (Frontiers | A Theory of How Columns in the Neocortex Enable Learning the Structure of the World | Frontiers in Neural Circuits). This idea looks really far-fetched. It is the role of entorhinal cortex to generate representations of locations and then the hippocampus binds features and locations together. I find it rather hard to believe that the Thousand Brains Theory explains anything more than maybe a small patch of entorhinal cortex. And even then, I’ve seen much better and more believable theories explaining that part of the brain as well.


1.The Thousand Brains Theory (TBT) does not say every column learns something about cups or any other thing. We made this clear in our papers and in my book. The TBT says there are many models of cups, in multiple sensory regions. But the number of columns that model cups is small compared to the number of columns overall. I have a section in my recent book on this.

1b.The FFA is an interesting case. We believe, and the evidence suggests, that all columns, including FFA columns, build models using the same basic mechanisms. So is FFA different? Our best guess is that the models in FFA columns are pre-wired for modeling faces. If you think about a model as a graph of features at relative poses, then FFA models come with part of the graph pre-built.

2 The TBT is also hierarchical. The difference is that each level in the hierarchy is combining features into 3D structure before sending to the next level. The point we make about regions such as V2 is that V2 is well known to also get direct retinal input, so we need to explain that.

3 You are espousing the conventional view of a final model of something existing at one place. People who have thought about this reach the conclusion that this is not possible. The binding problem is one manifestation of this. Voting between columns is the key to understanding how different parts of the cortex communicate with each other, The lateral connections from L3 to L3 and L2 to L2 and L5 to L5 are far greater than the hierarchical connections and many of them travel long distances across modalities and across hemispheres. These connections let the disparate columns vote.

4 Please read our papers carefully. We discussed the role of ventral and dorsal pathways. I go into it at length in my book and don’t want to repeat it here. The TBT predicts that location cells, aka grid cells, will be present in every cortical column, not just the EC. Researchers have now discovered grid cells in prefrontal cortex and now also in primary visual and primary somatosensory cortex. We stand by our prediction that they will be found everywhere,

The basic idea of the TBT is that cortex uses the same mechanisms as the entorhirinal cortex and hippocampus. Please read our papers and my book carefully before incorrectly stating what the theory says and doesn’t say.


My conclusion is that “Thousand Brains Theory” is just a neural manifestation of Dennett’s “Multiple Drafts Theory” out of philosophy. They both support and complement each other. Frankly, I would like to see more comparatives between TBT/HTM and Pan troglodyte brains since the chimps don’t have language mucking everything up. Chimps are also more accessible under experimentation.

1 Like

Neuroscience studies subjective phenomena under microscopes.

Hierarchical feature recognition assumes the brain recognizes the image on the retina. TBT is based on neuroscience evidence to the contrary. It helps clarify what the brain even does in the first place.

Access to self-movement info isn’t an issue. Coordinate transformations may well be all over the place too. They can be hard to identify.

Even whisker S1 does a coordinate transformation or two. One maps a space onto the cortical sheet in L2/3/5. A possible second one in L5tt might represent the sensor’s location in a similar space upon touch, which is more in line with TBT.

Maybe FFA just develops a face-space reference frame. The brain needs to develop many different reference frames. For example, maybe the subcortex makes FFA attend faces.

Thank you for your reply!

Your last remark

We stand by our prediction that they will be found everywhere

seemed very risky to me. The greatest problem with it is that absence of evidence does not imply evidence for absence. Hence, it seems like TBT is empirically unrefutable. The fact that there are some cells that code positions in prefrontal cortex is not very surprising, because those areas are always active during planning (including navigation). I deeply doubt that any grid cells would emerge elsewhere. Have you perhaps made any other predictions that could be experimentally tested and used to refute TBT? If you can’t build such an experiment, then that’s a rather poor theory, isn’t it?

That’s true, but he also mentioned grid cells in V1 and S1. Those are pretty surprising to me.

Take those seriously. Locations are very much a thing in sensory cortex. I dunno whether you read what I wrote earlier, so here are sources.
[1] Surround Integration Organizes a Spatial Map during Active Sensation
[2] Independent representations of self-motion and object location in barrel cortex output