How does HTM recognize thousands of objects of the real world?

As is known to us all, brain can recognize thousands of objects of the real world. I just read the latest papers about grid cells and the “Thousand Brains Intelligence Theory”. A cortical column in HTM Theory can only learns hundreds of objects, more columns only decrease the number of sensations, while unable to represent more objects. And the “Thousand Brains Intelligence Theory” indicates that the same model of one object is stored in thousands of cortical columns, that is, the “brain” underlying HTM cortical columns can only learn hundreds of representations of objects. This is unacceptable. So how to solve this problem?

My take on the theory is that each of the cortical columns generalizes on nuanced versions of the objects it most frequently encounters – different granularities, different modalities, different perspectives, different aspects flavoring these objects. Taken together they can describe a much larger range of higher-level objects than any one cortical column by itself could describe. I also believe nobody is throwing out classical hierarchical structures, just that there is also massive parallelism happening in the system.


I think you are looking at this wrong.

A given letter may only be one of a few hundred glyphs. Combine these into grouping and you can represent a much larger set of objects.

If you think of the columns as letters in this example the higher levels would correspond to word and sentence groupings.

" The primary benefit of multiple columns is to dramatically reduce the number of sensations needed to recognize objects. "
—— Hawkins, Jeff, S. Ahmad, and Y. Cui. “A Theory of How Columns in the Neocortex Enable Learning the Structure of the World.” Frontiers in Neural Circuits 11(2017).

So as you illustrate, the features and objects are represented hierachically. Some columns represent some low features, and some other columns represent high level features, which inputs derive the “low level” columns. This obviously doesn’t match the HTM column model. As figure shown below, each column has only location/sensory input, and the connections between columns are long-range connections within “output layer”, i.e. there is no hierachical connections between columns. So how could low features and high features be represented?

Figure1. cortical column model

“We propose that cortical columns are more powerful than currently believed. Every cortical column learns models of complete objects.”
"Just because each region learns complete models of objects does not preclude hierarchical flow. The main idea is that the neocortex has hundreds, likely thousands, of models of each object in the world. The integration of observed features does not just occur at the top of the hierarchy, it occurs in every column at all levels of the hierarchy. We call this ‘The Thousand Brains Theory of Intelligence.’ "
—— Jeff Hawkins, et. al. “A Framework for Intelligence and Cortical Function Based on Grid Cells in the Neocortex”. (2018).

In Thousand Brains Theory of Intelligence, each column has the entire model of the world, which is unsupported by the current HTM cortical column. This is what I am doubt of.

I had the same general issues. JH addressed some of them here:

I see that Numenta is describing a transformation of feature space to an object space as part of lower-level processing. Not complete objects - but the part of the object that can be sensed locally.

The difference is very subtle but it emphasizes a different way of thinking about what the columns are doing with the top-down information stream.

For me personally - it’s still features but they are framed and parsed by the top-down flow.

Or I still have no clue - it could go either way!

1 Like

Firstly, I can’t speak for Numenta, but I do not take the quotes you have highlighted to imply that they believe there is no classical hierarchy present in the cortex. On the contrary, there are many examples from recent talks where Jeff and others have called out specifically that they are not saying classical hierarchies do not exist in the brain, but that many of the connections observed in the cortex do not fit neatly into this classical model. These anomalous connections make sense when taken from the perspective of a mechanism for voting.

Second, you are assuming that all three of the columns in your example are representing exactly the same semantics of the object. In reality, this would never be the case. You might be able to argue similarities between different fingers, for example, but this same system is modelling objects from vision, audition, taste, emotional state, needs. All of these models are voting on what is being sensed. I simple “cup” can become something far more complex when combined with being tired, sense of security from the warmth, social aspects of chatting with colleagues over coffee, etc.


These exactly solve my puzzles. Thank you very much!

There is another question, how does neocortical represent hierachical features? As Browne said:

Does this means that some columns’ inputs derive other columns’ outputs?

I suppose you could start here:

1 Like

Awesome. To me, this is a really exciting development in the theory. One thought experiment I did recently was to imagine a box containing a cup, which has a hole in the side that you can reach your hand into to feel the object without seeing it. Imagine there is also a straw in the top of the box that you can look through, which is only big enough to see what color the object is, but not its shape.

Now imagine you are looking through the straw and reaching your hand into the box to sense the object. You recognize the object as a red cup.

But your fingers have no concept of “red”, so their model is very different, and they are only able to vote on the semantics about the cup’s shape, temperature, hardness, etc. Your eye looking through the straw was only able to vote on the semantics about its color. By themselves, they could only represent the concepts of “something red” and “a cup”. Together, they can represent something more complex – “a red cup”.

Now think about if you are performing this exercise at the office where you have knowledge about a particular cup that is red. Other cortical columns which are modeling the semantics of the office are also voting on what is being sensed. Now together, they can all represent an even more complex object – “my red cup from the office”.


This is a common misconception - we need to be much clearer about this in our text. Each column has complete models of objects. However, each column does not have a model of every object in the world. Also, every column does not store the same set of models.

It’s useful to think through an example. A human brain might contain 100,000 cortical columns. Each column might represent, say, 500 objects. With these numbers the total capacity upper bound is 50 million objects if every column were to represent a completely distinct set of objects. However since there is significant sharing of objects between columns, the actual number stored will be smaller. This exact number depends on how much sharing is going on but the system can easily store tens of thousands to hundreds of thousands of objects.

Essentially the theory says the brain contains a massively distributed representation of objects in the world using many different modalities. It is not a monolithic object recognition system where objects are only recognized “at the top”. @Paul_Lamb phrased this really well too in an earlier reply.


Now I take on that the model of one object in the real world stores in thousands of columns in our brains, and the same object is modeled from different perceptual processes like vision, audition, etc.

@Paul_Lamb and @Bitking has made the concepts of “level” and “hierachy” clear, as below

But in our neocortex, the connections like picture above is not existed explicitly, there are just the columns within same structure of circuits. When we recognize a cat, there are many levels of features: 1) the edge and texture features, 2) the basic geometrical shapes, 3) feet, body, head, eyes etc. 4) cat. Then how does our cortex represent these features hierachically?

You are thinking of objects as hierarchies of other objects, which they are. But that does not mean you must have a hierarchy to understand them. You need only composition.

A cat has features like fur, whiskers, a tail, etc. But these things are also other objects themselves, and we have representations for them alone. You can think of fur right now, and you immediately know how it feels on your face, your hands, how it might smell, what color it might be. It is an object that can be represented by each cortical column that has had some feedforward sensory experience with fur.

Or you can think of objects as l being represented as a map of features, each of which could be another object, which contains another map of features, etc. In programming, you don’t need hierarchy to process this structure, you simply need composition.

1 Like

got it. thanks, matt

This seems related to maps of objects in higher parts of the hierarchy. If I’m correct that those exist, mentioning those maps might help explain this.

@Bitking maybe that can make hex grids and other map-related things you’ve talked about compatible with HTM theory.

Funny that you should ask that: