Column models: Why can my toe recognize a coffee cup?

It might not be local among neighboring neurons. For example, a sound’s pattern of frequencies could span the whole map. I have no idea how to even represent a continuous shape.

Continuous shape is basically a graph, so GNNs might be doing that already.

But the brain reacts and perceives in small fractions of a second. Only a discrete number of neurons can produce a discrete number of action potentials within such a time, each individual neuron producing just a few action potentials. Yet from these discrete signals, the areas involved in language, as well as the hands can be used to describe a continuous shape and translate it to a continuous drawing or sculpture.

edit: Reading again you mention that continuous shape is basically a graph. Is there mathematics connecting continuity to discrete graphs?

There’s a whole research field dedicated to studying the relationships between discrete and continuous geometry representations.

See for example: Discrete Differential Geometry

Keenan Crane provides a very good introduction to the field (video intro to his online course).


Ok, but how can this connectivity clustering be done by neurons, without long delays?

I’ve not completed my research into DDG yet, but there was this one thing that jumped out at me early on. When introducing topological data analysis, Dr. Crane showed a demo where he discusses the concept of persistent homology. He starts with a cloud of points that we can obviously see forms three discrete letters. He then asks the question, How do we get a computer to see the letters? Or to at least detect them as discrete features as opposed to just a random sampling of points.

He does this by growing small neighborhoods (topological balls) around each point and then connecting points with overlapping neighborhoods into simplices. By examining the time history of the creation and destruction of these simplices (i.e. their persistence), you can distinguish between local features vs. more global features. In another example, he discusses recognizing a finger on a hand by examining the persistence signature of a neighborhood expanding out from a point on the finger.

As the demos were running I couldn’t help but think about the center-on, center-off signals detected by the retina and transmitted to the brain along the retinal ganglia. I could imagine similar patterns forming in V1 due to overlapping of their little discrete neighborhoods. If the local perturbations are ignored while retaining the locations where there is are abrupt changes in features (low-pass filter), then this could represent a more abstract version of a convolutional kernel.

The process of growing the neighborhood around the point could be the result of local inhibition and/or voting across columns. Another possibility might be to utilize the temporal signatures of each column while saccading (essentially growing a 1D neighborhood along the direction of the saccade). I can certainly imagine either scenario being plausible for the examples discussed above.


Wow, so interesting. We have this ability to automatically split “perception” into “things” and failed to understand how it works. This could be an essential part.

1 Like

I’m going to try to expand on this. It’s interesting how we can take a discrete label, e.g. cup, and convert it into something continuous.

I usually equate object recognition to labelling objects. Sure, I can’t see a generic cup, but the whole point is to create awareness of the object’s identity, right?
Well, it’s actually possible to see the generic object, not just be aware. I see generic words like they’re painted on my retina. By default, I don’t see the shapes of the letters. I couldn’t guess what this font’s “q” looks like. That’s so different from what it’s like to see a cup.
I don’t see a generic cup, but I also don’t see 2d images from the retinas. It’s a 3d object. I can see a 2d image, but that takes mental effort. For example, I had to look at a cup for a while to nail down the 2d ellipse I was looking at.

So maybe seeing shapes is just as important as labelling objects. It’s not trivial to determine continuous shapes in reference frames besides the sensor. We can even convert continuous shapes between senses. For example, I can crumple up a paper towel without seeing it and imagine the visual object. Also, I know it’s the same object regardless of its crumpled shape, which suggests some sort of intuitive understanding of shape distinct from label.

You can recognize a coffee cup with your elbow, or your nose for that matter. All of your somatosensory behavior maps into a 3D internal representation of the world that you began to construct as a baby. When you scan an object with your elbow/toe/whatever it maps into that representation of your world and links to the object as it exists in memory.


I distinctly remember reading of a psychology (or maybe psychophysics) experiment from many years ago where the subject was asked to identify unseen objects using only a rod to touch the object. The test subjects were able to successfully identify the most of the objects with surprising accuracy.

This ability to recognize objects through many different senses or even without direct sensory contact speaks to the object recognition task being 1) heavily distributed over many different modalities, 2) strongly coupled to movement, and 3) closely related to our ability to form mental models of space and spatial relationships.


Yes, but TBT says that models of objects are stored in cortical columns (which get input from small sensory patches). The cortical columns for my toe do not have a model of a coffee cup. At a minimum I believe that this means there also must be cup models at higher levels of my cortex.


I think this is quite well understood in Grossberg’s work, Neurala has an implementation based on that.

That is quite interesting, but I would say that holding a rod and then probing with it constitutes tool use and we extend our proprioceptive senses with tools. What you might find more provocative is the notion of astereognosis.

…an individual with astereognosis is unable to identify what is placed in their hand based on cues such as texture, size, spatial properties, and temperature. (Wiki)

This suggests that there are neural correlates (cerebro-structural) for tactile recognition, thus it being a function of macro architecture rather than micro (HTM).

Rats and mice kind of do that with whiskers, so it might not be because of tool use. Although, they have a map of whiskers and all that.

Even more interesting, I had not thought of whiskered animals. My point on tool use is that humans can extend their senses via a tool, the mice (et al) have it built-in.

It’s not just humans. Crows and many primates use tools and need a mental model of the ‘other end’ of a tool. Nest and bower builders place twigs and other objects, which must require spatial sense. Animals that fly, run, pounce or throw things all require spatial sense. Just grooming takes spatial sense: where do I scratch that itch?

I distrust introspection, it can never be objective. Figure it out in animals first I say.

Sorry to join this conversation so late. I have a comment on bhayes84 original question. This is a deep problem that has bothered me for many years, even before the TBT. We asked how you can sign you name with your toe. More recently, we asked how can I learn the shape of a coffee cup with one finger and then recognize the coffee cup with another finger, or even with a finger on the opposite hand. The problem also spans modalities. E.g. I can learn five new objects with touch and then distinguish them visually, even though I have never seen them.

The answer occurred to me just a few weeks ago. I described the answer last week in one our research meetings. We record these and post them on YouTube. I don’t know it is up there yet. We usually get them up within a few days. You will want to watch this video.

The TBT relies on “voting” between columns. The voting is implemented by long range connections across the cortex. We have described the voting as a way for columns to reach agreement on what object they are sensing. However, there are multiple long range connections. L2 L3 and L5 all have neurons that connect to their equivalent layers across the cortex. I believe they are voting and broadcasting not just what object they are sensing, but also the orientation and location of the object/sensed feature relative to the body. These long range connections are what you are consciously aware of. In this way, when one part of the cortex is actually perceiving something, then it is available to every one else. When you touch something in a black box, you can visualize it at the same time. Similarly, when you see something, such as the handle of a coffee cup, you can imagine what it will feel like and where it is relative to your hand. This broadcast information (feature/object, location and orientation relative to body) also is what we remember in episodic memory. This also explains how language areas of the brain can verbalize what we are sensing or thinking, and how incoming language can paint pictures in our head. I have come to believe that these long range connections are the main method of communication in the cortex.


Do you have any insight on the localization of functions noted in lesion studies and this long-distance voting mechanism you mention here?

Here’s the link to the research meeting from last week: Jeff Hawkins on Object Modeling in the Thousand Brains Theory (Part Two) - September 9, 2021 - YouTube

1 Like

That was great. Do you have links to the two papers discussed (just lazy), thanks!