Column models: Why can my toe recognize a coffee cup?

In TBT assume each cortical column linked to a fingertip learns the model of a coffee cup.
But even if I’m blindfolded I can still recognize a coffee cup on the floor by exploring it with my toe (even though I’ve never touched a cup with my toe).
Does this conflict with TBT?


It seems like it does, but it is still good to explore the model. There is a lot of stuff going on in the brain utilizing many systems. It is challenging to tease it apart. Even if there are exceptions, it is good to investigate what a model can do.

Interesting thought experiment, and I kind of agree that recognition can not simply be a direct mapping from sensor to internal representation of an object.

I dare say there are a good number of ways in which you can interpret an object without ever having experienced it through that specific patch of sensors / receptive field. I could use my chin and tell a cup from a bowl for example. Equally i could interpret sound snd some other force to guide my perception of the object.

So if sensor data is coming in and being mapped to an internal representation of the object, how do we take any source input and map it correctly,… I have no idea lol.

1 Like

I have a thought that maybe, if a column recognizes an object it inhibits the sensory input from spreading to other columns and to higher layers, instead it sends a representative signal indicating the found object.

1 Like

We compare our senses to those things it could be. We feel heat capacity, temperature, texture and these can eliminate many possibilities. Also the context of home may eliminate many and allow only a subset. When you move your toe you can map the shape and size and that is consistent with only a subset. It could be that the cortical columns need to go through two layers of columns to allow any senses to be used. Or the thalamus can map any senses. Or by clocking, there are two passes? Too many possibilities requires biological constraints to narrow the possibilities. Can’t we just simulate and see??

Great observation. I think the processes are different.
First, in order to confirm your assumption, one has to make sure they can not know what object is on the floor by other means than touching it with the toe.
Second a hand would have already almost immediately “knew” what object is touching and signal to conscious level “hey a cup is here!”. The foot however would take much more slower conscious investigation to get to the same conclusion. The object is inferred not directly by the toe touching it but indirectly from the spatial representation built using the toe’s sensory input.

A similar experiment is when someone is “drawing” letters on your back by wiping it with their finger and you try to guess what letter it was just from sensations on your back.

Only after some training the back sensations will build a more direct, faster representation


TBT theory or other theory – I believe there is a fundamental difference in two types of “recogntion”:

There is “Instinctive” or “reflex/muscle memory” style of recognition, which involves trained long term memory only, where not much conscious reasoning is involved – you see a cup, instantaneouly you recognize it’s a cup.

There is “mental researching”, or “puzzle solving” type of recognition, which heavily involves short term memory AND long term memory (hence working memory), where the brain explores different mental hypothesis (building and eliminating various mental models) about what the object is while touching/sensing it … . Your toe touches something … it feels hard with a smooth surface, it is light & movable, … it could be a cup or a soap box (or whatever is known to be around in the current enviornment)… if it is a cup, there should be a handle … wait, I do touch a part that feels like a handle … it must be a cup then…

The second type involves a lot more neurons, and circuitries, and conscious reasoning, I would guess.

[I am basically agreeing with cezar_t’s point of view … “coming to the conclusion of something” can take many different mental processes, we probably should not lump them all together as one and label it as a uniform “recogntion” process]


If I had to guess, I’d have to say that your toes are not directly recognizing the object. Rather they are allowing you to build a model of the space around the mug. This spatial representation (or reference frame if you like) is then probably recognized at a higher level. If the same object is encountered in the same way thereafter, some of that recognition would get pushed down into the lower levels.

1 Like

I’d argue there is one type of recognition but it’s a multi-stage process (maybe it’s just a semantic difference with what you’re saying) : the first time we see something - or feel it with a toe - we have to concentrate to recognise/learn it. After enough attempts, the brain shuffles it down into subconscious recognition, where it happens automatically, as you say. The same process occurs with other activities (learning to walk, driving, speaking a foreign language all start with deliberate attention and end up being automatic). But I think the processes all start with a conscious stage and progress to a subconscious one. What’s really interesting (imo) is how that transition occurs. Does the thalamus play a role in conscious activity and then step out of the way as the various cortical columns involved build direct links to each other ? It seems to me that the thalamus must play a central role given its position collecting outputs from the entire neocortex. But it has limited bandwidth, so direct links between columns could allow subconscious processing to bypass the bottleneck. There are some great YouTube lectures by Murray Sherman on the role of the thalamus if anyone’s interested.

1 Like

Agree there is a fast “single pass” memory, and a much slower search memory. Would love to see a model with both works (can be simulated) and matches neural structure.

That’s completely explainable through TBT, your toe starts building up the model from scratch, but at each frame it compared the integrated model to an index of known models through its link to PFC (prefrontal cortex). PFC holds a complete index of your concepts, and the moment that the model built by your toe matches any previously formed models it associates it to that category in PFC. if PFC can’t make a decision it forms a new category. In the mean time your toe macro columns have done their job.

1 Like

So TBT really believes that it would take my toe longer to recognize a cup and my fingertip?

1 Like

stop drinking coffee with your legs :coffee: :foot: :face_with_monocle:

1 Like

Maybe this is the same as perceiving an arbitrary shape. It’s not like a novel object is just a question mark, and then over time you start being able to perceive it. If you can recognize any shape without learning it, the toe’s cortical columns don’t need to learn the coffee cup.

TBT is mostly about the “what” pathway. If it gets damaged, you can’t label objects, but you can still reach out and grab them. To do that, I think you need to perceive the exact, spatially continuous shape of an object. So maybe something to do with the “where” pathway.

Maybe TBT could recognize arbitrary objects with object composition or something. I think there must be some sort of non-bucket-ified perception going on, since it’s so easy for us to form internal models of 3d objects and non-rigid objects aren’t harder to figure out.


I agree with JJC and many other replies here that there has to be more than one type of recognition.
It seems hard to prove or disprove Hawkin’s theory that a single column (after training) is doing all of the recognition task.
Thanks for all of the replies.

This is a question that @jhawkins should have been asked in one of the myriad interviews. Maybe he will answer here!

I think a possible explanation is that the models the columns recognize are simpler and compositional in nature like the alphabet. I think something to this effect was alluded in some of the commentary from Jeff. It is possible there are models that the feet learned during early childhood to which the coffee cup maps, and this in turn compositionally allow recognition of it.

1 Like

I agree, and that should involve computing lateral gradient and composing it into contours / fill-ins. Which I think is only possible by treating output from interneurons as driving input to adjacent neurons. Pretty sure that happens in retina, but I don’t know of anything like that in TBT / HTM?

My guess is it’d be learned and generally applicable. The spatial system might not be the same as the cortical sheet, since the objects can be 3d and it’d be nice if it could be part of general intelligence mechanisms.

Perception is learning, it just seems to be very local in this case. The question is how?