Language capabilities of humans, according to HTM

I had a random thought that I wish someone can explain whether or not my reasoning using reference frames may be correct.

It was in regards to the subtleties in human language. I noticed that for example, if someone uses a particular phrase in normal conservation, say a simile to insult someone - then we never think about the literal imagery behind said expression. If I said - “It’s raining cats and dogs” our first instinct is not to think of literal cats and dogs raining down. The first things that flash through are wetness, and maybe positive/negative thoughts associated with a rainy day.

I wonder - is it basically a translation to our reference frames? by observing when we were kids that the idiom referred to a wet day, or perhaps contextually knowing it refers to rain - we gained kind of (memory pathways/skip connections/directly connected cortical columns?) that bypassed our actual reference frames to individual words, but reached the reference frames associated to raw emotions of a wet day.

I wonder - is there any technical explanation for this phenomena (in layman neuroscience please, I am not of BIO background) and vice-versa (when we convert out raw emotions to idioms rather than describing it literally in the form of words)?

Or is this an unexplained aspect of the reference frames theory?

Hi Neel I am not sure what you are asking, and that is a good thing I think.

Maybe it helps to see words as concepts, associations, references.

I think we should separate column reference frames from language reference frames. Language reference frames are an abstraction of column reference frames.

First you have column reference frames, allowing self awareness, some call that consciousness. A perfect robot can be made behaving like the times before conscious interiority evolved.

Second, language as an add on allows for a new abstraction layer: metaphor.

cheers, Roland

I can say that this has popped up with my kids recently, specifically the 5yo. A storm was going and I said it to her, and she looked out and remarked about the kinds of dogs coming down, but then confirmed with me that it wasn’t literal. She knows it just means heavy rain now, but the first time or two, we had explained it’s figure of speech to her. At some point, some people will find out the origins of it- the trivia- and know it’s not just some random collection of words. So I think it’s a shortcut built up from a literal understanding, to knowing it’s just a figure of speech, to then understanding the why behind it. I think the literal and factoid references get pushed to the background shortly after serving their purpose for the saying. Much like most root words get glossed over until you start examining the words themselves. There’s a lot of shortcuts there that people just never examine.


I agree with all of your statements.

But what I am seeking is if there is any technical and deeper explanation for this phenomenon that is incorporated into HTM. Or is this a failing/unexplored part of this framework?

I’m interested in the phenomenon you described, so here is my 2-cents for the sake of engaging in a (hopefully productive) discussion, not as an attempt to “answer” the questions you raised. With that said,

  1. Is it true that “reference frames” in TBT is mainly about how the neocortex does object recognition during the visual perception process? i.e. an object-attached 2-D (or 3-D) space model (implemented as grid cells in the entorhinal conrtex or place cells in the hippocampus).

As such, it seems that you generalized the meaning of “reference frames” when you discussed “our actual reference frames to individual words”, and “the reference frames associated to raw emotions” …

Or it could be that my understanding of TBT “reference frames” is overly simplistic …

  1. As to the language phenomenon itself, is it true that HTM is mostly focused on visual perception (incorporated with movement, as sensori-motor integrated processing), hence not using that physical 2-D/3-D space model to explain human language?

The way I see it, human language is simply a labeling system, complex as it is. At the most basic level, it is about giving names to objects (concrete, then abstract, then features/relations/actions of/about/with objects …).

When we give names to a complex pattern of things, like “democracy”, or “bravery”, we have abstract words, or phrases.

“Cats and Dogs”, as an idiom, seems to be used as an abstract phrase, a label of some pattern we humans intuitively grasp, just like we intuitively grasp the meaning of “democracy” without visualizing a crowd of humans casting votes … it did by pass a lot of related concrete specifics, but somehow I fail to see its relevance to TBT/reference frames…

I am always interest in figuring out how our brains handle the complex patterns of natural language, and how language gets connected to the language-neutral part of our mental “model or image” of the external world –

Not as an answer, but maybe as a reference frame: The Conscience of Color, from Chemistry to Culture – Brain Pickings

Yeah, I should have clarified that I think things like that wind up getting handled at a much higher level of reasoning and HTM is insufficient towards that end right now.

I believe that reference frames are relevant to all sensory modalities.

I choose to share your belief – but still I wonder, for example, how can we relate the sense of smell to reference frames – care to come up with an idea of something like a hypothetical circuit? I am REALLY curious here… :slight_smile:

Reference frames conjures up location and as you pointed out - there are parts of the cortex that process data that is not spatially oriented.
Try “context” for less spatial parts of the cortex.


Don’t get hung up on the idea that grid-cells are only useful for encoding spatial adjacency relationships. As described in the latest Brains@Bay meetup, there is strong evidence that the grid cell mechanisms may also be able to encode (or assist with encoding) arbitrary relationships via learned transitions. It is also fairly likely that there exists a similar mechanism that allows us to encode and recall temporal sequence adjacency. The current TM model does this to some extent for immediate sensory successor/predecessor relationships; however, I’m referring to a the mechanism that allows us to project actions and behaviors forwards and backwards in time over much larger increments.


So you mean to say that the idiom is just the human way of representing that abstract/high-level thinking that relates to emotions or sensory inputs such as the feeling of wetness in the air and memories related to a wet day?

If so, that sounds awfully complicated to have such diverse abstract meanings of a simple idiom, yet it is common enough for most humans to understand its underlying meaning?

Most processes and complex thoughts we have are in the higher-up hierarchies. Isn’t the aim of HTM to meaningfully replicate those processes and use that in pursuit of its AGI goal?

I think the aim of HTM was to try and replicate much of the CC network, and then use that to get to the other goals. Considering the amount of other brain components outside the Pre-frontal Cortex involved, I don’t think HTM’s explanation of CCs alone gets there. That said, I think reproducing a lot of CC functionality will get us close enough, and I think getting to that point we’re going to solve a lot of the language issues (because you’ll have to, to get there- the path is paved with words).

1 Like

That seems to be well said …

As an example, think of how MUCH we humans like to use acronyms … when we come across the word “ASCII” during casual reading, do we stop and think about the specifics like “American, Standard, something …”? probably not. The concrete meanings represented by “A”, “S”, “C”, “I”, “I” seem to all get by-passed, leaving only the abstract meaning of “a character encoding scheme” in the mind.


Alright then, we will see how it progresses. This is such a basic concept that failure to account for it may be one of the biggest arguments against HTM

This is such a basic concept

I would put language a lot farther down the line towards the “hard” end of the spectrum, if that’s what you were referring to. Honestly, I think it’s hard parts all the way down, lol.

Great subject and questions, of very fundamental importance to higher reasoning. I just want to add one important point to this thread. The TBT and its underlying HTM concepts of the neocortex have a more foundational aim. TBT and HTM aim to explain how our neocortex builds a model of the real world. Without a model of the real world, there is no purpose nor foundation for language. The semantic elements of language require a model of the world as an anchor to then move on and convey higher level ideas, which are usually transitive in nature. At its current stage of investigation, TBT is trying to explain how that model of the world is created and how we disambiguate at different levels in order to eliminate uncertainty and create stable perceptions. Once that is understood, we have a long path ahead to understand other aspects of linguistics, but we probably will have found the set of tools our brain employs for such higher level processing. I personally am quite sure we will also have to learn a lot more from linguistics, not for specific languages, but regarding a better understanding of how language acquired properties and resolves complex communication and establishes higher order relationships of concepts. With a solid understanding of TBT models and disambiguation and some additional knowledge from the field of linguistics, we will have a clearer path to follow, than is apparent at the moment.


Let me also add a few terms that should outline the roadmap towards understanding natural intelligence. We have the “REPRESENTATION” problem for knowledge. That is probably the most fundamental and important problem in the puzzle of human intelligence, to be resolved. And that is correctly, what we are focusing on. Sparsity and frameworks are key points for representation, that this community has made very important contributions towards. Then we have the “BINDING” problem. This requires that we understand how multi-sensory (motor-sensory) perception disambiguates and establishes a stable, consistent and multi-contextual model of reality. We are also making great progress on this front with TBT. Next we will have to confront the field of “SEMANTICS”. Semantics can be broken down into at least two components. The “static meanings” of the “physical world”, like objects, mountains, people, places, animals. And secondly, the “transitive, changing concepts” involving “actions” between these physical objects, or upon them, or changes in state resulting from other actions. We are “causal” in our way of thought. We convey causality in our language, probably for evolutionary reasons. Once we have explained how semantics are handled in our brain, the remaining elements of intelligence, at the levels of logic, reasoning, planning, intentionality, empathy, reflecting etc. will not be as difficult to understand in terms of neuroanatomy. That is my take on this great challenge, we also call the “hard problem”.

Please also take note, that I have avoided the term “Consciousness”, for I am now absolutely convinced that this term has varying definitions for many different people. It would be senseless to discuss a term that has differing definitions for the parties involved in the discussion. And to complicate matters even further with this term, “consciousness” also seems to have multiple levels of manifestation. We often start with a multi-level model of consciousness in the medical fields, like perception, awareness, self-awareness, understanding, empathy, etc. I am convinced we will be able to explain consciousness very well, once we agree on its definition and perhaps the levels we are referring to.


Our mind’s model of the world is intrinsically hierarchical. It could be highly abstract and composite, in a large portion.

Could this model have come into existence without (or independent of) language?

In other words, our model of the real world might be intrinsically coupled/entangled/ with “language”.

For example, names of objects (i.e. Symbols or Labels) might be the most basic elements of a language. Their usage naturally enables the mind to perform abstraction and generalization: upon hearing or seeing the name of a known object, the mind can think of the object (e.g. a coffee cup) without having any sensory input about its concrete properties (color, shape, hardness, smoothness, etc.).

That means in the neocortex, a very limited number of neurons (representing the abstract concept of a coffee-cup) can now fire in isolation, without the visual/physical perception of a real coffee-cup and related sensory-motor circuitry being fired.

So the question is, is it possible for TBT to explore the ways our neocortex builds a realistic model of the real world successfully, without involving language modeling, or at least some modeling of the most basic elements of a language?

For example, in the sensori-motor processing of recognizing a coffee cup as an object, what if the Temporal Pooling layer output a label such as “coffee_cup” to represent it, so possible later-on processing can handle content like “1 coffee_cup on the table” or “2 coffee_cup(s) on the table” compositely, instead of treating them as two independent representations?

Of course this immediately involves hierarchy, which is not currently on the Numenta development roadmap. Could/Would/Should be on the radar? Just curiously asking.

1 Like

A human without language is not very human.

1 Like