Applying location/sensory theory to reading text - some thoughts

I am reading the most recent Numenta papers, and a thought occurred to me as I ploughed through an early one. Suppose we look at the reading of text as a high level example of HTM theory (it may not be, I’m just doing this as a thought experiment). So if you think of a line of text such as:
“‘The Black Dog’ was Winston Churchill’s term for his depression”
You could look at it as a hierarchy - on one level the letters, on another the words, and on another the entire sentence. Now according to the theory, when there is ambiguity, the union of all possibilities fire. This seems to be a problem. Think about this: first the user sees a “T” (the ‘T’ in ‘The’). Does that mean that all possible words that start with ‘T’ are activated? Then he sees an ‘h’ and an ‘e’. Does that mean that words such as “Theory, Theology, The, There, Then, Theodore” and so forth all get activated? It seems unlikely that so many possibilities could simultaneously be activated.
In this case, the location signal is one-dimensional, and not two-dimensional or three-dimensional or four-dimensional The eye can move left or right along the text (actually it saccades and looks at parts of letters, but lets assume that is a lower level).
Another issue to look at would be displacement cells. Displacement cells let you move from a whole to a part and back again. In this case, does that mean moving from the meaning of a sentence as a whole to a particular word in the sentence? Sometimes when we read a sentence, we have to get to the end to disambiguate some word in the middle. Would this be analogous to the part/whole movement in some way?
I know this is very vague, and that the reading of text may not work the same way as recognizing an object, but I thought I throw this item to the wolves (namely readers of this forum) to see what happens!

You really should listen to this podcast with Francisco from Cortical.IO that just came out yesterday. He talks a lot about language understanding, which should give you some great context for the types of questions you are asking.


Not exactly on point but we were chatting about applying HTM to text problem in discorse. Maybe you can get something from this exchange:

tachionYesterday at 10:56 AM

is teaching an HTM system words as hard as teaching it speech?
one obviously needs stuff from the thalamus like time dialation (i listened to that mat + jeff podcast on the thalamus, good listen)
but im not sure if the same is required for learning concepts of words, it may or may not need something similar to learning an object like we’re doing with the 2d object recognition
words (the letter kind) have a structure made out of 27 letters, it has a temporal structure to it
its more or less static, though you can change the structure of it to mean the same thing
like a lot of you probably noticed me using “though” and “tho”
ofcourse the “chatting contexts” are different, so the word can basically mean the same thing in different spaces
using “tho” in a serious scientific setting is out of place, using it on a layed back twitter rant setting is more acceptable

bitkingxYesterday at 12:57 PM

Words are sounds first. We learn to use letters to stand for the words later. When learning the letters we can learn to use the letters to do phonemes first but in the end we are learning to substitute a symbol for a sound.

tachionYesterday at 12:58 PM

yeah but if there is an agent that can only detect letters i think it would be easier for a HTM system to learn words from that first and then later he can extrapolate what they mean from sounds(edited)
from a technical standpoint its cheaper that way

bitkingxYesterday at 1:00 PM

Perhaps, but I am most interested in learning what the brain is doing before trying to optimize it.

tachionYesterday at 1:01 PM

of course

bitkingxYesterday at 1:07 PM

In the dads song slack we think that you hear and learn the sound first. Then you learn to imitate that sound using your hardware. The extension to visual recognition does not use letter token so much as spatial patterns. Perhaps thinking of it that way will help rearrange your mental furniture in how to represent the problem.

bitkingxYesterday at 1:27 PM

I would not get caught up so much with the number of letter and weirdness of grouping so much as pattern recognition. If it helps - you could be using Mandarin where all the foolishness of letters goes away.

tachionYesterday at 1:28 PM

but mandirin has a rule set that you can still group it in

bitkingxYesterday at 1:28 PM

(Yes, I know - strokes and stroke order / blah blah bla)

tachionYesterday at 1:28 PM

every symbol is made out of composite symbols that you can group together and encode it that way

bitkingxYesterday at 1:29 PM

Yes - I speak mandarin- poorly.

tachionYesterday at 1:29 PM

oh right on

bitkingxYesterday at 1:30 PM

But a completely different rule set. See the bigger picture - patterns standing for sounds.

tachionYesterday at 1:30 PM

yeah we’re facing the same problem, like matt said. It takes every moving part of intelligence to get anything to work

bitkingxYesterday at 1:31 PM

Mandarin mostly has no phonetic component. Pure symbols.
I have been preaching system level from day one here.

tachionYesterday at 1:32 PM

doesn’t it have some phonetic rule? i remember there being a joke where you can pronounce different words using the same “sounds”
like the word we would say in english “Ma”
it has 7 other meaning/context depending on the slight subtley while you’re pronouncing it

bitkingxYesterday at 1:33 PM

My name is mark. First syllable translates to the symbol ma. Depending on the tone it is mother, horse, hemp, or a curse. I use the horse tone.
Cantonese is the one with 7 tones.

tachionYesterday at 1:34 PM

oh right

1 Like

In this post I describe Broca’s and Wernicke’s regions in speech production and propose that they work together to both recognize and produce speech.

The retina as implemented by Cortical IO roughly corresponds to Wernicke’s area. To bring this to the next level you need to add in Broca’s area. This is grammar or speech rules areas. It provides templates for utterances for both perceptions and production. The parts of speech are associated with the object/work store in Wernickes’s area and Broca’s area strings them together into stream of words.

The next thing for Cortical IO is to build up the word store with online learning. It’s offline SOM formation method is powerful but not how the brain does it.

Here is a delightful paper that shows work to capture the concepts in a biologically plausible model:

It is a spiking model but I think that the ideas could be implemented in other models. The model presented is the most comprehensive I have seen.

I’m sorry this answer is strung out over several posts and references. If there is any interest I may be able to combine them into a single post with a better explanations and pictures. If this is not helping anyone I have other things that need doing.