I ment a sequence. If the HTM sees sequence “A” -> “B”, then after input “A” it would expect “B” (depolarize SDR “B”). In my example, the model sees sequence “apple” -> “red light” (e.g. red light bulb comes on). Then we show it a “banana” and expect the model to predict “red light”.
The point is, as far as I know, the HTM has only two ways to depolarize predicted patterns: by the previous pattern on the same level or by representations from a higher level. But a “banana” doesn’t connect to the “red light”. And the model doesn’t have high-level representation for “red light” yet.
So, I don’t see how the HTM could predict the appearance of “red light” after it sees a fruit.
Let’s say I told you some random word “апельсин”. You would certainly create SDR for this pattern. Then I tell you that “the word I told you - a fruit”. So, does it change the SDR of “апельсин” or does it change the SDR for “fruit”? Because if it doesn’t, we won’t have any overlapping bits between those two SDRs.
If I say “I saw a bat flying in the woods”, you definitely would understand what I mean. The representation “Bat (baseball)” would be completely inhibited.
If I say “I bought a bat in a store”, you wouldn’t even think about “Bat (animal)”.
So there are gotta be two separate SDRs for “Bat (animal)” and “Bat (baseball)”. Which definitely have some overlapped bits, but very little. Different contexts activate different SDR “bat”.
I have to say, there is really hard time to make fixed universal representation, which would have overlapping bits with every other necessary representation and fit for every possible case. We can’t rely on fixed hierarchy. It should be dynamic.
In my experience, we don’t have “Grandmother neuron” (or “Grandmother SDR”). We just have thousands of neurons representing grandmother in different contexts.
This task I successfully tested in my program. If I tell you to count elephants, you would easily do that without specific training. That’s not so hard.
Nope. Not really. More than enough to just do the sequence and connect number to a pattern of the object. I can come up with examples, if necessary.
So, that’s what I did differently. I was developing everything at once. And didn’t use anything which wouldn’t work in all cases. That’s hard but gives a big picture. And currently, my program can’t be separated on individual elements. Now it’s one relatively simple algorithm which does everything.
About the “location signal”. My point is why would we even need a location signal? Please, could you give me an example of the task which we absolutely can’t do by current (even HTM) methods and need a “location signal”?
You have hinted at the answer when you mentioned predicted patterns on the same level. The representation for the concept “fruit” doesn’t have to be higher in the hierarchy than the concepts for “apple” or “banana”. These can all three be sibling concepts (existing in the same levels of the hierarchy). What matters from the perspective of heirarchy is that the lower-level inputs or abstractions which make up these concepts and which are spread across multiple areas in the cortex need to converge up a pyramid hierarchy so that representations can be formed for them in an area of cells that are physically close to each other. This allows those representations to share semantics as overlapping bits in their SDRs.
Now, the concept for “fruit” will be semantically similar to both “banana” and “apple” (all three have some overlapping bits). If the system has learned that a red light comes on after sensing an apple, then it should be able to also predict a red light after it sees something else that is semantically similar enough to an apple. Depending on past experiences/training, a banana could be similar enough.
It does both. Synaptic connections change over time, so the representation for any concept is always changing (they are not static). “Object” representations are influenced by each different context in which they are encountered. This is somewhat obscured in classic HTM because the SP process is designed specifically for an inference layer. Changes have to be made when using it for a pooling layer (I’ll actually be posting more information about this pretty soon, as I am wrapping up my latest pooling implementation).
Cortical.ioSemantic Folding video provides a nice (albeit very high level) idea of how they were able to generate SDRs for text words that derive their meaning from context. It’s not exactly what I am talking about here, but hopefully it will trigger the basic idea that I am trying to communicate
I recommend watching the HTM Chat with Jeff, as he explains the problem much better than I can in a forum post.
Consider the visual system - as the eye scans over various aspects of an object the same chunk of the cortex is presented with a stream of features. Spatially - the same neurons are presented with information. The stream is a mix of spatial/temporal information. The parsing of this information requires the digestion through the connected maps.
Part of the visual process uses the fact that the feature resolution varies as you move out from the fovea. A corse/low-resolution version of the objects in the visual field is sampled by the lower brain stem. The eyes are directed to areas of interest to force-fed the V1 map with high-resolution sampling to play a cortical version of the old 20-question game and discover what you are looking at.
I like to think that the older brain structures force-feed the cortex with whatever it is processing and the cortex feeds back a “sharpened” version; partially by making connections with other parts of the cortex that are sampling the overall situation at the same time. (awareness and consciousness)
Much of your model starts with a great set of assumptions - I use most of them myself in trying to understand what I am reading. You may want to rethink this aspect (item #3 on your difference list) and see how it changes your considerations.
As far as item #1 on your difference list - consider what combinations of layers and hierarchy brings to the party. By bringing different representation together in alignment you can perform local calculation between representations. An SDR segment can sample some of each representation and learn a relationship between these two patterns.Since the connections are part of streams that run both up and down the parsing chain a partial recognition can trigger a filling-in effect on noisy or incomplete representations.
Instead of forcing a single part of the cortex to recognize “cup” in the fingertips a sense of smooth roundness can be formed at that level. “Shapeness” can be recognized at a different area without any differentiation into a cup or ball. This collage of features can be combined later in a naming area.
Trying to force all this into a single area introduces a host of problems that has killed many models with combinatorial explosions when the project moves past a few test cases.
This is an important point. A concept exists not just in one level of the hierarchy, but spans across many levels. When Jeff is talking about his coffee cup, he is talking about a concept that spans low-level vision and touch to the sounds making up the spoken word “cup”, to the textual representation of the word, to the sense of comfort and familiarity that it provides him, to the working theory of SMI for which he uses it extensively as an example.