For me at least, I am deeply invested in my own implementations of HTM, and using those to explore various problems, so it is difficult to break away and start with a new core AI algorithm. It is possible that I will reach a point in the future where I can’t expand any further with just HTM concepts alone and I’ll have to start exploring other approaches (but that day is not currently anywhere in sight – HTM has enormous unexplored frontiers to expand into at its current stage).
I think of hierarchy solving this problem in a bit different way. You don’t want a banana to be predicting a red signal, since a banana is not red. Where hierarchy helps solve this particular problem is by combining spatially separated inputs (physically different populations of cells) into representations that preserve their semantic similarities.
In the case of fruit, when you are being “trained” on an apple in the real world, you are getting low-level inputs like: “red signal”, “sweet smell”, “smooth skin”, “hard stem”, “thin skin”, “sugary taste”, “hunger satisfied”, etc. You are also getting a bit higher-level audio input for worlds, like “Fruit for sale!”, “Here, have an apple.”, etc. And higher-level visual input, such as signs that say “Fuji Apples”, “All fruit is on sale!” etc. These are all activating populations of cells in the brain that may be physically very far apart from each other.
If you roll all these lower-level inputs up into a pyramid-shaped hierarchy, then physically far apart representations can now become features of more abstract representations of “apple”, “fruit”, etc. These representations serve to establish semantic similarities between objects lower in the hierarchy, by creating SDR representations for them that have varying amounts of overlapping bits.
Now bring the new, novel banana into the picture. It comes with some of the same low-level inputs as the apple did, such as “sweet smell”, “smooth skin”, “sugary taste”, “hunger satisfied”. It also has some of the same higher-level audio input, such as “Fruit for sale!”, and higher-level visual input such as “All fruit is on sale!”. This smaller subset of the same inputs will traverse up the hierarchy as they did for “apple”, along with some new inputs, such as “yellow signal”, “soft stem”, “thick skin”, “bananas for sale!”, etc.
As a result, the representation generated higher in the hierarchy for the concept of “banana” will include some overlapping bits as the SDR for “apple”, but will also have some of its own new bits. Some of the overlapping bits for “apple” and “banana” will also overlap with the SDR for “fruit”. These high-level concepts can all exist in the same level in the hierarchy (i.e. “apple” and “banana” don’t necessarily have to be lower in the hierarchy than “fruit”, they just need to be able to share the proper percentage of overlapping bits to represent their semantic similarities, which means those representations need to be spatially near to each other)
The way I see it, the example of “Bat (animal)” versus “Bat (baseball)”, the word “Bat” could have an SDR with a percentage of bits that overlap with the SDR for “Animal” and a percentage that overlap with the SDR for “Baseball”. Therefore, the word “Bat” becomes a shared feature of the two concepts “Bat (animal)” and “Bat (baseball)”. In this case, hierarchy would be used to bring together spatially separated inputs, like the feeling of fur, the shape of a bat, the sound of a crowd cheering, etc. in order to create the representation of “Bat” which encodes the proper semantic relationships between those various lower-level parts. This “feature” can be used in various abstract “objects” higher in the hierarchy, such as “Take me out to the ball game”, or “I’ve come to drink your blood”.
In order to have a working SMI implementation, it requires reinforcement learning. Actions are learned through reward and punishment.
I have no doubt that I suspect that Numenta will be tackling RL in the near future as part of their SMI research. This is an area that several folks on the forum (including myself) are actively exploring as well.
This one is a bit more complex, but I don’t really see hierarchy being the go-to solution for it either. In this case we are asking the system to count occurrences of a particular letter. The system would first need to have an idea of what “counting” means (I would argue that there is a lot more to counting than just memorizing the sequence “1 2 3 4 …” – it also requires abstract concepts of numbers themselves). Then we also need a way to tell it what letter to count (meaning it needs to be intelligent enough to understand English or at least some low-level system of commands). Ultimately, the sequence being learned is not “a - 1, a - 2, a - 3…” and projecting that to “q - 1, q - 2, q - 3”. Rather, it is “subject - 1, subject - 2, subject - 3”, where “subject” is the thing that the system was told to count.
Of course, I am being rather vague here, but my point is that a task like this seems to me that it would require a far more robust hierarchy than it might seem to from the surface. You are basically describing the concept of imagining one abstraction in terms of another (which is something that even modern hominid species didn’t do very well, as evidenced by the stone hand axe which didn’t change for nearly a million years prior to the emergence of our species).
I think this is more a means of focusing the research on other areas, rather than the end goal. When your working out the individual elements of a system, you don’t usually start out by developing everything all at once. Instead, you hard-code some elements so that you can focus on others. Eventually, you go back and revisit the things that you have previously punted.
@dwrrehman has posted some of his ideas related to the “location signal” as well. The representations of “location” can in theory be derived over time by distilling semantics from patterns of input and motor actions over time. In my opinion this is really the only definition of “location” that makes sense in the abstract space, for example (IMO grid representations only really make sense for spatial concepts like “my position in the room”, etc.)