Iām glad to see someone else coming to this conclusion as well. Iāve got a few theories as to how this might work, but I donāt think any of them are worth sharing just yet. Instead, Iād like to take a few moments to review some details of my current working theory of how HTM can be used to create what Iāve dubbed a ābehavioral AIā. Iād like to preface with the fact that a lot of the ideas below are unsubstantiated hypothesis or speculation on my part, and that my understanding of HTM and its applications may significantly differ from the common understanding.
The two parts of the HTM architecture are the spatial pooler and the temporal memory algorithms. Letās first look carefully at the functions of these two pieces, then Iād like to offer an explanation of how they can be used as part of a larger learning system.
The spatial pooler is a very powerful and robust algorithm that acts as a sort of autoencoder, but not the kind that most people are familiar with. Contemporary autoencoders are typically thought of as an ML model that can learn to generate encodings of data that preserve all or most of the important features of the data while reducing the encoding footprint. Typically this is done by sending an error signal back through an ANN encoder/decoder combo to allow the network to map the āfeature spaceā of a given dataset; the network will learn to prioritize features that correlate closely with the error signal over time (in this case, being the reconstruction loss). The spatial pooler is a bit different, primarily in that there is no error signal to learn from. Instead, the spatial pooler learns to encode things by basically creating a sort of āhash mapā for things that it has never seen before. Its set of basic rules ensures that over time inputs will create unique encodings such that inputs with similar features should generate similar encodings. The way I like to imagine this is that each column in the spatial pooler essentially becomes a āfeature detectorā, and each column is learning which feature(s) from the input it wants to learn to represent. So on an individual level the columns become flags that say āthe thing Iāve learned to represent is either present or notā. By itself, this algorithm isnāt really anything special; most ANN architectures can considerably outperform the spatial pooler. Itās really just a basic reinforcement learning algorithm without the key component: temporal memory.
The temporal memory algorithm takes the spatial pooler to the next level (or dimension, I guess) by adding additional functionality that creates some interesting emergent properties. The first property of this algorithm is given away by its name: it allows the HTM system to learn temporal sequences. It does this through 2 mechanisms. First, it āgranularizesā the feature detectors such that a single feature (column) may be encoded in many different ways (the specific set of neurons that are active in the column). Second, it utilizes a predictive state in the individual neurons to differentiate similar but distinct sequences. So if the column says āthe feature I have learned to detect is presentā then the individual neurons in the column say āthe feature Iāve learned to detect is present, and weāve seen it before in the context of what we just saw a second agoā.
At first glance, these functions donāt actually seem to be all that useful. Even if you can reliably generate an encoding for a specific sequence, you donāt have any way to classify that encoding besides looking at all the previous encodings and trying to use the similarity to find an output that was generated by a similar input. As dmac put it:
Notice the qualifier ācombined with numentaās SDR-classifierā. The HTM system doesnāt do any sort of classification on its own, it only provides a feature encoding for the actual classifier (which as far as Iām aware is using a lot of hand waving with other algorithms that would still be as effective even without HTM).
For the longest time I have been trying to figure out a way around this; how to make HTM useful for actual agent-based tasks so that we can actually get to the next phase of using HTM to build AI agents? So far I have three key ideas that Iām working with to try to develop something like this.
First, is the fundamental idea that most people are expecting too much from HTM. Most people seem to view HTM as a general learning algorithm, but it is only a single part of a much larger puzzle. My view is that the goal of the HTM algorithm is to produce what I call the āground truthā. i.e. Itās entire function is to create an encoding for its input that captures the most salient features of that input. Iām probably not doing a great job of articulating the distinction, but I tend to think of HTM as a learning algorithm for the part of the brain that says āwhat am I looking at right now?ā and the part that says āwhat do we do about what weāre looking at right nowā is a completely different section that follows completely different rules. Making this distinction is important I think because if we view HTM as a reinforcement learning algorithm, then analyzing the system with respect to the correct goal is important. Once I started to view things this way, I started to consider how we can make HTM better at its actual goal instead of trying to get it to reach some abstract goal (like MNIST classification) when it canāt even do what it was designed for effectively.
This leads me to my second idea - modulating the learning signal based on some criteria (namely an error signal). With the temporal memory algorithm, the HTM system is constantly trying to predict the next input by putting neurons into a predictive state such that columns with a predicted neuron will activate only that neuron. This means that incorrect predictions will manifest in two ways: first, neurons that are predicted but not activated indicate that the system predicted something that didnāt happen, and second, bursting columns indicate that the system did not correctly predict a feature that was present. This is an interesting concept because, when we consider this in the context of what was mentioned in the previous paragraph, it allows for a potential metric by which to modulate the learning signal such that the HTM system becomes more effective at achieving its ACTUAL goal which is to encode the āground truthā such that the rest of the brain can use that encoding to make decisions. My current idea for implementing this is to create a function whereby the permanence delta is a function of the āconfidence scoreā for the current input, which is determined by some combination of the metrics I listed above. This is still in early testing.
The third idea that Iām working with is an extrapolation from the first two. If the HTM system is generating a reliable encoding for a given state, then a behavior (output) can be mapped to that encoding through a general reinforcement learning algorithm such that the same encoding generates the same output (behavior) at a later time. On the other hand, if an encoding is not considered reliable (e.g. it has too many non-activated predicted neurons or too many bursting columns) then the encoding should not be used for learning the associated behavior (output).
So to summarize: The general idea is to separate a goal-oriented AI agent into two sections, one is the HTM encoder, and the other is the behavioral mapping. The HTM section can use its own error signal (bursting, etc) to modulate its own learning signal to improve performance and generate more reliable encodings more effectively. Additionally, the behavioral mapping uses a completely different learning signal to reinforce the connections between the output and the HTM encoding such that when the output satisfies some goal, similar encodings to that input will generate the same behavior at a later time. Iām hoping that these modifications will help to both alleviate some of the issues with HTM as it currently stands (like forgetting, etc) and create a framework for building āagentsā that can use HTM to accomplish some task.
Iām currently working on a few projects to validate some of these hypotheses, but itās slow going so I donāt have much to show for it yet. If anyone would like to work together to establish a more constructive workflow for testing these ideas, Iād love to hear from you. If these changes show promising results, then I already have a few ideas for how more complex agents (read: general learning agents) can be developed using this or similar methods.
I hope that makes sense, if not please let me know and Iāll try to clear things up if I can.