Hello,
My question today is motivated by a question yesterday asked by @iyerlr here: https://discourse.numenta.org/t/my-own-implementation-of-temporal-memory
So, say I have an HTM system that I want to set up as a classifier, to do everyone’s favorite artificial-neural-network teaching example of classifying hand-written MNIST digits… Now, I understand utilizing HTM in this way is an abuse of HTM and shoving it into a feed-forward-NN-shaped box is highly suboptimal. But I’m asking the question this way to elucidate something I don’t understand. Sorry about that.
So with that preemptive apology out of the way, imagine the system has an input space bit for each pixel, and it’s just running a plain vanilla spatial pooler as described in HTM School Ep. 7. Eventually, it would succeed in modeling the structure of the variations in the input digits. However, this isn’t the same thing as associating the meaning to the digits that we want it to associate.
So, how do I get the meaningful output from the HTM system?
In some of the HTM School videos, @rhyolight compares the similarities between SDRs representing the column states at various times. So I could imagine creating a composite SDR for each digit (maybe a union or perhaps intersection of the column activations produced by many versions of the same digit), and then the output answer would be whichever digit was associated with the composite SDR that has the most overlap with the current state, after observing a given input.
But this approach seems fragile - if the HTM system continued to learn, it may reorganize which columns responded to which features in the input data, and the fixed SDRs that represent the “essential” activation for each digit may drift.
So another approach might be to augment the input with some additional bits encoding the correct value of the digit, and over time the HTM system would likely learn the associations between the features in the drawn bitmap and the digit’s value.
But then, how can you get the HTM system to not rely on that additional input data? In other words, what happens when you don’t have the “training value” as part of the input?
It seems like you’d want to space the input from the training answer temporally so the system can observe the input and predict the meaning, which will arrive one time-step later. This way we can be sure the answer doesn’t contaminate the prediction.
But if the predicted answer doesn’t arrive, because the system is now deployed, (assuming continuous learning is still enabled because we want it to learn other stuff) how do we prevent the fact that the correct answer didn’t show up from degrading the system’s ability to recognize digits?
My 13 month old needs to be told every time she correctly identifies that something is red, but I, as an adult, can be confident in my ability to recognize primary colors.
It seems like HTM must have some kind of solution for this general class of problem, even if it’s not theoretically pure - basically a simple placeholder for the old-brain structures that allow a more complex animal to function.
Thank you in advance for any insight.