In case of making predictions with HTM which are meaningful to a user it seems to me that there is a need to attach labels to a SDR. In case of HTM’s prediction of the next few steps of a time serie, HTM outputs a sequence of SDRs. In order to be meaningful to a user of this HTM these SDRs need to be translated to something the user understands, i.e. a label.
- Is this above description correct?
- If so, in case of a large HTM system there can be an enormous amount of SDRs with associated labels. Which method is used to quickly retrieve a label from a database of SDRs (with associated labels) given a certain predicted SDR?
Thanks in advance!
In our implementations, the CLAClassifier is what historically has filled this role. This is discussed in both the Raul introductory video, and with more rigor here in Subutai’s video
Basically, in short, the CLAClassifier keeps a reference from the input value to the output SDR for that time step. The programmer can then “query” the classifier to get the input value used for that time step.
Is this what you mean?
Yes, this is what I meant. Thanks!
Would it still be useful to have a function which can quickly retrieve the best matching input value to an output SDR? Maybe high up in the hierarchy of CLAClassifiers you would end up with a large collection of references to the original inputs and it might be useful?
Anyway, I was brainstorming about a quick way to solve this problem. It reminds me a little bit about how they solve fast retrieval of audio fingerprints at Shazam. They use SDRs as well and have a really effective method of finding the label (song name) given and SDR (constellation map of an audio spectogram).
This sounds very interesting. Although I’ve been around for almost two years now, I have not really used NuPIC as much as I’ve been on-goingly contributing to the Java-port code base; and it would seem to me to be most valuable to end user applications? We should see what other engineers think and maybe some Numenta engineers responsible for “steering” the development of the codebase?
In order to address the problem of robust identification in the presence of highly significant noise and distortion, we experimented with a variety of candidate features that could survive GSM encoding in the presence of noise. We settled on spectrogram peaks, due to their robustness in the presence of noise and approximate linear superposability.
Thanks for the reference to Shazam, I found that very interesting. Above is a quote from a paper they wrote. However, SDRs are very specific things and what they use aren’t quite SDRs from what I can see. I only point this out to keep clear what we mean when we talk about SDRs - and what other technologies mean when they talk about “vectors” - usually not the same thing.
To learn more about SDRs, if you haven’t already, you should check out the video series Matt (@rhyolight - Numenta Flag Bearer) is developing. Here is the first in the series…
It’s an interesting problem. I was going to suggest looking at the Locality-Sensitive Hashing literature. Maybe you could do a fast similarity search using multi index hashing or something. cortical.io must have some fancy algorithm like that.
But actually I think the problem in HTM is solved best by a simple classifier.