Questions regarding SDR encoding

Firstly, to the mods, I’m not certain whether this falls into HTM Theory, or HTM Hacking. Due to the practical focus I chose Hacking. Please feel free to move if required.

To everyone else, the preamble is fairly long and if you wish you may skip it. I am mostly coming to this from an uneducated perspective however and thus wanted to provide some background to help clarify possible peculiarities in my question. It mostly contains context about my abilities (lack thereof really) and what I’m trying to do. I’m hoping to use this as my “learning” thread to keep track of what I’ve done, and hopefully someone else might find it useful as well.

Starting my background, I have no grounding in math and cannot read mathematical notation beyond multiplication and division. I find I can usually understand math when converted to programming syntax though complex algorithms are still beyond me. I understand some C and Lua, and can read Python with some effort. I never finished school and have no tersiary education. And lastly, I’m simply a hobbiest programmer and thus not very good. I have however been considered competent enough for an internship in kernel development and although I was sweating blood trying to learn everything (I was responsible for producing a working graphics driver), I did make some progress until the company failed.

As for what I’m trying to accomplish, SDRs and HTMs caught my attention (yesterday), and I’m trying to make sense of how they work and why they work from a practical perspective. I am thus attempting to implement my own limited version. The reason I’m asking about encoding specifically is because HTMs are based on SDRs, SDRs represent other data, said data need to be somehow encoded into well formed SDRs, thus encoders are the logical starting point.

What I am hoping to accomplish with this, I’m trying to incrementally learn how these sysems work by coding them. I’ve read a fair amount of theory, however I don’t understand how to place that into a practical understanding. Also, without this practical grounding other parts of the theory are beyond my reach. Chicken, meet egg.

So, getting to my first questions (I have a lot):
What is “semantic similarity” defined as exactly?

Why does “related data” end up in similar positions in an SDR?
For example taking the MD5 sum (the BaMI paper suggested a deterministic hash function), of the words “tiger” and “jaguar” will not give a representation that is in any way similar to each other. Adding “is a car”, and “is a cat” will not improve the situation at all. The scalar encoder demonstrated does not help either as it assumes that the SDR has a fixed representation. Position one will always relate to 1 and thus the learning system needs to know this. This understanding is clearly lacking as one of the key points of SDRs is that one can use the same algorithms to operate on them regardless of what they encode. The way I’m understanding them seem more like a cute way of encoding ints which does not seem to be accurate at all.

Is it a requirement that there be a specific number of “ON” bits in an SDR. For example, no more, and no less than ten. If this is so, how on earth does one achieve that?

I’ll stop here for now. Thanks to all that’s read so far, it’s greatly appreciated. :slight_smile:

1 Like

Before I go into any detail, I want to point you to some resources. You may or may not have already seen these.

These resources should answer many of your questions.

Thank you, I somehow missed the videos which have so far done a great deal in explaining the papers. I’ll come back here once I have more questions. :slight_smile:

Okay, I have a basic idea of how things fit together. To summarize some of the conclusions that seem to follow from what I understood.
Semantic relatedness seems to be defined when writing the encoder. For example, with a scalar encoder if one decides that 1 is related to 700 then one can write the scalar encoder to do so. This is also how related data ends up in similar positions.

I have some notion of how the spatial pooler functions, although I’m not sure what its purpose is. Taking ASCII text as an example, one encodes the text using a simple scalar encoder to turn the ASCII codes into SDRs. What would the spatial pooler achieve when run over this data? Or is the main purpose of the spatial pooler for more complex data patterns?

Next, temporal memory seems to be responsible for learning sequence patterns in data. Taking the ASCII example again, if one would like to learn words in the text, as far as I understand this would be the job of temporal memory. This leads me to two questions, first how would the TM differentiate between single words and multiple words? I’ve got a very vague notion about this where basically in a small window individual words would vary less than multi-word groups due to fewer possible combinations. Assuming a total word count of 100, the number of 3 word combinations is rather huge. This may possibly be extended to morphemes too, morphemes, being fewer in number would have less variability. As a result, whether the TM learns words, morphemes, or sentences depends on the window length. Does this seem to be the right track? If this is the right track, could someone help with the details as my understanding so far is pretty vague?

The second question is simply whether one could use the output of the TM as a second layer SDR encoder? Detect words in text, output SDRs of those words for another system?

And one last question out of curiosity, if one creates a stack like the one I described in my above example using the TM to output “word SDRs”, can one actually work backwords and get the original “ASCII SDRs” back?

Resources would be great too. I’ve yet to look at the actual NuPIC code as I’d like to get a higher level idea before digging into code first, makes things a touch easier for me. I’ve found the BaMI papers which has been really helpful.

rhyolight, those links, especially the videos you posted where gold, thank you very much for them. :slight_smile:

From the SP psuedocode document (PDF):

The most fundamental function of the spatial pooler is to convert a region’s input into a sparse pattern.
This function is important because the mechanism used to learn sequences and make predictions
requires starting with sparse distributed patterns.

I’ll produce videos in the HTM School series about the temporal memory algorithm later this year.

That won’t work, because ASCII contains no semantic meaning. Watch this video for an explanation of what “semantic meaning” in bit arrays means:

There’s a lot to explain here. Here is a bit of an introduction to NLP with NuPIC:

Yes, the idea behind hierarchy in HTM is to pass SDRs out from one region of cells up to a higher region, which performs the same process on the input data without knowing where it comes from. But in this case, I would not call that “encoding”, it’s just one node in the hierarchical process.

This is sortof like how we extract predictions from HTM. We use classifiers to convert the cellular state of the system into a prediction of what the next state will be in the same units as the input data.

Start here: GitHub - numenta/nupic-legacy: Numenta Platform for Intelligent Computing is an implementation of Hierarchical Temporal Memory (HTM), a theory of intelligence based strictly on the neuroscience of the neocortex.

Glad you enjoyed them, I’m still working on episodes. Subscribe to our YouTube channel so you won’t miss the new ones:

Welcome to our community!

1 Like

This is partly where my confusion stems from. If the encoding layer creates well formed sparse representations, apart from the learning aspect of the spatial pooler, why create another layer to do it?

The best I can figure out is that the spatial pooler does two things. First, it increases robustness. Thus one doesn’t have to assume that the input is well formed, it’ll still function. The second would be for what I describe (due to lack of terminology) as “complex” data. For example, the gym data you presented on the spatial pooler episode of HTM School. Is this correct or am I missing something?

Taking what you said about ASCII not encoding semantic meaning into account I would like to try and refine what I’m trying to ask.
First I think I should have clarified that the purpose of the exercise presented in my question is not actually related to NLP, but rather recognition of multi-layered patterns. NLP is still a bit too far from my current understanding to be worthwhile to look at right now.

So to refine my question, first, why would the lack of semantic meaning preclude the learning of temporal patterns if the neuron works regardless of the data coming in? The best I can work out how to answer this question is that the HTM system relies on having preexisting relations in the data to make predictions from.

As for the original question, if one modifies it to include semantic information for ASCII (for example whether or not it’s a punctuation character etc) would the TM over time learn the patterns in text? Such that “cat” and “cats” would not be related due to the underlying concept, but rather due to the temporal similarity. Similarly to my original question, as far as I understand most commonly occurring patterns would be learned first, thus “word” patterns would be learned separately from “sentence” patterns as it represents a smaller pattern that reoccurs more frequently.

Would you mind giving me a simple high level overview of the data flow from component to component when extracting predictions?

Thanks for the wiki, I came across it when I just found NuPIC, but couldn’t make much sense of it at the time. I suspect it’ll be more useful now.

I unfortunately don’t have a YouTube account, but I’ve put the YouTube channel on a list I have for URLs to check frequently. :slight_smile:

You’ll understand more why the SP does this when you understand how our sequence memory algorithm works. Even though the described implementation is outdated, this video provides a view into how sequence memory works on top of spatial pooling:

You must keep in mind that the SP is performing a type of normalization to the input data, constructing mini-columns of cells that will be used to recognize sequences of SDRs over time.

I don’t think you’re right about either of these assumptions. Think about this from a biological standpoint and re-watch the first 4 minutes of this video:

If the input data has no semantic meaning, then nothing can be inferred from it. Data structures that have no semantics generally have something like a decoder or lookup table (like ASCII) that is used to extract meaning. The spatial pooler cannot assume anything about the incoming data except that it contains semantic meaning, which it will attempt to analyze first by learning the spatial patterns in the data and representing them from a columnar standpoint (each column looks at different spatial patterns).

Yes. You could, for example, represent the letter “m” by attributes like:

  • capital or not
  • noun or not
  • fricative or not

However, I don’t think this is a good strategy for applying NLP.

@amalta was recently working on a diagram of a sample use of our Network API. Andrew did you ever get that finished?

This is a very good question!
I think we have to keep in mind that Numenta’s main goal is to replicate the brain and not necessarily solve practical problems in the near future. From my experimentation I can confirm that when the encoder outputs proper sparse and distributed representations, temporal memory is still able to learn without the spatial pooler and do useful things.

Thank you! I sort of seem to have missed this point and it was creating a fair amount of confusion for me. I was coming at this from a more hobbyist programming perspective and apart from some pet theories I don’t have much knowledge about neurophysiology. This greatly helps… not sure what to call it, “places” the context? Well, anyway, it helps reduce some subtle confusions I had. :slight_smile:

Rhyolight, I haven’t had time yet to look into your suggestions, once I have I’ll respond more fully. I seem to not quite be finding the words to explain what I’m trying to ask yet so I’ll have to think about that too. But I think Dorinclisu’s comment may help in that regard.

So I’ll be back here once I’ve checked the suggestions and have a clearer notion of what I’m struggling with. :slight_smile: Thanks to everybody so far, I really appreciate it!

First let me mention I have implementation of some parts of HTM and can also look at it as intro to HTM.
You can find about it here : http://ifni.co/bbHTM.html

You could also skim this discussion.

The reason being is with this project I’m trying to apply HTM toward text processing and NLP.
A thing I discovered while working on this is that, so far the best encoding of words is to
use Category encoder to encode every character of a word (and then combine them to form the final SDR).
/I tried several other ideas for encoding which dont work so good./
Why this sort of encoding behave better ? Because the hamming distance between any character with every
other character is constant/the-same.

What this encoding fails to catch is “character-slippage”, as you can see here :

Any idea to remedy that are welcome.

PS> I’m currently on hold for this prj, but hope to pick it up in a couple of months :(, when I have time.
My current idea is to build a Semantic-encoder out of char-word-encoder and hierarchy SP’s and TM if necessary i.e. the
Encoder itself is a piece of HTM-modules.