I’m coming at this question from the perspective of having a deep learning background.
In deep learning, word2vec (glove) generally works by taking all the words that exist in a corpus, sorting them, then creating a potentially large representation, depending on the corpus.
If I had a corpus of a couple sentences, “Hello, how are you? I’m fine, thank you, and you?”, this would break down into a potential vector of length of 8 binary values, one for each possible word (how you choose to deal with punctuation varies… I’m ignoring it here, while of course it really does matter for context).
Then there is a sliding window that goes over the line, encoding the vectors. The width of this window can vary. If we order our possible vectors alphabetically, and have a three word window, we could have our first vector encoding as:
0,1,0,1,1,0,0,0
for
“are, hello, how”
That vector is then fed into a neural network. The goal in deep learning then becomes to train a network to predict, based on a given input vector, the sentiment, possible responses, next word in the sequence, etc., based on massive amounts of data, calculating the error value, and back propagation to adjust the weights for the “neurons” in the system.
Depending on your implementation, architecture choices, and training data, some decent systems are built out of this sort of approach. But it takes a ton of time and training to get to that level… lots of electricity is involved, frequently not knowing for certain if your tweaks will actually make the system better or worse. Deep Learning, while being powerful, produces idiot savant systems that struggle to move beyond the area on which they were trained. They don’t really learn in real time, and the methods of training them to get good results are far removed from how our brain seems to work. But there are ideas we can also learn from it too.
Sometimes, since certain word combinations occur frequently in any language, for a large corpus these designs will incorporate hashmaps/dictionaries to store calculated values (as a lookup can be cheaper than a doing the same maths operations repeatedly). These chunks have meaning and influence context.
Perhaps in HTM we could employ hashmaps/dictionaries in an attempt to speed up calculations? Or take a hash of a layer state rather than read through an entire binary array, so that we when see two previously known calculated hashes pop up, we can recal the results rather than do everything from scratch.