Another way to combine HTM and LLM

I haven’t looked at the Numenta site for years, and I came up with the following idea this morning before realizing that Numenta is already working with LLMs. But here’s the idea: A way of combining Numenta’s HTM with OpenAI’s LLM (large language model - ChatGPT).
They have some things in common. They are both about sequences.
In the case of ChatGPT (and other LLMs), the basic idea is that you take a set of words and predict the next word. For variety, you might not choose the very best next word, you might choose the second or third best. But this way, ChatGPT constructs entire answers to user prompts.
Numenta’s HTM theory distinguishes between sequences. It can tell the difference between A B C and A C C, for instance. As you feed it inputs one by one from a sequence, the sparse pattern of which neurons are active keeps changing. So the final pattern of A B C has different neurons active than the final pattern of A C C.
You might ask, why combine the two? LLMs do just fine already.
Well, I was reading a paper by some Hong Kong researchers titled " Predicting the next sentence (not word) in large language models: What model- brain alignment tells us about discourse comprehension".
The authors (if I understand them) say that they improved LLMs by adding an additional test. They looked at pairs of whole sentences and taught the net which pairs make sense together and which do not. So even though LLMs are on the level of words (or even subwords), the authors added a piece of information about entire sentences.
And this actually helped the performance of the LLMs.
I’m not sure why it helped, but since there is an element of randomness in LLM output, maybe its correcting for that. Or maybe there is some other reason such as improvement by looking at a problem at different levels of abstraction.
My idea was that perhaps we could use a representation of sentence information, as produced by HTM, as input to the LLM. The LLM would still use a stream of words (in a so-called context window) to produce the next word. But in addition to that stream, it would have available the HTM representation of the prior sentence.
I should explain that the ‘words’ in LLMs are vectors of bits. These vectors are not arbitrary, and you can do interesting operations on them to get answers to questions in the form of other vectors. But I digress.
So the way you would create a representation of the sentence “The Man Ran” in HTM, would be to feed a sequence of 3 vectors, one for “The”, one for “man” and one for “ran”, in order, to the HTM model. The HTM model would end up with a sparse pattern that uniquely represented that sentence.
You would feed that sparse pattern, along with the bit patterns for the 3 words (assuming the context window length allows for at least 3 vectors, which it definitely does), to the LLM, and it would predict the next word.
You could make variations on this - you could have a hierarchy of HTM representations, the higher levels more abstract.
So a higher level might use a stream of sentences (HTM patterns) to predict the next paragraph.
I guess my main question with this is whether you are adding anything by doing it. It is true that in the brain, words are associated together in small chunks, and those chunks in turn are used in bigger aggregations. But what does that buy you? I guess the only way to know would be to try it out. And now, I’m going to read the posts on the Numenta blog to get up to speed on what Numenta is already doing with LLMs.

1 Like