HTM and language modelling


have there been any attempts to build (serious, not toy-scale) language models using HTM? It would seem like a sparse, hierarchical alternative to the (IMO slightly overhyped) transformer architecture would make for an impressive demonstration of HTM’s capabilities. (And before you mention I’m not interested in summarization or sentiment analysis; I’m interested in a cheaper GPT-3.)

But I’ve not yet found any code to that end. I considered writing my own, but I got hopelessly confused by all the different GitHub repos (nupic.torch, nupic.research, htm-community, …). Before I waste my time, could anyone tell me where I should start? Thank you! :slight_smile:


There are two aspects to consider. One is the word encodings. You could go with random SDRs for each word, but IMO, you’ll be better off with SDRs that encode the semantics of the word.

For word SDRs that encode semantics, I would typically recommend getting them from Cortical IO, but it seems they no longer support their lower-level APIs (which you would need to access the word SDRs), so you might want to start with implementing the semantic folding algorithm. Cortical IO has published a lot of information on how it works, so not too difficult to implement. Semantic folding is patented, though, so depends on the application.

The second thing you’ll need is the speech generation component. This is where I assume you are considering applying HTM. There are a couple of projects here on the forum which have explored speech generation, but AFAIK not with the greatest results. This one for example.

I believe the lack of hierarchy is probably part of the reason for the less than satisfactory results. I also think the addition of an object/output layer from Thousand Brains Theory would help. Either / both of these might be worth exploring, if you are looking for something that isn’t “out of the box” and want to contribute to HTM theory.


Thanks for the response Paul! That Esperanto project is interesting, but I agree: hierarchy seems to be crucial, and I would even assume that some degree of time-warping (i.e. higher layers’ predictions stay active for longer) may be required to capture the long-range grammatical interdependencies between words in sentences.

As for the encoder, I found this paper called SPINE, which turns a GloVe embedding into an SDR. That could be enough.

Could you point me in the right direction re: which of the many HTM-related Github repos I should start with, if I end up finding the time to try and build something? I am comfortable with PyTorch, but if the htm-community repo lends itself better then I would use that instead.


From a performance perspective (especially if you want to go large-scale networks), you can’t beat Etaler. It has a couple key differences compared to the vanilla TM algorithm, though (see my reply and references to other threads on that one for more details on the possible ramifications). If you are looking to work with the vanilla algorithms, the HTM community repo is probably the way to go.


Brilliant, thanks so much. Etaler looks promising, I’ll take a look!


Thanks @Paul_Lamb. I’m still active. But busy on my exams until mid January. Feel free to submit issues or send a DM. I’ll reply as soon as I logged into the forum.