Esperanto NLP using HTM and my findings


Some rambling Bitking pontifications:

RNN and HTM can both do sequences and they both can match up a sequence in progress and do auto-completion. Great if you are doing spell checking. Beyond that - meh.

That is rather limited. In the bigger picture you can imagine typing in “It was a dark and stormy night” and a terrible pot boiler would spew out. I think that bad writing could be automated that way. No - I don’t think it should but - but it could.

As far as something a little more useful you need a purpose to write - it has to be about something. HTM is supposed to be based on how the brain works and it is thought that it works in what is sometimes called a predictive mode. I don’t know if that is really the best way to describe it. It is more about recall of a learned sequence(s).This may involved bits and pieces being jammed together in a synthesis to match a novel perception.

In the H of HTM we expect that the higher level representation is formed and is fed back to to the lower level so there is recall at all levels. If there is a “topic” that has been registered (we are talking about x) then the recognition at lower levels is guided to select fragments that are compatible with this current topic.

The canned generator should not be based on letter sequence but on parts of speech.

In sentence generation a more interesting version would work like Parsey McParseface and discover what it was that we are talking about and construct sentences that have something to do with the discussion.

I think of this as a blackboard. I can conceive of this as holding the current nouns and the relations between them. (verbs and adjectives)

The logic of generation should be distributed over several levels of representation rather than at a single level like this example program.

It would be asking a lot to have world knowledge of what the practical actions that could be taken (common sense) but it would in interesting to see what is possible. Hook this up to Eliza and let it go!


No, there is no need for a SDRClassifer in this case. Since we got only one input variable, the character. And the fact that TM does not care about spatial feature. I can send the encoded character (encoded by a CategoryEncoder) directly into a TM without a first trough a SP breaking the spatial feature of the SDR. And since the spatial feature is still intact. I can reverse the process then extract a (fake) probability distribution. By summing the bits that is a 1 for each category in an SDR then softmax the sum

Anyone is free to take, use and modify the code.

The Esperanto word for Esperanto is Esperanto. English is calle Agnla is because it is short for La Angla Lingvo. But Esperanto dons’t belong to any country, so it’s Esperanto.


I totally agree that sentence generation should be done by multiple layers. However that is not possible without support to inject distal signal. I might be wrong, but I believe most HTM implementations doesn’t support it.


I need a better understanding on why you don’t need a SDRClassifier in that case.

Isn’t TM predicting the next value as the SDR formed by the predicted cells? How could you decode it? Only based on each collumn has been activated? Don you get into the cell level?

Thanks for making a correction to that Esperantan mistake of mine.


I don’t know what’s the case in But in NuPIC.core TemporalMemory does not perform pooling on the proximal input and always has the same input/output SDR shape. (In HTM School TM does perform pooling for some reason.)

Let’s work on a simpler case. Assuming we have a CategoryEncoder, which encodes 3 categories (1, 2 and 3) with 4 bits per category. So the CategoryEncoder generates an SDR what is 12 bits(3 categories x 4 bits) long, where the bits corresponding to the encoded category is setted to on. For example when encoding category 1. The generated SDR is :

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
|-1st cat--|-2nd cat---|-3rd cat---| #cat = category
# NOTE: CategoryEencoder in NuPIC will automatically create an extra category 'unknown' for
# unknown inputs.. Thus when encoding 3 possible categories with 4 bits each, the SDR
# will ended up being 16 bits long

With that in mind. Let’s see how TM works. To describe in one sentence. TemporalMemory tries to predict the next value base on the currently activated cells (context) and connection between cells.

Assuming we have a TM with 12 columns; corresponding to every bit in the input SDR. When the input sequence (1, 2, 3) is sent into a TM. Connections grow from cells in columns representing category 1 (the first 4 columns) to cells representing category 2 (the middle 4 columns) and from category 2 to category 3. Therefor when category 1 is sent to TM. Connections made cells in the middle columns predictive. By that we say that TM lerns and predicts the pattern (1, 2, 3). Higher order memory works about the same way. It is just different sets of connections causing different cells in the same column going into a predictive state.

TL;DR. TM learns which cells are normally active before a column is on. So we can send a SDR into a TM and TM will predict which columns should be on no matter the format of the SDR is. In board terms. Feed TM a series of categories, TM will try to predict the next category (and in the original structure).

So, to recover the predicted class probability distribution from the predicted SDR. First we sum up how much bits in a category is on. Then apply softmax over it.

result_sdr = [0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0] #Assuming this is the predicted SDR.
#representing|-1st cat--|-2nd cat---|-3rd cat---|
on_sum = sum_category(result_sdr) #[1, 4, 0]
prop_dist = softmax(on_sum) #[0.04661262, 0.93623955, 0.01714783]

No SDRClassifer is needed.

BTW, HTMHelper is simply a wrapper around NuPIC.core so I can make use of ND-arrays like in NuPIC and have a cleaner API. It eventually calls NuPIC.core to do the computation.

Edit: Fix mistakes.


Older work with HTM did consider multiple levels. The H of HTM is hierarchy.

Since that time the work has pulled back from “H” to focus on loading a single level with as much function as possible. That has brought the current “thousand brains” view as the state of the art.

Now if only these “thousand brains” could work together with hierarchy.

I think that the direction of future work will have to move back to hierarchy to see the full power of HTM blossom.

Right now there are questions regarding the connections between layers 2/3 and 4 and from what I can see this is being considered with a strong focus on the local computation. If you pull back and notice that the map to map connections use L2/3 as the connection point this may cause a rethink on how the column fits in a larger system.

More plainly - if the column is part of a larger system the function locally has to fit into this larger picture. Making more of an effort to bring in “H” will inform consideration of what is being computed locally.

I am encouraged that there is news that the elves at Numenta are considering the connections to the thalamus in the lower levels. If this widening of focus is extended to the L2/3-to-L2/3 connections all the better.


That’s very interesting work! The result might be gibberish, but I think overall it is very encouraging.

When using RNN for NLP problems, the threshold between gibberish and actual words and sentences is very small… so further tweaking might be just enough.

Also, I wondered: did you use any kind of tokenization? I assume not and I do not know if that exists for Esperanto.


I would be surprised if HTM gives me coherent stuff as that requires higher level features that is not possible with a single layer of TM. Gibberish is expected. Tho it does outperform my expectations. The sentences are looking more like Esperanto than I expected.

It is a character level model. So, no, no tokenization is applied.


Are there ways to access the hierarchical feature in HTM? I have seen ancient code from way back that does that. But they are unrunable now.

Just out of curiosity… What happened causing Numenta and the community to drop the H in HTM?


@marty1885, I was thinking about a challenge for us. What do you think about instead of a character level NLP we try to build a phoneme level NLP? Just like a baby learns a language, identifying the phonemes and then the words…

What do you think about it?


I am wondering about the same. In the long run it should come back, I think it really is a must-have feature.


Hmm, first time hearing about a phoneme level NLP…
I think they are the same in Esperanto (It would be an interesting problem for natural languages). Esperanto has a definite mapping from phonemes to characters. a is always a, d͡ʒ is always g, o-i is always oj… Character level NLP is phoneme level NLP in Esperanto.

I might be wrong. I haven’t use Esperanto as much as you do. What do you think? Maybe diphthongs like oj, uj will make a difference?


You’re right. That unique mapping is one of the features that make Esperanto easy to learn.

But, for example, don’t you think that it would be easier to learn “mi mangas polmon” phoneme by phoneme than character by character? Only to normalize our understanding, what I mean by phoneme would result in just three phonemes: mi, man, gas, pol, mon rather than 15 characters


Ahhhh… Thanks for clearing my misunderstanding.
That is a great idea! Learning by phonemes/syllables should be way easier then character by character. And a lot more nature. Sounds like a unique and fun challenge.

I’m in!


There is an older thread that I just bumped up that is very relevant to this thread.

I added a post to that thread that is also applies to the current consideration of the letter/syllable/word choice.

That thread also some mentions of the cortical IO product. This is sort of the “next step” with language and may be worth your time to look into. They have some APIs that I have looked at but not implemented. Leveraging those tools might give you a lot of bang-for-the-buck in extending your work.


I hacked up a voting system during the weekends with 16 TMs. It is a bust.
When HTM is asked to predict what is after Estas multajn pomo. HTM predicts Estas multaj pomo ke la viro kajn polindan sensonis al lia. Still not predicting -jn that it should be predicting.

Randomlly generated text is still gibberish too. Though is looks more Esperanto-ish than a single TM.

Sensovas la vestis la vo ke la viro kiu aŭteza krome la vesi paĝen la vo kaj kajn pipeza kromalpli an sensovas linduste en si peza krome la voĉo. Ian sono. Is la voj korigis linduste en si paĉi tis en sonis en sia korigis alta dan sis en ludis al lindumovoĉo. Iro kie la voĉo. Ie lindan sis la vesta dumovas la viro kajn plejo kajn pro kie lindumovas lian sis en lumo kajn por paĥvesis en sensovas li

I think we do need multple layers to make NLP truly work. Voting helps to a degree but we don’t get coherent stuff by a bunch of non-coherent generatrors.


Have you tried to do it with a RNN to see if that works?


I’m curious what your overall strategy looks like for this. I assume multiple regions trained on different data, all predicting next letter, and voting is for what will be the next letter? If so, I think this is the wrong approach.

Recall from the SMI experiments that the regions were not voting on what will be the next feature, but rather what object is being sensed. The object representation then biases the predictions in the input layers for the next feature. This system thus requires a layer to represent something more abstract than the features (letters) that it is constructed from. This layer could be representing words, phrases, or sentences for example.

There is not currently a working algorithm to form these more abstract “object” representations in an unsupervised way, though. In the experiments they chose a random SDR for the output layer to represent each object during a training phase. To replicate a similar supervised learning strategy here, you might for example choose a random SDR for each phrase or sentence during training. To move from this to an unsupervised system will require a new pooling algorithm that can learn to form these more abstract “object” representations by itself (something I and other theorists on the forum are working on).

Also recall from the SMI experiments that the purpose of the voting was to replace the need for multiple “touches” so that recognition could happen in a single shot. This implies that the “sensors” are observing different locations on the object (not all the same part of the object). This doesn’t apply to a single temporal stream directly (without doing something implausible like feeding the regions different parts of the sequence simultaneously). Instead you’d need to add another “modality” (another perspective of the abstract “object” which is being represented). For example you could have some streaming images of what the phrases or sentences are describing, convert those to SDRs, and run them through a TM layer that is able to vote along with the letter predictors about what phrase or sentence is being observed.


I have avoided this “supervised learning” issue on this forum as is seem to be close to a “religious or political” third rail.

When it comes to language there is no example I can point to where a system forms internal models of functional language by “just listening.” There is always a teacher.

In the dad’s song project we determined that the “sexy dad” bird sang a song that was learned as the guiding training signal. Baby bird learned to reproduce the whole song as an entity. In this model, there was no underlying model of semantics. Entire songs are the words learned. This does not help us much with language but it could be very helpful in getting from letters to words.

In the human model, we have an external teacher to pair word sounds to perceived objects and sequences. These sequences could be verbs and relationships. If you expect the model to form internal models I think it is perfectly biologically plausible to have some sort of training signal paired with the stimulus to form this internal model.

How should we train up a hierarchy to form these internal models? Most self-organizing models I have seen have been underwhelming; the signals that form a structure in the presented data take massive repetition to discover these underlying patterns and are usually brittle. What is learned is often some irrelevant features that miss the “essential” structure of the data.

Not all.

In the “three visual streams” paper they have a bi-directional hierarchy with some primitive training on one end of the stream and the inputs on the other end. The “lower” end is the senses. In the paper, the “top” end is some facts about the physical world (object permanence, connectedness, and such) that provides feedback as the "The weak guidance on the “top” end and serves as a “light at the end of the tunnel” to help form internal models. For language, this guidance could be facts about the language such as semantic structure or word pairs. Even if the entire rest of the paper is utterly useless to you this principle is extremely valuable.

I will add that Numenta is just getting to the point where they are looking at the thalamus streams as part of the HTM canon. Work in this area could well be useful in advancing the HTM model.


The closest thing that I am aware of is semantic folding. The problem with the original algorithm (as described in Youtube videos – I don’t have any inside information) is that it requires compiling all the text snippets ahead of time and assigning them a coordinate on the semantic map.

I was resisting the urge to inject my favorite topic onto another thread again, but it is probably relevant here. I’ve implemented an alternative to the semantic folding algorithm which learns online using eligibility traces. Combining this with hex grid formation for feature binding is the route I am currently exploring for the Dad’s Song project. As you mentioned, self-labeling requires emotional flavoring as one of the “other perspectives” of the “object” being encountered.

I should also point out that the system I am describing would be for repeating phrases/sentences that it has previously learned – it wouldn’t be for generating new novel sentences. I realize that this is more in alignment with the goal of Dad’s Song than perhaps the goal of the Esperanto NLP project, so hopefully I haven’t gone too far off topic :blush: