Esperanto NLP using HTM and my findings

I am wondering about the same. In the long run it should come back, I think it really is a must-have feature.

1 Like

Hmm, first time hearing about a phoneme level NLP…
I think they are the same in Esperanto (It would be an interesting problem for natural languages). Esperanto has a definite mapping from phonemes to characters. a is always a, d͡ʒ is always g, o-i is always oj… Character level NLP is phoneme level NLP in Esperanto.

I might be wrong. I haven’t use Esperanto as much as you do. What do you think? Maybe diphthongs like oj, uj will make a difference?

You’re right. That unique mapping is one of the features that make Esperanto easy to learn.

But, for example, don’t you think that it would be easier to learn “mi mangas polmon” phoneme by phoneme than character by character? Only to normalize our understanding, what I mean by phoneme would result in just three phonemes: mi, man, gas, pol, mon rather than 15 characters

1 Like

Ahhhh… Thanks for clearing my misunderstanding.
That is a great idea! Learning by phonemes/syllables should be way easier then character by character. And a lot more nature. Sounds like a unique and fun challenge.

I’m in!

1 Like

There is an older thread that I just bumped up that is very relevant to this thread.

I added a post to that thread that is also applies to the current consideration of the letter/syllable/word choice.

That thread also some mentions of the cortical IO product. This is sort of the “next step” with language and may be worth your time to look into. They have some APIs that I have looked at but not implemented. Leveraging those tools might give you a lot of bang-for-the-buck in extending your work.


I hacked up a voting system during the weekends with 16 TMs. It is a bust.
When HTM is asked to predict what is after Estas multajn pomo. HTM predicts Estas multaj pomo ke la viro kajn polindan sensonis al lia. Still not predicting -jn that it should be predicting.

Randomlly generated text is still gibberish too. Though is looks more Esperanto-ish than a single TM.

Sensovas la vestis la vo ke la viro kiu aŭteza krome la vesi paĝen la vo kaj kajn pipeza kromalpli an sensovas linduste en si peza krome la voĉo. Ian sono. Is la voj korigis linduste en si paĉi tis en sonis en sia korigis alta dan sis en ludis al lindumovoĉo. Iro kie la voĉo. Ie lindan sis la vesta dumovas la viro kajn plejo kajn pro kie lindumovas lian sis en lumo kajn por paĥvesis en sensovas li

I think we do need multple layers to make NLP truly work. Voting helps to a degree but we don’t get coherent stuff by a bunch of non-coherent generatrors.


Have you tried to do it with a RNN to see if that works?

1 Like

I’m curious what your overall strategy looks like for this. I assume multiple regions trained on different data, all predicting next letter, and voting is for what will be the next letter? If so, I think this is the wrong approach.

Recall from the SMI experiments that the regions were not voting on what will be the next feature, but rather what object is being sensed. The object representation then biases the predictions in the input layers for the next feature. This system thus requires a layer to represent something more abstract than the features (letters) that it is constructed from. This layer could be representing words, phrases, or sentences for example.

There is not currently a working algorithm to form these more abstract “object” representations in an unsupervised way, though. In the experiments they chose a random SDR for the output layer to represent each object during a training phase. To replicate a similar supervised learning strategy here, you might for example choose a random SDR for each phrase or sentence during training. To move from this to an unsupervised system will require a new pooling algorithm that can learn to form these more abstract “object” representations by itself (something I and other theorists on the forum are working on).

Also recall from the SMI experiments that the purpose of the voting was to replace the need for multiple “touches” so that recognition could happen in a single shot. This implies that the “sensors” are observing different locations on the object (not all the same part of the object). This doesn’t apply to a single temporal stream directly (without doing something implausible like feeding the regions different parts of the sequence simultaneously). Instead you’d need to add another “modality” (another perspective of the abstract “object” which is being represented). For example you could have some streaming images of what the phrases or sentences are describing, convert those to SDRs, and run them through a TM layer that is able to vote along with the letter predictors about what phrase or sentence is being observed.

1 Like

I have avoided this “supervised learning” issue on this forum as is seem to be close to a “religious or political” third rail.

When it comes to language there is no example I can point to where a system forms internal models of functional language by “just listening.” There is always a teacher.

In the dad’s song project we determined that the “sexy dad” bird sang a song that was learned as the guiding training signal. Baby bird learned to reproduce the whole song as an entity. In this model, there was no underlying model of semantics. Entire songs are the words learned. This does not help us much with language but it could be very helpful in getting from letters to words.

In the human model, we have an external teacher to pair word sounds to perceived objects and sequences. These sequences could be verbs and relationships. If you expect the model to form internal models I think it is perfectly biologically plausible to have some sort of training signal paired with the stimulus to form this internal model.

How should we train up a hierarchy to form these internal models? Most self-organizing models I have seen have been underwhelming; the signals that form a structure in the presented data take massive repetition to discover these underlying patterns and are usually brittle. What is learned is often some irrelevant features that miss the “essential” structure of the data.

Not all.

In the “three visual streams” paper they have a bi-directional hierarchy with some primitive training on one end of the stream and the inputs on the other end. The “lower” end is the senses. In the paper, the “top” end is some facts about the physical world (object permanence, connectedness, and such) that provides feedback as the "The weak guidance on the “top” end and serves as a “light at the end of the tunnel” to help form internal models. For language, this guidance could be facts about the language such as semantic structure or word pairs. Even if the entire rest of the paper is utterly useless to you this principle is extremely valuable.

I will add that Numenta is just getting to the point where they are looking at the thalamus streams as part of the HTM canon. Work in this area could well be useful in advancing the HTM model.

1 Like

The closest thing that I am aware of is semantic folding. The problem with the original algorithm (as described in Youtube videos – I don’t have any inside information) is that it requires compiling all the text snippets ahead of time and assigning them a coordinate on the semantic map.

I was resisting the urge to inject my favorite topic onto another thread again, but it is probably relevant here. I’ve implemented an alternative to the semantic folding algorithm which learns online using eligibility traces. Combining this with hex grid formation for feature binding is the route I am currently exploring for the Dad’s Song project. As you mentioned, self-labeling requires emotional flavoring as one of the “other perspectives” of the “object” being encountered.

I should also point out that the system I am describing would be for repeating phrases/sentences that it has previously learned – it wouldn’t be for generating new novel sentences. I realize that this is more in alignment with the goal of Dad’s Song than perhaps the goal of the Esperanto NLP project, so hopefully I haven’t gone too far off topic :blush:


Cortical IO builds an SOM with the corpus of snippets. Nothing particularly fancy. I don’t know if it is possible to use an SOM as you are populating it. This is a technology that I have been wanting to implement in HTM but my Neural Workbench project is getting the love at this time.

I have read about Affine transformations and see that it may be possible to adapt that to the SOM formation but at this point looking it is one of a basket of projects on the back burner. That burner is getting very full.

As an AI hobbyist with too many projects and not enough time I have to say: having to work at a day job to pay the bills sucks.

1 Like

This is exactly my problem. In my case, it doesn’t help that I have a number of other competing hobbies as well :smile:




I’ve found HTM to be well suited for evolving representations, especially if the changes are gradual. Sudden shifts don’t take long to recover from either, as long as they aren’t happening too frequently, or stability increases over time. As bits in a representation are added/dropped, the synaptic connections are updated. This is the same behavior you see in anomaly detection with learning enabled. You get some bursting in the following input depending on how large the changes are, then those changes are learned.

Good point – I suppose what I really meant is that the level of manual interaction from a human teacher during the training process can be significantly reduced with a good auto-encoder/SOM, versus having to manually generate an SDR for every high-level concept, such as every sentence (especially if you want those SDRs to capture semantic similarities between similar sentences). Some form of hierarchy is a necessary component for this (the interaction between an HTM “input layer” and “output layer” is a simple two-level hierarchy)

1 Like

I trained multiple regions with the same input and sightly different parameters (different seed, permInc, permDec, etc…). And let them all predict the future. Forming an ensemble. Maybe this is not the right way.

I’m working on it… That’s say Pytorch although powerful is tricky to handle.


Initially I tried to generate Esperanto using LSTMs and PyTorch. But the network ended up predicting ’ ’ spaces all the time…
So I tried to generate Esperanto using the same approach this post uses.

But again the network is predicting spaces all the time. I can’t reproduce the author’s result.

My PyTorch notebook

And the Keras notebook (code from the linked post)

1 Like


Try fastai… they have a lesson about RNN and lnext-letter prediction (for English, but they do it more or less from scratch). They got pretty good results very quickly.

Check their other lessons, too, I am not sure I got the right one.

From a philosophy point of view RNNs are the closest “traditional neural network” to HTM in terms of what it does and how you structure problems.

As a HTM hobbyist it may be very helpful to become familiar with RNN technology. One of the areas where RNNs have seen success is in NLP (Natural Language Processing). I don’t think that it is necessary to mindlessly copy what RNNS are doing so much as a way to identify and think about the tasks in language processing when using a predictive system.

I just got these in my latest dispatch from my Medium daily digest:

But really - this stuff is everywhere; Google “RNN nlp” for a cornucopia of information on this topic.
Perhaps an overwhelming amount. That said - I would rather have too much than too little.

A pointer to posts in a related thread:



One of the main difference between RNN and TM is that RNN is a universal function approximatior. While TM is basically an Series AutoEncoder in a box. RNNs has a huge advantage that it can learn hierarchical meta-representation of sequences by stacking RNNs together and trough back prop. While in HTM we are stuck with one single layer of TM. (Although HTM is way more sample efficient than RNN)

I think HTM need some automatic meta-learning method to compete with neural networks.


A post was merged into an existing topic: H is for Hierarchy, but where is it?