Esperanto NLP using HTM and my findings

Right, though as readers we go from interpreting the text letter by letter as beginners to word by word and then clause by (I believe). For instance I only find myself sounding out words that are unfamiliar to me. We come to recognize larger structures as we gain more experience, so in reality I don’t think we choose any 1 level of data type to stick with. This seems inevitably related to the latest HTM theory about learning objects of objects.

1 Like

I am guessing that you would fire strings of letters with a stop at the white spaces. When you have trained so that is is good with word units fire pairs of words with a stop.
Parsey McParseface needs access to earlier parts of the string as context - in HTM that is the same as parts of the SDR being connected to samples of ‘older’ presentation of the string. These would be zero for shorter strings.
I suppose you could repeat the process getting longer until you hit a period.

1 Like

The model I developed is a character level one. I send character after character into the model and it learns character by character. To generate a paragraph; I start by sending HTM the ’ ’ (space) character. Then collect what HTM predicts, sample from the distribution, then feed the sample into HTM to generate another prediction. Rinse and repeat. And yes, this HTM has only seen his paragraph. I haven’t implemented model loading/saving in HTMHelper yet. :stuck_out_tongue:

No, the paragraph makes no sense and is simply Esperanto looking gibberish. I don’t even know how Google manages to translate it. Just to demonstrate. The first sentence Ŝi vojaĝis mia domo en merikis sed ĝizilo has a lot of error. Ŝi vojaĝis mia domo can be translate to She travelled to my house. But Esperanto has accusative marker, -n. And it is missing the to equivalent in Esperanto al. So in fact it should be: Ŝi vojaĝis al mian domon. Besides, AFAIK merikis and ĝizilo isn’t even a word. And there are still some major problems in this sentence.

A character level model is usually more powerful than a world/sentence level one. (At least that’s the case in the traditional ML/DL world.)

1 Like

As amazing as what HTM has demonstrated. HTM is missing some key feature of the language. Especially the -n accusative marker. HTM seems to be placing -n randomly. (Well, Esperanto is very free about it’s world order. So it can be SVO, OVS end event VSO, etc… But most sentences is SVO as in English. I think this is not the cause)
When I ask HTM to extend the sentence Estas multaj pomo (English: There are many apple). It should figure out that it should put -jn (plural and accusative, making the sentence. There are many apples) at the end of the sentence. But instead it puts a space directly after pomo: Estas multaj pomo ke lan loriste ensonoktista ŭdikajn. Yet at the end of the predictoin HTM generates ŭdikajn. Which contains the -jn that it should predicted.

I guess it is because that HTM is a memory based algorithm. It doesn’t learn how to infer anything but memorizes the pattern. For HTM to learn -n is needed for all nouns after a verb. HTM has to see every noun with a -n after a word ended with -as, -is, -os, -u, -us (All possible endings for a Esperanto verb). I don’t think HTM can generalize such knowledge. (At least not with a TM. Maybe grid cells and displacement cells can help? Anyone?)

Anyway. This is a fun experiment.

@marty1885, I think HTM is correct in that sentence. Since Estas is not an action verb and multaj pomoj is the only possible subject on that sentence I think “Estas multaj polmoj” is the right way to make that sentence. But if you try the sentence “I eat many apples”, I’m really hoping that HTM will auto complete it into “Mi man^gas multajn polmojn” because in that case we need the accusative n in order to know where it’s I that am eating the apple and not the contrary, the apple is eating me.

1 Like

Unfortunately no. Even with “mi manĝas multajn pomo”. HTM predicts “mi manĝas multajn pomo ke lan loriste ensonoktista ŭdikajn”.

I’m a bit dispointed with you, HTM :stuck_out_tongue:
Just kidding.
Could you inspect the probability distribution for the next character? I mean, HTM is predictiing the most probable value for the next character, but there are other possible alternatives that may give us some clue about what’s happening.

You can try it your self! I always made the code available for everyone.
JK, C++ is not for everyone. :stuck_out_tongue:

This is how many neurons are predicing for each category right after pomo.
{ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 16., 0., 16.}
So… TM is predicting ’ ’ (space) and v. : (

Space is reasonable. But V? My guess is that the TM has never seen the word pomo. So it bursts and that somehow get us a V.

Edit:
I tried adding some noise to my input SDR but it makes no difference. HTM is still predicting ’ ’ and V. There is way more -on in the training material. I have no clue now.

{ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 0., 1., 0., 0., 0., 0., 1., 0., 0., 0., 0., 0., 0., 0., 0., 16., 0., 16.}

1 Like

I’m a big enthusiast of La Esperantan (despite I’ve forsaken it for a long while) and HTM, so if your code is avaiable, may be I give it a try if my personal life allows me to do it. I have worked with C++ in 2006. I used to build audible response unities in C++ for a brazilian telcomunications company.

Aren’t you using SDRClassifier? https://github.com/numenta/htm.java/blob/master/src/main/java/org/numenta/nupic/algorithms/SDRClassifier.java

Forget about it. I now know you’ve built a HTMHelper by your own and theres is a SDRClassifier implemented there.

try this data https://www.kaggle.com/wcukierski/the-simpsons-by-the-data

Some rambling Bitking pontifications:

RNN and HTM can both do sequences and they both can match up a sequence in progress and do auto-completion. Great if you are doing spell checking. Beyond that - meh.

That is rather limited. In the bigger picture you can imagine typing in “It was a dark and stormy night” and a terrible pot boiler would spew out. I think that bad writing could be automated that way. No - I don’t think it should but - but it could.

As far as something a little more useful you need a purpose to write - it has to be about something. HTM is supposed to be based on how the brain works and it is thought that it works in what is sometimes called a predictive mode. I don’t know if that is really the best way to describe it. It is more about recall of a learned sequence(s).This may involved bits and pieces being jammed together in a synthesis to match a novel perception.

In the H of HTM we expect that the higher level representation is formed and is fed back to to the lower level so there is recall at all levels. If there is a “topic” that has been registered (we are talking about x) then the recognition at lower levels is guided to select fragments that are compatible with this current topic.

The canned generator should not be based on letter sequence but on parts of speech.

In sentence generation a more interesting version would work like Parsey McParseface and discover what it was that we are talking about and construct sentences that have something to do with the discussion.

I think of this as a blackboard. I can conceive of this as holding the current nouns and the relations between them. (verbs and adjectives)

The logic of generation should be distributed over several levels of representation rather than at a single level like this example program.

It would be asking a lot to have world knowledge of what the practical actions that could be taken (common sense) but it would in interesting to see what is possible. Hook this up to Eliza and let it go!

1 Like

No, there is no need for a SDRClassifer in this case. Since we got only one input variable, the character. And the fact that TM does not care about spatial feature. I can send the encoded character (encoded by a CategoryEncoder) directly into a TM without a first trough a SP breaking the spatial feature of the SDR. And since the spatial feature is still intact. I can reverse the process then extract a (fake) probability distribution. By summing the bits that is a 1 for each category in an SDR then softmax the sum

Anyone is free to take, use and modify the code.

The Esperanto word for Esperanto is Esperanto. English is calle Agnla is because it is short for La Angla Lingvo. But Esperanto dons’t belong to any country, so it’s Esperanto.

1 Like

I totally agree that sentence generation should be done by multiple layers. However that is not possible without support to inject distal signal. I might be wrong, but I believe most HTM implementations doesn’t support it.

I need a better understanding on why you don’t need a SDRClassifier in that case.

Isn’t TM predicting the next value as the SDR formed by the predicted cells? How could you decode it? Only based on each collumn has been activated? Don you get into the cell level?

Thanks for making a correction to that Esperantan mistake of mine.

1 Like

I don’t know what’s the case in NuPIC.java. But in NuPIC.core TemporalMemory does not perform pooling on the proximal input and always has the same input/output SDR shape. (In HTM School TM does perform pooling for some reason.)

Let’s work on a simpler case. Assuming we have a CategoryEncoder, which encodes 3 categories (1, 2 and 3) with 4 bits per category. So the CategoryEncoder generates an SDR what is 12 bits(3 categories x 4 bits) long, where the bits corresponding to the encoded category is setted to on. For example when encoding category 1. The generated SDR is :

[1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0]
|-1st cat--|-2nd cat---|-3rd cat---| #cat = category
# NOTE: CategoryEencoder in NuPIC will automatically create an extra category 'unknown' for
# unknown inputs.. Thus when encoding 3 possible categories with 4 bits each, the SDR
# will ended up being 16 bits long

With that in mind. Let’s see how TM works. To describe in one sentence. TemporalMemory tries to predict the next value base on the currently activated cells (context) and connection between cells.

Assuming we have a TM with 12 columns; corresponding to every bit in the input SDR. When the input sequence (1, 2, 3) is sent into a TM. Connections grow from cells in columns representing category 1 (the first 4 columns) to cells representing category 2 (the middle 4 columns) and from category 2 to category 3. Therefor when category 1 is sent to TM. Connections made cells in the middle columns predictive. By that we say that TM lerns and predicts the pattern (1, 2, 3). Higher order memory works about the same way. It is just different sets of connections causing different cells in the same column going into a predictive state.

TL;DR. TM learns which cells are normally active before a column is on. So we can send a SDR into a TM and TM will predict which columns should be on no matter the format of the SDR is. In board terms. Feed TM a series of categories, TM will try to predict the next category (and in the original structure).

So, to recover the predicted class probability distribution from the predicted SDR. First we sum up how much bits in a category is on. Then apply softmax over it.

result_sdr = [0, 1, 0, 0, 1, 1, 1, 1, 0, 0, 0, 0] #Assuming this is the predicted SDR.
#representing|-1st cat--|-2nd cat---|-3rd cat---|
on_sum = sum_category(result_sdr) #[1, 4, 0]
prop_dist = softmax(on_sum) #[0.04661262, 0.93623955, 0.01714783]

No SDRClassifer is needed.

BTW, HTMHelper is simply a wrapper around NuPIC.core so I can make use of ND-arrays like in NuPIC and have a cleaner API. It eventually calls NuPIC.core to do the computation.

Edit: Fix mistakes.

3 Likes

Older work with HTM did consider multiple levels. The H of HTM is hierarchy.

Since that time the work has pulled back from “H” to focus on loading a single level with as much function as possible. That has brought the current “thousand brains” view as the state of the art.

Now if only these “thousand brains” could work together with hierarchy.

I think that the direction of future work will have to move back to hierarchy to see the full power of HTM blossom.

Right now there are questions regarding the connections between layers 2/3 and 4 and from what I can see this is being considered with a strong focus on the local computation. If you pull back and notice that the map to map connections use L2/3 as the connection point this may cause a rethink on how the column fits in a larger system.

More plainly - if the column is part of a larger system the function locally has to fit into this larger picture. Making more of an effort to bring in “H” will inform consideration of what is being computed locally.

I am encouraged that there is news that the elves at Numenta are considering the connections to the thalamus in the lower levels. If this widening of focus is extended to the L2/3-to-L2/3 connections all the better.

3 Likes

That’s very interesting work! The result might be gibberish, but I think overall it is very encouraging.

When using RNN for NLP problems, the threshold between gibberish and actual words and sentences is very small… so further tweaking might be just enough.

Also, I wondered: did you use any kind of tokenization? I assume not and I do not know if that exists for Esperanto.

1 Like

I would be surprised if HTM gives me coherent stuff as that requires higher level features that is not possible with a single layer of TM. Gibberish is expected. Tho it does outperform my expectations. The sentences are looking more like Esperanto than I expected.

It is a character level model. So, no, no tokenization is applied.

1 Like

Are there ways to access the hierarchical feature in HTM? I have seen ancient code from way back that does that. But they are unrunable now.

Just out of curiosity… What happened causing Numenta and the community to drop the H in HTM?

1 Like