Using HTM for logical deduction/induction and prediction of sentences

Hello all,

I have recently been experimenting with the HTM.core code to train an HTM network to learn the logic behind a corpus of sentences. Currently, they look like this:

“If A is in X, where is A?”

Here is my training data:

[

    {"input": "Bob is in China. Where is Bob? ", "output": "China"},

    {"input": "Ronald is in Vietnam. Where is Ronald? ", "output": "Vietnam"},

    {"input": "Tracy is in Korea. Where is Tracy? ", "output": "Korea"},

    {"input": "Erick is in Indonesia. Where is Erick? ", "output": "Indonesia"},

    {"input": "Jonathan is in Germany. Where is  Jonathan? ", "output": "Germany"},

    {"input": "Joseph is in Russia. Where is Joseph? ", "output": "Russia"},

    {"input": "Lila is in Bulgaria. Where is Lila? ", "output": "Bulgaria"}

]

Here is my testing data (learning off):

[




    {"input": "Where is Joseph? ", "output": "Russia"},

    {"input": "Where is Jonathan? ", "output": "Germany"},

    {"input": "Bob is in China. Where is Bob? ", "output": "China"}




]

And I am attempting to train the model to predict X. However, the results have not been satisfactory. I am utilizing a custom classifier that I will admit I vibe coded (I am in high school, I am still very new to the topic of ML and cortical learning) to predict the appropriate token given the current active columns of the base layer of the network. I have tried two approaches, utilizing both nltk and simply feeding the network character by character.

My first approach uses the HTM rest API that I modified a slight bit to get around some of the safeguards preventing me from accessing some of the HTM network’s data, while my second approach uses the python HTM bindings, and this is where I have been trying the character-by-character level tokenization, which has worked better.

I’ve spent weeks trying to figure this out, but the REST based, word-level tokenization simply refuses to learn both the training and testing sequences, and while the python, character level approach successfully learns character level predictions, I would think that this would scale up to word-level tokens.

Interestingly, the word level approach does seem to predict the proper tokens as the anomaly does converge, but the best prediction seems to be different.

Here is a graph of the anomaly from one of my tests with the REST approach:

Ignore the red and green, those are for the second layer. My end goal is to use the apical feedback to learn higher level patterns.

The histogram displays the anomalies for the testing data.

However, here is an example of predictions during training after several epochs (the word below each epoch is the prediction):

Epoch 9: New training task: Joseph is in Russia . Where is Joseph ? Russia

Germany

Epoch 9: New training task: Lila is in Bulgaria . Where is Lila ? Bulgaria

Russia

Epoch 9: New training task: Bob is in China . Where is Bob ? China

Bulgaria

Epoch 9: New training task: Ronald is in Vietnam . Where is Ronald ? Vietnam

China

Epoch 9: New training task: Tracy is in Korea . Where is Tracy ? Korea

Vietnam

Epoch 9: New training task: Erick is in Indonesia . Where is Erick ? Indonesia

Korea

And the final predictions:

Epoch 9: New testing task: Where is Joseph ? Russia

Bulgaria

Epoch 9: New testing task: Where is Jonathan ? Germany

Bulgaria

Epoch 9: New testing task: Bob is in China . Where is Bob ? China

Bulgaria

Meanwhile, for the second experiment, the model seems to be learning words from characters, although not the right words. Here are the final predictions for my second experiment:

Where is Joseph?
onesia. Wh



Where is Jonathan?
esia. Wher



Bob is in China. Where is Bob?
d? Bd? Bd?

Here is the code for the first, REST based approach:

While here is the code for the python bindings approach:

Thank you for your time!

1 Like

Also, here is the classifier:

1 Like

That’s a pretty neat project!

It does look like there is an off-by-one error with the REST API results predicting the previous response?

You seem to be experimenting with two layers of HTM’s, so I will share my findings on that topic too: Video Lecture of Kropff & Treves, 2008

1 Like

I have a hypothesis that the problem may be the network is predicting the next tokens correctly, evident in the low anomaly score, but that it isn’t inhibiting connections for the other possibilities and thus every training example the predictions are slightly biased in favor of the current correct token, which causes an incorrect response for the next prediction.

Could this be the case? or does HTM simply struggle with learning multiple sequences? I’ll try putting all the sentences in one long sequence and seeing if that improves anything.

1 Like

An HTM should be able to do this task. By the time it’s done training, there should be one anomaly at the very start, and none thereafter.

I have not looked closely at your code, but I assume that your classifier is using the final state of the TM to predict the next token? I’m not sure how you’re getting an output out of the HTM.

That’s a great idea! Resetting the system is unnatural. Animals only reset their brains when they sleep. The HTM should not need to be reset in between runs either. I would also recommend shuffling the samples into a new order in between each epoch, so that it doesn’t merge into one big sequence. This will cause anomalies, and if you plot the anomalies over time they should coincide with the transitions between sequences.

1 Like

Sorry for my rather late response, I’ve been swamped with schoolwork…

I’ve given the idea of not resetting the network between training examples a shot, but that doesn’t seem to have done the trick. I suspect that my REST experiment may be configured nonoptimally or may have a timing error, because according to my pyplot visualization the active cells are all the same with the final token in my training examples (I removed the question marks). I’m not tokenizing the spaces so I assume its a silly bug of some kind…

Anyhow, what are your thoughts on using the anomaly score with IRT (Item Response Theory) to attempt an “optimal” training system? I’m thinking about implementing my own logic for eligibility tracing to effectively propagate the error signal backwards, and probably steer away from using the REST for now given how much easier the pure python bindings are (I still want to figure out the REST bug, however).

Finally, what are your thoughts on the role of the hippocampus in predictive coding? Does the hippocampus depolarize/bias the temporal layers in the same way as apical connections? How might one implement a hippocampal “cache” of sorts, and then use some sort of replay for further training? Curious if you might have anything I could look at in that domain.

I looked at your grid cell work, I find the use of high pass and low pass filters intriguing, although I’m personally curious to see how leaky-integration might work, taking one out of the SNN playbook.

2 Likes

I ran another experiment and it seems as if the network is encoding the final word/name that is fed to it the same (regardless of whta the name is), even though the first encoding of the name is encoded differently.

Shouldn’t the HTM network encode the token the same regardless of the context in terms of columns? Obviously the cells within the columns should be different, but I suspect the answer MUST be some annoying bug

2 Likes

Thanks for sharing! As one of the developers, it’s really interesting to hear about other people’s experiences using the htm.core software, as well as programming htm’s in general. It sounds like there is a bug with the REST API version of your program, or possibly in the REST API or htm.core itself, and I can not help you with that. I also prefer to use the python API.

If you encoded the same token twice, it should yield the same encoding. And also the minicolumns should be the same unless the spatial pooler is learning, in which case they might be a little different, but even then I would still expect them to have a high overlap.

I recommend reading Florian Fiebig’s PhD thesis “Active Memory Processing on Multiple Time-scales in Simulated Cortical Networks with Hebbian Plasticity.” Don’t be intimidated by the length, it starts with a long introduction of the topic that is very helpful, but can be skipped if you’re confident you already know that stuff. Then Fiebig summarizes three papers that he published. Paper #1 is about memory, sleep, and the hippocampus.