Hello all,
I have recently been experimenting with the HTM.core code to train an HTM network to learn the logic behind a corpus of sentences. Currently, they look like this:
“If A is in X, where is A?”
Here is my training data:
[
{"input": "Bob is in China. Where is Bob? ", "output": "China"},
{"input": "Ronald is in Vietnam. Where is Ronald? ", "output": "Vietnam"},
{"input": "Tracy is in Korea. Where is Tracy? ", "output": "Korea"},
{"input": "Erick is in Indonesia. Where is Erick? ", "output": "Indonesia"},
{"input": "Jonathan is in Germany. Where is Jonathan? ", "output": "Germany"},
{"input": "Joseph is in Russia. Where is Joseph? ", "output": "Russia"},
{"input": "Lila is in Bulgaria. Where is Lila? ", "output": "Bulgaria"}
]
Here is my testing data (learning off):
[
{"input": "Where is Joseph? ", "output": "Russia"},
{"input": "Where is Jonathan? ", "output": "Germany"},
{"input": "Bob is in China. Where is Bob? ", "output": "China"}
]
And I am attempting to train the model to predict X. However, the results have not been satisfactory. I am utilizing a custom classifier that I will admit I vibe coded (I am in high school, I am still very new to the topic of ML and cortical learning) to predict the appropriate token given the current active columns of the base layer of the network. I have tried two approaches, utilizing both nltk and simply feeding the network character by character.
My first approach uses the HTM rest API that I modified a slight bit to get around some of the safeguards preventing me from accessing some of the HTM network’s data, while my second approach uses the python HTM bindings, and this is where I have been trying the character-by-character level tokenization, which has worked better.
I’ve spent weeks trying to figure this out, but the REST based, word-level tokenization simply refuses to learn both the training and testing sequences, and while the python, character level approach successfully learns character level predictions, I would think that this would scale up to word-level tokens.
Interestingly, the word level approach does seem to predict the proper tokens as the anomaly does converge, but the best prediction seems to be different.
Here is a graph of the anomaly from one of my tests with the REST approach:
Ignore the red and green, those are for the second layer. My end goal is to use the apical feedback to learn higher level patterns.
The histogram displays the anomalies for the testing data.
However, here is an example of predictions during training after several epochs (the word below each epoch is the prediction):
Epoch 9: New training task: Joseph is in Russia . Where is Joseph ? Russia
Germany
Epoch 9: New training task: Lila is in Bulgaria . Where is Lila ? Bulgaria
Russia
Epoch 9: New training task: Bob is in China . Where is Bob ? China
Bulgaria
Epoch 9: New training task: Ronald is in Vietnam . Where is Ronald ? Vietnam
China
Epoch 9: New training task: Tracy is in Korea . Where is Tracy ? Korea
Vietnam
Epoch 9: New training task: Erick is in Indonesia . Where is Erick ? Indonesia
Korea
And the final predictions:
Epoch 9: New testing task: Where is Joseph ? Russia
Bulgaria
Epoch 9: New testing task: Where is Jonathan ? Germany
Bulgaria
Epoch 9: New testing task: Bob is in China . Where is Bob ? China
Bulgaria
Meanwhile, for the second experiment, the model seems to be learning words from characters, although not the right words. Here are the final predictions for my second experiment:
Where is Joseph?
onesia. Wh
Where is Jonathan?
esia. Wher
Bob is in China. Where is Bob?
d? Bd? Bd?
Here is the code for the first, REST based approach:
While here is the code for the python bindings approach:
Thank you for your time!

