I have been following Numenta and it’s technology (I would rather say its philosophy, and I too believe in it) since 2017. After 5 years and dealing with DL for my research in Natural Language Processing, I want to ask for guidance whether we can explore the technology in Numenta for sequence labelling task? Can anyone please guide me? Thank you in advance.
I don’t know if this will help you, but I consider the layers 5 & 6 to be the sequence portion of HTM, and 2 & 3 to be the pattern portion, with layer 4 being the receiving & distribution of thalamus input.
In this framework, the state of layers 2/3 could be considered the label.
If you accept my hex-grid proposal as a signaling framework the next question is what do the labels mean?
In the brain, the content of the pattern layer is learned as inputs to the co-trained brain regions and no interpretation is required. If you want a stand-alone system you may have to have an interpreter network to decode the labels into a form that some outside system can use.
Thank you. I shall explore it. Will share our experiences here.
Can you suggest some resources from where we can start! I’ve got these “https : // github . com / numenta / nupic.nlp-examples”. Do you know any other work in this line (using HTM)? Thank you!
No - you are doing the cutting-edge research on this.
“I must be a mermaid, Rango. I have no fear of depths and a great fear of shallow living.” - Anais Nin
I made a proof of concept sequence labeling program using HTM, and then i made a video presenting it:
My demo looks at a sequence of letters and determines what word it is looking at.
Great. Will dive into it right now. Many thanks.
What exactly do you mean by sequence labeling?
I know the spatial pooler’s output is an SDR - a learned label to the input that it sees, except that these labels are meaningless to us humans. Therefore there is usually an extra step that processes the spatial pooler’s output that does the actual classification work.
Yes, it can be considered as classification. A straight forward example would be to label a word with correct parts of speech in a stream of words. Currently we are doing the same with LSTM and BiLSTMs. I want to see how HTM or TBT behaves in this class of problems.
Can you recommend any datasets for sequence labeling?
For what I’m aware of, HTM is used for unsupervised time series learning/prediction. I’m not sure how could it be used for a labeled dataset
Yes, you can download the brown corpus from nltk.org/nltk_data (serial no 10). It contains pos tagged English sentences. There are unsupervised methods in pos tagging too. So we can do that using HTM. Moreover, our brain is capable of both supervised and unsupervised learning. So, to be truly able to learn like the brain, we must invent a system that can lead both ways. Correct me if I’m wrong.
Have you seen these guys?
They embed some (not a lot) of HTM ideas and get to and from spare representation.
Some toys to play with:
From what I can tell, they do off-line training using SOMs.
It would be pretty awesome to do a SOM-like mapping using HTM/TBT one-shot learning.
I think cortical.io are mainly focused to classify/cluster full documents.
If I got it right, sequence labeling is more concerned with classifying individual words within a phrase - which one is noun/adjective/verb/etc…
I wonder how hard would be an HTM-like AI that infers grammar rules/parts by itself, unsupervised, from plain text datasets. Stuff that the transformers seem to get pretty well, providing they are trained with lots of examples.
Its hard. It will be worth exploring. If we can do that, a new paradigm for HTM or TBT will open up.