2013 Hackathon NLP Planning Meeting Notes

rhyolight · April 5, 2017, 9:14pm

On Friday, Sep 15, a planning meeting was held to help decide the direction for the Natural Language Processing (NLP) focus of the upcoming 2013 Fall Hackathon. Following is the agenda for the meeting, and meeting notes. Also in attendance during the meeting were Francisco Webber, Fergal Byrne, Jeff Hawkins, Subutai Ahmad, and Matt Taylor. A partial video of this meeting is available on YouTube.

Meeting Agenda

Primary Goals

Provide necessary tools, documentation, and guidance in the area of NLP to provide a foundation for NLP work for hackathon participants.
Identify higher-level building blocks for easier usage of NuPIC in general.

Current Tools

retina text to sdr mapping file
cept api for word-to-sdr mapping
- pycept python client
linguist project for feeding letters into nupic
nupic-texts project for converting test into better formats

Docs to Create

General introduction to NLP
How to convert text into SDRs
How to feed raw SDRs into NuPIC
How to tweak SP / TP settings
Links to relevant external resources

Open Questions

Possible to link multiple CLAs into a hierarchy for NLP usage?
- Get one level working well first

Meeting Notes

Provide a “back conversion” from the output of the CLA to translate an output SDR back into a word?
- cept has an api endpoint to do this, which returns the “closest” word to the input SDR? YES
- results could actually be outside of the input language corpus
provide a baseline
- Random word SDR input vs word SDR comparison
- known sentences vs unknown sentences
anomaly detection guidance could be a good idea, it should be easier to do this than prediction
- after training on sentences from a known corpus of words, feed in nonsense sentences and we should see high anomaly scores (re: syntactic anomalies)
- several sentences like: “Tom drove a X”, where X is a reasonable word, train on lots of values for X, then send unreasonable values to confirm high anomaly scores (re: semantic anomalies)

Tasks To Do:

test results of random word SDRS vs real word SDRs and compare
train on the known language set, then feed in grammatically incorrect input for anomaly detection
Aside from the existing children text corpus, provide another more general simple corpus of common nouns, verbs, etc., for experimentation.
- Francisco will gather the most frequent words in language and create another mapping file
Experiment: Train on the raw cept sdrs through the SP with the pass-through encoder, also train on normalized sdrs with fixed density, compare results

Topic		Replies	Views
Natural Language Processing NuPIC nlp , planning , nupic-wiki	0	3210	April 5, 2017
Tools for NLP Engineering nlp , tools , nupic-wiki	3	1932	May 24, 2017
Help to with NLP in NuPIC NuPIC	3	612	April 8, 2019
NLP Projects NuPIC nlp , projects , nupic-wiki	0	1279	April 6, 2017
HTM/nupic jobs? Lounge question , jobs , community	2	1090	December 5, 2017

2013 Hackathon NLP Planning Meeting Notes

Meeting Agenda

Primary Goals

Current Tools

Docs to Create

Open Questions

Meeting Notes

Tasks To Do:

Related topics