How can I create an Artificial Intelligence system that can learn new words that are out of their initial vocabulary?


I use and other natural language process systems but I can not find a way to make my program learn new words that are outside the initial vocabulary. Learning new words in Real Time gives my system the advantage of being able to expand its database of concepts and apply them to new situations automatically.


Wouldn’t it need to know something about word etymology to accomplish that? That information, as far as I know, is not part of the Cortical IO fingerprint. So you would have to encode word etymology as part of the term’s representation.


It is fairly easy to create SDRs for new words by defining a new word in say 20 best existing keywords. Take the SDRs for all the keywords and score all the indices (if same index is in two words, it gets a score of 2, etc). Then take the top 2% total array size of indices with the highest scores (using a random tie breaker), and discard the rest. Resulting indices make your new SDR.

BTW, if you need an online learning system, you’ll need to change the semantic folding process a bit. As I understand implementation, it requires knowing all source text ahead of time. I used eligibility traces to solve that problem.


eligibility traces?


This is a concept from reinforcement learning (Google “RL backward TD”). A nice image that is often used to help visualize the concept:


Where the nodes in this case depict streaming inputs, and the current input is being shouted backwards in time. The closer an input is to the megaphone, the louder it can hear.

The basic idea here is that inputs which are nearer to a given input are more likely to have a cause/effect relationship than inputs which are further away. We assume word semantics are based largely on causality (which’s algorithm does as well), such that words which appear near to each other often share semantics.

For example, consider sequence A -> B -> C -> D. When forming associations, C will have more weight on D than B, which will have more impact than A. If later we see something like X -> C -> Y -> D, the semantics between C and D can be adjusted to be a tiny bit closer. Rinse and repeat.

Over time, inputs which appear near to each other more often share more semantics than inputs which appear near to each other less often. Inputs that randomly appear near to each other simply generate noise that is overwritten by subsequent random noise. Semantic similarities only coalesce between inputs that appear near to each other consistently.