Ideas for better word encoding

mraptor · July 12, 2016, 11:00pm

Ideas for better word encoding …

I’m testing different variations of encoding WORDS (of characters) and passing them trough bbHTM:SPooler to see what happens.

Currently I tried 3 variants :

Use 5 bits for all possible 26 combinations (dense)
One-hot 26 bit vector
3 active bits out of 78 bits per character (i.e. category encoder)

So far the option 3 behaves the best…
I compare the SP generated word-SDR and I get for example :

13 bits overlap for “interface” and “intersection”, which is good … the interesting part is that I get zero-overlap for
"management" and “measurement”, you can see why if you put them one below the other :

 interface
 intersection

 management
 measurement

Do you see how the second example “slip” by one character

The question is do you have some other ideas for Encoder that will capture the slippage ?!

garry.morrison · July 13, 2016, 3:41am

What about letter ngrams? That’s what I use for my project.
Here is the python to convert a string to ngrams:

  def create_letter_n_grams(s,N):
    for i in range(len(s)-N+1):
      yield s[i:i+N]

Using 1,2,3 letter ngrams I get these similarity results:
45.8% interface, intersection
47.8% management, measurement

mraptor · July 13, 2016, 6:56pm

That’s good, but I want to pass to SP encoded-word not encoded-string !
Also how do you convert them to binary ? what encoding do you use for the n-grams ? If I use 2-grams I no longer can encode it with 26-category encoder ( per char), because there are too many 325! combinations !

Topic		Replies	Views
Do I need an SP between my word encodings and the TM? Engineering	8	817	April 19, 2018
Encoding squence or word NuPIC	7	773	August 16, 2018
Set encoder Numenta Theory encoders	5	698	June 7, 2017
Encoder with multiple similarities? Numenta Theory encoders	15	629	May 3, 2021
Raw TM Test (no SP) NuPIC encoders , temporal-memory , category-encoding	30	1371	June 10, 2018

Ideas for better word encoding

Related topics