Ideas for better word encoding …
I’m testing different variations of encoding WORDS (of characters) and passing them trough bbHTM:SPooler to see what happens.
Currently I tried 3 variants :
- Use 5 bits for all possible 26 combinations (dense)
- One-hot 26 bit vector
- 3 active bits out of 78 bits per character (i.e. category encoder)
So far the option 3 behaves the best…
I compare the SP generated word-SDR and I get for example :
13 bits overlap for “interface” and “intersection”, which is good … the interesting part is that I get zero-overlap for
"management" and “measurement”, you can see why if you put them one below the other :
interface
intersection
management
measurement
Do you see how the second example “slip” by one character
The question is do you have some other ideas for Encoder that will capture the slippage ?!