Questions regarding SDR encoding

First let me mention I have implementation of some parts of HTM and can also look at it as intro to HTM.
You can find about it here : http://ifni.co/bbHTM.html

You could also skim this discussion.

The reason being is with this project I’m trying to apply HTM toward text processing and NLP.
A thing I discovered while working on this is that, so far the best encoding of words is to
use Category encoder to encode every character of a word (and then combine them to form the final SDR).
/I tried several other ideas for encoding which dont work so good./
Why this sort of encoding behave better ? Because the hamming distance between any character with every
other character is constant/the-same.

What this encoding fails to catch is “character-slippage”, as you can see here :

Any idea to remedy that are welcome.

PS> I’m currently on hold for this prj, but hope to pick it up in a couple of months :(, when I have time.
My current idea is to build a Semantic-encoder out of char-word-encoder and hierarchy SP’s and TM if necessary i.e. the
Encoder itself is a piece of HTM-modules.