I’m trying to figure out HOW, IF and WHAT kind of sequence similarity does the TPooler “preserve”.
From what I understand the TP is “glorified” UNION and is not clear when different similar sequences are encoded/merged by TP.
Do they preserve similarity once encoded as SDR ?
But then the question arises what similarity ? Lets take for example the following sequences :
ABC : ABX, AXC, BAC, CAB, XBC, ...
which one is the most similar to ABC ? which one are equally similar ?
In practice it is often the Minimum Edit Distance that is used !
Is this a feasible similarity measure ? If not what , why not ? Which one, then ?
The TP goal is to be unique and stable, but should also preserve similarity ?
The same way SP preserve SDR overlap, TP should preserve MED/? as TP-SDR overlap, right ?