So anyway, it’s your right to say it “completely fails” but I don’t think that’s right in ML terms. Using that “perfect match” criteria, Then a digit classifier layer stating an image of digit 5 has 20% chances of having label 5 is sooo wrong (by 80%), even when probabilities for other labels are lower than 20% - and based on that, the classifier makes the correct prediction, with only 20% “mathematical match”.
I’m really sorry, but you badly mistake me; when I say “completely fails”, I mean, “0 overlapping bits”. Let’s just take as granted and accepted that a high overlap and not-too-high distance (for example, an SDR with all 1s would have perfect overlap but would be useless) means that the SDRs match; I never doubted that was the case.
So, a 1000/20 sequence memory (that is, a triadic-based sequence memory with N=1000 and P=20) does fine, as you say, with long sequences as long as there is only one or two sequences. But, when you try to store and recall, say, five sequences of 1,000 tokens each (each token a random SDR), here’s what happens (everyone following along is encouraged to try this themselves; if you copy and paste what I type, you will get the same results; you should be able to replicate with your Python implementation, should you desire, but you’ll need to set up your test framework there):
cargo run --release --example=sequence -- -s 5 -l 1000 -N 1000 -P 20 | awk '{print $NF}' |sort |uniq -c |sort -n
Finished release [optimized] target(s) in 0.01s
Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -s 5 -l 1000 -N 1000 -P 20`
1 13
1 4
1 5
2 14
2 16
2 17
3 18
13 3
18 19
140 2
726 1
1655 0
2426 20
The first column is count of overlap counts, and the second is how many bits of overlap. This tells us that it perfectly recognized about half of the tokens, and the rest had two bits or less of overlap. If you drop P to 11, you get:
cargo run --release --example=sequence -- -s 5 -l 1000 -N 1000 -P 11 | awk '{print $NF}' |sort |uniq -c |sort -n
Finished release [optimized] target(s) in 0.01s
Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -s 5 -l 1000 -N 1000 -P 11`
1 9
32 10
4957 11
The lowest overlap is 9, and that was a single SDR; the rest are almost all fully overlapped.