Triadic Memory — A Fundamental Algorithm for Cognitive Computing

So anyway, it’s your right to say it “completely fails” but I don’t think that’s right in ML terms. Using that “perfect match” criteria, Then a digit classifier layer stating an image of digit 5 has 20% chances of having label 5 is sooo wrong (by 80%), even when probabilities for other labels are lower than 20% - and based on that, the classifier makes the correct prediction, with only 20% “mathematical match”.

I’m really sorry, but you badly mistake me; when I say “completely fails”, I mean, “0 overlapping bits”. Let’s just take as granted and accepted that a high overlap and not-too-high distance (for example, an SDR with all 1s would have perfect overlap but would be useless) means that the SDRs match; I never doubted that was the case.

So, a 1000/20 sequence memory (that is, a triadic-based sequence memory with N=1000 and P=20) does fine, as you say, with long sequences as long as there is only one or two sequences. But, when you try to store and recall, say, five sequences of 1,000 tokens each (each token a random SDR), here’s what happens (everyone following along is encouraged to try this themselves; if you copy and paste what I type, you will get the same results; you should be able to replicate with your Python implementation, should you desire, but you’ll need to set up your test framework there):

cargo run --release --example=sequence -- -s 5 -l 1000 -N 1000 -P 20 | awk '{print $NF}' |sort |uniq -c |sort -n
    Finished release [optimized] target(s) in 0.01s
     Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -s 5 -l 1000 -N 1000 -P 20`
      1 13
      1 4
      1 5
      2 14
      2 16
      2 17
      3 18
     13 3
     18 19
    140 2
    726 1
   1655 0
   2426 20

The first column is count of overlap counts, and the second is how many bits of overlap. This tells us that it perfectly recognized about half of the tokens, and the rest had two bits or less of overlap. If you drop P to 11, you get:

cargo run --release --example=sequence -- -s 5 -l 1000 -N 1000 -P 11 | awk '{print $NF}' |sort |uniq -c |sort -n
    Finished release [optimized] target(s) in 0.01s
     Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -s 5 -l 1000 -N 1000 -P 11`
      1 9
     32 10
   4957 11

The lowest overlap is 9, and that was a single SDR; the rest are almost all fully overlapped.

1 Like

Thanks, that’s interesting. I apologize if I was aggressively pedantic.
Knowing the actual limits of an instrument is important.

If I recall correctly in triadic sequence code is an OR (union) made on some SDRs which would double sdrs used as keys to 40bits of length. That would limit the underlying triadic memory capacity somewhere under 20000 records.

It might be the cause here.

I think it is relatively easy to project a pair (or more) of SDRs into a “slim” SDR of a specified size instead OR-ing them - that should increase significantly sequence capacity.

Also, if that’s not too much to ask, would you mind using a specified size dictionary? I mean instead of having all unique SDRs (aka tokens) for each step to test e.g. 100 or 1000 dictionary size of unique SDRs (may call them words or symbols) and individual sequences would pick their sdrs from this dictionary instead of generating a new random token.

Such data would mimic sequence of commands in a robot, or algorithm steps, or text, etc.

I wonder how a limited dictionary would influence the above results.

Also regarding sequence length, I think it would be more biologically plausible sequence lengths of a dozen (or few dozens) of tokens in a sequence instead of thousands or even hundreds.

One minus of diadic/triadic memories is when their capacity is capped, instead of forgetting old data, they just go… nuts.

I guess some sort of automated or imperative forgetting would work by storing all write records in a queue and have methods to query queue size and “trim” the old records by popping them out of the queue or do these automatically - compute a “safe” queue size and remove records from old memory when needed.

I know storing all recordings sounds excessive but a 1M x (10/1000) bit SDR queue needs ~50MBytes of storage while the triadic memory itself is 1Gbyte large.
So it’s actually only 5% increase in memory (ab)use. 20 bits instead of 10 would double the storage per SDR but would limit the number of unique records to 1/8 ~= 125k

Same strategy could be implemented at the higher level user of triadicmemory - e.g. the sequence learner.

An internal ValueMap can record how often each SDR was used to query the memory and that would allow the pruning algorithm to re-write back the most queried (==interesting, useful) old records.

Memory usage of ValueMap is only 4bytes per address - 0.4% at sdr size of 1000 bits.

Its performance cost is much smaller than di/triadic memory reading. And can be reduced further if used sparsely (e.g. count only every 10 readings)

There’s a simpler “forgetting” approach which doesn’t require additional memory: For each counter that gets incremented when writing into the memory, decrement a randomly selected counter (unless its value is zero).

This method is already available in the C implementation.

Hmm… that should work, not sure if performance is better for “clustered” data which leaves in lots of zeros and then random sampling would hit lots of zeros too.

And it stills erodes the memory - it is equivalent to adding an equal amount of negative noise to every store.

Forget ValueMap optimizations, simple queue size and removal ( -1 increment) of old writings would maintain the remaining records clean and might be less computationally expensive as random removal needs to make a non-zero test for every bit decremented.

And memory increase… 5% at most it’s really not significant.

It is not too much to ask at all! It’s something I’ve been very curious about it too.

I’ve been wondering about this as well, but haven’t thought of a good way to do this. You’d want it to be deterministic, so maybe a union then remove every other connection or something?

I actually tried this, but found it didn’t work as well as keeping track of the maximum weight and doing a proportional decrement of all weights once a threshold is crossed. Which reminded me that that was a knob I could twiddle in the sequence memory, so setting the threshold weight to be 2 * pop instead of 1.5 * pop (the default) yields:

$ sequence -s 5 -l 1000 -N 1000 -P 20 | awk '{print $NF}' |sort | uniq -c | sort -n
      1 10
      1 8
      2 14
      2 15
      2 16
      2 7
      4 17
      4 18
      4 3
     28 19
     54 2
    261 1
    621 0
   4004 20

which is a pretty good improvement! Disabling forgetting completely yielded a slight improvement over the default (but not as good as what I just pasted), which shows that I should remove the default-value and force users to supply one.

Still, I’m planning on keeping a valuemap of some sort anyway and there’s already an erase() method.

1 Like

Yeah, I guess you can shoot first ask questions later
.
Like a union between first half of first SDR and second half of second one. They-re random anyway, with sdrs generated by bit-positional encoders, it would be an issue. I guess a fixed random permutation before would “mix” it enough to make it less sensitive to positional encoders.

1 Like

OK, here’s the data; my repo contains the updated code for the test.

I’ll paste the exact commands I ran, as well as the output, but without the messages from cargo about compiling or running. I’m also sorting the output by number of bits of overlap.

First, five 1,000 word sequences, without re-using SDRs (our baseline from last post):

cargo run --release --example=sequence -- -s 5 -l 1000 -N 1000 -P 20 | awk '{print $NF}' |sort |uniq -c |sort -rn -k 2
   3170 20
     18 19
      5 18
      1 17
      2 16
      2 15
      1 12
      1 9
      4 4
     11 3
     94 2
    491 1
   1190 0

Next, re-using SDRs; this creates 1,000 SDRs then shuffles them for each sequence (the -r argument). Spoiler: it does much worse.

cargo run --release --example=sequence -- -s 5 -l 1000 -N 1000 -P 20 -r | awk '{print $NF}' |sort |uniq -c |sort -rn -k 2
   1086 20
     15 19
      4 18
      1 17
      3 16
      1 12
      1 11
      1 8
      1 7
      2 6
      2 5
      7 4
     44 3
    291 2
   1155 1
   2376 0

As you can see, it only correctly recognized SDRs in sequence about a thousand times when there’s only 1,000 SDRs in the vocabulary.

Even a much sparser memory gets tripped up pretty quickly:

cargo run --release --example=sequence -- -s 40 -l 50 -N 1000 -P 10 -r | awk '{print $NF}' |sort |uniq -c |sort -rn -k 2
   1218 10
     36 9
      2 8
      3 3
      9 2
     81 1
    571 0

That’s 40 sequences of 50 SDRs each, using the same 50 SDRs each time, just shuffled between sequences, with P=10. Even a P=20 memory would have no problem with 40 sequences of 50 different SDRs each time.

I also tried an alternate to the normal union of SDRs to preserve the sparsity, but that slightly degraded performance.

There’s a “deep temporal memory” algorithm, implemented in C so far, which works well for small dictionaries. It combines subsequent inputs into temporal bigrams like the algorithm you’re testing, but then continues to combine those bigrams into trigrams, trigrams into 4grams, and so on.

I’m wondering, by the way, if similar tests have been done with the HTM temporal pooler…

I saw that as well, and it’s in my queue to port :slight_smile: Any kind of real-world system would have to have some kind of hierarchy, I think, and it’s interesting there are multiple ways to build one.

I also wonder this! My initial plan was to implement an HTM, actually, but I got side-tracked by your triadic memory due to its simplicity :slight_smile:

Dang, I just gave this a shot, and it was not pretty. On the other hand, it was almost 2x better than merging them then taking every other addr, so at least there’s that :slight_smile:

I guess the condition when compared with retrieved one from memory is checked for at least P bits overlap should be halved too since its length halves

2 Likes

Holy crap, that makes a huge difference! I apologize for the long-ish pastes here, but it’s worth seeing the differences.

First, the baseline. 40 sequences, each 50 tokens long, using the same 50 tokens for each sequence, N=1000, P=20 (similar qualitative behavior exists for sparser P):

cargo run --release --example=sequence -- -s 40 -l 50 -N 1000 -P 20 -r | awk '{print $NF}' |sort |uniq -c |sort -rn -k 2
     73 20
      7 19
      8 18
      2 17
      1 16
      2 15
      1 13
      2 12
      2 11
      4 10
      5 9
      7 8
      3 7
      8 6
     15 5
     46 4
    117 3
    303 2
    633 1
    681 0

As you can see, it does not have a good time.

Next, “split union” (glomming the first and second halves of each SDR’s weights together):

    300 20
      5 19
      4 18
      5 17
      3 16
      1 15
      1 13
      1 12
      3 11
      2 10
      1 8
      2 7
      7 6
     19 5
     35 4
     93 3
    220 2
    426 1
    792 0

More than 4x improved retention!

Last, “sparse merge”, which does a union and then removes every other connection:

    309 20
      4 19
      3 18
      1 16
      1 14
      2 13
      2 12
      2 11
      2 10
      4 9
      4 8
     17 7
     20 6
     20 5
     46 4
    120 3
    256 2
    496 1
    611 0

Also greater than 4x improvement! I think I’ll stick with the sparse merge method, just because it should still work with locality sensitive encoders without having to convolve it first.

2 Likes

@POv In the meantime I did some experiments with TemporalMemory with N=300, P=3 and have 10 sdr inputs like
sdr[0]=1,2,3
sdr[1]=4,5,6
sdr[2]=7,8,9
sdr[3]=10,11,12
sdr[4]=13,14,15
sdr[5]=16,17,18
sdr[6]=19,20,21
sdr[7]=22,23,24
sdr[8]=25,26,27
sdr[9]=28,29,30
after 2 iteration of learning, it learns this pattern correctly.

Now, I try to turn it in the prediction mode (in my case, TriadicMemory does NOT store any pattern in the prediction mode) and input a random SDR.
For example, I put sdr[4] into TemporalMemory for asking it to predict. Ideally, it should provide sdr[5] as prediction SDR.
But it fails to predict.

I do not get any better result even using more iterations of learning (e.g. 50 or 100).
I really do not understand, what is happen here? Any idea from you?
Thanks

One iteration should be sufficient.

P=3 is extremely small, the associative memories work through redundancy by expanding the P bits to P*P physical addresses.

Besides higher P, it needs a more than a single SDR to remember its embedding sequence. Give it few consecutive SDRs from a previously memorized sequence (2 or 3 sdrs should suffice, depending on variant)

@cezar_t I even tested it with P = 10 without any success!

With a sequence memory made of two triadic memories, it needs at least two prompts in the sequence to begin accurate next-item prediction.

@nebkor your idea is implemented in Temporal Memory, what failed to predict in my experiment s

It would be more useful to explain in more detail what code are you using and how. There are a lot of incarnations of these ideas all at a very young, infancy level . Mathematica, C, Python, Rust, Julia, and… Scheme? I might have missed some.

We don’t know which one you test with, how do you expect it to work and how do you actually use the unspecified code to achieve those expectations. We don’t even know whether you try your own implementation of a new implementation or… who knows?

If you expect useful feedback then bring forward more relevant information.

@cezar_t I used the c version from the github. and the idea comes from @POv
In my first test I use as many separate TemporalMemory modules as number of prediction steps.
The new idea is to use only one TM and you fed the output back to the input many time.