Triadic Memory — A Fundamental Algorithm for Cognitive Computing

If you are familiar with python here is a simplified version of address computing.
the sdr is basically an ordered list of 1 bits.
See below if all bits from 0 to 6 are 1, then the resulting addresses fill the space from 0 to 20 (21 address pairs)

def address(sdr):
    for y in range(1,len(sdr)):
        for x in range(y):
            yield (sdr[y]*(sdr[y]-1)//2+sdr[x])

[a for a in address([0,1,2,3,4,5,6])]
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]

To be more clear how bits are paired:

def bit_pairs(sdr):
    for y in range(1,len(sdr)):
        for x in range(y):
            print(sdr[x],sdr[y])

bit_pairs([4,5,6])

In my implementations I need to make sure the SDRs are ordered.

1 Like

@POv Yes. I have seen the current C-version in github. It looks now correct and has a better performance than my version. Thanks for your update.

@POv Could you please share C-version of SDRnoise so that I can test and compare some results with your Mathematica Dyadic or MonadicMemory? Thanks

I’ve just finished my first pass at an implementation of triadic memory in Rust:

My very limited testing shows that it works, though I’ve not yet done benchmarks (it should be at least as fast as the C version, though).

I really struggled to understand the binarize function in the C code, but eventually it clicked. One reason for my confusion was the use of an SDR as the input, since the values in the SDR’s array were not interpreted as indices where there’s a connection, but rather connection weights inside the memory itself. So, I moved that functionality into a constructor for SDRs that takes an array of connection weights as input:

    pub fn from_weights(weights: &[u8], pop: usize) -> Self {
        // first, find the "pop"-largest connection weight to use as a threshold value
        let uniq_weights: HashSet<_> = weights.iter().copied().collect();

        // using a heap means we only have to sort `pop` elements instead of the whole thing
        let mut sorted_weights: BinaryHeap<_> = uniq_weights.into_iter().collect();
        let mut ranked_max = 1;
        for _ in 0..pop {
            if let Some(w) = sorted_weights.pop() {
                ranked_max = w;
            } else {
                break;
            }
        }
        ranked_max = ranked_max.max(1);

        // now, walk through the weights and if the value is higher than `max`, add the index of it to the `ones`.
        let mut ones = Vec::with_capacity(pop);
        for (i, &v) in weights.iter().enumerate() {
            if v >= ranked_max {
                ones.push(i as u16);
            }
        }

        Self {
            ones,
            size: weights.len(),
        }
    }
2 Likes

@nebkor Awesome. I modified the binarize function, removing that confusing hack. Thanks for your feedback!

1 Like

Welcome.

I’m not familiar with rust but what I can tell about binarize() is it:

  1. takes in a vector of size N, the actual N and P = desired output size
  2. by sorting finds out the P highest values in the vector and uses the lowest of these as a threshold
  3. outputs the positions of all values greater or equal than the threshold. That means if there are more positions matching the threshold, then the output is longer than P
  4. except when P-th highest value is 0 in which case (I believe) it picks some arbitrary zero-s to fill up the result to P positions
2 Likes

Yep :slight_smile: The code I pasted does steps 1-3 as you describe (though I pass the weights through a set to unique-ify them, and then only sort the top P elements by pulling them from a maxheap). For step 4, all I do, and all the C code does, is ensure that the threshold value was at least 1.

My SDR implementation doesn’t store a P value, it just returns the length of the array that contains the sparse bit locations, which is usually P, but could be more or less, depending on how things went with the fetch method that called the “binaryize” (from_weights) constructor.

1 Like

Here are the numbers I just got, with N = 1000 and P = 11:

Inserted 10000 triples in 267ms, 37453 insertions per second
Queried 10000 triples for X in 7481ms, 1336 queries per second, total err: 0
Queried 10000 triples for Y in 4511ms, 2216 queries per second, total err: 0
Queried 10000 triples for Z in 345ms, 28985 queries per second, total err: 0

I think I need to better understand @rogert 's nybble trick to fix up those query times.

2 Likes

The trick is every triplet writing is done twice:

  1. Store x,y > z relation in first 4 bits of each byte - this works as usual by adding 1 to every byte that needs to be incremented.
  2. Store x, z > y relation in last 4 bits of each byte - this works by adding 16 to every byte. If you want to have fast queries for X then store y,z > x relation instead.

It is equivalent to having two separate memories using 4 bit bytes instead of 8 bit bytes, hence using half the space.

The drawback is half-a-byte can only accept 15 increments before messing the other 4 bits which could be quite restrictive.

In certain usage cases - random (aka orthogonal, low overlap) SDRs written only once - when memory saturates (gets ugly with more writings) the highest value are ~12 (12 increments on a single byte) which is close but within the limit 15.

But that assumption has important limits:

  1. forces the user to somehow avoid duplicate writings and that is limiting and tricky. Like performing a query first and do the write only if the data to be written isn’t already there. But that is both expensive and (again) tricky since considering the imperfection nature of SDRs you need to define a measure of “already there” .
  2. or somehow check for overflow of individual (half)bytes and… do something about that.
  3. and most importantly it deters direct using these associative memories for “real” stuff instead of randomly generated sdrs. Like sensor repeating the same or close values for lots of times.

The same limitations apply for full byte encoding too, but that has an order of magnitude extra… cushioning.

The associative memories can be powerful tools yet whoever uses them has to understand their limitations.
E.G. that’s why the “Temporal Memory” above (not to be confused with HTM’s TM) uses extensively random generated seeds tricks - to avoid writing several Z-s at the same (X,Y) address.

1 Like

@cezar_t

Triadic Memory and the algorithms derived from it learn in one shot, so one shouldn’t store similar associations on top of each other, like it is always done in slow-learning systems. Before storing any triple, one should do a query to test if the information has been seen before. The right measure is an overlap metric with threshold p (so there’s no additional parameter to be tuned). See the Deep Temporal Memory implementation for example.

Doing this extra query before storing a new triple certainly has a cost, but overall this ought to be less expensive than any kind of slow learning…

Awesome work Cesar.

I’d be interested if variations of the algo could be applied to simulation learnings. Why if we could instantly save the delta versus SIM, to apply and capture training data for AI. This would also force the deep dive to ask “why”, accelerating learning, and remodeling of the data to decide if we retrain AI/ML models.

Alex Seguin

@POv sure it learns in one shot, that’s great but repetition is much more than just an embarrassing drawback of “slow learning”.

There is a difference between requiring thousands of repetitions because… “backpropagation” and using a reasonable amount of few repetitions to filter out 0.1% of associations that are rather rules from the remaining 99.9% of irrelevant coincidences.


@alexseguin - this is @POv work, I just mentioned it on this forum. His paper clicked because I used similar addressing for different content structures.

I don’t think I understand your question. I don’t know what simulation learning is nor delta versus SIM. Can you be more specific?

Ah, ha! This system is really interesting, in that there are a lot of knobs to tweak on the implementation to tune performance. Dropping N from 1,000 to 700 gets twice the speed across the board, for example, and P also has some sharp performance/capacity curve, and of course this is all before considering parallelization.

I’ve been doing a lot of benchmarking of different parameters, and the sparse population turns out to be really sensitive in terms of performance (capacity, query rate). But the short version is that performance increases in terms of capacity and speed of writes and reads with P between 3 and 8 for N between 500 and 1000, vs. the (10, 1000) for (P, N) discussed previously.

I’ve also tested an elementary temporal memory using two triadic memories, with (P, N) of (4, 750), and it does one-shot learning of a sequence 100,000 elements long, with almost perfect recall:

$ ./temporal -s 1 -l 100000 |grep -v 'distance: 0'       
Sequence 0, SDR 6250: distance: 2, overlap: 3
Sequence 0, SDR 28044: distance: 1, overlap: 4

(using grep to filter out perfect recalls, only two out of 100k were not perfect)

If you have the Rust toolchain installed (https://rustup.rs), you can clone the repo and then run:

cargo run --release --example=temporal -- -h

(note the -- -h; the double-dash means "pass the rest of the args to the example program), which will print out:

triadic-memory 0.1.0

USAGE:
    temporal [OPTIONS]

OPTIONS:
    -h, --help                     Print help information
    -l, --length <LENGTH>          Length of each stored sequence [default: 100]
    -N, --size <SIZE>              Size of the SDRs [default: 750]
        --nostalgia                Check recall on earliest sequence at the end of the test
    -P, --pop <POP>                Number of active connections in the SDRs [default: 4]
    -r, --pre-roll <PRE_ROLL>      How many items to feed each sequence in before trying to recall
                                   [default: 2]
    -s, --sequences <SEQUENCES>    Number of sequences to store [default: 10]
    -V, --version                  Print version information

cargo run --release --example=temporal -- -s 1 -l 1000 will store and recall a 1,000 item sequence, for example.

2 Likes

I’m not sure whether such low P is really useful. It could be in certain cases.
SDR encoding usually reflects some real life data and for redundancy and gap overlap reasons, encoders represent a single value by multiple ON bits.

e.g redundancy:

  • a 20 bit P expands in 190 (=20*19/2) bit pair addresses.
  • a 4 bit P decodes in only 6 (=4*2/1) addresses. Which makes it 30 times more vulnerable to random overlaps (collisions) and other types of noise

While lower P makes collisions less likely, the alterations they produce are much more significant.

Sure for symbolic data like disjoint categories very low P might be fine, but one would also wonder what are the practical advantages of using very sparse SDRs instead of normal integer IDs and hash tables?

1 Like

Yeah, whether or not to use a more traditional structure is always a question, but it seems to me that you should try to be as sparse as your noise and error tolerance allows for associative recall. A P=20 SDR with N=1000 starts to fall apart almost immediately:

$ time cargo run --release --example=sequence -- -s 1 -l 10000 -N 1000 -P 20 |rg -v 'distance: 0'
    Finished release [optimized] target(s) in 0.01s
     Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -s 1 -l 10000 -N 1000 -P 20`
Sequence 0, SDR 3: distance: 3, overlap: 19
Sequence 0, SDR 6: distance: 2, overlap: 19
Sequence 0, SDR 101: distance: 2, overlap: 19
Sequence 0, SDR 156: distance: 2, overlap: 19
Sequence 0, SDR 207: distance: 3, overlap: 19
Sequence 0, SDR 552: distance: 2, overlap: 19
Sequence 0, SDR 656: distance: 3, overlap: 19
Sequence 0, SDR 887: distance: 3, overlap: 19
Sequence 0, SDR 957: distance: 2, overlap: 19
Sequence 0, SDR 1017: distance: 2, overlap: 19
Sequence 0, SDR 1425: distance: 2, overlap: 19
Sequence 0, SDR 2056: distance: 2, overlap: 19
Sequence 0, SDR 2155: distance: 2, overlap: 19
Sequence 0, SDR 2379: distance: 2, overlap: 19
Sequence 0, SDR 3000: distance: 2, overlap: 19
Sequence 0, SDR 3399: distance: 3, overlap: 19
Sequence 0, SDR 3414: distance: 2, overlap: 19
Sequence 0, SDR 3627: distance: 2, overlap: 19
Sequence 0, SDR 3673: distance: 2, overlap: 19
Sequence 0, SDR 4034: distance: 2, overlap: 19
Sequence 0, SDR 4429: distance: 2, overlap: 19
Sequence 0, SDR 4556: distance: 2, overlap: 19
Sequence 0, SDR 4627: distance: 2, overlap: 19
Sequence 0, SDR 4834: distance: 2, overlap: 19
Sequence 0, SDR 4847: distance: 2, overlap: 19
Sequence 0, SDR 4885: distance: 2, overlap: 19
Sequence 0, SDR 5161: distance: 2, overlap: 19
Sequence 0, SDR 5541: distance: 2, overlap: 19
Sequence 0, SDR 5718: distance: 2, overlap: 19
Sequence 0, SDR 6015: distance: 2, overlap: 19
Sequence 0, SDR 6543: distance: 2, overlap: 19
Sequence 0, SDR 6578: distance: 2, overlap: 19
Sequence 0, SDR 6679: distance: 2, overlap: 19
Sequence 0, SDR 6882: distance: 4, overlap: 18
Sequence 0, SDR 7221: distance: 2, overlap: 19
Sequence 0, SDR 7415: distance: 2, overlap: 19
Sequence 0, SDR 7781: distance: 3, overlap: 19
Sequence 0, SDR 7866: distance: 2, overlap: 19
Sequence 0, SDR 7931: distance: 2, overlap: 19
Sequence 0, SDR 7943: distance: 2, overlap: 19
Sequence 0, SDR 8091: distance: 2, overlap: 19
Sequence 0, SDR 9575: distance: 3, overlap: 19
Sequence 0, SDR 9686: distance: 2, overlap: 19
cargo run --release --example=sequence -- -s 1 -l 10000 -N 1000 -P 20  40.22s user 0.65s system 100% cpu 40.726 total

I’m imagining that a triadic-based system would have hierarchical layers, with sparsity tending to be greater as you go deeper, because by the time you get there, your inputs should be cleaned up/autoencoded/etc.

I also slightly changed the structure of the sequence memory compared to the reference C version; my struct declaration is roughly:

pub struct SequenceMemory {
    size: usize,
    pop: usize,
    contexts: TriadicMemory,
    predictions: TriadicMemory,
    // persistent state variables
    previous_input: Sdr,
    previous_context: Sdr,
    backstory: Sdr, // union of previous_input and previous_context
    predicted_input: Sdr,
}

(in Rust, the type is specified to the right of the :.)

Compared to the C code,

y and v => combined into previous_input
c => previous_context
u => backstory
x => eliminated

This is the same algorithm, just streamlined and annotated.

Wait, that’s… 100-ish “failures” in a 10000 long sequence memorized in a single run? If you were an optimist how would you call that … 99% success rate?

Does the C version yields the same results?

Actually I don’t understand what your test does, like

  • how many sequences are there - many short ones or a single 10000 sdrs long one
  • are the SDRs random or they encode actual data which naturally has lots and lots of both repeating and overlapping (close) values.
  • what distance and overlap mean in your output? You are aware that in HTM terms a 6 bit overlap is so statistically significant that even if it isn’t taken as perfect match, it is statistically “very close” as in “very unlikely to happen by coincidence”?
  • 19 bit overlap is virtually identical, if you generate random pairs of 20/1000 SDRs, you need quite a bit of patience, (dozens millions of trials) to get a meager 7 bit overlap. And the +1 (8th) bit of overlap is almost 1000 times less likely to happen by chance.

PS or 100 times… whatever 19 bits of overlap is very significant match even for 100 bits of P

Those are all very fair points :slight_smile:

The test I had there (-s 1 -l 10000 -N 1000 -P 20) is one sequence, 10k elements long, working with 1,000-bit SDRs with a target sparse population of 20 bits. distance is hamming distance between stored SDR and recalled SDR, and overlap is what you think with the same SDRs. -s 5 -l 1000 would do five sequences of 1,000 elements each. When using N=1000 and P=20, it completely fails to learn more than a couple sequences of 1,000 tokens each, but reducing P to 10 has no problem with fifty 1,000-token sequences.

The SDRs are completely random; the test generates them and stores them in an array, then feeds them to the triadic sequence memory once, then tries to recall the sequence, comparing recalled SDRs with the originals.

I’ve not tested the C version the same way, but it’s the same algorithm, so I expect the same behavior.

Ok, hamming distance that got me confused. It isn’t used as metric in HTM since overlap is considered the actual relevant match.
Take for example noise. Remove 10 random bits, add another 10 random bits, hamming distance between original and heavily changed SDR is 20 overlap “only” 10 but chances of being an actual mismatch (like another random SDR having the same 10 bit overlap) are very low.

I don’t know exact collision math but in 10 billion trials I could not generate a 9 bit overlap on random pairs of 20 bit SDRs. Beware that at 20/1000 triadic/diadic memory capacity is in the order of ~100k,
how that translates for sequential memory on top of it I can’t tell

So anyway, it’s your right to say it “completely fails” but I don’t think that’s right in ML terms. Using that “perfect match” criteria, Then a digit classifier layer stating an image of digit 5 has 20% chances of having label 5 is sooo wrong (by 80%), even when probabilities for other labels are lower than 20% - and based on that, the classifier makes the correct prediction, with only 20% “mathematical match”.

Another argument it didn’t fail is the fact the imperfect SDRs were very good for predicting correctly the following SDR in the sequence.
Humans/animals work the same way we don’t remember every detail of a path in the woods, yet sufficient ones to not lose the trail.