Triadic Memory — A Fundamental Algorithm for Cognitive Computing

just so we can compare.

My implementation takes 60s to store a million random patterns and 250s to Z-read all of them witn N=1000 on a Ryzen3 3.7Ghz cpu and 2666Mhz RAM.

thats around 16000 writes/s and 4000 z-reads/s

I’m sure it can go faster though since the code is partially implemented in python.

what speeds are you getting?

edit: corrected values.

If I just time the writes, to write one million triples (so three million SDRs) takes about 40 seconds, or, about 25k writes/sec. That’s using SDRs with N=1000 and P=20.

In terms of operations/second, I’m getting, with those SDRs:

writes/sec: 25k
x reads/sec: 1100
y reads/sec: 2800
z reads/sec: 4500

Each of those metrics roughly triples if I set P=10 (though writes go to 178k/sec), but I’ve found that with single bit connections in the memories, larger P values work better.

My hardware is AMD Ryzen 9 3900X 3.8 GHz. I can’t find the specs for the RAM, though. Reasonably fast DDR3 or 4 from a couple years ago?

1 Like

@JarvisGoBrr , if you’d like to replicate, do the following:

  1. install Rust via https://rustup.rs
  2. clone my repo from Joe Ardent / triadic rust · GitLab
  3. cd triadic-rust
  4. run cargo run --release --example=triadic -- -i 5000 -P 20 -N 1000 -t 20000 which will store 20k triples, reporting write and recall rates every 5k triples. When I run it, I get:
cargo run --release --example=triadic -- -i 5000 -P 20 -N 1000 -t 20000
warning: unused manifest key: package.maintenance
    Finished release [optimized] target(s) in 0.02s
     Running `target/x86_64-unknown-linux-gnu/release/examples/triadic -i 5000 -P 20 -N 1000 -t 20000`
5000,1000,20,25641,1091,2837,4500,20,20,20,20,20,20,
10000,1000,20,25380,1096,2836,4504,20,20,20,20,20,20,
15000,1000,20,25641,1101,2844,4504,20,20,20,20,20,20,
20000,1000,20,25510,1099,2832,4504,20,20,20,20,20,20,

The columns are

triples so far,N,P,writes/sec,x/sec,y/sec,z/sec,...

and the last six columns are maximum populations in recalled SDRs and minimum overlaps between recalled and stored.

I wanted to call attention to the numbers from this comment. The best case perfect recall for this memory system, using 8-bit connection weights, was roughly 300 elements in a sequence, out of a possible 2000 positions.

My current system, using 1-bit connection weights, and no random SDRs in context creation, gets the following numbers:

$ cargo run --release --example=sequence -- -N 1000 -P 20 -s 40 -l 50 |awk '{print $NF}' |sort |uniq -c |sort -k 2 -rn
   Compiling triadic-memory v0.1.0 (/home/ardent/git/triadic_rust)
    Finished release [optimized] target(s) in 0.76s
     Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -N 1000 -P 20 -s 40 -l 50`
   1432 20
      1 15
      1 14
      2 13
      2 12
      3 11
      3 10
      4 9
      1 7
      3 6
     10 5
     20 4
     37 3
     92 2
    119 1
    190 0

Note the nearly 5x improvement in perfect recall.

2 Likes

So how does the context in your version works?

You’re right, it makes no difference.

Your Rust benchmarks look impressive.

One caveat: At N=1000 and P=20, one million random triples exceeds the capacity. I’d stick to N=1000 and P=10 for performance comparisons.

On an Apple M1 system with the 1-bit C implementation:

writes/sec: 117k
x reads/sec: 4.3k
y reads/sec: 12k
z reads/sec: 14.5k

if I remember correctly, Ryzen 9 is almost the same clockspeed but has 16x more cache than a Ryzen 3, maybe thats where the difference comes from.

Using the previous input as the new context: Use previous input as default new context. (36549793) · Commits · Joe Ardent / triadic rust · GitLab

1 Like

On my system, with P=10 and N=1000, I get

writes: 178k
x reads: 3.1k
y reads: 8.3k
z reads: 17k

I wonder why our Y speeds are so different :slight_smile:

Hmm, not really sure but that simplification might fail distinguishing between similar, but not identical sequences.

1 Like

I’ll test that later today. My motivation was trying to come up with something that could be used in a recall method that didn’t modify the memory stores, and this bumped the recognition perf up by nearly 50%.

well, motivation is good :slight_smile:

1 Like

The performance of y reads will be closer to x or z reads depending on 1. the system’s cache, and 2. on the inner loop’s stride length n/8.

Due to the switch from 8-bit to 1-bit storage the stride length decreased 8 times, which on some systems will be small enough to benefit from better caching behavior.

1 Like

I tried installing rust but I’m having some issues, my distro is probably broken or maybe I’m just dumb.

but I managed to run @POv’s C implementation and got:

  • 77k writes/s
  • 8.1k z reads/s

after some optimization to my code I got:

  • 63k writes/s
  • 6.8k z reads/s when 1M patterns are stored
  • 11k z reads/s when 100k patterns are stored
  • 15k z reads/s when 10k patterns are stored

I wish I had a more powerful CPU to run those tests on.

1 Like

Read performance shouldn’t depend much on the number of stored patterns.

Depends on what optimizations it does, e.g. if a whole byte is 0 there-s no need to cut through it to see which bits are 1.

Then obviously at low population a lots of times it can skip bit finding.

1 Like

My implementation uses the bit-skipping trick to ignore zeros on 64bit words I mentioned earlier.

What I find a bit frustrating is how come the triadic memory handles 1M entries without error while the sequential memory on top of it has a significant failure rate at 2000

I don’t think simple increase of P from 10 to 20 should have such a high impact.

The capacity of a temporal/sequence memory mostly depends on the dictionary size and the depth of the feedback lines.

If you wanted to store a sequence of 1M all-different items, the basic temporal/sequence memory algorithm would handle that without problem.

1 Like