Triadic Memory — A Fundamental Algorithm for Cognitive Computing

They seem to be performing the same task but they are structured differently. but thats just a part of the task, theres probably much more stuff going on. I’d bet on ring attractors.

1 Like

@JarvisGoBrr you told many times about your test with N=2000.
In my case I have N=1920, P=60, but I can not allocate memory even with the current TriadicMemory 1bit C-version.
The memsize: (1920 * 1920 *1920 +7) /8
Does anybody have this problem?

1 Like

I found why it failed? thanks

1 Like

By testing with N=1920, P=60, current TemporalMemory 1bit C-version I found that the time consumption for prediction increases 25 times in comparison to N=768, P=24.
Exactly, it took 100ms for one prediction.
Maybe we need somehows to optimize/accelerate the query process.

1 Like

what language are you using, if its just python makes total sense for it to be slow.

1 Like

I use C or C++ with GCC in Ubuntu.

1 Like

a larger memory will benefit less from cache locality and 60^3 is 15x more addresses to look up than 24^3, maybe thats the reason then.

but what exactly are you predicting?

1 Like

@JarvisGoBrr I am testing it with SDR of 2D-data, so that we need a longer SDR.
I believe that is more problem with bigger N than bigger P, so that we have a big array (N³) by 8bit version and (N³ +7)/8 by 1bit version and memory access time will be increased dramatically!

1 Like

well, something is not right, a triadic memory usually get thousands of reads per second.
there must be something funky going on on your code, how are you iterating over the addresses?

1 Like

@JarvisGoBrr I tested the TemporalMemory1Bit (original C-version in Github) here in a very common way:

  1. I have 7 SDR, each of N=1920, P=60;
  2. I put them in a sequence into TM like adr1, adr2,…, sdr7, sdr1, sdr2, sdr3,…
  3. and see if it can predict sdr(j+1) by inputing sdr(j) …
1 Like

They’re getting about 1k/sec; the limitation is RAM bandwidth, and with N that large, it requires a ton of memory IO. When I run the same size SDRs through my benchmark, I only get 290 z-reads/sec.

1 Like

@nebkor your benchmark results look like the same as my ones, because the prediction has 3 read out and 2 writes

1 Like

No, it’s slower than that, because my benchmark is not doing any prediction, just storing and reading SDRs in a single triadic memory.

1 Like

@nebkor if you use C-version, you can speed-up it using openmp. By me: 2-2.5x better

2 Likes

Hey, I don’t know if any had the idea yet, but I tried to leverage the fact that in many cases the weights might be sparse too in order to trade some processing performance for saving a lot of memory. I figured that by using a hash map instead of a matrix fully initialized with 0s, we can potentially save a lof of memory by simply deducing the 0s when needed. The implementation I have produced is definitely slower (first because I had to avoid using numba but I have not even tried to workaround it really, I just wanted a quick POC) but it works and the memory savings are most likely substantial in most cases, i.e. as long as the associative memory is not too loaded.

For some use cases, trading computation for memory might be advantageous so I thought it would be a good thing to share this idea and implementation. I will definitely use it for myself at least.

Here is the PR: add an implementation trading computation for memory by clems4ever · Pull Request #20 · PeterOvermann/TriadicMemory · GitHub.

Let me know what you think.

1 Like

@clems4ever interesting!
Do you have any comparison results about Performance?

1 Like

I’ve done a bit of very un-rigorous math.

a list of a int and a float (a synapse), takes up 72 bytes of memory.
a dictionary takes on average 96 bytes per stored item, if we assume 2/3 of the hash table is full.
so a implementation using it must be 8 / (96 + 72) = 0.023% empty in order to benefit from extra space over a regular dense byte array.

I guess its good to remember that python dictionaries already rely on large arrays with sparely spread pointers under the hood,

I’ve tried making a custom hash-table to get more efficiency but I couldn’t manage to make anything remotely better than the python implementation.

in practice, I found it very hard to beat the good ol’ array of bytes in anything honestly, CPUs are optimized to the bone for these kinds of arrays.

1 Like

@thanh-binh.to , I have nothing rigorous and I think it’s not a good idea to have it with this implementation anyway because as I said it is far from being optimized, it was just a POC. With the code in the PR, the Dyadic Memory is around 10x slower on 10k queries but it reduced the memory consumption for my specific use case of ~5800x.

@JarvisGoBrr , I don’t think Python is ideal for this kind of computation anyway. Also if SDR size is big, I would not assume I can reach 2/3 of load in the hash table, that’s the point. Besides, it was just prototyping to make me confident that it could work and so that I can raise the the SDR size without worrying too much about the memory taken on my machine during experimentations.

1 Like

I think numba uses dictionaries with fixed type keys and values, which should allow for even more memory savings. Speed improvements might be less impressive on the dictionary/hash table lookup itself, but any other operations would be.

2 Likes

I eventually wrote an implementation with hashmaps in Rust and bindings in Python. After few trials to use numba I just gave up lol…

2 Likes