Triadic Memory — A Fundamental Algorithm for Cognitive Computing

JarvisGoBrr · September 13, 2022, 7:13pm

I already posted in another thread but here is probably the correct place.

I implemented a binary version of the triadic memory that compresses the synapses down to single bits in a dense array. so it takes up 8x less space in ram at the cost of precision in the weights…

I made a plot for the retrieval error rate of random triplets as a function of the number of stored triplets on a N=1000, P=10 memory.
it starts degrading noticeably at around 2.5M stored random triplets.

this implementation takes up 125MB in RAM for N=1000.

assuming you need at least 300 bits to store a triplet in raw address form, 125MB is enough for 3.3M triplets so a binary version of the triadic memory has about 75% theoretical storage efficiency.

thanh-binh.to · September 14, 2022, 6:58am

congratulation. I find it very interesting and very helpful, especially if you want to use it for multiple prediction.
Could you pls share your (pseudo-) codes? Thanks

POv · September 14, 2022, 10:31am

This is an exciting discovery – reducing the counter size to from 8 bits to 1 bit even increases the capacity per storage location!

You can easily verify this with the C implementation by changing the maximum counter value to from 255 to 1. Also the “random forgetting” mode seems to work quite well with 1 bit counters.

JarvisGoBrr · September 14, 2022, 1:35pm

Here’s a stripped down implementation with only the nescessary components for it to work.
as you can see, it as the limitation of only being able to go in the X ⊗ Y => Z direction, thats a side effect of how the data is stored in an packed binary vector.

import numpy as np
from math import ceil


def addresses(sdra, sdrb):
    addx = []
    addy = []
    for x in sdra:
        for y in sdrb:
            addx.append(x)
            addy.append(y)
    return addx, addy


def sdr_to_bin(sdr, arr):
    for x in sdr:
        i = x // 8
        arr[i] |= (128 >> (x % 8))


class TriadicMemoryBinary:
    def __init__(self, n=1000, p=10):
        self.data = np.zeros((n, n, ceil(n / 8)), dtype=np.uint8)
        self._zsum = np.zeros(self.data.shape[2] * 8, dtype=np.int_)
        self._zbin = np.zeros(self.data.shape[2], dtype=np.uint8)
        self.retrieved_z = []
        self.p = p

    def store(self, x, y, z):
        self._zbin[:] = 0
        sdr_to_bin(z, self._zbin)
        self.data[addresses(x, y)] |= self._zbin

    def retrieve(self, x, y):
        self._zsum[:] = 0
        for zcol in self.data[addresses(x, y)]:
            self._zsum += np.unpackbits(zcol, bitorder='big')
        self.retrieved_z = np.argsort(self._zsum)[-self.p:]
        return self.retrieved_z


mem = TriadicMemoryBinary()

mem.store([1, 2, 3, 4, 9, 50, 80, 60, 81, 150],
          [5, 6, 7, 8, 151, 10, 61, 84, 92],
          (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

mem.retrieve([1, 2, 3, 4], [5, 6, 7, 8])
print(sorted(mem.retrieved_z))

thanh-binh.to · September 14, 2022, 2:22pm

thanks

cezar_t · September 14, 2022, 3:27pm

Quite cool, thanks!

I wonder whether using np.packbits() at store instead of sdr_to_bin(), would make it faster.

Well. I guess it can be numbified which would take advantage of sparsity.

Someone sometimes will hopefully take time to compare the two versions in terms of performance and… quality? (instead of accuracy).

Stuff like conditions in which they fail and how they fail.

The idea of trying it with bits alone is great anyway

PS I’m not really disappointed by not having possibility to query for X or Y

that was a PITA in terms of complexity and performance anyway
whoever feels like they need capability to retrieve e,g, Y from X and Z can simply add a second memory. It will still be 1/4 the size of the 8bit version. Some applications might even be happy writing twice in the same memory, just rotate the x,y,z order, thus avoiding overriding between the two writes. Or use permuted keys if that’s not sufficient

JarvisGoBrr · September 14, 2022, 3:44pm

yeah, it can definitely be made faster, I have another implementation in C++ where I had reused some code I made for processing sparse binary matrices using some bitshifting hacks for optimization and its slightly faster.
I just didn’t share it because the code is a big pile of spaghetti I am ashamed of and I find C++ ugly.

I agree that we need better tests for this kind of stuff, just storing random patterns is sort of the best case scenario, I wonder how it deals with real-world data where correlations can cause memory interference.

cezar_t · September 14, 2022, 3:51pm

The potential is promising anyway. One can even consider increasing it 8 times (back to “cube”) yet using bits for storage. Like 8 stacked binary memories on top of each other aka an 8 shelves stack
It would work if there-s a reliable means to compute from keys sdrs a “storage shelve” between 0 and 7. Or 31, 63 for really massive memories.

JarvisGoBrr · September 14, 2022, 3:55pm

If you make N = 2000, it will take up about the same amount of space again, maybe thats enough to make a larger memory.

cezar_t · September 14, 2022, 4:03pm

It is larger but needs to remain at 10 bits P in order to take advantage of the extra real estate.

Sounds reasonable, but I’m not sure how well that works with “real” data.
e.g. numerical encodings depend on overlap in order to “sense” two values are “close” to each other. Sparsifying too much would be counter productive.

JarvisGoBrr · September 14, 2022, 4:09pm

yea, unless we find a way to make reliable encoders that take larger SDRs and output a sparser but reliable one, sorta like how CA3 does it maybe.

but ok, maybe its time to look for a good realworld test dataset for playing around with this concept.

cezar_t · September 14, 2022, 4:16pm

I recently looked at examples of reservoir computing (which is quite cool btw) they use different chaotic systems for benchmarking.

That’s on the “physics” end of the spectrum, the opposite one is language. Convert words to SDRs see what it can remember/predict.

PS I think some kind of reservoir for SDRs (or liquid state machine) would be really useful for time series recalling and prediction.

JarvisGoBrr · September 14, 2022, 4:32pm

well, I remember seeing somewhere a reservoir network that was able to learn the lorenz attractor with just a snippet of the a generated sequence.

maybe the output of some chaotic systems could be a good dataset for testing this.

as for SDR reservoirs, I have a few ideas, I’m thinking somehing like a stack of sequence memories feeding SDR’s up the stack but running at different timescales.
or maybe a large shifting stack of arrays again running at different timescales.

POv · September 14, 2022, 5:33pm

The “deep temporal memory” algorithm is based on a reservoir computing architecture:

A temporal signal propagates through a cascade of self-supervising memory instances.
The prediction of the next step in time is attached to a readout of the reservoir’s state.

JarvisGoBrr · September 14, 2022, 8:24pm

I was thinking on building something more like this:

The blue stacked layers with circle arrows are “Slow” feedforward layers where their output is not simply a feedforward projection but an exponential average of the projection.

those linear layers output get converted to SDRs via WTA or whatever other method and sent up the stack for more “Slowification”

this way, as you go up the stack, the layers output change at a increasingly slower pace.

thanh-binh.to · September 15, 2022, 7:53am

What do you think about HTM encoder, e.g. RDSE for this case?

cezar_t · September 15, 2022, 8:50am

That is quite tricky. Bit-pair associative memories need high overlap in order to recognize similarity but also need dispersion (aka low overlap in general) in order to ensure a high capacity.

The two are fighting against each other, simple projecting in a sparser P/N is unlikely to be sufficient.

JarvisGoBrr · September 15, 2022, 1:36pm

Whats RDSE? sorry, I’m not familiar with this term.

I think I found a way to make P=100 without affecting storage capacity, maybe that can mitigate the robustness problem, I’ll try implementing it and will post the results.

cezar_t · September 15, 2022, 2:32pm

Randomly Distributed Scalar Encoder - It’s a scalar encoding algorithm. I think this is a good explanation

JarvisGoBrr · September 15, 2022, 6:41pm

oh, yea, that would work really well for scalars but we need a more general way to encode arbitrary data like images, sound, text.

Topic		Replies	Views
Book Review: Sparse Distributed Memory by Pentti Kanerva - April 7, 2021 Current Research	2	502	April 9, 2021
Uninformative memories will prevail: the storage of correlated representations and its consequences General Neuroscience	0	350	February 10, 2020
Sparse Associative Memory Java code Lounge	3	254	May 30, 2023
HTM for ML AI Researchers Machine Learning reference , machine-learning , htm , nupic-wiki	0	1659	April 10, 2017
How Do Neurons Operate On Sparse Distributed Representations? A Mathematical Theory Of Sparsity, Neurons And Active Dendrites Related Papers	7	1752	January 21, 2020

Triadic Memory — A Fundamental Algorithm for Cognitive Computing

Related Topics