Triadic Memory — A Fundamental Algorithm for Cognitive Computing

I already posted in another thread but here is probably the correct place.

I implemented a binary version of the triadic memory that compresses the synapses down to single bits in a dense array. so it takes up 8x less space in ram at the cost of precision in the weights…

I made a plot for the retrieval error rate of random triplets as a function of the number of stored triplets on a N=1000, P=10 memory.
it starts degrading noticeably at around 2.5M stored random triplets.

this implementation takes up 125MB in RAM for N=1000.

assuming you need at least 300 bits to store a triplet in raw address form, 125MB is enough for 3.3M triplets so a binary version of the triadic memory has about 75% theoretical storage efficiency.

2 Likes

congratulation. I find it very interesting and very helpful, especially if you want to use it for multiple prediction.
Could you pls share your (pseudo-) codes? Thanks

This is an exciting discovery – reducing the counter size to from 8 bits to 1 bit even increases the capacity per storage location!

You can easily verify this with the C implementation by changing the maximum counter value to from 255 to 1. Also the “random forgetting” mode seems to work quite well with 1 bit counters.

Here’s a stripped down implementation with only the nescessary components for it to work.
as you can see, it as the limitation of only being able to go in the X ⊗ Y => Z direction, thats a side effect of how the data is stored in an packed binary vector.

import numpy as np
from math import ceil


def addresses(sdra, sdrb):
    addx = []
    addy = []
    for x in sdra:
        for y in sdrb:
            addx.append(x)
            addy.append(y)
    return addx, addy


def sdr_to_bin(sdr, arr):
    for x in sdr:
        i = x // 8
        arr[i] |= (128 >> (x % 8))


class TriadicMemoryBinary:
    def __init__(self, n=1000, p=10):
        self.data = np.zeros((n, n, ceil(n / 8)), dtype=np.uint8)
        self._zsum = np.zeros(self.data.shape[2] * 8, dtype=np.int_)
        self._zbin = np.zeros(self.data.shape[2], dtype=np.uint8)
        self.retrieved_z = []
        self.p = p

    def store(self, x, y, z):
        self._zbin[:] = 0
        sdr_to_bin(z, self._zbin)
        self.data[addresses(x, y)] |= self._zbin

    def retrieve(self, x, y):
        self._zsum[:] = 0
        for zcol in self.data[addresses(x, y)]:
            self._zsum += np.unpackbits(zcol, bitorder='big')
        self.retrieved_z = np.argsort(self._zsum)[-self.p:]
        return self.retrieved_z


mem = TriadicMemoryBinary()

mem.store([1, 2, 3, 4, 9, 50, 80, 60, 81, 150],
          [5, 6, 7, 8, 151, 10, 61, 84, 92],
          (0, 1, 2, 3, 4, 5, 6, 7, 8, 9))

mem.retrieve([1, 2, 3, 4], [5, 6, 7, 8])
print(sorted(mem.retrieved_z))


2 Likes

thanks

Quite cool, thanks!

I wonder whether using np.packbits() at store instead of sdr_to_bin(), would make it faster.

Well. I guess it can be numbified which would take advantage of sparsity.

Someone sometimes will hopefully take time to compare the two versions in terms of performance and… quality? (instead of accuracy).

Stuff like conditions in which they fail and how they fail.

The idea of trying it with bits alone is great anyway

PS I’m not really disappointed by not having possibility to query for X or Y

  • that was a PITA in terms of complexity and performance anyway
  • whoever feels like they need capability to retrieve e,g, Y from X and Z can simply add a second memory. It will still be 1/4 the size of the 8bit version. Some applications might even be happy writing twice in the same memory, just rotate the x,y,z order, thus avoiding overriding between the two writes. Or use permuted keys if that’s not sufficient
1 Like

yeah, it can definitely be made faster, I have another implementation in C++ where I had reused some code I made for processing sparse binary matrices using some bitshifting hacks for optimization and its slightly faster.
I just didn’t share it because the code is a big pile of spaghetti I am ashamed of and I find C++ ugly.

I agree that we need better tests for this kind of stuff, just storing random patterns is sort of the best case scenario, I wonder how it deals with real-world data where correlations can cause memory interference.

The potential is promising anyway. One can even consider increasing it 8 times (back to “cube”) yet using bits for storage. Like 8 stacked binary memories on top of each other aka an 8 shelves stack
It would work if there-s a reliable means to compute from keys sdrs a “storage shelve” between 0 and 7. Or 31, 63 for really massive memories.

1 Like

If you make N = 2000, it will take up about the same amount of space again, maybe thats enough to make a larger memory.

It is larger but needs to remain at 10 bits P in order to take advantage of the extra real estate.

Sounds reasonable, but I’m not sure how well that works with “real” data.
e.g. numerical encodings depend on overlap in order to “sense” two values are “close” to each other. Sparsifying too much would be counter productive.

yea, unless we find a way to make reliable encoders that take larger SDRs and output a sparser but reliable one, sorta like how CA3 does it maybe.

but ok, maybe its time to look for a good realworld test dataset for playing around with this concept.

I recently looked at examples of reservoir computing (which is quite cool btw) they use different chaotic systems for benchmarking.

That’s on the “physics” end of the spectrum, the opposite one is language. Convert words to SDRs see what it can remember/predict.

PS I think some kind of reservoir for SDRs (or liquid state machine) would be really useful for time series recalling and prediction.

well, I remember seeing somewhere a reservoir network that was able to learn the lorenz attractor with just a snippet of the a generated sequence.

maybe the output of some chaotic systems could be a good dataset for testing this.

as for SDR reservoirs, I have a few ideas, I’m thinking somehing like a stack of sequence memories feeding SDR’s up the stack but running at different timescales.
or maybe a large shifting stack of arrays again running at different timescales.

The “deep temporal memory” algorithm is based on a reservoir computing architecture:

A temporal signal propagates through a cascade of self-supervising memory instances.
The prediction of the next step in time is attached to a readout of the reservoir’s state.

I was thinking on building something more like this:

The blue stacked layers with circle arrows are “Slow” feedforward layers where their output is not simply a feedforward projection but an exponential average of the projection.

those linear layers output get converted to SDRs via WTA or whatever other method and sent up the stack for more “Slowification”

this way, as you go up the stack, the layers output change at a increasingly slower pace.

What do you think about HTM encoder, e.g. RDSE for this case?

That is quite tricky. Bit-pair associative memories need high overlap in order to recognize similarity but also need dispersion (aka low overlap in general) in order to ensure a high capacity.

The two are fighting against each other, simple projecting in a sparser P/N is unlikely to be sufficient.

Whats RDSE? sorry, I’m not familiar with this term.

I think I found a way to make P=100 without affecting storage capacity, maybe that can mitigate the robustness problem, I’ll try implementing it and will post the results.

1 Like

Randomly Distributed Scalar Encoder - It’s a scalar encoding algorithm. I think this is a good explanation

oh, yea, that would work really well for scalars but we need a more general way to encode arbitrary data like images, sound, text.