Triadic Memory — A Fundamental Algorithm for Cognitive Computing

whats the test setup for the temporal memory so I can reproduce?

Indeed, this performed very poorly with sequences made of SDRs that had strong overlap (my scalar_encoder_sequence example can construct SDRs that differ by one bit for each successive element, and using previous input as new context just went off the rails completely).

The new algorithm uses a deterministic PRNG seeded with the latest count of how many items have been added to the current sequence, and this seems to provide a best-of-all-worlds performance quality, having no problem with long sequences of many closely related SDRs, or many short sequences from a shared, limited alphabet of tokens.

Because of the way the random context SDRs are generated, the recall method (that doesn’t modify the triadic stores) is able to generate appropriate context SDRs that are the same ones that were stored in the learning step, which helps with recall performance.

1 Like

What’s interesting is that my implementation, with the one-bit weights, does better with P > 10. For example, for a single sequence of 10k random SDRs, N=1000, P=10, I get the following counts for bits of overlap, sorted by bits of overlap, descending:

$ cargo run --release --example=sequence -- -N 1000 -P 10 -s 1 -l 10000 |awk '{print $NF}' |sort |uniq -c |sort -k 2 -rn
    Finished release [optimized] target(s) in 0.01s
     Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -N 1000 -P 10 -s 1 -l 10000`
   6082 10
      1 9
      4 8
      5 7
      4 6
      4 5
      3 4
      5 3
     27 2
    354 1
   3509 0

With P=20, I get

$ cargo run --release --example=sequence -- -N 1000 -P 20 -s 1 -l 10000 |awk '{print $NF}' |sort |uniq -c |sort -k 2 -rn
    Finished release [optimized] target(s) in 0.01s
     Running `target/x86_64-unknown-linux-gnu/release/examples/sequence -N 1000 -P 20 -s 1 -l 10000`
   9994 20
      1 19
      1 18
      2 13

I doubt you’re dumb :slight_smile: I’m happy to help out if you’d like.

1 Like

but I think we are loosing track of the goal here.

we are building some kind of exoteric memory storage here, but dont know how to apply it exactly.

I dont expect the human brain to remember exactly ultra long sequences of random symbols. I cant even remember my phone number.

from introspection, I think we are built to extract meaning from short snippets of sequences and infer the rules behind them and not to store 2000 sequences of random characters.

I’d like to make a memory system that can learn generate novel sequences naturally, the same way we learn to speak as if it was nothing but find it hard to memorize one page of a book.

2 Likes

I’ll lookup some tutorials later, I’ll endup trying to learn this language sooner or later, so maybe now is the time.

I was playing around a little and ended up implementing a version of triadic memory that works directly on symbols without sdrs or neural grids, it uses python dictionaries instead and works with tuples, strings and integers as “symbols”. so it has no overlap between symbols or capacity limits.

I think its cool to play with this in an “ideal world” so I ended up implementing a sequence memory and it works quite well.

from collections import defaultdict
from random import randint, seed
seed(0)

class IdealTriadic:
    def __init__(self, p=1):
        self.x_data = defaultdict(set)
        self.y_data = defaultdict(set)
        self.z_data = defaultdict(set)
        self.p = p

    def addresses(self, a, b):
        for a_ in a:
            for b_ in b:
                yield a_, b_

    def write(self, x, y, z):
        for addr in self.addresses(y, z):
            self.x_data[addr].update(x)
        for addr in self.addresses(x, z):
            self.y_data[addr].update(y)
        for addr in self.addresses(x, y):
            self.z_data[addr].update(z)

    def _read(self, a, b, data):
        out = defaultdict(int)
        for addr in self.addresses(a, b):
            for o in data.get(addr, ()):
                out[o] += 1
        if not out:
            return []
        return sorted(out, key=out.get)[-self.p:]

    def read_x(self, y, z):
        return self._read(y, z, self.x_data)

    def read_y(self, x, z):
        return self._read(x, z, self.y_data)

    def read_z(self, x, y):
        return self._read(x, y, self.z_data)


class ShallowTemporalMemory:
    def __init__(self):
        self.memory = IdealTriadic()
        self.predictive_memory = IdealTriadic()
        self.z = []

    def input(self, inp, new=True):
        self.predictive_memory.write(self.z, self.z, inp)

        x = self.z
        y = inp
        z = self.memory.read_z(x if new else x + y, y)
        if not z:
            z = [randint(0, 0xffffffffffffffff)]

        self.memory.write(x + y, y, z)
        self.z = z
        return z

    def predict(self, p=1):
        self.predictive_memory.p = p
        return self.predictive_memory.read_z(self.z, self.z)

    def reset(self):
        self.z = []


mem = ShallowTemporalMemory()


for letter in 'this is a text any text is a text and this is a text being texted by a typer this':
    inp = [letter]
    mem.input(inp)

mem.reset()
print()

for letter in 'an text and this is a text beyng texted by a typer this':
    inp = [letter]
    mem.input(inp, new=False)
    print(letter, *mem.predict())

1 Like

Yeah that is nice, I guess there are quite a bunch of attempts to directly incarnate symbolic processing.

The promise of an associative approach is it should be able to handle both “vague” representations like (encodings from) sensory inputs and symbols, together with a yet-to-be-discovered bridge between these two realms.

The impression I get is that Triadic memory, HTM, Spatial pooler, Assembly calculus and other variats of the same idea are just ways people have attempted to implement those symbolic processors using neural-like storage.

to be completely honest, when I look at all those, I just see differently shaped hopfield networks with enforced sparsity.

we already have several different ways to implement the memory but I think maybe it would be better to focus on the algorithm instead.

I’d like to bring your attention back to Pentti Kanerva’s Sparse Distributed Memory, a visionary concept
introduced more than 30 years ago.

It seems hard to find research on how one would use such a memory if there was a high-performance implementation. Did anyone say “I tried this, but SDM was too small / to slow for my purpose”?

The Dyadic Memory algorithm in its latest incarnation with 1-bit storage locations is certainly today’s
most efficient SDM: At n=1000/p=10 one can write 250k and read 30k heteroassociations x->y per second. The dimension of x can be as big as 20k, while there’s no limit on the dimension of y. The usable capacity is in the millions, and the speed of read or write operations remains independent of the number of items stored.

The question is now: How can we put SDM to work in real-world applications?

2 Likes

exactly, we already have some really powerful implementations for sparse vector storage and search, triadic memory is certainly the best I’ve seen so far but we have no way to actually apply it onto real-world problems.

I say that because I dont think sequence memories are the right way to approach this, we are not extracting information from the input, we are just storing the input in a new data format.

we are basically building overfitting machines so far, there has to be a better way to do it which I suspect will include some forms of lossy storage and deliberate corruption of data.

However Kanerva’s memory features 1000 bits long dense bit arrays, which have a representation power in the order of 1e301 possible states while 10/1000 bits sdrs have only 2.6e23 available states.

Not saying this means this is necessarily insufficient, just that it might not be as expressive as one might need.

Thinking of embedding representations in both vision and language modelling where float vectors sized between 200 and 2000 are common.

The underlying idea behind them being they need to be rich enough to encode in themselves enough information about the identity of the “thing” they represent but also its content/structure.

SDRs were intended as a simplification of the same concept, a 50/1000 sdr “floats” in a space 1e85 states, still much lower than Kanerva’s yet much higher than a 10/1000 bit representation.

from math import factorial
def sdr_states(n,p):
    return factorial(n)/factorial(n-p)/factorial(p)

print(sdr_states(1000,10))

PS not to say GPT-3 embeds the words (tokens) in a vector of size 12288

There’s a tradeoff between how much information you need to represent vs how much hardware to do computation.

those dense representations have more content, therefore they take up more storage space, so we can fit less items in memory no matter the implementation.

I believe brains solve this problem by using clever forms of lossy data compression, somehow you need to learn whats irrelevant in the data and throw it away while storing it.

an extreme example: here’s some data, you may guess it takes a fair amount of space:
image

but I can use my brain to convert this data into a string, which has no resemblance to the original data but still better captures its meaning and takes up less space: [120, 94, 50, 43, 121, 94, 50, 61, 49]

Yes but the question is can you represent anything with 9 bytes?

How many bits in a SDR are needed to encode a vector 9 int8-s, and how much more bits you need to add to the SDR for it to have some overlap with neighboring (aka similar) encodings?

of course not but neither do a cortical column with 200k bytes.

At Cortical.io they use N=16k SDRs to represent text content and is rich enough to search and retrieve similar documents with a “query” document.

Which as an example application proves that it is pretty sufficient to embed rich content and… if it were to be replicated via tri/diadic memories there are two obstacles:

  • implicitly diadic or triadic associations points to a SDR, which does not tell in itself what are the actual documents we look for
  • it points to a single SDR or an union of SDRs if multiple Y-s are stored with the same key X
  • not to say its P size is in the hundreds, but it might get a response in a second or so.

PS by document it could be anything from an email to application forms, contracts or hundreds of pages novels, no actual restriction on content, as long it is textual

2 Likes

thats another thing that gets me confused, triadic memory is too sparse, from sources I find online, the brain seems to work with 1-5% sparsity while triadic uses 0.01% sparsity.

maybe we are treating sdrs as symbols but the brain and cortical.io are working with superpositions of several symbols to deliberatelly cause interference?

not to mention, a 1000^3 block of memory would need a million neurons to be implemented, its too big for a cortical column, jeff estimates it to be 700 neurons but the biggest estimate I found is 200k neurons, which is still too few neurons. maybe we should drop the idea of a self contained column?

1 Like

it is 1% not 0.01% . At 100/10000 it would be pretty rich.

I would not go too far with analogies with HTM… besides the SDR representation. If we want to force that analogy I would consider “column” each bit-pair location. So an N=1000 diadic memory is an abstract representation of N*(N-1)/2 ~ =500k micro-columns each “learning” an 1000 bit SDR

1 Like

I was making analogies with HTM because although the algorithm is different, the limitations should be pretty similar.

1 Like

I’d wager they-re complementary.
There-s another sense in which the bit-pair associations match thousand brains theory - their pattern recognition mechanism IS voting.

1 Like