Triadic Memory — A Fundamental Algorithm for Cognitive Computing

It has many encoders for handling with different kind of data type
I recommend you to use those encoders to encoding your data into SDR and put them further into Triadic memory

I added a simple linear_encoder in sdr_util.py. It is a scalar encoder that produces a sparse SDR.
Usage:

from sdr_util import linear_encoder
encode = linear_encoder(min_val, max_val, sdr_size, sdr_len) 
sdr = encode(value) 

See example at bottom of sdr_util.py


I also think I made a more … comprehensive sequence predictor (temporal memory discussed here)
Will follow with a description.

You can follow this text together with class SequencePredictor in seq_learn.py


init() instantiates two triadic memories and what I call a rolling context.
There are two types of keys “random keys” and “union keys” which will be described later

  • self.Mkeys - “Memory for keys” triadic memory storing a (union_key, input, random_key) as (X,Y,Z) associations.
  • self.Mpreds- “Memory for predictions” is the second triadic memory storing (union_key, input, next_input) as (x,y,z)
  • self.context is simply preserving the previous random_key, union_key, and input

ukey is the union (bit-wise OR) of previous rkey and previous input.

predict(new_input) function:

  • retrieves prev_ukey, prev_rkey, prev_input from self.context

  • Looks up Mpred (using prev.ukey, prev.input) for predicted input. If it does not match current input saves current input in Mpred as “the right” prediction

now it processes Mkeys to keep it “in order” -

  • 60: computes new ukey from previous input and previous random key
  • 62,63: looks up Mkeys for whatever “memorized rkey and ukey” could be there.
  • 64-66: if memorized ukey is very different from computed ukey then generates a new rkey and stores the (computed ukey, input, new rkey) association in Mkeys.

So in a sense MKeys maintains “known contexts” while Mpreds a list of “expected predictions” which links a context to a future input.

I hope it makes sense. Somewhat.

PS As a comment for a performance improvement - since query for X from an (Y, Z) is unbearably slow (specially since ukey is a union, double length = 4 times slower than already slow X search) then Mpred could store both
(ukey, rkey, input) and (rkey, input, ukey)
Since the element positions are rotated and unrelated then it should not collide easily.

1 Like

So I wrote a simple linear encoder and decoder for real numbers and tested the temporal memory algorithm with a sine signal. It works just fine:

2 Likes

@POv excellent! Could you pls put your code in github so that I want to learn your ideas at the weekend?
Thanks for sharing your works

The sinusoidal signal should be “remembered” by a single triadic memory. Even diadic memory could be used with overlap to store (a+permuted(b), c) to remember an arbitrary a,b,c triple sequence

A slightly more serious challenge is to have it learn two different signals - e.g. a pair of x,y values traveling along circular and square paths then being able to differentiate between them.

Another interesting experiment is to play with overlaps.
E.G. have two sinusoidal signals each at a different frequency. First encoded with bits 0-499, the other with bits 500-999.
Teach them separately then use an overlap of both signals at a random phase for prediction.

It does ring a bell :slight_smile: M1 resembles AC3 with its recurrent collateral…
in HC, AC3 is doing a storage & completion step give an incomplete input from EC.
M1 is storing a memory along with its past context. It also does a completion step: if few bits are off, the right output will still be restored due to the big difference between memories in high dimension.
In HC, AC3 feed AC1 and in AC1 you can find place cells (which today we know are actually general memory cell and not only “place” cells). AC1 also show previous and future place cells activation while doing a planning task/goal task. The future and past cells are activated at the early and late phase of the Theta cycle. The algorithm above seems to “imagine” a possible future (but not past) at M2 output …
Exciting stuff here :slight_smile:

1 Like

I just posted an extended temporal memory algorithm which is a circuit of three triadic memory instances (the original algorithm used only two). The newly added first stage encodes the
input stream as bigrams, which improves prediction accuracy. For example, it can learn 100 digits of Pi in one go. It also works reasonably well learning 50,000 words of a novel in a few iterations.

As it is rather easy to draw such circuit diagrams and generate the corresponding program code, the question is now what’s the right brain-inspired architecture for solving more complex problems with increasingly larger circuits?

Links to PDFs:

1 Like

I doubt there-s anything in the brain akin to memorizing 100 digits of Pi in a single shot.

PS however I’m not saying an “internal experience recorder” which I could “google” to compare mind-generated scenarious would not be useful. But we have something much more… simplistic.

I wonder whether simplicity isn’t actually a feature.

Impressive. :face_with_open_eyes_and_hand_over_mouth:

Can I ask, where is the state (the ‘memory’) encoded after a sequence is learned?

Can one circuit learn more than one sequence?

What I find interesting about this is the similarity between this diagram and the Petri Net diagram used by Jerome Feldman to describe embodied cognition. Could be just coincidence, of course.

One example of many:

https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.22.8212&rep=rep1&type=pdf

Interesting analogy. The Petri net “places” would be the equivalent of two sides of a triadic memory, and the “transitions” would be the respective third side, firing an output when the inputs are populated.
In such a framework, the circuit would operate asynchronously, and might show non-deterministic behavior.

My elementary (two-stage) temporal memory was modelled after an Elman network, with the two memory units operating in synch like in a clocked digital circuit.

1 Like

@david.pfx I don’t know what you mean by “where” is the state encoded. It is in a pair of associative memories or three in the newer variant.

The way @POv designed this, it learns continuously, which means when it fails to predict a new input it takes the current context (which is am encoding of previous inputs) as a fork or branching.

If you want to have “independent” sequences I guess you can use start/stop “tokens” as separators.

Which works decently, yet:

  • it will still make various predictions after a stop,start sequence, which the user code should discard.
  • there is no per-sequence identifier in the sense of having “words-sdr” assigned to “chain-of-letter-sdrs”
  • And consequentially at this stage there is no support for reverse remembering, (restoring previous SDRs) although I think it should be possible.

PS On second thought it definitely needs a “query only” mode which “rolls” the sequential context without updating the associative memories here-s why.

Let’s say an application wants to teach it first 20 digits of Pi… after entering them all, then currently there is no other way to test they were remembered correctly but to re-enter the sequence again beginning with the first digit. But this way the memory learns the 20-th digit is followed by the first one! It has no notion of test mode as all other machine learning models have. After checking 10 digits are correctly learned, we want to test a new sequence, e.g. digits of e. But the memory learns this way that after 10-th digit of Pi sequence can branch to either the 11-th digit of Pi, OR the first digit of e!
It looks all clean and elegant as a formal diagram without if-then-else-s but its a nightmare in practice

@david.pfx @cezar_t It’s possible to use the temporal memory algorithm for learning separate sequences. For this purpose, the circuit state variables (but not the triadic memory contents) need to be flushed between sequences. In the latest implementation this is done by sending a zero SDR.
(@cezar_t this is what you call a start/stop token).

I tested this with a data set from the SPMF sequence learning library. The prediction accuracy reaches 95 percent out of the box. The scoring could be further improved using a proper overlap metric, as most “mispredictions” have bits in common with the expected SDR.

I guess my problem is that I really don’t understand what the diagram represents. What are those circles with values in them? What do the arrows mean? If the state (‘memory’) is not part of the diagram, where is it?

There is mention of triples being ‘stored’. Where? Not in the diagram?

Then there is mention of ‘a query on M2’. How do you query a triple that contains a random value (which by definition you cannot know)?

At the end of the day, how is this different from simply recording the last sequence and comparing it to the next one?

My understanding was that an HTM network can store multiple sequences and choose which to predict based on the first few elements. But maybe that’s not so either.

imo this is still insufficient. In your example what happens when you/user needs to query a middle of a sequence? So you can’t touch the start token since that would wrongfully hint we-re either seeking a begining of a sequence OR that we want to teach the memory a new sequence.

@david.pfx I’m aware this is all a bit unconventional - to my knowledge there’s no precedent for building such networks of associative memories.

In the diagram, the blue boxes are associative memories that can store triples of SDRs. A typical capacity would be 1M random triples per memory unit. The arrows carry SDRs between memory units, or also within the same memory unit from one port to another. When three arrows hit a memory unit, it learns (if the information is new). When two arrows arrive at a memory, a query is performed to recall the third position. The state of the circuit consists of the SDRs moving around via the arrows.

Sequence information is stored in form of associations in sparse distributed memories, Therefore it is robust against noise in the input. Predictions can be superpositions/bundles of SDRs in case of ambiguities.

1 Like

This sequence memory does not store semantic start and end tokens for sequences (although one could of course prepend and append those to the sequences). The only purpose of that special zero-SDR token is to clear current context. To query the middle of a sequence, send a zero-SDR and three consecutive tokens, and it will automatically find the right sequence position to continue. Only if your three input tokens are not part of any known sequence, the system will store this as a new sequence.

It’s a feature, not a limitation, of this sequence memory that it won’t store a subsequence separately from an enclosing sequence. An example: Send ABCDEFGHIJ, and terminate with a zero-SDR. Now this sequence is stored. Then send DEF. It will predict G, but not learn anything new.

Is a limitation if e.g. you want to search unreliable data - (maybe it is fake?, maybe it is real?) in a memory purposed to serve as discriminator between fake and real data: “Yup, sure I’ve seen this before, then it must be real!”