Maybe the brain is capable of remembering everything, but doesn’t to reduce calorie usage (which is important for how well the brain works, not just starvation). Some people can remember things much better than most people (e.g. hyperthymesia). That’d be hippocampus rather than neocortex probably.
Just one example: The piano part of Rachmaninov’s concerto no 3 is a 30000 token long sequence.
Ok hyper … 61 known cases, fair enough.
If that is relevant why not apply the same judgment to Diadic/Triadic memories themselves? Shouldn’t they get rid of “write” and “read” operations and replace them with a single function that both makes a query and a write? How the temporary memory would be possible with such a “simplification”?
I think this goes to my question. So how precisely does this work?
- The state derived from a previously recognised sequence is stored, but where and in what form?
- That state can be queried by an ordered sequence of 3 tokens, how?
- The stored triples contain random values, do they not? How does that affect lookup?
My impression is that HTM does not have this concept of a separate isolated store of previously recognised sequences, but rather each element learns its own small subsequence. It can also recognise more than one such subsequence, and thus take part in recognising multiple longer sequences.
The key to understanding the sequence memory is the feedback loop in M1:
At each time step, a new SDR comes into the top position of this triadic memory. It’s previous value propagates down to the center position. Now these two consecutive SDRs are used to query the memory for the third (bottom) position. At this point the memory learns in a self-supervised way: If the query input has not been seen before, it fills the the bottom position with a new random SDR, otherwise with the recalled value. (The neural equivalent to deploying a new random pattern at once is to increment a “counter” on a whole dendritic segment, rather than slow synaptic learning.) The SDR at the bottom position of M1 encodes memory-internal context, it’s never a value entered from outside. The context is fed back to the center position, recursively building up sequence-specific context in the subsequent steps.
You always feed one token after the other into the circuit. In the most basic implementation with just two triadic memories, two tokens are sufficient to recall context and make a prediction. The version with three triadic memories has an additional stage that combines consecutive tokens into bigrams – this algorithm needs to see three tokens to start making predictions.
When working with symbols ABCD it’s common to encode them with fixed random SDRs which are then nearly (not perfectly) orthogonal. This means there will be small overlaps, which can indeed have a rare effect on sequence prediction, in particular in ambiguous situations.
The HTM temporal memory (please correct me if I’m wrong) is a component with one SDR input and one SDR output. From an electrical engineering point of view it’s clear that you cannot build interesting circuits from such a component – this requires components with three ports, such as logical gates, transistors, or … triadic memories.
Actually if the input is doubled in size one half can be used as a controlling context for the other half.
In general any subset of input space bits can be regarded (or used) as a “controlling gate” for the reminder input bitspace.
The main limitation being input and output spaces are same size.
Which is no longer the case with brainblocks .
There is an explicit context learner class that could be a powerful twist on HTM principles.
today I use HTM RDSE for testing TemporalMemory (N=1024, P=21) for one-step prediction (having 2 TriadicMemory modules) with different sinus data.
I put the sparse SDR, the output of the RDSE encoder, into TM.
TM learns very quickly different kind of input data and immediately predicts data if it has seen it before.
Thanks @POv for sharing your codes and interesting algorithms.
I will invest more for multiple steps prediction next time.
Adding a new algorithm to the collection:
Monadic Memory is an auto-associative sparse distributed memory to be used as a
cleanup memory and for clustering/pooling a stream of SDRs. It learns new patterns in one shot.
It’s based on two mirrored Dyadic Memory instances which effectively form a hidden layer.
The capacity is similar to the capacity of Dyadic Memory, for example 500k random patterns at n=1000 and p=10.
The algorithm works at a fixed sparsity. A sparsity-reducing version can be derived from this algorithm (more on that later).
C code is available on GitHub. PDF with examples linked below.
Triadic memory doesn’t always learn, but that’s just when it already knows it perfectly. It doesn’t treat the digits of pi as a distinct episodic memory. Is that what you meant? I guess if it isn’t like episodic memory, maybe it shouldn’t be doing one-shot learning.
(I’m not a neuroscientist and this is just brainstorming based on memory.)
It’s hard to compare the triadic temporal memory circuit with biology. It’s ambiguous because many parts of it have corresponding SDRs. Those could be the same neurons at different points in time.
The delay mechanism is interesting. Before updating the sequence-so-far, it associates it with the current (already updated) input. When it updates the sequence-so-far cells, those cells implicate the next input.
In biology, it’d probably update the sequence-so-far near the end of an oscillatory cycle (timestep). @LiorA talked about something related in the hippocampus (HC), I think.
In neocortex, L6ct cells could be the sequence-so-far cells. They seem to suppress a surprise signal in thalamus, so they might implicate the next input. They also mostly target inhibitory cells in cortex, so they could trigger the end of the oscillatory cycle.
@POv Could you please share the asymmetric C-Version of Dyadic Memory? Thanks
just pushed it on GitHub
Continuing the HC analogy, If you make p high enough it will act like the Dentate Gyrus in the HC. It will take similar memory (comprised of similar sensorial representations) and return (or forward to CA3/CA1 & back to EC) a well separated representation of one episodic memory…
for example, you enter a room… your eyes are sending similar & close views of the environment, yet you remember one room only, and not all the “different” rooms your eyes saw …
I think this “module” is a needed one on top of the previous memories that remember everything (like the numbers of pi) regardless of their emotional or significance importance. People who remember everything are very rare …
Now it is unclear for me with your calculation of storage locations.
static int storagelocation(int n, int i, int j)
return n * (-2 - 3i - ii + 2j + 2i*n) / 2; // cell in base triangle
According to your neuronal net in pp 3 Triadic Memory Docu, the index of the hidden unit, which connects to bit i and j of the input bits should be
i * N - summe of all integer from 0 to i
iN - i(i+1)/2
Am I right here?
The calculation of storage locations is not documented in the paper. It works like this: i and j are all ordered pairs you can form from the non-zero indices in x. In the C code, indices run from 0 to n-1. Therefore i runs from 0 to n-2, and j from i+1 to n-1. The formula in storagelocations maps this triangle to a linear storage array. (This triangle, by the way, is half of the triadic memory storage cube).
Hope this helps…
@POv thanks for your response.
But my algo is immediately for this purpose:
The storage location index = j + N - i(i+1)/2 for all j = i+1 to N
I think bit pair addressing logic deserves a topic on its own. Different structures - diadic, triadic memories and ID & value maps - all use similar addressing, the differences being in what is stored at these addresses.
In a sense each bit pair address is like a micro-dendrite with only two synapses, each looking at a different bit in the input SDR
You are right that we have different ways for pairing them. I only tried to understand your algorithm. I tested your algorithm for the case N=7, but can not find all possible pairs between 7 input nodes and 21 hidden nodes
@thanh-binh.to There was actually a bug in the asymmetric version … thanks for pointing this out!
If you are familiar with python here is a simplified version of address computing.
the sdr is basically an ordered list of 1 bits.
See below if all bits from 0 to 6 are 1, then the resulting addresses fill the space from 0 to 20 (21 address pairs)
def address(sdr): for y in range(1,len(sdr)): for x in range(y): yield (sdr[y]*(sdr[y]-1)//2+sdr[x]) [a for a in address([0,1,2,3,4,5,6])] [0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20]
To be more clear how bits are paired:
def bit_pairs(sdr): for y in range(1,len(sdr)): for x in range(y): print(sdr[x],sdr[y]) bit_pairs([4,5,6])
In my implementations I need to make sure the SDRs are ordered.
@POv Yes. I have seen the current C-version in github. It looks now correct and has a better performance than my version. Thanks for your update.