TLDR Instead of predicting input at next time step from the recent past context, concatenate the past context with future contexts retrieved from an associative memory, using current context as a search key.
If the tldr above didn’t phased you out, here are some details.
We have the following elements:
-
Some sort of algorithmic next-step predictor, let’s use “RNN” as a generic term. It could mean an (H)TM, a generative LLM, Echo State Network, etc… even an “actual” RNN, LSTM, GRU, whatelse.
-
Some form associative memory which were mentioned pretty often here, but it could be a vector database or vector memory. In general is a tool capable of performing nearest neighbor search, or K-NN aproximator over a database of vectors.
The technique described below is about how to use 2 (the associative memory) to improve 1 (the RNN).
All RNNs have a limited size context which is a summary of the past. It is called “hidden size” in RNNs, or “cell activations” in HTM, or “context window” in generative transformers.
A common problem across all is that in order to improve predictions, the context size needs to get bigger, because it needs to encode relevant clues needed figure out the future. The longer the past the more complex and more possible interactions between those clues, so there is needed a “larger” context. Which slows everything down quadratically. A RNN’s weight # is proportional with the hidden state size squared. Similar issue with attention cost in transformers.
However, the prediction of token at time t becomes much easier when besides context(t-N,. up to…,t-1), we could have available the future context(t+N down to t) - which sounds kind of absurd since the future context didn’t happen.
And here-s where 2 (associative memory) comes to help. The next best thing besides knowing the near future is knowing what were the futures in contexts similar to the current one.
In short use (2) to use a current context embedding to retrieve (e.g.) three “past moments” which look most similar with the current one, having an exact pointer in time we can assemble three condensed “potential future contexts” and feed the actual predictor network not only the “recent past” context but also a short list of “what future might look” contexts.
Biological plausibility? Well, we seem to have a means to make predictions on longer term. I don’t think about the next letter I want to type, nor the next word, it’s a larger (yet less exact) idea of what I want to say.
So I don’t know, hypotheses about brains having some sort of associative memory capability keep popping out, this is just a clue about how we might use it.