Building up longer term contextual representations with HTM

Li et al. writes “higher levels of cortical organisation increasingly integrate information over longer timescales, similar to how higher layers in the visual cortex correspond to larger spatial patterns.” (

An LSTM retains information by learning which previous information was useful to retain over a longer timescale, using supervised learning. However, HTM is unsupervised, is it possible to learn such a representation without previous examples? I think so. An LSTM uses a short term and long term representation, but we don’t need to be constrained by that, we can build a model which integrates information over arbitrarily many increasing temporal intervals. Let me demonstrate with an example:

You’re doing some work on your computer, receiving sensory information and your own motor commands as input. A first-order HTM-like system is modelling the short-term (100ms), immediate patterns and sequences such as your immediate sensory information, motor commands etc. At a higher level of cortical organisation is a slightly longer-term predictor on the order of a few seconds, modelling your previous mouse movements, your short-term clicking or your immediate emotional reaction to information. At a higher level still may be a representation of your work, what you are currently doing over the order of a couple of minutes etc. At the highest level is your continuous self-representation of the current context, things like “I am a human being” “I am working towards my degree/job” “I am currently living in X country”. These are like stack frames for different contexts where you can understand many different timescales of tasks simultaneously. eg. I am performing a saccade in x direction, I am navigating this website, I am writing this paper, I am getting my degree, I am trying to be existentially fulfilled etc.

Your brain clearly does not remember every single input sequence from a year ago to train your continuous representation to remember what’s important in a supervised way. All of that information is learned implicitly. Imagine the scenario where you’re working and then you suddenly hear the doorbell ring. Your medium-term prediction suddenly encounters a massive prediction error with respect to the current incoming stimuli, this causes you to update your mid-level context to switch tasks, changing how you predict lower-level stimuli in response. Your long-term goals are unaffected by this contextual switch as its already “priced in” to the predictions of what is normal. The medium-term context switches rapidly, but still persists across many timesteps. This allows the brain to make simultaneous predictions through space and time, as well as operate in a stack of contexts and predict more complex patterns and dependencies across time.

My question(s) is: How do we build these increasingly long-term representations? What does HTM theory tell us about increasing timescales at increasing levels of cortical organisation? How do we encourage contexts to span across time in a robust, unsupervised way, whilst also simultaneously updating rapidly to new information which contradicts the contextual prediction?

My own thoughts are that some unsupervised representation of what is important is crucial. I’m guessing this is something like the prediction error from higher layers. The more incongruent a lower-level prediction is with the current hypothesis (possibly using some threshold of error) the more likely it is to update the higher context, moving up the layers based on what is allowable for the higher-level context, with each one being harder to adjust and more robust. But does every level need to be continually checking for these prediction errors? Does that mean it needs to maintain essentially the same SDR across longer and longer times, how is that possible implementation-wise with HTM?


The paper missed an obvious explanation:

“While an increased receptive field is accounted for by spatial summation of inputs from neurons in an upstream area, the emergence of timescale hierarchy cannot be readily explained,”

Temporal pooling of sequences.

If a given stage is recognizing a known sequence its output to the next stage will be a constant value.
the next stage will be recognizing sequences of sequences, and so on up the hierarchy.

1 Like

Interesting idea, thank you for the insight. Forgive me for not completely understanding, but how would this be applied to, for example, nonlinear sequences or keeping track of memory or context over time where there isn’t an obvious sequential link? What if I need to retain information about an event happening 10 minutes ago, which is causally related to the current sequence, but without any obvious sequence-of-sequences between them? This paper presents an analytical approach to determining common longer-term sequences, which is moreso what I’m looking for:

In the doorbell example, what would a learned sequence of sequences even look like? It seems intuitively too rigid to encapsulate a prediction of the entire context, where “doing work on a computer” isn’t just a longer sequence of sequences, but is actually essentially summarising those patterns over longer timescales, and identifying relevant, high information patterns independently for memory. If one neuron’s prediction of the next active neuron counts for one active neuron in the next layer, each higher-level neuron will represent a connection between two sequences. This means that with each increasing layer, you’ll achieve 2^layer timesteps of “range” of the sequences it can observe when ascending the hierarchy. What about finer-grained control and short-term memory? What about cases where sequences can be completely out of order, but you can determine the higher level context based on just one of many potential cues such as the environment, sound, time, etc. not specifically the identification of one extremely specific meta sequence-of-sequence-of-sequences?


“Time Myopia” was a critique on HTM I intended to discuss about these days.

Because TM (temporal memory) seems focused in predicting only what will happen in the following 5-10 milliseconds. Which for a “higher intelligence” is quite handicapping.

There are several pathways to address this:

  • Head in the sand - assume this is not cortex’ issue some other (hyper-/mega-/hippo-)thalamus (-campus?) takes care of stitching larger spans of time fabric together And wait for new neurological breakthroughs that will point to the right solution.
  • assume again TM is correct but instead of waiting for breakthroughs try to invent them, test them “in silico” then check later whatever neuro-/biology data confirms them.
  • go wild, assume TM misses certain important points and try to change the TM assumptions/model and see where these changes lead.
    Then either look further for neurological feedback or skip that and move on entirely siliconic if the updates you made show promising practical results.

Have you considered Metabotropic receptors for the longer time scale activities?

" These receptors can remain open from seconds to minutes and are associated with long-lasting effects, such as modifying synaptic strength and modulating short- and long-term synaptic plasticity."

1 Like

For episodic memory, the hippocampus is clearly involved. I have not read any compelling papers on how the episodic sequences are stored there.

The structure of the hippocampus is very different than the cortex and this may be important for this sequential storage - the arrangement of the CA1-3 and dentate gyrus is most suggestive of a recurrent network.

I am very interested is learning how these episodic memories are formed and communicated to the rest of the cortex but consider this an open question for now.

Researchers are looking at how Metabotropic receptors may play a part in episodic memory.
If you are ready to really deep dive down the hippocampus rabbit hole look at the cited papers in this link:

Thanks, that’s a new word I’ll use to justify a few past and lots of future speculations about my brain.

The only problem will remain the apparent inability to test them. Which is one of the sloooww changing parameters of my cognitive states.

I’ll just dump them here for start, but rather in separate topics.

1 Like

I’ve since realised that I’m directly trying to implement temporal pooling (I think renamed to union pooling), a concept which has been discussed on this forum many, many times. If anyone has any new ideas or information about temporal pooling, please do not hesitate to message me

1 Like

@seanjhardy as far as I know the union pooling is also implemented in htmresearch in Python


Ok, I found union pooler but I have trouble understanding what it does.
It seems to transform a fixed sequence of SDRs into a single… “sumary” SDR.

But I don’t get it if order is significant or not. Like if
“ABCDEF” gets a different output than “BACDEF” and how.

1 Like