Li et al. writes “higher levels of cortical organisation increasingly integrate information over longer timescales, similar to how higher layers in the visual cortex correspond to larger spatial patterns.” (https://www.pnas.org/doi/full/10.1073/pnas.2110274119)
An LSTM retains information by learning which previous information was useful to retain over a longer timescale, using supervised learning. However, HTM is unsupervised, is it possible to learn such a representation without previous examples? I think so. An LSTM uses a short term and long term representation, but we don’t need to be constrained by that, we can build a model which integrates information over arbitrarily many increasing temporal intervals. Let me demonstrate with an example:
You’re doing some work on your computer, receiving sensory information and your own motor commands as input. A first-order HTM-like system is modelling the short-term (100ms), immediate patterns and sequences such as your immediate sensory information, motor commands etc. At a higher level of cortical organisation is a slightly longer-term predictor on the order of a few seconds, modelling your previous mouse movements, your short-term clicking or your immediate emotional reaction to information. At a higher level still may be a representation of your work, what you are currently doing over the order of a couple of minutes etc. At the highest level is your continuous self-representation of the current context, things like “I am a human being” “I am working towards my degree/job” “I am currently living in X country”. These are like stack frames for different contexts where you can understand many different timescales of tasks simultaneously. eg. I am performing a saccade in x direction, I am navigating this website, I am writing this paper, I am getting my degree, I am trying to be existentially fulfilled etc.
Your brain clearly does not remember every single input sequence from a year ago to train your continuous representation to remember what’s important in a supervised way. All of that information is learned implicitly. Imagine the scenario where you’re working and then you suddenly hear the doorbell ring. Your medium-term prediction suddenly encounters a massive prediction error with respect to the current incoming stimuli, this causes you to update your mid-level context to switch tasks, changing how you predict lower-level stimuli in response. Your long-term goals are unaffected by this contextual switch as its already “priced in” to the predictions of what is normal. The medium-term context switches rapidly, but still persists across many timesteps. This allows the brain to make simultaneous predictions through space and time, as well as operate in a stack of contexts and predict more complex patterns and dependencies across time.
My question(s) is: How do we build these increasingly long-term representations? What does HTM theory tell us about increasing timescales at increasing levels of cortical organisation? How do we encourage contexts to span across time in a robust, unsupervised way, whilst also simultaneously updating rapidly to new information which contradicts the contextual prediction?
My own thoughts are that some unsupervised representation of what is important is crucial. I’m guessing this is something like the prediction error from higher layers. The more incongruent a lower-level prediction is with the current hypothesis (possibly using some threshold of error) the more likely it is to update the higher context, moving up the layers based on what is allowable for the higher-level context, with each one being harder to adjust and more robust. But does every level need to be continually checking for these prediction errors? Does that mean it needs to maintain essentially the same SDR across longer and longer times, how is that possible implementation-wise with HTM?