In current implementations as far as I know the cells in a region can predict one step ahead. But in this paper it explains that a region can predict multiple steps ahead.
As mentioned earlier, predictions are not just for the next time step. Predictions in
an HTM region can be for several time steps into the future. Using melodies as
example, an HTM region would not just predict the next note in a melody, but might
predict the next four notes. This leads to a desirable property. The output of a
region (the union of all the active and predicted cells in a region) changes more
slowly than the input. Imagine the region is predicting the next four notes in a
melody. We will represent the melody by the letter sequence A,B,C,D,E,F,G. After
hearing the first two notes, the region recognizes the sequence and starts predicting.
It predicts C,D,E,F. The “B” cells are already active so cells for B,C,D,E,F are all in one
of the two active states. Now the region hears the next note “C”. The set of active
and predictive cells now represents “C,D,E,F,G”. Note that the input pattern changed
completely going from “B” to “C”, but only 20% of the cells changed. Page 25
I am wondering about how a region achieves this. It has been made clear exactly how one-step prediction is achieved, but it would be interesting how a region can predict an arbitrary number of steps into the future.
I have a vague memory that it is achieved through temporal pooling? Something like an SDR is layer 3 remains stable because it is associated with all the TM SDRs of a sequence in layer 4? Then that stable representation is projected back to layer 4 to make the multi-step prediction.
Maybe I’m missing something in the paper that explains this, or maybe its still just a high-level theory?
Older versions of the HTM whitepaper had a step in TM which allowed connecting with earlier timesteps (I don’t recall the exact implementation off hand, I’ll looked it up from my notes later if you are interested).
Personally, I think this is a function of a temporal pooling layer, as you mentioned. If you think of a sequence as an object, apical feedback of that object represented in a pooling layer cause all the elements of the sequence (or section of the sequence, depending on the pooling implementation) to become predictive.
I am curious about this too. I know there’s a parameter in NuPIC for number of steps into the future to predict, but how does this work currently?
If I set that param to say 5, does that mean NuPIC will generate a set of predicted cells for t+1, then feed that back in and get predicted cells for t+2 and so on until t+5? Or would there be a separate model learned for doing t+5 prediction apart from the default t+1? Thanks
To answer my own question, it seems the predicted cells(t) are treated as activeCells and fed back in to get the next predicted cells(t+1). This function is in backtrackingTM.py.
def predict(self, nSteps):
"""
This function gives the future predictions for <nSteps> timesteps starting
from the current TM state. The TM is returned to its original state at the
end before returning.
1. We save the TM state.
2. Loop for nSteps
a. Turn-on with lateral support from the current active cells
b. Set the predicted cells as the next step's active cells. This step
in learn and infer methods use input here to correct the predictions.
'''
It’s important to remember that multi-step predictions produced in this manner of following out states of prediction in time defined by the network’s current lateral dendrites represent a compilation of possible futures. If multiple possible futures are likely in a signal, they will all be predicted on the same SDR.
However, in my own research, counter-intuitively, I have found that the ratio of predicted columns often falls to 0 upon high-order prediction->activation transitions instead of accumulating to 1. I presume this is because of the vast combinatoric complexity of cell to cell connections requires a highly learned and plentiful population of lateral dendrite connections to produce meaningful high-order transitions of predicted state.
I’ve attached two plots from my research which illustrate plunging prediction density in high-order transitions as well as it’s effect on prediction error statistics. When “thresholding on density,” I am enforcing a column-wise prediction SDR density of between 2% and 10%, and simply throwing out anything that doesn’t fall within those bounds.