Difference between TM and backtracking TM

I came across nupic.algorithms.temporal_memory and nupic.algorithms.backtracking_tm.py. Can I know whats the difference between them? I saw from a source that the latter performs better for anomaly detections. Can anyone point out the differences between those (whether algorithms are different…if so, whats the diff) or the latter is just an optimization over the first one (if so, please elaborate on what optimization).


I think the idea of backtracking basically says:

“We are surprised by this input, since it wasn’t predicted by the set of previously active cells from t-1. But maybe it is part of a known pattern which started before t-1, maybe t-2, t-3…up to maxInfBacktrack steps ago. Let’s go back to each prior step in maxInfBacktrack and see if starting from any of these times generates prediction that lead to the current input”.

Here are a couple pics from the code which seem to summarize the core logic:

I think this can help in scenarios where known patterns are interspersed with noise.

So let’s say there’s a known pattern:
But you’re currently getting something like:
If you’re seeing noise input ‘(’ you won’t predict ‘D’, but if you backtracked a couple steps to “C” you then would predict “D”.

This is what I gather, but would love to know I’m right.

Am I missing any concepts here?
Does this interspersed noise scenario make sense?
Are there other scenarios that backtracking is thought to help with?

If I may I bug @subutai, @Scott, @mrcslws or anyone else who might know :grin:


I’m not sure either (so I would also like to read any insights from someone who does know). But I had a different understanding of backtracking TM, which is not really to address noise. It is more for reducing the ambiguity that piles up when a particular sequence is repeated. For example, say we have a system configured for one-shot learning, and we have input the following (using bold to indicate when minicolumns burst):


In this case, the minicolumns for B will burst. With normal TM, a new representation for B would be chosen and connect to the representation for A, growing the sequence a little bit longer. With backtracking TM, though, it would check if, for example, the A minicolumns had burst one timestep ago, would B have been correctly predicted in the current timestep? (this isn’t the best example, but you get the idea…) Thus, it can recognize that it already knows this sequence.


Ok, so it checks if the prior active columns had all burst?

My understanding was that it checks the active states from not just 1 timestep back, but each prior up to maxInfBacktrack steps. So in the case of your example, that bursting 3rd B would become predicted by backtracking 5 steps to the 2nd bursting A which predicted B.

Does this sound right to you? Thanks for your intuition on this!
I really want to understand this backtracking concept completely.

1 Like

Yes, I was just using 1 for that value, since it happened to work for the short sequence in the example. I probably should have spent a little more time coming up with a better example :wink:

Anyway, my point was that (if I understand it correctly, which may be false) it goes back in time to see if the sequence had “started” (which I read as “minicolumns bursting”) some timesteps back, would the current timestep have been correctly predicted?

The comments in the algorithm also refer to something about a maximum learned sequence length, which I don’t entirely understand either (I’m guessing maybe to also address the problem of repeating sequences growing infinitely over time?) so my grasp on this algorithm is also pretty shaky


I think you’re right on checking if the sequence has started, but I don’t think it means bursting.
I found a treasure-trove toward our questions here in the _inferBacktrack method.

Its seeks out the earliest timestep which still predicts the current input with confidence.
The assumption is that an earlier high-confidence offset will yield more context than a more recent one.
If I have it right, ‘Confidence’ (of prior input ‘S’ to current input ‘I’) is essentially:

number of known sequences starting with 'S' and leading to 'I' / number of known sequences starting with 'S'


@sheiser1 and @Paul_Lamb: thanks for your explain, so that I better understand it.
I do not want to distract you from this topic, but I am very motivated here for using this method for multiple-steps prediction using SDR classifier:
By backtracking T-steps we learn about the probability, that current value @ the current time t is the follow-up value from all time instant (t - k), where k=1 to T.
There is only one difference to backtracking TM, where T = const. For SDR classifier based multisteps prodiction, T is dynamic and means the sequence length (exactly: number of different SDR are seen before).

If you are interested in it we can start it as a new topic. Thanks