Time scale invariance of anomaly detection



Let an HTM model do anomaly detection on a scalar timeseries x(t), which doesn’t have very low frequencies, and which we can assume samples a physical phenomenon, like temperature. Let’s also interpolate x and make y(t)= x(t/2). y therefore has no extra information on the underlying phenomenon (no extra anomalies), but exhibits the same anomalous behaviour as x, albeit with many extra uninformative transitions.

Is it reasonable to expect that a sufficiently large HTM model should perform equivalently on predicting anomalies in x and in y? Is this a desirable property? And what would happen in practice?

I would call this property time scale invariance.


The neurons in a temporal memory system make predictive connections to the active neurons in the next timestep. So, while the networks for x and y wouldn’t look the same, they’d make similar predictions. While y’s predictive connections would go from a to c, x’s should go from a to b to c.

However, y’s data stream may be more or less predictable than x’s, so having both could actually be a benefit. This benefit is kind of what we’re getting into in the recent numerical encoders based on biology.



You mean the opposite, x: a->c, y: a->b->c

I would actually expect an efficient learning algorithm to get maximum information out of x, and gain nothing from y.

Can you point me to them?


Err, yup. My bad.

But actually, useful information can be found in both x and y. It’s like seeing the forest vs. the trees.

Sure thing. Look at the bottom [here].(Encoders created by the community) Also, look at the more recent HTM-school videos for grid cell stuff.

(I really need to wait until I’m off mobile for this stuff.)


Not really I believe. So as I assumed in the beginning, x is the discrete-time samples of a continuous-time natural phenomenon, let’s call it p. Then y is created by assuming a (eg polynomial) interpolation of x and drawing out “extra samples” magically. Of course, y has nothing new to tell us about the underlying physical phenomenon p, and no new anomalies; any anomaly should come directly from p.

If, for example, p has an unexpected sharp peak (anomalous) for a Δt << Δt_sampling, then most likely it will not be reflected at all in x, neither in y.

Seeing the forest vs the trees would imply sampling y directly from p with a higher sampling rate and focusing on a smaller time window.


Ah, right. In that case, part of the temporal memory would be wasted on predicting the interpolation, and would be a little bit less useful in predicting x. I’m not sure exactly how it’d compare mathematically though.

Theoretically, it’d be interesting to go into the math. Practically though, I wouldn’t recommend interpolation with temporal memory.

Going back to your questions in your first post, I’d say that yes, a sufficiently large HTM model should perform approximately equivalently. I’d say the performance would be asymptotically equivalent when increasing size. While it’s a desirable quality to predict further in the future, I would say interpolating the data would be similar to adding noise, which HTM can handle, but other than for testing, it’s never really a good thing.

In practice, they should predict the same anomalies at the same times, if the time scales are accounted for correctly. If there’s a short spike in the data, the model with interpolation should predict interpolating back down from the spike very well.