TemporalAnomaly detection questions

Hello NuPIC community,

I’m trying to build an anomaly detection application based on NuPIC for Prometheus and have a few questions regarding this. I’m kind of new to NuPIC so I’m sorry if my questions seem trivial :slight_smile:

  • I’ll run TemporalAnomaly detection of a single scalar. I stumbled upon the “best single metric anomaly params” modelParams, and also getScalarMetricWithTimeOfDayAnomalyParams. My question is, how important are the min/max values? Let’s say I take a few days data and feed that as the min/max to generate the resolution. What happens if the actual value exceeds the maximum/minimum? Or is what really important here is the resolution?

  • How important is the resolution? What happens if I let it use the default 0.001? Will I get many false positives for metrics that don’t match this resolution?

  • Regarding the timestamp parameters. Does it need to be in consistent intervals? Can it handle small shifts between the provided data points?

  • Between application restarts I save the models I currently have. It’s a bit lengthy and reaches tens of megabytes per metric. I have a couple of questions about this. Can I reduce the size before saving? Or maybe I don’t need to save them at all? For example, it’s possible to query all the data points of the last 24 hours (10 second interval) and continue from there. Does reusing (save/load) the model so that it learns weeks/months of data would improve the anomaly detection?

2 Likes

Here’s a quick answer to get you started.

The resolution refers to the “accuracy” of the input units of measure and is used by the Encoders to put similar inputs into the same “bucket”. So for example, given the input, “0.002 → 0.234 → 0.474 → 0.009”, if your resolution was 10 then these inputs would all go into the same bucket (because your resolution would be too high) (bucket 0 - there would be a bucket for 10, 20, 30 etc, and the numbers 10-19 would all go in bucket 0, and the numbers 20-29 would go in bucket 1).

If you set your resolution to 0.001 such that inputs varying by thousandths would then be spread into multiple buckets (i.e. 0.000-0.0009 would go in zero’th bucket, and 0.001-0.0019 [i.e. 0.001, 0.0013, 0.0014875] would go in bucket 1, and 0.002-0.0029 would go in bucket 2).

Keep in mind this is just a conceptual answer, but you can hopefully see now that the resolution makes a difference in just how finely tuned the system is to your input.

Typically we use the RandomDistributedScalarEncoder which (I believe I’m putting this right), can “auto-size” the resolution to give you the best distribution of buckets, given your data. But if you know the “range” of your data before hand, you can use other Encoders such as the ScalarEncoder.

That’s just a quick answer to that one question…

Cheers, and Welcome @lightpriest!!

P.S. The usual flow of data through NuPIC’s algorithms is: Data → Encoder → Spatial Pooler → TemporalMemory < Classifier -or- AnomalyDetector (just named Anomaly).

I recommend this video which covers encoders - but the rest of HTM School is essential for new comers to HTM Theory! Enjoy!

2 Likes

@lightpriest,

I took a look at Prometheus, and it looks very interesting! Grok (a partner of Numenta which uses NuPIC to offer monitoring/management of application “health”), is another example of a very specific monitoring solution already in use… So it can be done!

1 Like

min/max encoder parameters are really important if you are using a ScalarEncoder. Any values that fall outside the min/max range will be interpreted as either the min or max (depending on the extreme). The resolution is a parameter for the RDSE, and indicates how many values will be included in each bucket. You either have to pay attention to min/max or resolution, depending on the encoder you are using. (More details about both encoders in HTM School: Scalar Encoding).

I assume you are talking about the RDSE’s resolution configuration. It is really important. Here is an example of using the min and max for a specific data set to determine the resolution, giving how many buckets you want to have:

https://github.com/numenta/nupic.workshop/blob/master/part-1-scalar-input/run_prediction.py#L36-L41

Consistent intervals in the input data are not required as long as the time is encoded using the DateEncoder. Just be sure they are in chronological order.

The current serialization technique in NuPIC leaves much to be desired. We have been working on a different mechanism that should improve things all around (speed and size). But it is not done yet.

Currently, no. We depend on python’s pickling for serialization. Not much control there.

Yes! That way it can remember patterns it has seen a long time ago. You won’t get it to learn longer-period patterns (like daily, weekly, etc) without having a model that’s been alive for that long and seen that much data.

2 Likes

@rhyolight, @cogmission thank you very much guys. :thumbsup:

Sorry about the confusion between ScalarEncoder and RDSE, it’s still kind of new to me :).
I must admit that initially when I started looking into this (about 7 months ago) it was a bit overwhelming, partly because of how awesome it is. I see that you’ve added videos to the youtube channel and I will certainly check them out. Really looking forward to doing a lot more with HTM!

1 Like