Auto-selecting encoder resolution for anomaly detection

Hi guys. I’ve implemented my own version of HTM/anomaly-detection (largely based on NuPIC). I’ve tried running the detection on both NAB and non-NAB data, and it appears to work well only for certain encoder resolutions. The resolution used is dependent on the nature of the data, and this seems intuitive. My question is are there any techniques anyone has validated for determining the appropriate resolution to use from some representative portion of the data? [Note: simply determining the max/min range doesn’t appear sufficient, since what appears to be crucial is the actual number of buckets used.]

My initial idea was to separate the calibration input into a training and test set (obviously this part of the data should be non-anomalous). Then run the detector on both sets. Then compare the anomaly results. My theory is that for a good encoding resolution, the stats for both result sets (mean/std) will be similar (within some min similarity threshold).

Thoughts?

Hi @mrrubato, welcome!

Good question!
I got to discuss this once with the great @rhyolight, who recommended taking the min & max found from a subsample of the data and padding it. For instance if the min & max found were 0 & 100, you could add padding of 10 which would set the min & max to -10 & 110. His assumption was that the population distribution would have more variation than the sample, hence the padding.
I’ve also tried an alternate percentile-based approach, where the min is set to the 1st percentile of the sample and the max is set to the 99th.

This is interesting, I haven’t tried changing the number of buckets myself. I believe the standard number of buckets for RDSE is 140, so I left it there. It makes sense that performance is so sensitive to bucket count tho, since it sets the level of spatial granularity used by the HTM.

I’d be curious to see the results of this, and I’d recommend another approach as well – checking how long it takes for the anomaly scores to flatline on a predictable dataset (like a simple sine wave). Since the patterns are clear and simple, we know it should take the HTM very little time to learn and the anomaly scores should flatline fast, assuming valid encodings. If the anomaly scores are taking too long to flatline, it means the encoding is missing the clear signal that we know is there.

Hope this helps at all, and again welcome to the forum!

Good question!
I got to discuss this once with the great @rhyolight, who recommended taking the min & max found from a subsample of the data and padding it. For instance if the min & max found were 0 & 100, you could add padding of 10 which would set the min & max to -10 & 110. His assumption was that the population distribution would have more variation than the sample, hence the padding.
I’ve also tried an alternate percentile-based approach, where the min is set to the 1st percentile of the sample and the max is set to the 99th.

Thanks @sheiser1. Yes I know about this part of it. There also appears to be a variation of this to pad using the std-dev of the data sample (in nupic.core I believe). Creating the correct range for the encoder appears helpful, but the critical piece in my testing seems to be the number of buckets used (which translates to the resolution).

I’d be curious to see the results of this, and I’d recommend another approach as well – checking how long it takes for the anomaly scores to flatline on a predictable dataset (like a simple sine wave). Since the patterns are clear and simple, we know it should take the HTM very little time to learn and the anomaly scores should flatline fast, assuming valid encodings. If the anomaly scores are taking too long to flatline, it means the encoding is missing the clear signal that we know is there.

I understand the motivation behind what you’re saying, but I don’t think it would work. Mainly because the optimal resolution param appears heavily dependent on the specific pattern of data being analyzed. So it wouldn’t help to switch it out to a simple sine wave, since then you’d be analyzing a different (likely simpler dataset). Or am I misunderstanding you?

In my testing, I’ve found what seems to give good results is to actually split the test portion into two halves and to do a three-way stat compare between these and the training stats.

1 Like

True good point! So I guess my idea is circular since you’d have to know the granularity & predictability of your data is going in.

True good point! So I guess my idea is circular since you’d have to know the granularity & predictability of your data is going in.

Haha no worries. I’ve had my own circular reasoning moments dealing with HTM. My basic premise is that for a good tuning, the recent errors mean should start to converge to within a narrow band of the historical errors mean - and that this mean should stay fairly constant for non-anomalous data. Essentially, I believe I’m just restating the assumptions behind Numenta’s anomaly likelihood algorithm. So if this is true, then segments within this non-anomalous data should have similar error means. If after some sufficient number of data points, they still don’t - all things being equal - this would appear to implicate the choice of encoding resolution.

1 Like