Anomaly detection - params and questions / definition of terms

phil_d_cat · May 12, 2017, 11:07pm

Would one / some of you please go into some (greater) detail explaining the following terms / usages:

a - ‘inference’ - how is it used in the context of anomaly detection?
b - the third line of ‘PublisherSupplier.builder().addHeader()’ - what’s a ‘blank’? how are these settings used? I’ve read the source and (kind of) get ‘category’ and ‘timestamp’ but not ‘blank’. What happens if this field is left empty?
c - when I run ~20K iterations of my dataset (timestamp, param0, p1, p2, result) where only the timestamp and result fields are changing - the values for ‘SDR.cellsAsColumnIndices(inf.getPredictiveCells(), …)’ and ‘inf.getAnomalyScore()’ never change. The first being integers from 0-18 and the second being 0.0 consistently. Is this insufficient data to find patterms? The ‘hot gym’ seems to find an equilibrium fairly rapidly. Perhaps my data is not regular enough within the span of time? How would I expand the ‘memory’ of the HTM setup?
d - how does HTM (nupic specifically) deal with timestamped data that’s out of sequence?

cogmission · May 13, 2017, 11:32am

“Inference” is just the name of the object containing output from the NAPI (Network API).

Don’t worry about it. “Blank” is an internal representation that “crept” into the API, these get filled in automatically to allow all the headers to be the same length to ease and speed up parsing. All front-facing knowledge of it should be removed actually.

You should probably ask all HTM functionality questions in the NuPIC forum, since the functionality is the same in both. Java and Java API specific questions, I can answer or other HTM.Java savvy people can answer here, but I personally haven’t “used” NuPIC to any credible extent, and so wouldn’t be the best choice to ask those questions to.

You have to make sure your settings are the most efficient, which takes some learning and fiddling, though the “Anomaly” parameters have been normalized by Numenta engineers to a best fit for most situations. However, “Prediction” modeling requires maybe more specific parameter setting. One thing I noticed was that you use 500 bits for each field of your encoders which I think is probably overdoing it by about 5x - maybe (depending on how much variation and the “resolution” of the field data, of course)? Though the NuPIC forum might yield a different conclusion? Also, an Anomaly of “0.0” indicates either no inference (I believe), or that the data is completely predicted… Anyway, I would ask these questions on the NuPIC forum…

phil_d_cat · May 13, 2017, 12:17pm

Thank you!

cogmission · May 13, 2017, 1:42pm

Of course you could test your Anomaly detection by throwing in a piece of labelled data at the 5000th iteration or something to see if it’s detected?

Topic		Replies	Views
HTM detecred an anomaly. How to find out the reason? Lounge	3	596	August 30, 2017
Please throw me a clue: finding anomaly _likelihood_ HTM.Java	2	1271	May 19, 2017
Generic NuPIC anomaly / usage questions NuPIC	11	1156	May 16, 2017
Live Q/A session on Anomaly Detection NuPIC anomaly-detection	10	1442	September 1, 2017
Anomalies vs predictions HTM.Java	1	854	May 17, 2017

Anomaly detection - params and questions / definition of terms

Related topics