Anomaly detection - params and questions / definition of terms

#1

Would one / some of you please go into some (greater) detail explaining the following terms / usages:

a - ‘inference’ - how is it used in the context of anomaly detection?
b - the third line of ‘PublisherSupplier.builder().addHeader()’ - what’s a ‘blank’? how are these settings used? I’ve read the source and (kind of) get ‘category’ and ‘timestamp’ but not ‘blank’. What happens if this field is left empty?
c - when I run ~20K iterations of my dataset (timestamp, param0, p1, p2, result) where only the timestamp and result fields are changing - the values for ‘SDR.cellsAsColumnIndices(inf.getPredictiveCells(), …)’ and ‘inf.getAnomalyScore()’ never change. The first being integers from 0-18 and the second being 0.0 consistently. Is this insufficient data to find patterms? The ‘hot gym’ seems to find an equilibrium fairly rapidly. Perhaps my data is not regular enough within the span of time? How would I expand the ‘memory’ of the HTM setup?
d - how does HTM (nupic specifically) deal with timestamped data that’s out of sequence?

0 Likes

#2

“Inference” is just the name of the object containing output from the NAPI (Network API).

Don’t worry about it. “Blank” is an internal representation that “crept” into the API, these get filled in automatically to allow all the headers to be the same length to ease and speed up parsing. All front-facing knowledge of it should be removed actually.

You should probably ask all HTM functionality questions in the NuPIC forum, since the functionality is the same in both. Java and Java API specific questions, I can answer or other HTM.Java savvy people can answer here, but I personally haven’t “used” NuPIC to any credible extent, and so wouldn’t be the best choice to ask those questions to.

You have to make sure your settings are the most efficient, which takes some learning and fiddling, though the “Anomaly” parameters have been normalized by Numenta engineers to a best fit for most situations. However, “Prediction” modeling requires maybe more specific parameter setting. One thing I noticed was that you use 500 bits for each field of your encoders which I think is probably overdoing it by about 5x - maybe (depending on how much variation and the “resolution” of the field data, of course)? Though the NuPIC forum might yield a different conclusion? Also, an Anomaly of “0.0” indicates either no inference (I believe), or that the data is completely predicted… Anyway, I would ask these questions on the NuPIC forum…

1 Like

Generic NuPIC anomaly / usage questions
#3

Thank you!

1 Like

#4

Of course you could test your Anomaly detection by throwing in a piece of labelled data at the 5000th iteration or something to see if it’s detected?

0 Likes