Anomaly score logic for multiple inferred fields

styagi · November 10, 2018, 9:28pm

I’m calculating the anomaly score for input with two inferred fields, both of which are scalar encoded. One field (field1) has a range between 0 and 250, while the other (field2) lies between 0 and 12000.
I’ve added both fields in field encodings, but I don’t understand how the two input fields influence the anomaly score.

To test my code, I deliberately add anomalies in my input set, which is just a value of 0.The model detects an anomaly when field2 is 0 and field1 isn’t. However, when field1 is 0 and field2 isn’t, the model has a tough time detecting anomalies. Can someone explain why this happens?

rhyolight · November 10, 2018, 11:24pm

Can you show your model parameters? Especially for the encoders?

styagi · November 11, 2018, 2:15am

The entire code is available here: https://github.com/sahiltyagi4/Indy500/blob/master/src/main/java/com/dsc/iu/utils/TwoMetricsDetection.java

The parameters are listed in the method getNetworkLearningEncoderParams().

sheiser1 · November 11, 2018, 2:50am

One core thing to know about the way NuPIC generates anomaly scores from multi-field streams is that the multiple encodings are all concatenated into 1 when fed into the SP & TM. This means that the SDRs created are like swirls or mixtures of the input encodings, so there’s no way to tell what role each different field played in the creation of the anomaly score. If you have 2 fields as in your example and the anomaly score is say 0.5, its unknowable whether 1 field was acting predictably with the other unpredictable or if both fields were semi-predictable.

You could try running models with each field separately, to shed light on where they are each more predictable and unpredictable. You could then compare these series of anomaly scores to that from the 2-field model. If there are points in time where both 1-field models are anomalous but the 2-field model is predictable, that (I think) is when the behavior of one field is helping to predict the other field.

rhyolight · November 12, 2018, 6:56pm

It is working better on field2 because the min - max range is so much farther away from zero. Try making the anomaly -1000 and I bet they will both have high anomalies.

styagi · November 13, 2018, 6:35pm

@sheiser1 I ran the models with each field separately, and it turns out that field2 is predictable. However, it’s field1 that’s hard to predict. As @rhyolight mentioned, it’s happening because range is wider on field2 than field1.

In an attempt to fix this, I normalized field2 to the range of field1 by multiplying field2 input by 0.02 (field1 range is [0,250] and field1 range is [0,12500], so multiplier is 250/12500 = 0.02). The results of anomalies generated makes it look like my model has gone bonkers.

Any suggestions how to scale down the inputs so they both influence my model equally @rhyolight @sheiser1 ?

rhyolight · November 13, 2018, 6:40pm

Have you tried calling getScalarMetricWithTimeOfDayAnomalyParams() to get encoder params? If you don’t need a time encoding you can just ignore it. But try this cause this is typically where we get the best scalar anomaly encoding parameters.

sheiser1 · November 13, 2018, 7:01pm

My personal hunch is that its not just the wider range, but the way the inputs are distributed across the range in terms of frequency. Have you checked a histogram of all field1 and field2 values? I’d be curious to see it. If the frequencies are lower at the tails (most of the values are in the middle) you could restrict the range to contain say 90 or 80% of all data values, and the encoder will clip all values above & below the max & min to the max and min values themselves. I do this in my current project and I find it helps.

styagi · November 13, 2018, 7:33pm

@sheiser1 @rhyolight One correction guys. I had normalized the field2 input w.r.t field1 range but did not adjust the range for field2 in the encodings (it was the same 0 and 12500). That was a mistake on my part.
I corrected it to 0 and 250 and ran again, the anomalies generated are the exact same as the initial case . So it’s certain that normalizing one input w.r.t another does not change anomaly and prediction capability.

@sheiser1 You’re right about the data spread. I’ll check again and confirm though, and maybe try what you suggest and see if that works for me.

@rhyolight any idea of the equivalent of getScalarMetricWithTimeOfDayAnomalyParams() in htm.java (as this is what I’m mostly using) ?

rhyolight · December 31, 2018, 3:24pm

There is no equivalent in HTM.Java, sorry.

Topic		Replies	Views
Doubt: Predicted field, anomaly Likelihood and multiple inputs NuPIC	13	1899	March 30, 2017
Predict anomalies on more than a single input field NuPIC htm	2	754	November 8, 2018
Anomaly detection from multiple variable NuPIC question	5	1690	August 11, 2017
Generating corelation between input variables NuPIC	3	507	February 15, 2017
Newbie question: How to get both anomaly score, anomaly likelihood and predictions NuPIC question	5	1080	January 29, 2020

Anomaly score logic for multiple inferred fields

Related topics