Anomaly score logic for multiple inferred fields



I’m calculating the anomaly score for input with two inferred fields, both of which are scalar encoded. One field (field1) has a range between 0 and 250, while the other (field2) lies between 0 and 12000.
I’ve added both fields in field encodings, but I don’t understand how the two input fields influence the anomaly score.

To test my code, I deliberately add anomalies in my input set, which is just a value of 0.The model detects an anomaly when field2 is 0 and field1 isn’t. However, when field1 is 0 and field2 isn’t, the model has a tough time detecting anomalies. Can someone explain why this happens?


Can you show your model parameters? Especially for the encoders?


The entire code is available here:

The parameters are listed in the method getNetworkLearningEncoderParams().


One core thing to know about the way NuPIC generates anomaly scores from multi-field streams is that the multiple encodings are all concatenated into 1 when fed into the SP & TM. This means that the SDRs created are like swirls or mixtures of the input encodings, so there’s no way to tell what role each different field played in the creation of the anomaly score. If you have 2 fields as in your example and the anomaly score is say 0.5, its unknowable whether 1 field was acting predictably with the other unpredictable or if both fields were semi-predictable.

You could try running models with each field separately, to shed light on where they are each more predictable and unpredictable. You could then compare these series of anomaly scores to that from the 2-field model. If there are points in time where both 1-field models are anomalous but the 2-field model is predictable, that (I think) is when the behavior of one field is helping to predict the other field.


It is working better on field2 because the min - max range is so much farther away from zero. Try making the anomaly -1000 and I bet they will both have high anomalies.


@sheiser1 I ran the models with each field separately, and it turns out that field2 is predictable. However, it’s field1 that’s hard to predict. As @rhyolight mentioned, it’s happening because range is wider on field2 than field1.

In an attempt to fix this, I normalized field2 to the range of field1 by multiplying field2 input by 0.02 (field1 range is [0,250] and field1 range is [0,12500], so multiplier is 250/12500 = 0.02). The results of anomalies generated makes it look like my model has gone bonkers.

Any suggestions how to scale down the inputs so they both influence my model equally @rhyolight @sheiser1 ?


Have you tried calling getScalarMetricWithTimeOfDayAnomalyParams() to get encoder params? If you don’t need a time encoding you can just ignore it. But try this cause this is typically where we get the best scalar anomaly encoding parameters.


My personal hunch is that its not just the wider range, but the way the inputs are distributed across the range in terms of frequency. Have you checked a histogram of all field1 and field2 values? I’d be curious to see it. If the frequencies are lower at the tails (most of the values are in the middle) you could restrict the range to contain say 90 or 80% of all data values, and the encoder will clip all values above & below the max & min to the max and min values themselves. I do this in my current project and I find it helps.


@sheiser1 @rhyolight One correction guys. I had normalized the field2 input w.r.t field1 range but did not adjust the range for field2 in the encodings (it was the same 0 and 12500). That was a mistake on my part.
I corrected it to 0 and 250 and ran again, the anomalies generated are the exact same as the initial case . So it’s certain that normalizing one input w.r.t another does not change anomaly and prediction capability.

@sheiser1 You’re right about the data spread. I’ll check again and confirm though, and maybe try what you suggest and see if that works for me.

@rhyolight any idea of the equivalent of getScalarMetricWithTimeOfDayAnomalyParams() in (as this is what I’m mostly using) ?