Problem with false positive after anomaly detection

kairotavares · April 11, 2017, 9:26pm

Hi,

I’m doing some experiments with nupic to detect anomalies for DNS requests over the timeOfDay.

Basically, I’m using a dataset of 3000 records to train the algorithm. I’m using a timeOfDay DateEncoder and SDRCategoryEncoder. All the 3000 records are a data like (timestamp, “dns1”) with timestamp is a +delta of 30 min. Therefore the dataset for training looks like:

"00:00:00", "dns1"
"00:00:30", "dns1"
"00:01:00", "dns1"
"00:01:30", "dns1"
"00:02:00", "dns1"
"00:02:30", "dns1"
"00:03:00", "dns1"
… (until 3000)

Then I disable the learning and I want to detect anomalies with different address in a specific time. Basically, the anomaly dataset looks like:

"00:00:00", "dns1"
"00:00:30", "dns1"
"00:01:00", "dns1"
"00:01:30", "dns1"
"00:02:00", "dns1"
"00:02:30", "anomaly"
"00:03:00", "dns1"

After getting the anomalies scores and plotting it, looks like that every time that it gets an anomaly the next “good” record is detected as an anomaly as well.

What could be this false positive? Some issue or some wrong parameter or something expected?

Thanks.

rhyolight · April 11, 2017, 9:59pm

This doesn’t seem like unreasonable behavior. Once the anomaly has been seen, it cannot predict when the anomaly will end. Such is the nature of an anomaly ;). But once it returns to normal after seeing two steps it settles down and recognized it is back.

Are you plotting the raw anomaly score or the anomaly likelihood?

kairotavares · April 12, 2017, 2:06am

@rhyolight Thanks for you quick reply to my question.

I used the raw anomaly score for the plotting because the likelihood was always at “0.5”. The code is below:

anomalyLikelihood = an.AnomalyLikelihood()   
for i in xrange(data.size()):
    record = dict(zip(data.getFieldNames(), data.next()))
    result = model.run(record)
    anomaly_results.add([record["connection"], result.inferences["anomalyScore"]])
    likelihood = anomalyLikelihood.anomalyProbability(
      record["connection"], result.inferences["anomalyScore"], record["timestamp"]
    )
    print likelihood

Note: What I plot are the contents of anomaly_results and “print likelihood” only prints “0.5”

“But once it returns to normal after seeing two steps it settles down and recognized it is back.” - I thought that the anomaly scores was based only in the trained data. Since I disabled the learning therefore I was expecting that the anomaly score was computed based on the prediction of the trained data and not considering the previous anomaly prediction.

rhyolight · April 12, 2017, 3:29pm

From my experience, the anomaly likelihood process will return 0.5 until it seem 500 records (I think). But after that it should start emitting real results. Is the likelihood value always 0.5 throughout the data stream?

Topic		Replies	Views
Why am I seeing lot of false positives? NuPIC	12	2484	June 22, 2016
Repeated data not flagged as anomalous - why? NuPIC anomaly-detection	1	663	September 6, 2016
Generic NuPIC anomaly / usage questions NuPIC	11	1157	May 16, 2017
Anomaly detection on binary data using NuPIC NuPIC anomaly-detection	2	1490	October 5, 2018
What constitutes an anomaly? How to tune detection? NuPIC	15	2477	May 22, 2017

Problem with false positive after anomaly detection

Related topics