Problem with false positive after anomaly detection

Hi,

I’m doing some experiments with nupic to detect anomalies for DNS requests over the timeOfDay.

Basically, I’m using a dataset of 3000 records to train the algorithm. I’m using a timeOfDay DateEncoder and SDRCategoryEncoder. All the 3000 records are a data like (timestamp, “dns1”) with timestamp is a +delta of 30 min. Therefore the dataset for training looks like:

"00:00:00", "dns1"
"00:00:30", "dns1"
"00:01:00", "dns1"
"00:01:30", "dns1"
"00:02:00", "dns1"
"00:02:30", "dns1"
"00:03:00", "dns1"
… (until 3000)

Then I disable the learning and I want to detect anomalies with different address in a specific time. Basically, the anomaly dataset looks like:

"00:00:00", "dns1"
"00:00:30", "dns1"
"00:01:00", "dns1"
"00:01:30", "dns1"
"00:02:00", "dns1"
"00:02:30", "anomaly"
"00:03:00", "dns1"

After getting the anomalies scores and plotting it, looks like that every time that it gets an anomaly the next “good” record is detected as an anomaly as well.

What could be this false positive? Some issue or some wrong parameter or something expected?

Thanks. :slight_smile:

1 Like

This doesn’t seem like unreasonable behavior. Once the anomaly has been seen, it cannot predict when the anomaly will end. Such is the nature of an anomaly ;). But once it returns to normal after seeing two steps it settles down and recognized it is back.

Are you plotting the raw anomaly score or the anomaly likelihood?

@rhyolight Thanks for you quick reply to my question.

I used the raw anomaly score for the plotting because the likelihood was always at “0.5”. The code is below:

anomalyLikelihood = an.AnomalyLikelihood()   
for i in xrange(data.size()):
    record = dict(zip(data.getFieldNames(), data.next()))
    result = model.run(record)
    anomaly_results.add([record["connection"], result.inferences["anomalyScore"]])
    likelihood = anomalyLikelihood.anomalyProbability(
      record["connection"], result.inferences["anomalyScore"], record["timestamp"]
    )
    print likelihood

Note: What I plot are the contents of anomaly_results and “print likelihood” only prints “0.5”

“But once it returns to normal after seeing two steps it settles down and recognized it is back.” - I thought that the anomaly scores was based only in the trained data. Since I disabled the learning therefore I was expecting that the anomaly score was computed based on the prediction of the trained data and not considering the previous anomaly prediction.

From my experience, the anomaly likelihood process will return 0.5 until it seem 500 records (I think). But after that it should start emitting real results. Is the likelihood value always 0.5 throughout the data stream?