Suggested enhancement for NAB: raw->likelihood functionality


#1

I’ve been experimenting on testing some new algorithms on the NAB. Like NuPIC, these algorithms can produce “raw” anomaly scores, but deriving a likelihood score based on the characteristics of each dataset (as the NuPIC detector does) improves the score significantly.

My suggestion for new algorithms (not integrated into the NAB detector namespace) is a new workflow:

  1. Generate a CSV file of timestamp, value, raw_score, store in results/yourdetector
  2. Run a new option python run.py -d <detector> --likelihood --score --normalize which fills in the anomaly_score using NuPIC’s AnomalyLikelihood class
  3. Scoring proceeds as before

This would serve to separate the contributions of the algorithms generating raw scores from the method used to threshold them. It might, if it changes the relative performance of competing algos, give appropriate attention to the idea of using the AnomalyLikelihood process to augment anomaly-detection systems.


#2

I think a tool like that would be a nice idea to try. The likelihood calculation is not specific to HTMs - it could be applied to a large set of anomaly detection algorithms.

We just updated NAB last week showing the impact of likelihood over raw anomaly scores in the case of HTMs. The NAB score for HTM with likelihood is 65.3, and the score without using likelihood (just raw anomaly scores) is 52.5 (see the Additional Scores section of the NAB README). This shows that the likelihood calculation does have a significant positive effect. It’s nice that even without likelihood the HTM does very well.