Noobie Question: How to use NuPIC for a NAB dataset?


#1

Hello everyone,

I took some time off from working on HTM due to some other work but have come back to take a look at it again for streaming anomaly detection tasks. I’ve been reading some HTM papers and I have also come across the NAB corpus. I’m interested in trying to test HTM on some of the datasets in NAB. Right now I’m going to try testing specifically on the datasets: Twitter_volume_GOOG.csv and ambient_temperature_system_failure.csv.

However, I’m not completely even sure how to get started. I’m using NuPIC to make the SP and TM at the moment.

Here is my rudimentary start: https://github.com/flarelink/HTM_Nupic_Streaming_Anomaly

However, I’m not sure how to go from there. I’ve been trying to look around for tutorials using the NAB dataset and haven’t come across any. Is there any code available to show a quick tutorial on using HTM for NAB? I may not be looking in the right places if there is.

I want to make an implementation to mimic the Numenta HTM result for NAB:
Detector: Numenta HTM* Standard Profile: 70.5-69.7 Reward Low FP: 62.6-61.7 Reward Low FN: 75.2-74.2

Thank you for your time and have a nice day! :slight_smile:


#2

Personally, I found this whitepaper (particularly Appendix D) and this diagram super helpful when trying to run NAB on an arbitrary detector.

Are you able to run the default Numenta detector with NAB?

From my own experience, I was able to use a custom anomaly detector by generating results in the format specified by Appendix D (linked above).

  1. First I ran the default numenta detector and the null detector to see how the results files should be formatted.
  2. Then I made a little script that used my custom detector to output files in the same format.
  3. Once you’ve got that, you can use python run.py -d [detector_name] --optimize --score --normalize to run the process of optimizing anomaly thresholds, scoring each file, and then normalizing the results.

The end result is data in the format you are looking for. You’ll find that data in an file called final_results.json, if I recall.

Hopefully that’s enough to help you make some forward progress!


#3

@Balladeer , @marty1885
Do you have any idea , how can I use the NAB metrics (anomaly windows, scoring function, and application profiles ) with my own detector and with my own dataset


#4

If you want to add a new dataset for NAB scoring, you’ll need to make sure it meets the NAB anomaly labeling guidelines.

To use a custom detector, you can either register it directly in NAB (if your detector is written in Python 2.7) or you can generate results files with your detector for use in NAB scoring (as detailed in Appendix D of the whitepaper I linked above).

The NAB wiki contains a lot of good info (including links to the whitepaper, labeling guidelines, etc).


#5

So you mean the only way to evaluate my detector using (Standard, Reward low FP, Reward low FN) with my dataset is to add a new dataset to NAB (it’s very complicated).
BTW I’m using HTM to detect anomaly in ECG dataset. and I want to evaluate my HTM detector … I read a thesis that using NAB score with his dataset


#6

You can use the numenta detector code on it’s own if you adjust it a bit.


#7

thanks @rhyolight for you reply
yes this is what I do , but I want to evaluate my results with (Standard, Reward low FP, Reward low FN) like Sutubai does in his paper with my ECG data set
or do you mean just add my dataset to NAB ???


#8

Yes, in order to evaluate it, you’ll have to add it to NAB. Remember you don’t have to run all the NAB detectors (that will take a LONG time). There is an option to only run one.


#9

Ok, thanks a lot @rhylight I will work on it