Is there any real world public dataset of anomaly detection example?

Yoni_Cohen · December 7, 2021, 4:23pm

Hello everyone!
After some serious search online I could not find any real world example of using HTM for anomaly detection.
By “real world dataset” I mean a standard, public, well tested, annotated dataset - for example this.
By “example” I mean a .py of notebook that presents how to take data, preprocess it, configure the SP and TM, training and getting at least decent results.
I am aware that there is the hotgym.py example in the git repo, but I seems that doing the same pipeline and configs with real world datasets just seems to not work properly.
I would extremely appreciative of someone could share a link for such example of even better a .py/JN of such an example.
BTW I am using HTM.Core since I wand python 3.

Thanks,
Yoni.

sheiser1 · December 7, 2021, 6:32pm

Hey @Yoni_Cohen,

Welcome! Here’s NAB for the win:

Yoni_Cohen · December 8, 2021, 8:33am

Hi @sheiser1 thank you for your answer!

I am aware of NAB, but I’m trying to find an example of using HTM for anomaly detection on datasets that are out of the NAB.
The reason I’m asking it is that I was trying to use HTM for anomaly detection on several public datasets (for example dodgers) and I could not get any reasonable results.
I’m using HTM.Core implementation and I tried both raw tm.anomaly and anomaly score which didn’t work well (the prediction ability however is pretty great). I used the parameters of the TM and SP from the hotgym example.
This makes me thinking that I might be missing the basic approach of how to use HTM as an out-of-the-box model for AD and I wanted an example of how to take a simple dataset and use HTM for AD.
Thank you again

sheiser1 · December 10, 2021, 11:44pm

Hey @Yoni_Cohen ,

I bet the encoding parameters are the problem.

In my experience there are 2 main causes of bad performance:

Encoders not suited to data
Sampling frequency too high

A most reasonable request, tho I’m not aware of one for htm.core.

I’d suggest:

Check for bad encoders – Plot histograms of all input feature values, to inform the encoding parameters.
Check for bad sample rate – Plot out your data over time, to see how many time steps it takes for the patterns to repeat. The longer, noisier & more complex the sequences are, the more data it will take for the TM to learn them well. If they are long, try reducing the sample rate somehow to make the patterns show themselves faster.
Re-run and check anomaly scores over time – Use a Scalar encoder (rather than RDSE encoder). RDSE is more flexible, but to me Scalar is more intuitive to control since you can set min & max. Trying setting the min & max based on the feature distributions from step 1. The min & max should be such that most of feature values fall between them.

Hope this helps some!

Topic		Replies	Views
Understanding Anomaly detection through HTM Implementations question , community , nupic-wiki	1	864	October 15, 2021
Comparison between HTM and other anomaly detection algorithms NAB	3	1147	November 29, 2019
Real world example that HTM algorithm or Numenta's framework can be applied to Applications question	3	911	May 27, 2020
Supervised multivarient Anomaly Detaction by using HTM Education question , community , nupic-wiki	2	568	July 25, 2020
Noobie Question: How to use NuPIC for a NAB dataset? NAB	9	1462	November 23, 2019

Is there any real world public dataset of anomaly detection example?

Related topics