Evaluating the performance of the real-time anomaly detection included in the Elastic Stack

Dima · February 4, 2020, 1:43pm

I am about to do my MSc thesis and I have in mind to do a comparative study between the Anomaly Detection algorithm included in the Elastic Stack and 1-2 other Frameworks/Implementations.

I was wondering whether it is possible in some sort of way to use the NAB to evaluate the performance and the quality of the Elastic Anomaly Detection algorithm. I know that the Elastic ML is open source, does anybody know whether it has already been evaluated previously?

Thank you in advance for your help.

rhyolight · February 4, 2020, 6:42pm

Definitely possible, see NAB Entry Points.

It is not on the scoreboard, so go for it!

Dima · March 10, 2020, 3:07pm

Hello again,

I have tried to understand for the past two weeks how to do it, but I really find it very difficult.
That repository is pretty complex and I do not understand if it is feasible or not. Could you provide me with any help/more documentation on the entry points?

rhyolight · March 10, 2020, 6:07pm

Sure, NAB runs natively in Python 3. What environment does Elastic Anomaly Detection algorithm run in? If there is a way to run it in Python 3, that makes it easier.

Dima · March 11, 2020, 8:29am

Elastic AD algorithm is written in C++. Setting up a build environment for ml-cpp native code is complex, you can have a look at this: build-setup.
Normally, the Anomaly Detection Jobs are run through the GUI of Kibana, but this won’t be useful at all for registering the score in the benchmark.

Maybe I am underestimating the complexity of doing such thing. If you think this cannot be achieved in the scope of a MSc thesis, considering it won’t be my one and only focus, please feel free to tell me.

Any suggestion would be appreciate.

rhyolight · March 11, 2020, 4:09pm

It can be achieved. I suggest you follow the example of the twitter advec detector, which is written in R, executed in its own runtime on the NAB dataset, and integrates through a file interface with the scorer in NAB. The primary work is getting the NAB datasets into your environment, running the algorithm, & outputting the intermediary files required by the NAB scorer.

Dima · March 12, 2020, 7:56am

Well, I looked at that but it seems very different to build a detector for the elasticsearch code. It is very hard to build and in that repo there is not only the anomaly detection, but the entire ML tools of elasticsearch. So forecasting, regressions, etc.

I was wondering whether I could achieve this with a more analytical approach, like running the anomaly detection on the Numenta dataset and reporting the scores manually in the right format. Or something like this. I am asking this since, the elasticsearch AD is meant to be run through GUI.

The only problem with this strategy is that when I want to run an AD job with elasticsearch I need to choose the bucket span for the time series and I think this will affect the detection. Are there any workaround?

Thank you

rhyolight · March 13, 2020, 3:37pm

What do you mean by this?

Dima · March 16, 2020, 8:57am

When analyzing data, Elasticsearch uses the concept of a bucket to divide up a continuous stream of data into batches for processing.

For example, if you were monitoring the average response time of a system, using a bucket span of 1 hour means that at the end of each hour we would calculate the average (mean) value of the last hour’s worth of data and compute the anomalousness of that average value compared to previous hours.

You can read more here.

rhyolight · March 16, 2020, 7:52pm

It sounds like there may be some complications getting data into a format that works for Elasticsearch? I don’t have the time to look into this. I with you luck

Topic		Replies	Views
NAB: faster optimization and scoring NAB	10	1586	January 10, 2019
Ideas for the Numenta Anomaly Benchmark Competition! NAB	4	1430	June 28, 2016
Comparison with S-ESD and S-H-ESD NAB	4	1108	August 30, 2018
Anomaly detection in multiple data sreams NAB anomaly-detection , question	2	1638	May 14, 2018
Naive predictor: how compute anomaly score from 2 scalars? Include error metrics into NAB? Is anomaly score the optimal metric? NAB	2	593	July 10, 2019

Evaluating the performance of the real-time anomaly detection included in the Elastic Stack

Related topics