Clarification on anomaly score and anomalylikelihood

wip_user · August 5, 2017, 5:10am

Hi

Can someone clarify my doubt on Anomaly Score and Anomaly likelihood ?
I got anomaly score =0.1 and anomalylikelihood=0.999
Can I consider this is an anomaly ?

rhyolight · August 5, 2017, 2:52pm

There is a lot of stuff on the forums about anomaly likelihood and anomaly scores. Try out a few searches and see if you find anything useful before starting a new post. See the Before Posting section of the Read this first post.

You should fine that the anomaly likelihood score is generally more reliable and less fluctuating than the raw anomaly score. You can also find out how it is generated, and what thresholds you might use to indicate anomalies (0.9999 is typical, but you might want more or less 9s).

wip_user · August 6, 2017, 9:19am

Hi Matt(@rhyolight)
I dont post the question without going through previous posts and discussions , I got confused with different responses and clarifications on Anomaly likelihood , Hence I posted the question

In one of the documentation says , Anomaly likelihood is the
probability or confidence level on the current anomaly score , for ex : - anomaly score is .3 and Anomaly likelihood is .9999 , what I understood from the documentation is system is 99.99% confident that anomaly score is .3

As per the below documentation , System is 99.99% confident that current score is an anomaly .
AnomalyLikelihood

anomalyProbability(value, anomalyScore, timestamp=None)
Compute the probability that the current value plus anomaly score represents an anomaly given the historical distribution of anomaly scores. The closer the number is to 1, the higher the chance it is an anomaly.

Pls let me know which one is correct

Pls confirm my understanding
we have to use Anomaly likelihood when environment is extremely noisy (lot of fluctuations in the values )
We need look into Anomaly likelihood score when anomaly score is high .

subutai · August 6, 2017, 5:31pm

Hi @wip_user, I understand your confusion. The best reference code for anomaly detection is here:

github.com

numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py

# ----------------------------------------------------------------------
# Copyright (C) 2014, Numenta, Inc.  Unless you have an agreement
# with Numenta, Inc., for a separate license for this software code, the
# following terms and conditions apply:
#
# This program is free software: you can redistribute it and/or modify
# it under the terms of the GNU Affero Public License version 3 as
# published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
# See the GNU Affero Public License for more details.
#
# You should have received a copy of the GNU Affero Public License
# along with this program.  If not, see http://www.gnu.org/licenses.
#
# http://numenta.org/licenses/
# ----------------------------------------------------------------------

This file has been truncated. show original

In short, you should ignore the anomaly score completely. You should only use the anomaly likelihood. I recommend a threshold of >= 0.99999 The above code has some other best practices and has been proven to work well in a very wide variety of situations.

@rhyolight There are a lot of good questions about anomaly detection on the forum. It is hard for me to reply to each one. Perhaps at a future Hacker’s hangout we can cover the various questions in detail and more comprehensively?

wip_user · August 6, 2017, 7:24pm

Thanks Subatai for valuable clarification . Apologies for inconvenience caused .

rhyolight · August 7, 2017, 4:53pm

Good idea, I’ll plan it!

subutai · August 9, 2017, 7:12pm

No inconvenience at all! Just hoped it helps!

sheiser1 · June 12, 2018, 12:38pm

Hey guys,

Super quick here, I just wna make sure I’m interpreting the Anonaly Likelihood algorithm right as its shown here:

As I understand it (in plain words):

M_total(t) = mean of all anomaly scores so far
M_recent(t) = mean of all anomaly scores in recent time window
Sigma(t) = standard deviation of all anomaly scores so far

Likelihood(t) = Z-score of: ( ( M_total(t) - M_recent(t) ) / Sigma(t) )

I’m taking ‘k’ to mean the number of inputs seen to date, though I’d also think ‘W’ is that. Do I have this right? I’m trying to implement it myself. Finally to that end, is the time window for ‘M_recent’ generally held 100 across data sets as per the ‘reestimationPeriod’ value in numenta_detector.py file?

Thanks!!

sheiser1 · July 2, 2018, 2:42am

Hey @rhyolight, would you mind affirming or correcting me on this super quick? Sry to bother

rhyolight · July 2, 2018, 3:42pm

This is not really my forte. Maybe @scott or @subutai could respond.

sheiser1 · July 2, 2018, 4:06pm

Gotcha! Could I impose on either of you @scott or @subutai to sure me up on this? My home-brew TM implementation (from the BAMI pseudo-code) yields anomaly scores larger than (but seemingly proportional to) NuPIC’s, and I’m very curious if the Likelihood values would therefor fall in line with NuPIC’s. Thanks!!

bkutt · July 2, 2018, 4:44pm

In short, Anomaly Score is the fraction of active columns that were not predicted correctly. In contrast, Anomaly Likelihood is the likelihood that a given anomaly score represents a true anomaly. In any dataset, there will be a natural level of uncertainty that creates a certain “normal” number of errors in prediction. Anomaly likelihood accounts for this natural level of error.

scott · July 2, 2018, 6:53pm

k here should just be W and W is the number of recent anomaly scores to include in the historical distribution. It will be the entire history initially, but after enough records it will be the most recent scores in a recent history. Note that this is NOT the reestimation period, which is simply an optimization where the historical statistics are not recalculated every record. See the code here to see the difference between historicalWindowSize (W) and reestimationPeriod:

github.com

numenta/nupic/blob/50c5fd0dc94f2ffb205544ed11fe82ad5bb0de18/src/nupic/algorithms/anomaly_likelihood.py#L154-L155


historicWindowSize=8640,
reestimationPeriod=100):

We don’t usually change either value but it certainly can have an impact. Larger values of historicalWindowSize result in a slower adaptation to changes in the statistics. The reestimationPeriod shouldn’t be increased by much without negative impacts. You could lower it for minor benefit at the cost of processing time.

I’d recommend this paper for more up-to-date description of anomaly detection and formulae:

subutai · July 2, 2018, 6:56pm

Hi @sheiser1 - sorry for the delayed reply. I think it is best to look at this more recent paper where we tried to a bit more careful with the notation. Here’s a screenshot of the relevant section.

To answer your specific questions:

This is the mean of anomaly scores over a large window, the last W samples. In the code this is historicWindowSize, and defaults to about a month’s worth of data at 5 minute intervals.

Yes, but it is very short. Usually about 10 samples. It is averagingWindow, which is different from reestimationPeriod (the latter is just an optimization hack, and not that important).

Yes, k=W (fixed in the paper).

life_happy · October 12, 2019, 9:15am

hi all
I see the Anomaly Likelihood ( Lt) is calculated by the Q function . Does anyone know how this function is ( In a simplest way).

Thank you very much

Topic		Replies	Views
Understanding AnomalyLikelihood? Engineering	1	409	October 8, 2019
Please throw me a clue: finding anomaly _likelihood_ HTM.Java	2	1275	May 19, 2017
About Anomaly Detection Thresholding NuPIC anomaly-detection , question	7	2573	July 22, 2017
Anomaly Likelihood is always over 0.5 NuPIC	3	1316	April 20, 2018
Anomaly Likelihood seems wrong Education	8	518	July 17, 2020

Clarification on anomaly score and anomalylikelihood

Related topics