Anomaly Likelihood is always over 0.5

koichi · April 19, 2018, 11:34am

I have been working for anomaly detection using nupic and found that anomaly likelihood is always over 0.5.

Here is a typical example of anomaly likelihood result.

This result is calculated by using the program based on this. https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym

Anomaly likelihood algorithm is found in this paper.

Real-Time Anomaly Detection for Streaming Analytics written by Subutai Ahmad and Scott Purdy.
https://arxiv.org/pdf/1607.02480.pdf

Raw anomaly score is defined as:

raw

Please note that ( 0 < St < 1)

Mean and variance are defined as follow:

mean

And then, the anomaly likelihood is defined as follows:

likelihood

Q-function the Gaussian tail probability.

Gaussian destribution
%E3%82%AF%E3%83%AA%E3%83%83%E3%83%97%E3%83%9C%E3%83%BC%E3%83%8901

is the mean of St. and St is greater than 0 and lesser than 1. That means is also greater than 0 and lesser than 1.

0< < 1

is also greater than 0 and lesser than 1.

Since anomaly situation rarely happens, > . That is > 0. And also ( - ) / > 0.

As a result > 0.5 and anomaly likelihood is always greater than 0.5

That’s why nupic anomaly likelihood results are always over 0.5.

Does anyone have any comments on above discussion?

rhyolight · April 19, 2018, 1:46pm

Kiochi showed me this in January, and I replcated it with hot gym:

I don’t think this is affecting anomaly likelihood functionality, but it would be good to figure out if he’s right about the cause of this.

rhyolight · April 19, 2018, 4:09pm

So I asked for more details about this around the office and came up with a response.

The anomaly likelihood value never falls below 0.5 by design. We use 1.0 – Q(anomaly_scores – mean) as the anomaly likelihood. In our case, values far to the right or far to the left of the mean are both treated as being more anomalous. We handle this explicitly in the code:

github.com

numenta/nupic/blob/fcaea0f0cf5fc74b930a45f138279c654f870a80/src/nupic/algorithms/anomaly_likelihood.py#L742-L763


def tailProbability(x, distributionParams):
"""
Given the normal distribution specified by the mean and standard deviation
in distributionParams, return the probability of getting samples further
from the mean. For values above the mean, this is the probability of getting
samples > x and for values below the mean, the probability of getting
samples < x. This is the Q-function: the tail probability of the normal distribution.


:param distributionParams: dict with 'mean' and 'stdev' of the distribution
"""
if "mean" not in distributionParams or "stdev" not in distributionParams:
  raise RuntimeError("Insufficient parameters to specify the distribution.")


if x < distributionParams["mean"]:
  # Gaussian is symmetrical around mean, so flip to get the tail probability
  xp = 2 * distributionParams["mean"] - x
  return tailProbability(xp, distributionParams)


# Calculate the Q function with the complementary error function, explained
# here: http://www.gaussianwaves.com/2012/07/q-function-and-error-functions

This file has been truncated. show original

With this understanding, the likelihood should never be smaller than 0.5 and it is enforced in the code.

koichi · April 20, 2018, 12:52am

Of course, I have check the code and the definition of anomaly likelihood which is defined in the paper has been implemented.

What I’ m pointing here is “Likelihood” is a technical term is statistics. For example, please refer to the following page.

In actual deployment, we think the system should output anomaly likelihood so that users can change the threshold by themselves or give the threshold as a parameter. It’s really confusing for users who are familiar with statistics that likelihood is never smaller than 0.5. That’s also for me!!

I think nupic might use Q function incorrectly.

Central limit theorem should be considered.

Topic		Replies	Views
Anomaly likelihood problem NuPIC	0	387	April 10, 2020
About Anomaly Detection Thresholding NuPIC anomaly-detection , question	7	2570	July 22, 2017
Newbie question: How to get both anomaly score, anomaly likelihood and predictions NuPIC question	5	1080	January 29, 2020
Understanding AnomalyLikelihood? Engineering	1	408	October 8, 2019
Why the anomaly likelihood is so high for the repeated data pattern? HELP NuPIC	23	1610	August 13, 2019

Anomaly Likelihood is always over 0.5

Related topics