# Anomaly Likelihood is always over 0.5

I have been working for anomaly detection using nupic and found that anomaly likelihood is always over 0.5.

Here is a typical example of anomaly likelihood result.

This result is calculated by using the program based on this. https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym

Anomaly likelihood algorithm is found in this paper.

Real-Time Anomaly Detection for Streaming Analytics written by Subutai Ahmad and Scott Purdy.
https://arxiv.org/pdf/1607.02480.pdf

Raw anomaly score is defined as:

Please note that ( 0 < St < 1)

Mean and variance are defined as follow:

And then, the anomaly likelihood is defined as follows:

Q-function the Gaussian tail probability.

Gaussian destribution

is the mean of St. and St is greater than 0 and lesser than 1. That means is also greater than 0 and lesser than 1.

0< < 1

is also greater than 0 and lesser than 1.

Since anomaly situation rarely happens, > . That is > 0. And also ( - ) / > 0.

As a result > 0.5 and anomaly likelihood is always greater than 0.5

Thatâ€™s why nupic anomaly likelihood results are always over 0.5.

Does anyone have any comments on above discussion?

Kiochi showed me this in January, and I replcated it with hot gym:

I donâ€™t think this is affecting anomaly likelihood functionality, but it would be good to figure out if heâ€™s right about the cause of this.

The anomaly likelihood value never falls below 0.5 by design. We use 1.0 â€“ Q(anomaly_scores â€“ mean) as the anomaly likelihood. In our case, values far to the right or far to the left of the mean are both treated as being more anomalous. We handle this explicitly in the code:

With this understanding, the likelihood should never be smaller than 0.5 and it is enforced in the code.

Of course, I have check the code and the definition of anomaly likelihood which is defined in the paper has been implemented.

What Iâ€™ m pointing here is â€śLikelihoodâ€ť is a technical term is statistics. For example, please refer to the following page.

In actual deployment, we think the system should output anomaly likelihood so that users can change the threshold by themselves or give the threshold as a parameter. Itâ€™s really confusing for users who are familiar with statistics that likelihood is never smaller than 0.5. Thatâ€™s also for me!!

I think nupic might use Q function incorrectly.

Central limit theorem should be considered.

1 Like