Anomaly Likelihood is always over 0.5

I have been working for anomaly detection using nupic and found that anomaly likelihood is always over 0.5.

Here is a typical example of anomaly likelihood result.

This result is calculated by using the program based on this. https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym

Anomaly likelihood algorithm is found in this paper.

Real-Time Anomaly Detection for Streaming Analytics written by Subutai Ahmad and Scott Purdy.
https://arxiv.org/pdf/1607.02480.pdf

Raw anomaly score is defined as:

raw

Please note that ( 0 < St < 1)

Mean and variance are defined as follow:

mean

And then, the anomaly likelihood is defined as follows:

likelihood

Q-function the Gaussian tail probability.

q

Gaussian destribution
%E3%82%AF%E3%83%AA%E3%83%83%E3%83%97%E3%83%9C%E3%83%BC%E3%83%8901

myu is the mean of St. and St is greater than 0 and lesser than 1. That means myu is also greater than 0 and lesser than 1.

0< myu < 1

tid is also greater than 0 and lesser than 1.

Since anomaly situation rarely happens, tid > myu. That is minus > 0. And also (tid - myu) / sig > 0.

As a result q > 0.5 and anomaly likelihood L is always greater than 0.5

That’s why nupic anomaly likelihood results are always over 0.5.

Does anyone have any comments on above discussion?

Kiochi showed me this in January, and I replcated it with hot gym:

I don’t think this is affecting anomaly likelihood functionality, but it would be good to figure out if he’s right about the cause of this.

So I asked for more details about this around the office and came up with a response.

The anomaly likelihood value never falls below 0.5 by design. We use 1.0 – Q(anomaly_scores – mean) as the anomaly likelihood. In our case, values far to the right or far to the left of the mean are both treated as being more anomalous. We handle this explicitly in the code:

With this understanding, the likelihood should never be smaller than 0.5 and it is enforced in the code.

Of course, I have check the code and the definition of anomaly likelihood which is defined in the paper has been implemented.

What I’ m pointing here is “Likelihood” is a technical term is statistics. For example, please refer to the following page.

In actual deployment, we think the system should output anomaly likelihood so that users can change the threshold by themselves or give the threshold as a parameter. It’s really confusing for users who are familiar with statistics that likelihood is never smaller than 0.5. That’s also for me!!

I think nupic might use Q function incorrectly.

Central limit theorem should be considered.

1 Like