Why the anomaly likelihood is so high for the repeated data pattern? HELP

white · March 30, 2018, 3:45am

Hi, everyone

I use nupic anomaly model to detect the anomaly point from sine data, which is generated in the following way by repeating many times

np.sin(np.linspace(0,3.14*2,100))

The picture below shows very high anomaly_score, where anomaly is computed as follows

from nupic.algorithms import anomaly_likelihood
anomaly_score = anomaly_likelihood.AnomalyLikelihood.anomalyProbability(...)

I really don’t understand for such a extremely simple data pattern, nupic still got very high anomaly likelihood after learning many times. Why?

Here is the model param used for anomaly detection in json format.

{
    "aggregationInfo": {
        "hours": 0,
        "microseconds": 0,
        "seconds": 0,
        "fields": [],
        "weeks": 0,
        "months": 0,
        "minutes": 0,
        "days": 0,
        "milliseconds": 0,
        "years": 0
    },
    "model": "HTMPrediction",
    "version": 1,
    "predictAheadTime": null,
    "modelParams": {
        "sensorParams": {
            "verbosity": 0,
            "sensorAutoReset": null,
            "encoders": {
                "value": {
                    "name": "value",
                    "resolution": 0.001,
                    "n": 400,
                    "seed": 50,
                    "fieldname": "value",
                    "w": 21,
                    "type": "RandomDistributedScalarEncoder"
                }
            }
        },
        "anomalyParams": {
            "anomalyCacheRecords": null,
            "autoDetectThreshold": null,
            "autoDetectWaitRecords": 5030
        },
        "clEnable": true,
        "spParams": {
            "columnCount": 2048,
            "synPermInactiveDec": 0.0005,
            "spatialImp": "cpp",
            "synPermConnected": 0.2,
            "seed": 1956,
            "numActiveColumnsPerInhArea": 40,
            "globalInhibition": 1,
            "inputWidth": 0,
            "spVerbosity": 0,
            "synPermActiveInc": 0.003,
            "potentialPct": 0.8,
            "boostStrength": 1
        },
        "trainSPNetOnlyIfRequested": false,
        "clParams": {
            "alpha": 0.035828933612158,
            "verbosity": 0,
            "steps": "1",
            "regionName": "SDRClassifierRegion"
        },
        "inferenceType": "TemporalAnomaly",
        "spEnable": true,
        "tmParams": {
            "columnCount": 2048,
            "activationThreshold": 13,
            "pamLength": 3,
            "cellsPerColumn": 32,
            "permanenceDec": 0.1,
            "minThreshold": 10,
            "inputWidth": 2048,
            "maxSynapsesPerSegment": 32,
            "outputType": "normal",
            "globalDecay": 0.0,
            "initialPerm": 0.21,
            "newSynapseCount": 20,
            "maxAge": 0,
            "maxSegmentsPerCell": 128,
            "permanenceInc": 0.1,
            "temporalImp": "cpp",
            "seed": 1960,
            "verbosity": 0
        },
        "tmEnable": true
    }

Looking forward to reply!

Thanks

rhyolight · March 30, 2018, 3:17pm

Let me suggest something… While you are generating your sine curve, after the model has learned it for like 100 cycles, start adding a random perturbation to the signal and see how it changes the anomaly score.

I don’t know what it will look like, I’m honestly curious. I hope it at least changes, if not increases. The anomaly score is a funny thing. It looks like your example is set up to make a change like this and plot it pretty easily. What does it look like?

white · March 31, 2018, 5:46am

I finished 100 cycles training and then add some noise as follows for the rest cycles training

noise = random.uniform([-0.1, 0.1])

The anomaly points show as red points in the top of the pic, and corresponding mark line in the bottom of the pic since noise is added. However it last just for a while.

During experiment, no sign shows that the anomaly score(or likelihood) decreases or never below 0.5.

rhyolight · March 31, 2018, 2:58pm

That’s good! I expected that type of behavior to happen. Remember that a sine wave is not a good pattern for an HTM to learn. You need more random noise. It will probably give you better anomaly scores if you added a little random jitter to the entire sine wave.

The anomaly score is pretty erratic. It always is. We always use an anomaly likelihood instead. There are instructions for using it in the API docs I linked above.

This is the 2nd time I’ve seen this. I’m going to investigate this.

white · March 31, 2018, 11:58pm

Actually, the anomaly score in the picture is anomaly likelihood in my case, which is calculated as below

from nupic.algorithms import anomaly_likelihood
anomaly_score = anomaly_likelihood.AnomalyLikelihood.anomalyProbability(…)

I use the concept of anomaly_score from https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py in my case

subutai · April 1, 2018, 11:08pm

Hi @white - are you plotting the log likelihood? We always use a 0.4 or 0.5 threshold on the log score. (See line 102 of the numenta_detector file.)

white · April 2, 2018, 2:03am

@subutai Let me guess. The real difference between using log_score and using anomaly score is that log_score makes the trend appear nicely, right? The log_score of 0.4 - 0.6 is equivalent to the likelihood of 0.9999 - 0.999999.

Here is the picture. From 10000 training, noise are added, and the log_score burst to a very high value, decrease afterwards

I wanna be verified that my experiment w.r.t. anomaly score behavior is right or normal as well as the model_param.

Could you give me some advises regarding configuration of model_param? Thanks a lot

subutai · April 5, 2018, 12:14am

Yes, that’s exactly right. It’s very hard to interpret plots and notice the difference between 0.999 and 0.9999.

It certainly looks a lot better! Once it sees the noise for a while, it will adapt and the anomaly likelihood will go back down. Your SP and TM params look the same as what we normally use. I don’t know whether the encoder resolution is ok or not, particularly for sine waves. Usually for real datasets we set resolution as follows:

resolution = max(minResolution,
                 (maxVal - minVal) / numBuckets
                )

where numBuckets=130 and minResolution=0.001

white · April 8, 2018, 1:10am

Here is my code for encoders

            padding = abs(max_input - min_input) * 0.2

            resolution = max(0.001, (max_input - min_input + 2*padding) / 130)
            encoders[f] = {
                'name': f,
                'fieldname': f,
                'type': 'RandomDistributedScalarEncoder',
                'seed': 42,
                'resolution': resolution,
                'w': 21,
                'n': 400,
            }

Regarding anomaly likelihood, I read a little bit from source code anomaly_likelihood.py. I am confused of the mechanism how does the value and timestamp are involved in the computation of anomaly likelihood?

    anomalyProbability = anomalyLikelihood.anomalyProbability(
        value, anomalyScore, timestamp)

Could you please give me some clues about that? Thanks

rhyolight · April 9, 2018, 2:38pm

Have you read these docs? Or watched this?

white · April 10, 2018, 1:28am

I have read online document many times including the link you mention. I don’t think the algorithm about the implementation of anomalyProbability will be discussed in detail.

The source code is the best document any way.

rhyolight · April 10, 2018, 3:26pm

Ok, here’s the source code and api docs.

Shahar · August 13, 2019, 9:33pm

Hi,
I am trying to generate this example to have a better understanding of anomaly detection. A few questions:

The configurable parameters are:

learningPeriod
estimationSamples
historicWindowSize - In the paper denoted by W
reestimationPeriod
What would be equivalent to the sliding window size (W’)? How can we change this parameter?

Why is a sine wave not a good pattern to learn?
When training with a sine wave without noise, and resetting the TM every period, I get the following results:

Sin01068×424 76.7 KB

When training with a sine wave without noise, without resetting the TM every period, I get the following results:

SinWoRST1068×424 82.4 KB

I expected the anomaly score to be constantly zero after one period but we still see the score going to 1 constantly. Why does this occur?

My helper command is

anomaly_history = AnomalyLikelihood( historicWindowSize = 250, reestimationPeriod = 40 )

anomalyScores.append( tm.anomaly )

Thank you!

rhyolight · August 13, 2019, 9:54pm

I think there are ambiguous representations within the TM for different cycles. Because the sine wave is not stochastic, every cycle looks exactly the same. The only ambiguity is which cycle within the longer pattern is it. If you’re resetting the TM after every cycle it it might help. I would also add some random jitter to the wave and see what happens.

Shahar · August 13, 2019, 10:06pm

Thank you!

When adding 10 spatial anomalies to the input (after training on noiseless samples and resetting the TM) I get the following peaks in the loglikelihood:

When changing the frequency of the wave after N/2 samples (length of sine wave is N), training the SP only with the first type of wave and resetting the TM after complete periods, I get the following results:

if (i % T == 0) & (i<(N/2)): # reset when period starts
    tm.reset()
elif (i % (4*T) == 0) & (i>=(N/2)): 
    tm.reset()

rhyolight · August 13, 2019, 10:08pm

Interesting, but add the noise constantly. Don’t train on a smooth signal. In fact, don’t train all. Keep learning on all the time.

Shahar · August 13, 2019, 10:35pm

If I thought I had a better understanding of how the anomaly scoring works… now I am really confused

When learning all the time on noisy data I got a pretty peaky likelihood. So I increased the resolution of the encoder to get a flatter likelihood curve :

But the anomaly score is pretty wild. How do you suggest to go forward and gain an intuitive understanding about the anomaly score? Which I think is equivalent to getting an intuitive understanding of the TM, right?
Would you completely disregard the anomaly score in anomaly detection?

rhyolight · August 13, 2019, 10:38pm

This looks better! Now introduce an anomaly into the signal and see if the log likelihood responds in an obvious way. An anomaly would be doubling the amplitude or changing the wavelength. Or even suddenly jumping out of cycle or repeating a value a bunch of times.

sheiser1 · August 13, 2019, 11:00pm

The anomaly likelihood works by comparing a small distribution of recent anomaly scores (going back ‘reestimationPeriod’ steps) to a big distribution of anomaly scores (going back ‘historicWindowSize’ steps). So the sliding window size for the big distribution is ‘historicWindowSize’ and the window size for the small one is ‘reestimationPeriod’.

When you reset the TM its a bit like wiping its short term memory, in that it no longer knows where it is in the sequence and has to rediscover it. If you keep resetting after every sequence it has to keep regaining its bearings, it’ll keep being surprised by certain transitions and the anomaly scores won’t settle down.

Since you’re not resetting there’s no limit to the length of sequence it can learn, so the anomaly score spikes get further and further between. For example lets take the sequence:

‘A,B,C,D,E,F,A,B,C,D,E,F,…’

If you reset the TM after each iteration of the base sequence (in the case ‘A,B,C,D,E,F’), then there will always be an anomaly spike when ‘A’ arrives. This is because the TM won’t learn the transition from ‘F’ to ‘A’ – since you reset after ‘F’. However with no reset it will learn that transition and the anomaly score spike will die down after several repetitions.

Now if the sequence were instead say:

‘A,B,C,D,E,F,A,B,C,D,E,F,1,2,3,A,B,C,D,E,F,A,B,C,D,E,F,1,2,3’

It would take longer to learn transition from ‘F’ to ‘1’ since it emerges over a longer time. This transition would produce an anomaly spike for a longer time than the transition ‘A’ to ‘B’, which is learned faster since it recurs more.

First is to make sure your know exactly what the anomaly score means (the proportion of columns which activated at time t that contained no predictive cells at time t-1). It’s a quantified answer to the question: ‘how’ much of what I am seeing did I except to see?

It follows that the anomaly score should generally be lower and less volatile in highly period (predictable) environments, while higher and more volatile in less period (noisier) environments.

With your noiseless sine wave example there are still anomaly spikes at times, but if you looked at a list of all the anomaly scores you’d see mostly ~0 values. In this case the system is entirely predictable so the anomaly scores flatline, but how long it takes for them to fully flatline depends on length and complexity of the recurring sequences.

With your noise-injected sine wave example, the anomaly score will never go totally flat because the sequence has some amount of randomness baked in. I would suggest you plot a smaller range of those anomaly scores too, maybe like 1000, and I think you’ll see many of them are closer to 0. The less noise you inject the more this distribution should lean towards 0.

I’d also say that these volatile anomaly score and anomaly likelihood values aren’t necessarily bad. Real data is very often noisy, non-stationary and just not well behaved anything like a sine wave, and the anomaly likelihood is designed to adapt to that. Though it looks like the anomaly likelihood is very high much of the time, there’s usually a very high threshold before actually declaring an anomaly (default is like 0.9999 I think), so there are still few false positives.

For sanity check I find the main params to worry about have to do with granularity - of the encoders and the data. I’d make sure that similar inputs are generating encoding vectors with appropriate amounts of overlap between them. So if the distribution ranges from ~0 to ~100, there should be almost total overlap between say 7 and 9, much between 7 and 12, little/none between 7 and 20 and none between 7 and 50. I’d test this first.

Then I’d make sure that the velocity of the data makes sense. For instance in the nupic hot gym example the patterns to be found are mostly daily and weekly (maybe some seasonal). Therefor it makes much more sense to sample every 15 minutes than every 15 seconds. If it were coming every 15 seconds it would take any algorithm much longer to learn a daily or weekly pattern - though it would also enable it to learn a minute-by-minute pattern which the coarser 15 minute sampling would miss.

Shahar · August 13, 2019, 11:03pm

For completion:

We can see that at the beginning of the anomalous sequence, and at the end, the likelihood is higher. In addition, the initial random peaks in the data do stand out in the likelihood.

Last few questions:

Why?

Would it be correct to say that the anomaly score is needed only for calculating the distribution but for detection, thresholding the likelihood is much more robust?

Topic		Replies	Views
Newbie question: How to get both anomaly score, anomaly likelihood and predictions NuPIC question	5	989	January 29, 2020
Weird plot for anomaly detection NuPIC anomaly-detection	26	1505	June 12, 2017
Understanding AnomalyLikelihood? Engineering	1	373	October 8, 2019
Anomaly Detection: Optimizing Parameters for Telemetry Data Implementations anomaly-detection , question , anomaly-likelihood	8	619	July 21, 2021
Why am I seeing lot of false positives? NuPIC	12	2391	June 22, 2016

Why the anomaly likelihood is so high for the repeated data pattern? HELP

Related Topics