Why the anomaly likelihood is so high for the repeated data pattern? HELP

Hi, everyone

I use nupic anomaly model to detect the anomaly point from sine data, which is generated in the following way by repeating many times

np.sin(np.linspace(0,3.14*2,100))

The picture below shows very high anomaly_score, where anomaly is computed as follows

from nupic.algorithms import anomaly_likelihood
anomaly_score = anomaly_likelihood.AnomalyLikelihood.anomalyProbability(...)

I really don’t understand for such a extremely simple data pattern, nupic still got very high anomaly likelihood after learning many times. Why?

Here is the model param used for anomaly detection in json format.

{
    "aggregationInfo": {
        "hours": 0,
        "microseconds": 0,
        "seconds": 0,
        "fields": [],
        "weeks": 0,
        "months": 0,
        "minutes": 0,
        "days": 0,
        "milliseconds": 0,
        "years": 0
    },
    "model": "HTMPrediction",
    "version": 1,
    "predictAheadTime": null,
    "modelParams": {
        "sensorParams": {
            "verbosity": 0,
            "sensorAutoReset": null,
            "encoders": {
                "value": {
                    "name": "value",
                    "resolution": 0.001,
                    "n": 400,
                    "seed": 50,
                    "fieldname": "value",
                    "w": 21,
                    "type": "RandomDistributedScalarEncoder"
                }
            }
        },
        "anomalyParams": {
            "anomalyCacheRecords": null,
            "autoDetectThreshold": null,
            "autoDetectWaitRecords": 5030
        },
        "clEnable": true,
        "spParams": {
            "columnCount": 2048,
            "synPermInactiveDec": 0.0005,
            "spatialImp": "cpp",
            "synPermConnected": 0.2,
            "seed": 1956,
            "numActiveColumnsPerInhArea": 40,
            "globalInhibition": 1,
            "inputWidth": 0,
            "spVerbosity": 0,
            "synPermActiveInc": 0.003,
            "potentialPct": 0.8,
            "boostStrength": 1
        },
        "trainSPNetOnlyIfRequested": false,
        "clParams": {
            "alpha": 0.035828933612158,
            "verbosity": 0,
            "steps": "1",
            "regionName": "SDRClassifierRegion"
        },
        "inferenceType": "TemporalAnomaly",
        "spEnable": true,
        "tmParams": {
            "columnCount": 2048,
            "activationThreshold": 13,
            "pamLength": 3,
            "cellsPerColumn": 32,
            "permanenceDec": 0.1,
            "minThreshold": 10,
            "inputWidth": 2048,
            "maxSynapsesPerSegment": 32,
            "outputType": "normal",
            "globalDecay": 0.0,
            "initialPerm": 0.21,
            "newSynapseCount": 20,
            "maxAge": 0,
            "maxSegmentsPerCell": 128,
            "permanenceInc": 0.1,
            "temporalImp": "cpp",
            "seed": 1960,
            "verbosity": 0
        },
        "tmEnable": true
    }

Looking forward to reply!

Thanks

Let me suggest something… While you are generating your sine curve, after the model has learned it for like 100 cycles, start adding a random perturbation to the signal and see how it changes the anomaly score.

I don’t know what it will look like, I’m honestly curious. I hope it at least changes, if not increases. The anomaly score is a funny thing. It looks like your example is set up to make a change like this and plot it pretty easily. What does it look like?

I finished 100 cycles training and then add some noise as follows for the rest cycles training

noise = random.uniform([-0.1, 0.1])

The anomaly points show as red points in the top of the pic, and corresponding mark line in the bottom of the pic since noise is added. However it last just for a while.

During experiment, no sign shows that the anomaly score(or likelihood) decreases or never below 0.5.

That’s good! I expected that type of behavior to happen. Remember that a sine wave is not a good pattern for an HTM to learn. You need more random noise. It will probably give you better anomaly scores if you added a little random jitter to the entire sine wave.

The anomaly score is pretty erratic. It always is. We always use an anomaly likelihood instead. There are instructions for using it in the API docs I linked above.

This is the 2nd time I’ve seen this. I’m going to investigate this.

Actually, the anomaly score in the picture is anomaly likelihood in my case, which is calculated as below

from nupic.algorithms import anomaly_likelihood
anomaly_score = anomaly_likelihood.AnomalyLikelihood.anomalyProbability(…)

I use the concept of anomaly_score from https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py in my case

Hi @white - are you plotting the log likelihood? We always use a 0.4 or 0.5 threshold on the log score. (See line 102 of the numenta_detector file.)

@subutai Let me guess. The real difference between using log_score and using anomaly score is that log_score makes the trend appear nicely, right? The log_score of 0.4 - 0.6 is equivalent to the likelihood of 0.9999 - 0.999999.

Here is the picture. From 10000 training, noise are added, and the log_score burst to a very high value, decrease afterwards

I wanna be verified that my experiment w.r.t. anomaly score behavior is right or normal as well as the model_param.

Could you give me some advises regarding configuration of model_param? Thanks a lot

Yes, that’s exactly right. It’s very hard to interpret plots and notice the difference between 0.999 and 0.9999.

It certainly looks a lot better! Once it sees the noise for a while, it will adapt and the anomaly likelihood will go back down. Your SP and TM params look the same as what we normally use. I don’t know whether the encoder resolution is ok or not, particularly for sine waves. Usually for real datasets we set resolution as follows:

resolution = max(minResolution,
                 (maxVal - minVal) / numBuckets
                )

where numBuckets=130 and minResolution=0.001

Here is my code for encoders

            padding = abs(max_input - min_input) * 0.2

            resolution = max(0.001, (max_input - min_input + 2*padding) / 130)
            encoders[f] = {
                'name': f,
                'fieldname': f,
                'type': 'RandomDistributedScalarEncoder',
                'seed': 42,
                'resolution': resolution,
                'w': 21,
                'n': 400,
            }

Regarding anomaly likelihood, I read a little bit from source code anomaly_likelihood.py. I am confused of the mechanism how does the value and timestamp are involved in the computation of anomaly likelihood?

    anomalyProbability = anomalyLikelihood.anomalyProbability(
        value, anomalyScore, timestamp)

Could you please give me some clues about that? Thanks

Have you read these docs? Or watched this?

I have read online document many times including the link you mention. I don’t think the algorithm about the implementation of anomalyProbability will be discussed in detail.

The source code is the best document any way.

Ok, here’s the source code and api docs.

Hi,
I am trying to generate this example to have a better understanding of anomaly detection. A few questions:

  1. The configurable parameters are:
  • learningPeriod
  • estimationSamples
  • historicWindowSize - In the paper denoted by W
  • reestimationPeriod
    What would be equivalent to the sliding window size (W’)? How can we change this parameter?
  1. Why is a sine wave not a good pattern to learn?
  2. When training with a sine wave without noise, and resetting the TM every period, I get the following results:

    When training with a sine wave without noise, without resetting the TM every period, I get the following results:

    I expected the anomaly score to be constantly zero after one period but we still see the score going to 1 constantly. Why does this occur?
  • My helper command is

anomaly_history = AnomalyLikelihood( historicWindowSize = 250, reestimationPeriod = 40 )

anomalyScores.append( tm.anomaly )

Thank you!

1 Like

I think there are ambiguous representations within the TM for different cycles. Because the sine wave is not stochastic, every cycle looks exactly the same. The only ambiguity is which cycle within the longer pattern is it. If you’re resetting the TM after every cycle it it might help. I would also add some random jitter to the wave and see what happens.

1 Like

Thank you!

When adding 10 spatial anomalies to the input (after training on noiseless samples and resetting the TM) I get the following peaks in the loglikelihood:

When changing the frequency of the wave after N/2 samples (length of sine wave is N), training the SP only with the first type of wave and resetting the TM after complete periods, I get the following results:

if (i % T == 0) & (i<(N/2)): # reset when period starts
    tm.reset()
elif (i % (4*T) == 0) & (i>=(N/2)): 
    tm.reset()

2 Likes

Interesting, but add the noise constantly. Don’t train on a smooth signal. In fact, don’t train all. Keep learning on all the time.

2 Likes

If I thought I had a better understanding of how the anomaly scoring works… now I am really confused

When learning all the time on noisy data I got a pretty peaky likelihood. So I increased the resolution of the encoder to get a flatter likelihood curve :


But the anomaly score is pretty wild. How do you suggest to go forward and gain an intuitive understanding about the anomaly score? Which I think is equivalent to getting an intuitive understanding of the TM, right?
Would you completely disregard the anomaly score in anomaly detection?

2 Likes

This looks better! Now introduce an anomaly into the signal and see if the log likelihood responds in an obvious way. An anomaly would be doubling the amplitude or changing the wavelength. Or even suddenly jumping out of cycle or repeating a value a bunch of times.

2 Likes

The anomaly likelihood works by comparing a small distribution of recent anomaly scores (going back ‘reestimationPeriod’ steps) to a big distribution of anomaly scores (going back ‘historicWindowSize’ steps). So the sliding window size for the big distribution is ‘historicWindowSize’ and the window size for the small one is ‘reestimationPeriod’.

When you reset the TM its a bit like wiping its short term memory, in that it no longer knows where it is in the sequence and has to rediscover it. If you keep resetting after every sequence it has to keep regaining its bearings, it’ll keep being surprised by certain transitions and the anomaly scores won’t settle down.

Since you’re not resetting there’s no limit to the length of sequence it can learn, so the anomaly score spikes get further and further between. For example lets take the sequence:

‘A,B,C,D,E,F,A,B,C,D,E,F,…’

If you reset the TM after each iteration of the base sequence (in the case ‘A,B,C,D,E,F’), then there will always be an anomaly spike when ‘A’ arrives. This is because the TM won’t learn the transition from ‘F’ to ‘A’ – since you reset after ‘F’. However with no reset it will learn that transition and the anomaly score spike will die down after several repetitions.

Now if the sequence were instead say:

‘A,B,C,D,E,F,A,B,C,D,E,F,1,2,3,A,B,C,D,E,F,A,B,C,D,E,F,1,2,3

It would take longer to learn transition from ‘F’ to ‘1’ since it emerges over a longer time. This transition would produce an anomaly spike for a longer time than the transition ‘A’ to ‘B’, which is learned faster since it recurs more.

First is to make sure your know exactly what the anomaly score means (the proportion of columns which activated at time t that contained no predictive cells at time t-1). It’s a quantified answer to the question: ‘how’ much of what I am seeing did I except to see?

It follows that the anomaly score should generally be lower and less volatile in highly period (predictable) environments, while higher and more volatile in less period (noisier) environments.

With your noiseless sine wave example there are still anomaly spikes at times, but if you looked at a list of all the anomaly scores you’d see mostly ~0 values. In this case the system is entirely predictable so the anomaly scores flatline, but how long it takes for them to fully flatline depends on length and complexity of the recurring sequences.

With your noise-injected sine wave example, the anomaly score will never go totally flat because the sequence has some amount of randomness baked in. I would suggest you plot a smaller range of those anomaly scores too, maybe like 1000, and I think you’ll see many of them are closer to 0. The less noise you inject the more this distribution should lean towards 0.

I’d also say that these volatile anomaly score and anomaly likelihood values aren’t necessarily bad. Real data is very often noisy, non-stationary and just not well behaved anything like a sine wave, and the anomaly likelihood is designed to adapt to that. Though it looks like the anomaly likelihood is very high much of the time, there’s usually a very high threshold before actually declaring an anomaly (default is like 0.9999 I think), so there are still few false positives.

For sanity check I find the main params to worry about have to do with granularity - of the encoders and the data. I’d make sure that similar inputs are generating encoding vectors with appropriate amounts of overlap between them. So if the distribution ranges from ~0 to ~100, there should be almost total overlap between say 7 and 9, much between 7 and 12, little/none between 7 and 20 and none between 7 and 50. I’d test this first.

Then I’d make sure that the velocity of the data makes sense. For instance in the nupic hot gym example the patterns to be found are mostly daily and weekly (maybe some seasonal). Therefor it makes much more sense to sample every 15 minutes than every 15 seconds. If it were coming every 15 seconds it would take any algorithm much longer to learn a daily or weekly pattern - though it would also enable it to learn a minute-by-minute pattern which the coarser 15 minute sampling would miss.

1 Like

For completion:


We can see that at the beginning of the anomalous sequence, and at the end, the likelihood is higher. In addition, the initial random peaks in the data do stand out in the likelihood.

Last few questions:

  1. Why?
  1. Would it be correct to say that the anomaly score is needed only for calculating the distribution but for detection, thresholding the likelihood is much more robust?
2 Likes