Why the anomaly likelihood is so high for the repeated data pattern? HELP


#1

Hi, everyone

I use nupic anomaly model to detect the anomaly point from sine data, which is generated in the following way by repeating many times

np.sin(np.linspace(0,3.14*2,100))

The picture below shows very high anomaly_score, where anomaly is computed as follows

from nupic.algorithms import anomaly_likelihood
anomaly_score = anomaly_likelihood.AnomalyLikelihood.anomalyProbability(...)

I really don’t understand for such a extremely simple data pattern, nupic still got very high anomaly likelihood after learning many times. Why?

Here is the model param used for anomaly detection in json format.

{
    "aggregationInfo": {
        "hours": 0,
        "microseconds": 0,
        "seconds": 0,
        "fields": [],
        "weeks": 0,
        "months": 0,
        "minutes": 0,
        "days": 0,
        "milliseconds": 0,
        "years": 0
    },
    "model": "HTMPrediction",
    "version": 1,
    "predictAheadTime": null,
    "modelParams": {
        "sensorParams": {
            "verbosity": 0,
            "sensorAutoReset": null,
            "encoders": {
                "value": {
                    "name": "value",
                    "resolution": 0.001,
                    "n": 400,
                    "seed": 50,
                    "fieldname": "value",
                    "w": 21,
                    "type": "RandomDistributedScalarEncoder"
                }
            }
        },
        "anomalyParams": {
            "anomalyCacheRecords": null,
            "autoDetectThreshold": null,
            "autoDetectWaitRecords": 5030
        },
        "clEnable": true,
        "spParams": {
            "columnCount": 2048,
            "synPermInactiveDec": 0.0005,
            "spatialImp": "cpp",
            "synPermConnected": 0.2,
            "seed": 1956,
            "numActiveColumnsPerInhArea": 40,
            "globalInhibition": 1,
            "inputWidth": 0,
            "spVerbosity": 0,
            "synPermActiveInc": 0.003,
            "potentialPct": 0.8,
            "boostStrength": 1
        },
        "trainSPNetOnlyIfRequested": false,
        "clParams": {
            "alpha": 0.035828933612158,
            "verbosity": 0,
            "steps": "1",
            "regionName": "SDRClassifierRegion"
        },
        "inferenceType": "TemporalAnomaly",
        "spEnable": true,
        "tmParams": {
            "columnCount": 2048,
            "activationThreshold": 13,
            "pamLength": 3,
            "cellsPerColumn": 32,
            "permanenceDec": 0.1,
            "minThreshold": 10,
            "inputWidth": 2048,
            "maxSynapsesPerSegment": 32,
            "outputType": "normal",
            "globalDecay": 0.0,
            "initialPerm": 0.21,
            "newSynapseCount": 20,
            "maxAge": 0,
            "maxSegmentsPerCell": 128,
            "permanenceInc": 0.1,
            "temporalImp": "cpp",
            "seed": 1960,
            "verbosity": 0
        },
        "tmEnable": true
    }

Looking forward to reply!

Thanks


#2

Let me suggest something… While you are generating your sine curve, after the model has learned it for like 100 cycles, start adding a random perturbation to the signal and see how it changes the anomaly score.

I don’t know what it will look like, I’m honestly curious. I hope it at least changes, if not increases. The anomaly score is a funny thing. It looks like your example is set up to make a change like this and plot it pretty easily. What does it look like?


#3

I finished 100 cycles training and then add some noise as follows for the rest cycles training

noise = random.uniform([-0.1, 0.1])

The anomaly points show as red points in the top of the pic, and corresponding mark line in the bottom of the pic since noise is added. However it last just for a while.

During experiment, no sign shows that the anomaly score(or likelihood) decreases or never below 0.5.


#4

That’s good! I expected that type of behavior to happen. Remember that a sine wave is not a good pattern for an HTM to learn. You need more random noise. It will probably give you better anomaly scores if you added a little random jitter to the entire sine wave.

The anomaly score is pretty erratic. It always is. We always use an anomaly likelihood instead. There are instructions for using it in the API docs I linked above.

This is the 2nd time I’ve seen this. I’m going to investigate this.


#5

Actually, the anomaly score in the picture is anomaly likelihood in my case, which is calculated as below

from nupic.algorithms import anomaly_likelihood
anomaly_score = anomaly_likelihood.AnomalyLikelihood.anomalyProbability(…)

I use the concept of anomaly_score from https://github.com/numenta/NAB/blob/master/nab/detectors/numenta/numenta_detector.py in my case


#6

Hi @white - are you plotting the log likelihood? We always use a 0.4 or 0.5 threshold on the log score. (See line 102 of the numenta_detector file.)


#7

@subutai Let me guess. The real difference between using log_score and using anomaly score is that log_score makes the trend appear nicely, right? The log_score of 0.4 - 0.6 is equivalent to the likelihood of 0.9999 - 0.999999.

Here is the picture. From 10000 training, noise are added, and the log_score burst to a very high value, decrease afterwards

I wanna be verified that my experiment w.r.t. anomaly score behavior is right or normal as well as the model_param.

Could you give me some advises regarding configuration of model_param? Thanks a lot


#8

Yes, that’s exactly right. It’s very hard to interpret plots and notice the difference between 0.999 and 0.9999.

It certainly looks a lot better! Once it sees the noise for a while, it will adapt and the anomaly likelihood will go back down. Your SP and TM params look the same as what we normally use. I don’t know whether the encoder resolution is ok or not, particularly for sine waves. Usually for real datasets we set resolution as follows:

resolution = max(minResolution,
                 (maxVal - minVal) / numBuckets
                )

where numBuckets=130 and minResolution=0.001


#9

Here is my code for encoders

            padding = abs(max_input - min_input) * 0.2

            resolution = max(0.001, (max_input - min_input + 2*padding) / 130)
            encoders[f] = {
                'name': f,
                'fieldname': f,
                'type': 'RandomDistributedScalarEncoder',
                'seed': 42,
                'resolution': resolution,
                'w': 21,
                'n': 400,
            }

Regarding anomaly likelihood, I read a little bit from source code anomaly_likelihood.py. I am confused of the mechanism how does the value and timestamp are involved in the computation of anomaly likelihood?

    anomalyProbability = anomalyLikelihood.anomalyProbability(
        value, anomalyScore, timestamp)

Could you please give me some clues about that? Thanks


#10

Have you read these docs? Or watched this?


#11

I have read online document many times including the link you mention. I don’t think the algorithm about the implementation of anomalyProbability will be discussed in detail.

The source code is the best document any way.


#12

Ok, here’s the source code and api docs.