Strange anomaly likelihood plot

magarwal · March 29, 2017, 10:38pm

I am using code from here: https://github.com/subutai/nupic.subutai/tree/master/run_anomaly

Made my changes to run anomaly detection on my dataset:

The plot looks strange.
https://github.com/marutiagarwal/tempGitRepo/blob/master/nupic_anomaly/anomaly_graph.jpg
Although, the values in the output csv seem to be more legitimate.
https://github.com/marutiagarwal/tempGitRepo/blob/master/nupic_anomaly/fb_paid_post_like_hourly_out.csv
The output csv above also have some values as high as 0.9. But is is clearly not visible in the graph.
I generated modelParams using getScalarMetricWithTimeOfDayAnomalyParams()
But I am not sure how to select “n” and “w”?
https://github.com/marutiagarwal/tempGitRepo/blob/master/nupic_anomaly/model_params/fb_paid_post_like_hourly_model_params.py
Since I need to pass min and max values in the csv for generating model parameters and also in the modelParams again I need to write min and max value of the feature I wanna track. But what if the future data coming from the data-stream has values higher or lower than described above? Will the anomaly-detection continue to handle them without breaking?

Thanks.

magarwal · March 30, 2017, 3:41pm

Solved part (1). There were a lot a duplicates in my training data. Once i fixed that, everything looks fine. Used Pandas function pandas.DataFrame.drop_duplicates to address this.

Still waiting for answers to part (2) & (3).

rhyolight · March 30, 2017, 6:42pm

I’ll explain these things in a short video, using a visualization I used for the Scalar Encoder episode of HTM School.

In addition to increasing the n value, you might also try lowering the w value (but keep it an odd number).

magarwal · April 3, 2017, 6:12pm

How could I decide the best value of n and w automatically to encode a completely new scalar data stream?

rhyolight · April 3, 2017, 6:20pm

Does the new scalar data stream have a new min/max? You could use the min/max to identify a resolution for the RandomDistributedScalarEncoder, which is also explained in HTM School. Here is an example of how we use min/max to get a resolution:

github.com

numenta/nupic.workshop/blob/master/part-1-scalar-input/run_prediction.py#L36-L41


# RDSE - resolution calculation
valueEncoderParams = \
  modelParams["modelParams"]["sensorParams"]["encoders"]["value"]
numBuckets = float(valueEncoderParams.pop("numBuckets"))
resolution = max(0.001, (maxInput - minInput) / numBuckets)
valueEncoderParams["resolution"] = resolution

magarwal · April 5, 2017, 6:13pm

There is no [“value”] field in modelParams generated using getScalarMetricWithTimeOfDayAnomalyParams().

rhyolight · April 5, 2017, 6:20pm

No, the value is the name of the encoder in this case. It is referring to:

github.com

numenta/nupic.workshop/blob/master/part-1-scalar-input/model_params/model_params.json#L50-L56


"value": {
    "name": "value",
    "fieldname": "value",
    "numBuckets": 130.0,
    "seed": 42,
    "type": "RandomDistributedScalarEncoder"
}

magarwal · April 5, 2017, 6:26pm

Oh. OK. How do I compute n and w from this “resolution”? Also, how do we determine the value of above given “numBuckets” ?

rhyolight · April 5, 2017, 6:33pm

The RDSE docs say this:

The only required parameter is resolution, which determines the resolution of input values.

The numBuckets is used to compute the resolution in the example above because it is a little easier to reason about. @scott might have more to say here.

magarwal · April 5, 2017, 7:07pm

Now I don’t need to worry about n and w. The only parameters is resolution which greatly affects the amount of anomalies detected on my dataset. Could you suggest what are the reliable ways to determine “resolution”?

magarwal · April 7, 2017, 1:50pm

Any suggestions on this?

rhyolight · April 7, 2017, 4:33pm

The resolution depends on your data. How many continuous values do you want to be stored in each bucket? I think this is your resolution.

magarwal · April 7, 2017, 5:52pm

If I set resolution as the minimum distance between any 2 data samples, will that be fine?

rhyolight · April 7, 2017, 5:57pm

You probably want at least some overlap in the encodings for nearby values. Recommended reading: Encoding Data for HTM Systems.

Topic		Replies	Views
Weird plot for anomaly detection NuPIC anomaly-detection	26	1559	June 12, 2017
Issue Using Nupic Anomaly Output NuPIC	4	727	October 26, 2018
Anomaly detection Newbie NuPIC	3	816	October 25, 2017
Minimun set for anomaly detection NuPIC	4	790	June 29, 2017
Anomaly Detection - Poor results - Build issues or Tuning issues on Real Data NuPIC	1	397	June 7, 2020

Strange anomaly likelihood plot

Related Topics