Weird plot for anomaly detection

iankurgarg · June 7, 2017, 3:33pm

This plot is for the number of twitter posts over time of a particular user. The plot of anomaly likelihood and anomaly score seems very counter-intuitive as the periods of time where the number of posts generate a peak in the plot seem to be periods for which anomaly likelihood is minimum.
Min/Max value for my data is around 0-80000. But most of the data has values close to 0-1000.
The encoder resolution I am using is 1.0-2.0
What could be the reason for such behavior of the nupic anomaly detection for this data ?

rhyolight · June 7, 2017, 4:06pm

What do the rest of your model parameters look like?

iankurgarg · June 7, 2017, 4:08pm

maxBoost for spatial params is 0.1
global inhibition is 0
potentialPct for spatial params is 0.85
temporal params permanence is 0.15
remaining params are default.
encoder is RandomDistributedScalarEncoder

rhyolight · June 7, 2017, 4:10pm

What about time encoders?

iankurgarg · June 7, 2017, 4:18pm

These are the model parameters I am using currently:

'modelParams': {u'anomalyParams': {u'anomalyCacheRecords': None,
                                     u'autoDetectThreshold': None,
                                     u'autoDetectWaitRecords': 5030},
                  u'clParams': {u'alpha': 0.035828933612158,
                                u'regionName': u'SDRClassifierRegion',
                                u'steps': u'1',
                                u'verbosity': 0},
                  u'inferenceType': u'TemporalAnomaly',
                  u'sensorParams': {u'encoders': {'_classifierInput': {'classifierOnly': True,
                                                                       'fieldname': 'post_count',
                                                                       'name': '_classifierInput',
                                                                       'resolution': 1.0,
                                                                       'type': 'RandomDistributedScalarEncoder'},
                                                  'post_count': {'fieldname': 'post_count',
                                                                 'name': 'post_count',
                                                                 'resolution': 1.0,
                                                                 'type': 'RandomDistributedScalarEncoder'}},
                                    u'sensorAutoReset': None,
                                    u'verbosity': 0},
                  u'spEnable': True,
                  u'spParams': {u'columnCount': 2048,
                                u'globalInhibition': 0,
                                u'inputWidth': 0,
                                'maxBoost': 0.1,
                                u'numActiveColumnsPerInhArea': 40,
                                u'potentialPct': 0.85,
                                u'seed': 1956,
                                u'spVerbosity': 0,
                                u'spatialImp': u'cpp',
                                u'synPermActiveInc': 0.02,
                                u'synPermConnected': 0.2,
                                u'synPermInactiveDec': 0.005},
                  u'tpEnable': True,
                  u'tpParams': {u'activationThreshold': 13,
                                u'cellsPerColumn': 32,
                                u'columnCount': 2048,
                                u'globalDecay': 0.0,
                                u'initialPerm': 0.21,
                                u'inputWidth': 2048,
                                u'maxAge': 0,
                                u'maxSegmentsPerCell': 128,
                                u'maxSynapsesPerSegment': 32,
                                u'minThreshold': 10,
                                u'newSynapseCount': 20,
                                u'outputType': u'normal',
                                u'pamLength': 3,
                                u'permanenceDec': 0.15,
                                u'permanenceInc': 0.15,
                                u'seed': 1960,
                                u'temporalImp': u'cpp',
                                u'verbosity': 0},
                  u'trainSPNetOnlyIfRequested': False},
 u'predictAheadTime': None,
 u'version': 1}

iankurgarg · June 7, 2017, 4:31pm

Also, while initializing the model params, it requires input min and inputMax values as input. But what happens when the future values of the input are well outside that limit? For example … in this case, initially the input range is around 0 and 1000 but later on changes to 80,000. Are the model params updated as input changes ?

rhyolight · June 7, 2017, 4:57pm

How are the tweet counts aggregated? What does the tweet count represent? Only one user’s tweets? That’s a lot of tweets, even if this is a daily aggregation.

Ideally, the aggregation would be small enough that you could incorporate time of day into the encoding. So a 15 minute aggregation is usually good.

I think I’ve been confused about some of the things I’ve told you about RDSE resolution… standby I will probably have corrections .

iankurgarg · June 7, 2017, 5:01pm

Right now the aggregation is hourly (It may also be changed to daily or 15 mins). Actually it is not for a single user. More like tweets related to a particular topic from multiple users.
How do I include time of day into the encoding ?

iankurgarg · June 7, 2017, 5:56pm

I did figure out how to encode the date and time… the parameters for that are as follows:

time of day : (21, 6)
day of week: (21, 3)
season: (21, 4)

Now the plot that I get is as follows:

It does look much better than the previous one. But I observed that after the model processes a lot of data, the anomaly score becomes very low after a while. I think that is based on the date encoder parameters I’ve used. How to decide these parameters ?

rhyolight · June 7, 2017, 6:05pm

In the model params you pasted earlier, your spParams.inputWidth is 0, which is not right. That should be the number of bits in the encoding. How did you come up with your model params?

rhyolight · June 7, 2017, 6:08pm

To find out what spParams.inputWidth should be, I think you might be able to get this number by calling:

model._getEncoder().getWidth()

iankurgarg · June 7, 2017, 6:09pm

I am using the getScalarMetricWithTimeOfDayAnomalyParams function from the nupic.frameworks.opf.common_models.cluster_params
And then configuring some of the parameters as needed. I didn’t change the inputWidth. That was returned by this function. It is 400 as given by model._getEncoder().getWidth()

rhyolight · June 7, 2017, 6:11pm

There may be a problem with that function, or maybe you are using it wrong. Here is the code example from the docs:

from nupic.frameworks.opf.model_factory import ModelFactory
from nupic.frameworks.opf.common_models.cluster_params import (
  getScalarMetricWithTimeOfDayAnomalyParams)

params = getScalarMetricWithTimeOfDayAnomalyParams(
  metricData=[0],
  tmImplementation="cpp",
  minVal=0.0,
  maxVal=100.0)

model = ModelFactory.create(modelConfig=params["modelConfig"])
model.enableLearning()
model.enableInference(params["inferenceArgs"])

iankurgarg · June 7, 2017, 6:17pm

That is exactly how I am using it. Only difference is I am modifying some parameters from parmas before I call [quote=“rhyolight, post:13, topic:2395”]
ModelFactory.create(modelConfig=params[“modelConfig”])
[/quote]

rhyolight · June 7, 2017, 6:19pm

I assume you are using maxval=50000.0 (or whatever the max actually is)?

iankurgarg · June 7, 2017, 6:21pm

Yes, I am using the max value and min value of the actual data for that.

rhyolight · June 7, 2017, 8:08pm

Then it returns you params with included datetime encoder configurations, right? Your first model params did not have those. You should use the datetime encoder configurations the function returns.

iankurgarg · June 7, 2017, 8:37pm

I tried with the default datetime encoder parameters as returned by the function. I get the plot as follows:

It still looks like it would label a lot of data points as anomalies.

rhyolight · June 7, 2017, 8:43pm

The anomaly likelihood levels looks much better. Now you can flag anomalies by setting a 0.9999 threshold on the anomaly likelihood. Adjust this value for higher/lower resolution on the anomalies.

It looks like it is finding things that are not directly attributed to the spikes in tweets. It would be interesting to see some of these plots closer up with the dates displayed.

iankurgarg · June 7, 2017, 8:57pm

In this case, the anomaly likelihood values seem to be pretty high, but that can change over time. In an earlier signal, I was getting significantly lower anomaly likelihood values. In that case, I would need to select a lower threshold value. If we do not know the range of the input data before hand, or if it probable that the range of the input stream can change significantly over time, in that case would the same threshold for anomaly likelihood work for the complete time series ?

Topic		Replies	Views
Strange anomaly likelihood plot NuPIC	13	988	April 7, 2017
Why the anomaly likelihood is so high for the repeated data pattern? HELP NuPIC	23	1605	August 13, 2019
TemporalAnomaly detection questions NuPIC	4	914	January 9, 2017
Newbie question: How to get both anomaly score, anomaly likelihood and predictions NuPIC question	5	1078	January 29, 2020
Anomaly detection Newbie NuPIC	3	816	October 25, 2017

Weird plot for anomaly detection

Related topics