Anomaly score always 0

white · March 13, 2018, 3:32am

I take nupic to predict the duration of function call. The following graph shows that the blue line stands for actual duration value, the green one means predicted duration value, and the red one the difference between actual and predicted value. It is obvious that the difference varies too much. However, the anomaly score (from nupic) line is always zero.

Can anyone explain why the difference seems not relevant to anomaly score? or it is the way it is? Thanks

rhyolight · March 13, 2018, 4:47am

Something is wrong if the anomaly score is always 0. Show your model Params and code?

white · March 13, 2018, 7:33am

Sorry to paste the plain text here. It is not allowed to upload python file in this website.

The swarm file is as follows:

SWARM_DESCRIPTION = {
  "includedFields": [
    {
      "fieldName": "timestamp",
      "fieldType": "datetime"
    },
    {
      "fieldName": "duration",
      "fieldType": "float",
      "maxValue": 300.0,
      "minValue": 1.0
    }
  ],
  "streamDef": {
    "info": "duration",
    "version": 1,
    "streams": [
      {
        "info": "duration",
        "source": "file://../data/aaa.csv",
        "columns": [
          "*"
        ]
      }
    ]
  },
  "inferenceType": "TemporalAnomaly",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "duration"
  },
  "swarmSize": "medium"
}

The model Params file, which is generated from swarm file above, is like this:

MODEL_PARAM = \
{ 'aggregationInfo': { 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0},
  'model': 'HTMPrediction',
  'modelParams': { 'anomalyParams': { u'anomalyCacheRecords': None,
                                      u'autoDetectThreshold': None,
                                      u'autoDetectWaitRecords': None},
                   'clParams': { 'alpha': 0.0001,
                                 'regionName': 'SDRClassifierRegion',
                                 'steps': '1',
                                 'verbosity': 0},
                   'inferenceType': 'TemporalAnomaly',
                   'sensorParams': { 'encoders': { 
                                 u'duration': { 'clipInput': True,
                                                'fieldname': 'duration',
                                                'maxval': 300.0,
                                                'minval': 1.0,
                                                'n': 22,
                                                'name': 'duration',
                                                'type': 'ScalarEncoder',
                                                'w': 21},
                                 u'timestamp_dayOfWeek': None,
                                 u'timestamp_timeOfDay': None,
                                 u'timestamp_weekend': None},
                                     'sensorAutoReset': None,
                                     'verbosity': 0},
                   'spEnable': True,
                   'spParams': { 'boostStrength': 0.0,
                                 'columnCount': 2048,
                                 'globalInhibition': 1,
                                 'inputWidth': 0,
                                 'numActiveColumnsPerInhArea': 40,
                                 'potentialPct': 0.8,
                                 'seed': 1956,
                                 'spVerbosity': 0,
                                 'spatialImp': 'cpp',
                                 'synPermActiveInc': 0.05,
                                 'synPermConnected': 0.1,
                                 'synPermInactiveDec': 0.1},
                   'tmEnable': True,
                   'tmParams': { 'activationThreshold': 12,
                                 'cellsPerColumn': 32,
                                 'columnCount': 2048,
                                 'globalDecay': 0.0,
                                 'initialPerm': 0.21,
                                 'inputWidth': 2048,
                                 'maxAge': 0,
                                 'maxSegmentsPerCell': 128,
                                 'maxSynapsesPerSegment': 32,
                                 'minThreshold': 9,
                                 'newSynapseCount': 20,
                                 'outputType': 'normal',
                                 'pamLength': 1,
                                 'permanenceDec': 0.1,
                                 'permanenceInc': 0.1,
                                 'seed': 1960,
                                 'temporalImp': 'cpp',
                                 'verbosity': 0},
                   'trainSPNetOnlyIfRequested': False},
  'predictAheadTime': None,
  'version': 1}

Regarding the code, I encapsulate the example from nupic.

rhyolight · March 13, 2018, 1:42pm

Did you change the field name in the code to duration? What does the header and a few sample rows of your data file look like? Also, you’ll get better anomalies if you use these canned parameters.

white · March 13, 2018, 2:48pm

I am sure the field name used in my code is duration, because the field name is fetched from swam file in the code. Meanwhile, the data file with header and sample rows looks like
func,caller,callee,timestamp,duration
string,string,string,datetime,float
, , , T ,
ord_IInvQueryCSV_funcQ:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcK:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcQ:127.0.1.1,2018-01-01 03:26:13.960,48
ord_IInvQueryCSV_funcQ:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcK:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcQ:127.0.1.1,2018-01-01 04:26:16.187,51
ord_IInvQueryCSV_funcQ:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcK:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcQ:127.0.1.1,2018-01-01 04:26:16.957,43

rhyolight · March 13, 2018, 3:26pm

Can you print out the model result’s inferences you get back from the compute function? It is what contains the anomaly score value.

white · March 13, 2018, 3:54pm

putting print snippet as follows:

    anomaly_score = result.inferences['anomalyScore']
    print(result.inferences)

The output is:

{'multiStepPredictions': {1: {113.48496495441054: 0.50022495748788509, 188.0: 0.49977504251211491}}, 'multiStepBucketLikelihoods': {1: {0: 0.50022495748788509, 1: 0.49977504251211491}}, 'multiStepBestPredictions': {1: 113.48496495441054}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}
{'multiStepPredictions': {1: {103.73947546808738: 0.49944997772429861, 188.0: 0.50055002227570133}}, 'multiStepBucketLikelihoods': {1: {0: 0.49944997772429861, 1: 0.50055002227570133}}, 'multiStepBestPredictions': {1: 188.0}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}
{'multiStepPredictions': {1: {100.21763282766115: 0.50132502190910699, 188.0: 0.49867497809089317}}, 'multiStepBucketLikelihoods': {1: {0: 0.50132502190910699, 1: 0.49867497809089317}}, 'multiStepBestPredictions': {1: 100.21763282766115}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}
{'multiStepPredictions': {1: {111.5523429793628: 0.50132496690992512, 188.0: 0.49867503309007483}}, 'multiStepBucketLikelihoods': {1: {0: 0.50132496690992512, 1: 0.49867503309007483}}, 'multiStepBestPredictions': {1: 111.5523429793628}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}

rhyolight · March 13, 2018, 3:57pm

Thank you for your patience.

Which example please? There are several.

sheiser1 · March 13, 2018, 4:37pm

How about the encoder parameters? ‘w’ is 21 but ‘n’ is only 22! Should it be like 10x ‘w’?

rhyolight · March 13, 2018, 4:40pm

Good point. OP should use the canned anomaly detection model params I suggested earlier.

sheiser1 · March 13, 2018, 5:05pm

Do you know what the ‘n’ and ‘w’ values are off hand? I’m having trouble finding them in there . I think I remember seeing defaults somewhere of 50 and 21 and thinking that ‘n’ seemed small. In my own work I’ve been getting better results with a ‘n’ like 10x ‘w’

rhyolight · March 13, 2018, 5:08pm

I would use the RDSE like the anomaly model defaults to:

github.com

numenta/nupic/blob/master/src/nupic/frameworks/opf/common_models/anomaly_params_random_encoder/best_single_metric_anomaly_params_cpp.json#L38-L44


"c1": {
  "name": "c1",
  "fieldname": "c1",
  "numBuckets": 130.0,
  "seed": 42,
  "type": "RandomDistributedScalarEncoder"
}

sheiser1 · March 14, 2018, 4:34am

Hey @rhyolight!

Quick question on the RDSE. So I’m trying to implement it and getting the error:

#### Error in constructing RandomDistributedScalarEncoder encoder. Possibly missing some required constructor parameters. Parameters that were provided are: {'seed': 42, 'name': 'dist', 'numBuckets': 140}
Traceback (most recent call last):
 
  File "<ipython-input-12-6911e8f5dd5c>", line 1, in <module>
    runfile('/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py', wdir='/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym')
 
  File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)
 
  File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 81, in execfile
    builtins.execfile(filename, *where)
 
  File "/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py", line 203, in <module>
    TMrunModel(only_csv_files,train_files,test_files,plot=plot)
 
  File "/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py", line 169, in TMrunModel
    model = createModel(getModelParamsFromName(GYM_NAME))
 
  File "/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py", line 88, in createModel
    model = ModelFactory.create(modelParams)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/frameworks/opf/model_factory.py", line 85, in create
    return modelClass(**modelConfig['modelParams'])
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/frameworks/opf/htm_prediction_model.py", line 240, in __init__
    clParams, anomalyParams)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/frameworks/opf/htm_prediction_model.py", line 1125, in __createHTMNetwork
    encoder = MultiEncoder(enabledEncoders)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/encoders/multi.py", line 74, in __init__
    self.addMultipleEncoders(encoderDefinitions)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/encoders/multi.py", line 173, in addMultipleEncoders
    self.addEncoder(fieldName, eval(encoderName)(**fieldParams))
 
TypeError: __init__() got an unexpected keyword argument 'numBuckets'

I’m using this structure for the ‘encoders’ dictionary within modelParams:

    encoder_dict[field] = {"name": field,
                                  "fieldname": field,
                                  "numBuckets": 140.0,
                                  "seed": 42,
                                  "type": "RandomDistributedScalarEncoder"
                                   }

I’m trying to imitate what you showed above but something seems missing - any intuition what it may be? Thanks again

sheiser1 · March 14, 2018, 4:39am

Here’s the whole MODEL_PARAMS dict too:

https://pastebin.com/81nRsy6v

white · March 14, 2018, 1:44pm

I managed to run hotgym_anomaly.py in https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly. and It works for anomaly score apparently. However, when I try to print(result) and find out ‘multiStepBestPredictions’ is missing, which means prediction is not available. How to fix this problem?

Is it an option to write anomaly detection program based on hotgym_anomaly.py?

rhyolight · March 14, 2018, 1:53pm

Convert numBuckets into resolution. See:

github.com

numenta/nupic.workshop/blob/master/part-1-scalar-input/run_anomaly.py#L36-L41


# RDSE - resolution calculation
valueEncoderParams = \
  modelParams["modelParams"]["sensorParams"]["encoders"]["value"]
numBuckets = float(valueEncoderParams.pop("numBuckets"))
resolution = max(0.001, (maxInput - minInput) / numBuckets)
valueEncoderParams["resolution"] = resolution

rhyolight · March 14, 2018, 1:56pm

See prediction hotgym example (and tutorial video) here.

See how to convert into an anomaly model (and tutorial video) here.

white · March 15, 2018, 10:07am

Thanks for the link! It really helps a lot.

I learned a lot from https://github.com/numenta/nupic.workshop/tree/master/part-1-scalar-input as well.

BTW, does nupic see no anomaly in last epoch, if I trained more epochs one time with the same data?

Topic		Replies	Views
Why anomalyScore is none NuPIC	3	574	September 26, 2017
Understanding NuPIC and troubleshooting to get the best results NuPIC	2	1498	July 20, 2016
Bad Anomaly detection for complex periods data NuPIC usage-help , anomaly-detection	4	2054	October 10, 2019
Why am I seeing lot of false positives? NuPIC	12	2484	June 22, 2016
Anomaly detection Newbie NuPIC	3	817	October 25, 2017

Anomaly score always 0

Related topics