Anomaly score always 0

I take nupic to predict the duration of function call. The following graph shows that the blue line stands for actual duration value, the green one means predicted duration value, and the red one the difference between actual and predicted value. It is obvious that the difference varies too much. However, the anomaly score (from nupic) line is always zero.

Can anyone explain why the difference seems not relevant to anomaly score? or it is the way it is? Thanks

Something is wrong if the anomaly score is always 0. Show your model Params and code?

Sorry to paste the plain text here. It is not allowed to upload python file in this website.

The swarm file is as follows:

SWARM_DESCRIPTION = {
  "includedFields": [
    {
      "fieldName": "timestamp",
      "fieldType": "datetime"
    },
    {
      "fieldName": "duration",
      "fieldType": "float",
      "maxValue": 300.0,
      "minValue": 1.0
    }
  ],
  "streamDef": {
    "info": "duration",
    "version": 1,
    "streams": [
      {
        "info": "duration",
        "source": "file://../data/aaa.csv",
        "columns": [
          "*"
        ]
      }
    ]
  },
  "inferenceType": "TemporalAnomaly",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "duration"
  },
  "swarmSize": "medium"
}

The model Params file, which is generated from swarm file above, is like this:

MODEL_PARAM = \
{ 'aggregationInfo': { 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0},
  'model': 'HTMPrediction',
  'modelParams': { 'anomalyParams': { u'anomalyCacheRecords': None,
                                      u'autoDetectThreshold': None,
                                      u'autoDetectWaitRecords': None},
                   'clParams': { 'alpha': 0.0001,
                                 'regionName': 'SDRClassifierRegion',
                                 'steps': '1',
                                 'verbosity': 0},
                   'inferenceType': 'TemporalAnomaly',
                   'sensorParams': { 'encoders': { 
                                 u'duration': { 'clipInput': True,
                                                'fieldname': 'duration',
                                                'maxval': 300.0,
                                                'minval': 1.0,
                                                'n': 22,
                                                'name': 'duration',
                                                'type': 'ScalarEncoder',
                                                'w': 21},
                                 u'timestamp_dayOfWeek': None,
                                 u'timestamp_timeOfDay': None,
                                 u'timestamp_weekend': None},
                                     'sensorAutoReset': None,
                                     'verbosity': 0},
                   'spEnable': True,
                   'spParams': { 'boostStrength': 0.0,
                                 'columnCount': 2048,
                                 'globalInhibition': 1,
                                 'inputWidth': 0,
                                 'numActiveColumnsPerInhArea': 40,
                                 'potentialPct': 0.8,
                                 'seed': 1956,
                                 'spVerbosity': 0,
                                 'spatialImp': 'cpp',
                                 'synPermActiveInc': 0.05,
                                 'synPermConnected': 0.1,
                                 'synPermInactiveDec': 0.1},
                   'tmEnable': True,
                   'tmParams': { 'activationThreshold': 12,
                                 'cellsPerColumn': 32,
                                 'columnCount': 2048,
                                 'globalDecay': 0.0,
                                 'initialPerm': 0.21,
                                 'inputWidth': 2048,
                                 'maxAge': 0,
                                 'maxSegmentsPerCell': 128,
                                 'maxSynapsesPerSegment': 32,
                                 'minThreshold': 9,
                                 'newSynapseCount': 20,
                                 'outputType': 'normal',
                                 'pamLength': 1,
                                 'permanenceDec': 0.1,
                                 'permanenceInc': 0.1,
                                 'seed': 1960,
                                 'temporalImp': 'cpp',
                                 'verbosity': 0},
                   'trainSPNetOnlyIfRequested': False},
  'predictAheadTime': None,
  'version': 1}

Regarding the code, I encapsulate the example from nupic.

Did you change the field name in the code to duration? What does the header and a few sample rows of your data file look like? Also, you’ll get better anomalies if you use these canned parameters.

I am sure the field name used in my code is duration, because the field name is fetched from swam file in the code. Meanwhile, the data file with header and sample rows looks like
func,caller,callee,timestamp,duration
string,string,string,datetime,float
, , , T ,
ord_IInvQueryCSV_funcQ:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcK:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcQ:127.0.1.1,2018-01-01 03:26:13.960,48
ord_IInvQueryCSV_funcQ:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcK:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcQ:127.0.1.1,2018-01-01 04:26:16.187,51
ord_IInvQueryCSV_funcQ:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcK:127.0.1.1,com.gyl.scm.center.query.service.impl.InvQueryCSVImpl.funcQ:127.0.1.1,2018-01-01 04:26:16.957,43

Can you print out the model result’s inferences you get back from the compute function? It is what contains the anomaly score value.

putting print snippet as follows:

    anomaly_score = result.inferences['anomalyScore']
    print(result.inferences)

The output is:

{'multiStepPredictions': {1: {113.48496495441054: 0.50022495748788509, 188.0: 0.49977504251211491}}, 'multiStepBucketLikelihoods': {1: {0: 0.50022495748788509, 1: 0.49977504251211491}}, 'multiStepBestPredictions': {1: 113.48496495441054}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}
{'multiStepPredictions': {1: {103.73947546808738: 0.49944997772429861, 188.0: 0.50055002227570133}}, 'multiStepBucketLikelihoods': {1: {0: 0.49944997772429861, 1: 0.50055002227570133}}, 'multiStepBestPredictions': {1: 188.0}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}
{'multiStepPredictions': {1: {100.21763282766115: 0.50132502190910699, 188.0: 0.49867497809089317}}, 'multiStepBucketLikelihoods': {1: {0: 0.50132502190910699, 1: 0.49867497809089317}}, 'multiStepBestPredictions': {1: 100.21763282766115}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}
{'multiStepPredictions': {1: {111.5523429793628: 0.50132496690992512, 188.0: 0.49867503309007483}}, 'multiStepBucketLikelihoods': {1: {0: 0.50132496690992512, 1: 0.49867503309007483}}, 'multiStepBestPredictions': {1: 111.5523429793628}, 'anomalyLabel': '[]', 'anomalyScore': 0.0}

Thank you for your patience.

Which example please? There are several.

How about the encoder parameters? ‘w’ is 21 but ‘n’ is only 22! Should it be like 10x ‘w’?

Good point. OP should use the canned anomaly detection model params I suggested earlier.

Do you know what the ‘n’ and ‘w’ values are off hand? I’m having trouble finding them in there :sweat_smile:. I think I remember seeing defaults somewhere of 50 and 21 and thinking that ‘n’ seemed small. In my own work I’ve been getting better results with a ‘n’ like 10x ‘w’

I would use the RDSE like the anomaly model defaults to:

Hey @rhyolight!

Quick question on the RDSE. So I’m trying to implement it and getting the error:

#### Error in constructing RandomDistributedScalarEncoder encoder. Possibly missing some required constructor parameters. Parameters that were provided are: {'seed': 42, 'name': 'dist', 'numBuckets': 140}
Traceback (most recent call last):
 
  File "<ipython-input-12-6911e8f5dd5c>", line 1, in <module>
    runfile('/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py', wdir='/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym')
 
  File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)
 
  File "/usr/lib/python2.7/dist-packages/spyderlib/widgets/externalshell/sitecustomize.py", line 81, in execfile
    builtins.execfile(filename, *where)
 
  File "/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py", line 203, in <module>
    TMrunModel(only_csv_files,train_files,test_files,plot=plot)
 
  File "/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py", line 169, in TMrunModel
    model = createModel(getModelParamsFromName(GYM_NAME))
 
  File "/home/sheiser1/nupic-master/examples/opf/clients/hotgym/anomaly/one_gym/5D_run_new.py", line 88, in createModel
    model = ModelFactory.create(modelParams)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/frameworks/opf/model_factory.py", line 85, in create
    return modelClass(**modelConfig['modelParams'])
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/frameworks/opf/htm_prediction_model.py", line 240, in __init__
    clParams, anomalyParams)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/frameworks/opf/htm_prediction_model.py", line 1125, in __createHTMNetwork
    encoder = MultiEncoder(enabledEncoders)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/encoders/multi.py", line 74, in __init__
    self.addMultipleEncoders(encoderDefinitions)
 
  File "/usr/local/lib/python2.7/dist-packages/nupic/encoders/multi.py", line 173, in addMultipleEncoders
    self.addEncoder(fieldName, eval(encoderName)(**fieldParams))
 
TypeError: __init__() got an unexpected keyword argument 'numBuckets'

I’m using this structure for the ‘encoders’ dictionary within modelParams:

    encoder_dict[field] = {"name": field,
                                  "fieldname": field,
                                  "numBuckets": 140.0,
                                  "seed": 42,
                                  "type": "RandomDistributedScalarEncoder"
                                   }

I’m trying to imitate what you showed above but something seems missing - any intuition what it may be? Thanks again :slight_smile:

Here’s the whole MODEL_PARAMS dict too:

https://pastebin.com/81nRsy6v

I managed to run hotgym_anomaly.py in https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly. and It works for anomaly score apparently. However, when I try to print(result) and find out ‘multiStepBestPredictions’ is missing, which means prediction is not available. How to fix this problem?

Is it an option to write anomaly detection program based on hotgym_anomaly.py?

Convert numBuckets into resolution. See:

1 Like

See prediction hotgym example (and tutorial video) here.

See how to convert into an anomaly model (and tutorial video) here.

Thanks for the link! It really helps a lot.

I learned a lot from https://github.com/numenta/nupic.workshop/tree/master/part-1-scalar-input as well.

BTW, does nupic see no anomaly in last epoch, if I trained more epochs one time with the same data?