Anomaly Detection - Poor results - Build issues or Tuning issues on Real Data

Dear HTM Anomaly Detection community,
I was wondering if anyone can shed some light on what I am doing wrong. I have been pulling my hair out for last week on this.

I am getting poor results and not sure if it is due to tuning parameters or the library versions.


PROBLEM 1: git hub build version issues:
I started with htmengine which depends on an old version of nupic 0.5.7. I was not able to install the more preferred version of nupic 1.0.7.
So I have nta.utils 0.0.0, htmengine 0.0.0.

PROBLEM 2: results dont make sense.

I saw Matts video on the hot gym and results totally make sense, but on my data, the results dont make sense and was wondering if it is due to the libraries that I used to build the tool, or tunable params below:

Here is an screenshot of where results dont make sense:

The predicted values are very close to actual values, yet the anomaly likelihood goes high, this contradicts Matts hot gym video, where the anomaly likelihood ( and score ) goes high when the models prediction and actual have a large differential.

Any clarifications on what is going on would be highly appreciated. Is it the build itself, or my tunable parameters or something else that I am missing ?


My parameters are:

{ ‘aggregationInfo’: { ‘days’: 0,
‘fields’: [],
‘hours’: 0,
‘microseconds’: 0,
‘milliseconds’: 0,
‘minutes’: 0,
‘months’: 0,
‘seconds’: 0,
‘weeks’: 0,
‘years’: 0},
‘model’: ‘CLA’,
‘modelParams’: { ‘anomalyParams’: { u’anomalyCacheRecords’: None,
u’autoDetectThreshold’: None,
u’autoDetectWaitRecords’: None},
‘clParams’: { ‘alpha’: 0.014695645742164247,
‘regionName’: ‘SDRClassifierRegion’,
‘steps’: ‘1’,
‘verbosity’: 0},
‘inferenceType’: ‘TemporalAnomaly’,
‘sensorParams’: { ‘encoders’: { u’network_latency’: { ‘clipInput’: True,
‘fieldname’: ‘network_latency’,
‘maxval’: 53.0,
‘minval’: 0.0,
‘n’: 102,
‘name’: ‘network_latency’,
‘type’: ‘ScalarEncoder’,
‘w’: 21},
u’timestamp_dayOfWeek’: None,
u’timestamp_timeOfDay’: { ‘fieldname’: ‘timestamp’,
‘name’: ‘timestamp’,
‘timeOfDay’: ( 21,
‘type’: ‘DateEncoder’},
u’timestamp_weekend’: None},
‘sensorAutoReset’: None,
‘verbosity’: 0},
‘spEnable’: True,
‘spParams’: { ‘columnCount’: 2048,
‘globalInhibition’: 1,
‘inputWidth’: 0,
‘maxBoost’: 1.0,
‘numActiveColumnsPerInhArea’: 40,
‘potentialPct’: 0.8,
‘seed’: 1956,
‘spVerbosity’: 0,
‘spatialImp’: ‘cpp’,
‘synPermActiveInc’: 0.05,
‘synPermConnected’: 0.1,
‘synPermInactiveDec’: 0.1},
‘tpEnable’: True,
‘tpParams’: { ‘activationThreshold’: 13,
‘cellsPerColumn’: 32,
‘columnCount’: 2048,
‘globalDecay’: 0.0,
‘initialPerm’: 0.21,
‘inputWidth’: 2048,
‘maxAge’: 0,
‘maxSegmentsPerCell’: 128,
‘maxSynapsesPerSegment’: 32,
‘minThreshold’: 9,
‘newSynapseCount’: 20,
‘outputType’: ‘normal’,
‘pamLength’: 2,
‘permanenceDec’: 0.1,
‘permanenceInc’: 0.1,
‘seed’: 1960,
‘temporalImp’: ‘cpp’,
‘verbosity’: 0},
‘trainSPNetOnlyIfRequested’: False},
‘predictAheadTime’: None,
‘version’: 1}

Hi @dk25094, welcome!

This is what jumped out at me.
All recent versions I’ve seen use ‘HTMPrediction’ instead of ‘CLA’.

In case you haven’t already I’d have a look at the model_params file for hotgym anomaly:

The Classifier is not involved here so it won’t yield predicted values, just anomaly score and likelihoods. You can still use a Classifier and it’ll have no effect on the anomaly scores, but I wouldn’t assume that the system is predicting well just because the Classifier-‘predicted’ and ‘actual’ values are close. I’m not totally familiar with the Classifier, but I feel I’ve seen this before. It may be that when the TM generates no predictions at all the Classifier defaults to the last observed value, which could make them look close often.

I’m not sure the implications of your version issue, but my instinct is try to your setup on some well-understood data first. This could be the hotgym data itself, since we know what it’ll look like if the system is acting normally.