Doubt: Predicted field, anomaly Likelihood and multiple inputs

juanhorta · February 21, 2017, 3:13pm

Hello,

I’m working in a model with multiple inputs in order to get an global anomaly output. Don’t based in a specific input field but the set.

I understand, as Matt says: "Anomaly output is for the entire input. "

My encoders are as follow:

"encoders": {
    "timestamp_timeOfDay": null,
    "timestamp_dayOfWeek": null,
    "timestamp_weekend": null,

    "req301": { "name": "req301", "fieldname": "req301", "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta301": { "name": "delta301",  "fieldname": "req301",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  },

    "req302": { "name": "req302", "fieldname": "req302", "w": 21, "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta302": { "name": "delta302",  "fieldname": "req302",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  },

    "req303": { "name": "req303", "fieldname": "req303", "w": 21, "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta303": { "name": "delta303",  "fieldname": "req303",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  },

    "req304": { "name": "req304", "fieldname": "req304", "w": 21, "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta304": { "name": "delta304",  "fieldname": "req304",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  },

    "req305": { "name": "req305", "fieldname": "req305", "w": 21, "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta305": { "name": "delta305",  "fieldname": "req305",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  },

    "req309": { "name": "req309", "fieldname": "req309", "w": 21, "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta309": { "name": "delta309",  "fieldname": "req309",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  },

    "req310": { "name": "req310", "fieldname": "req310",  "w": 21, "resolution": 0.01, "type": "RandomDistributedScalarEncoder" },
    "delta310": { "name": "delta310",  "fieldname": "req310",  "clipInput": true,  "forced": true,  "w": 21, "n": 100, "type": "DeltaEncoder"  }
},

But I’ve found that anomalyLikelihood depends of the predicted field. I mean, if I run the model :

result = model.run({
    'timestamp': timestamp,
    'req301' : req301,
    'req302' : req302,
    'req303' : req303,
    'req304' : req304,
    'req305' : req305,
    'req309' : req309,
    'req310' : req310
})

anomalyLikelihood is diferent for theses cases:

model.enableInference({"predictedField": "req301"})

or

model.enableInference({"predictedField": "req310"})

Has it sense? Any idea what do I doing wrong? (It doesn’t look better adding an average field as predicted field).

Regards.

Juan

sheiser1 · February 22, 2017, 6:01am

I just want to second this question.

If you’re running NuPIC with the same 7 input fields and just changing which is the predicted field, that shouldn’t effect the anomaly score right?

In those two different cases where you have predicted field ‘req301’ and ‘req310’, do they produce different anomaly score values as well? My understanding is that the anomaly likelihood value is calculated directly from the anomaly scores, so its seems very weird to me that identical sets of anomaly scores would produce different anomaly likelihood values.

I’m curious for any explanation here as well.

juanhorta · February 22, 2017, 1:29pm

Running the same dataset, same model, only changing the predicted field (req301 or req310)

My doubt graphically, anomaly Score and anomaly Likelihood are different.

rhyolight · February 22, 2017, 5:25pm

I don’t know the answer to this question. Hoping another @committer might be able to help.

sheiser1 · February 22, 2017, 11:31pm

If I’m reading that output plot right it looks like the anomaly scores are different depending which field is marked as the predicted field. Does this make sense @rhyolight? Is it possible that the anomaly score is in some way affected by which field is marked as ‘predicted’? If the anomaly score from a multi-field model should be the same regardless of which field is ‘predicted’ then this result shouldn’t happen right? I just wonder if we can first establish that much.

subutai · February 22, 2017, 11:53pm

Yes, you’re right. If everything else is identical (random seeds, model parameters, exact data stream, etc.) then the anomaly score should not be affected by which field is predicted. Marking something predicted just affects the classifier, which is not used to compute the anomaly score.

sheiser1 · February 23, 2017, 12:20am

Ok good to know, thanks @subutai! I wonder in that case, do you think the differences in anomaly score have to do with those random seeds then? It seems that he’s running the exact same data with the same input fields and the same params, so would that be the only thing left to explain it? Thanks again,

– Sam

subutai · February 24, 2017, 8:50pm

Hmm, usually all the random seeds are all fixed in our model params but perhaps it is not the case here.

@juanhorta - If you run with the same predicted field twice, do you get identical results?

juanhorta · February 27, 2017, 8:43am

Running twice with the same predicted field, I’m getting identical results:

Juan.

Jos · February 27, 2017, 2:27pm

Isn’t it possible that the RDSEncoders are slightly different? In that case the predicted field could be different, even when the input fields are the same? Because the different RDSEncoders could give different values for the same input values.

juanhorta · February 27, 2017, 6:43pm

Before opening this topic, I thought in the same posibility, maybe the Random part from RDSEncoder could be the origin. So I repeated the test using ScalarEncoder instead, but I obtained diferent anomaly Scores again.

Juan

juanhorta · March 29, 2017, 10:54am

I think I found the cause of that behaviour: SDRClassifier.

When SDRClassifier is enabled:

"clEnable": true,
"clParams": {
  "implementation": "cpp",
  "alpha": 0.1,
  "verbosity": 0,
  "regionName": "SDRClassifierRegion",
  "steps": "1,5"
},

anomalyLikelihood depends of the predicted field, It doesn’t happen when SDRClassifier is disabled (“clEnable”: false).

The problem is I’m getting some better results with SDRClassifier active.

Such you can see in next graphs:

rhyolight · March 29, 2017, 3:37pm

That’s interesting. Can you share your code? I think we should file a ticket for this, but having code that reproduces this is essential.

juanhorta · March 30, 2017, 7:23pm

Here you are!

The code is strongly inspired in

Launcher:

import ntpath
import os
import simplejson as json
import numpy as np
import pandas as pd
import time
from datetime import datetime

from nupic.algorithms import anomaly_likelihood
from nupic.data.inference_shifter import InferenceShifter
from nupic.frameworks.opf.modelfactory import ModelFactory

DATE_FORMAT = "%Y-%m-%d %H:%M:%S"


def getDataFrame(dataFilePath):
  df = pd.read_csv(dataFilePath, skiprows=3, names=['timestamp',
                                                     'avg',
                                                     'req301',
                                                     'req302',
                                                     'req303',
                                                     'req304',
                                                     'req305',
                                                     'req309',
                                                     'req310'])

  return df


def runDataThroughModel(model, dataFrame):
    shifter = InferenceShifter()
    anomalyLikelihood = anomaly_likelihood.AnomalyLikelihood()
    out = []


    for index, row in dataFrame.iterrows():
        timestamp = datetime.strptime(row["timestamp"], DATE_FORMAT)
        avg = float(row["avg"])
        req301 = float(row["req301"])
        req302 = float(row["req302"])
        req303 = float(row["req303"])
        req304 = float(row["req304"])
        req305 = float(row["req305"])
        req309 = float(row["req309"])
        req310 = float(row["req310"])

        result = model.run({
            'timestamp': timestamp,
            'avg': avg,
            'req301': req301,
            'req302': req302,
            'req303': req303,
            'req304': req304,
            'req305': req305,
            'req309': req309,
            'req310': req310
        })

        if index % 100 == 0:
            print time.strftime("%d %b %Y %H:%M:%S", time.localtime()) + " Read %i lines..." % index
        result = shifter.shift(result)
        resultOut = convertToWritableOutput(result, anomalyLikelihood)
        out.append(resultOut)

    return pd.DataFrame(out)

def convertToWritableOutput(result, anomalyLikelihood):
  timestamp = result.rawInput["timestamp"]
  avg = result.rawInput["avg"]
  req301 = result.rawInput["req301"]
  req310 = result.rawInput["req310"]


  inferences = result.inferences
  output = {
      "timestamp": timestamp,
      "avg": avg,
  }


  if "anomalyScore" in inferences and inferences["anomalyScore"] is not None:
    anomalyScore = inferences["anomalyScore"]
    output["anomalyScore"] = anomalyScore
    likelihood = anomalyLikelihood.anomalyProbability( avg, anomalyScore, timestamp)

  return output


def createAnomalyDetectionModel(dataFrame):
  with open(MODEL_PARAMS_PATH, "r") as dataIn:
    modelParams = json.loads(dataIn.read())


  model = ModelFactory.create(modelParams)
  model.enableInference({"predictedField": "req301"})
  return model


def main(inputPath):
    inputFileName = ntpath.basename(inputPath)
    dataFrame = getDataFrame(inputPath)
    model = createAnomalyDetectionModel(dataFrame)

    outputFrame = runDataThroughModel(model, dataFrame)

    outputFrame.to_csv(
        os.path.join('data', "anomaly_" + inputFileName),
        index=False
    )

if __name__ == "__main__":
  dataPath = 'data/dataset.csv'
  MODEL_PARAMS_PATH = 'model_params/multivariable.json'
  main(dataPath)

Model:

{
  "aggregationInfo": {
    "days": 0,
    "fields": [],
    "hours": 0,
    "microseconds": 0,
    "milliseconds": 0,
    "minutes": 0,
    "months": 0,
    "seconds": 0,
    "weeks": 0,
    "years": 0
  },
  "predictAheadTime": null,
  "version": 1,
  "model": "CLA",
  "modelParams": {

    "anomalyParams": {
            "anomalyCacheRecords": null,
            "autoDetectThreshold": null,
            "autoDetectWaitRecords": 1000
    },

    "clEnable": true,
    "clParams": {
      "implementation": "cpp",
      "alpha": 0.1,
      "verbosity": 0,
      "regionName": "SDRClassifierRegion",
      "steps": "1"
    },
    "inferenceType": "TemporalAnomaly",
    "sensorParams": {
      "encoders": {
        "timestamp_timeOfDay":  null,
        "timestamp_dayOfWeek": null,
        "timestamp_weekend": null,

        "avg": { "name": "avg", "fieldname": "avg", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta": { "name": "delta",  "fieldname": "avg",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req301": { "name": "req301", "fieldname": "req301", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta301": { "name": "delta301",  "fieldname": "req301",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req302": { "name": "req302", "fieldname": "req302", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta302": { "name": "delta302",  "fieldname": "req302",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req303": { "name": "req303", "fieldname": "req303", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta303": { "name": "delta303",  "fieldname": "req303",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req304": { "name": "req304", "fieldname": "req304", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta304": { "name": "delta304",  "fieldname": "req304",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req305": { "name": "req305", "fieldname": "req305", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta305": { "name": "delta305",  "fieldname": "req305",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req309": { "name": "req309", "fieldname": "req309", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta309": { "name": "delta309",  "fieldname": "req309",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  },

        "req310": { "name": "req310", "fieldname": "req310", "resolution": 2.00,"type": "RandomDistributedScalarEncoder" },
        "delta310": { "name": "delta310",  "fieldname": "req310",  "clipInput": true,  "forced": true,  "w": 41, "n": 2048, "type": "DeltaEncoder"  }
      },
      "sensorAutoReset": null,
      "verbosity": 0
    },
        "spEnable": true,
        "spParams": {
            "potentialPct": 0.8,
            "columnCount": 2048,
            "globalInhibition": 1,
            "inputWidth": 0,
            "maxBoost": 1.0,
            "numActiveColumnsPerInhArea": 40,
            "seed": 1956,
            "spVerbosity": 0,
            "spatialImp": "cpp",
            "synPermActiveInc": 0.003,
            "synPermConnected": 0.2,
            "synPermInactiveDec": 0.0005
        },
        "tpEnable": true,
        "tpParams": {
            "activationThreshold": 13,
            "cellsPerColumn": 32,
            "columnCount": 2048,
            "globalDecay": 0.0,
            "initialPerm": 0.21,
            "inputWidth": 2048,
            "maxAge": 0,
            "maxSegmentsPerCell": 128,
            "maxSynapsesPerSegment": 32,
            "minThreshold": 10,
            "newSynapseCount": 20,
            "outputType": "normal",
            "pamLength": 3,
            "permanenceDec": 0.1,
            "permanenceInc": 0.1,
            "seed": 1960,
            "temporalImp": "cpp",
            "verbosity": 0
        },

    "trainSPNetOnlyIfRequested": false
  }
}

Dataset (only few rows by confidencial reason):

timestamp,avg,req301,req302,req303,req304,req305,req309,req310
datetime,float,float,float,float,float,float,float,float
T,,,,,,,,
1970-11-05 17:15:00,7625.71,7484,8034,8036,7412,7647,7592,7175
1970-11-05 17:20:00,8048.86,8151,7976,8638,8201,8261,7890,7225

I hope it can help.

Topic		Replies	Views
Anomaly score logic for multiple inferred fields NuPIC	9	554	December 31, 2018
Predict anomalies on more than a single input field NuPIC htm	2	754	November 8, 2018
Anomaly detection for multi features NuPIC	15	1820	May 15, 2019
Creating a car anomaly detection with Multiple field NuPIC question , multiple-inputs	5	1228	October 8, 2021
Multi Encoder NuPIC multiple-inputs	1	830	July 4, 2017

Doubt: Predicted field, anomaly Likelihood and multiple inputs

Related topics