Int Assertion Error

Hi,

I am getting following error on running swarm:

Generating experiment files in directory: /tmp/tmpQyrcyS...
Writing  313 lines...
Writing  114 lines...
done.
None
Traceback (most recent call last):
  File "./swarm.py", line 109, in <module>
    swarm(INPUT_FILE)
  File "./swarm.py", line 101, in swarm
    modelParams = swarmForBestModelParams(SWARM_DESCRIPTION, name)
  File "./swarm.py", line 78, in swarmForBestModelParams
    verbosity=0
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/swarming/permutations_runner.py", line 277, in runWithConfig
    return _runAction(runOptions)
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/swarming/permutations_runner.py", line 218, in _runAction
    returnValue = _runHyperSearch(runOptions)
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/swarming/permutations_runner.py", line 161, in _runHyperSearch
    metricsKeys=search.getDiscoveredMetricsKeys())
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/swarming/permutations_runner.py", line 825, in generateReport
    raise Exception(jobInfo.workerCompletionMsg)
Exception: E10002: Exiting due to receiving too many models failing from exceptions (6 out of 6). 
Model Exception: Exception occurred while running model 1152: AssertionError() (<type 'exceptions.AssertionError'>)
Traceback (most recent call last):
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/swarming/hypersearch/utils.py", line 435, in runModelGivenBaseAndParams
    (completionReason, completionMsg) = runner.run()
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/swarming/ModelRunner.py", line 236, in run
    maxTimeout=readTimeout)
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/data/stream_reader.py", line 201, in __init__
    bookmark, firstRecordIdx)
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/data/stream_reader.py", line 297, in _openStream
    firstRecord=firstRecordIdx)
  File "/home/komalydedhia/.local/lib/python2.7/site-packages/nupic-0.5.3.dev0-py2.7.egg/nupic/data/file_record_stream.py", line 234, in __init__
    FieldMetaType.integer)
AssertionError

Following is my swarm_description

SWARM_DESCRIPTION = {
  "includedFields": [
    {
      "fieldName": "startTime",
      "fieldType": "datetime"
    },
    {
      "fieldName": "flag",
      "fieldType": "string"
    },
  {
      "fieldName": "packets",
      "fieldType": "int",
      "maxValue": 20000,
      "minValue": 1
    },
  {
      "fieldName": "bytes",
      "fieldType": "int",
      "maxValue": 300000000,
      "minValue": 40
    },
  {
      "fieldName": "duration",
      "fieldType": "float",
      "maxValue": 55.0,
      "minValue": 0.0
    }



  ],
  "streamDef": {
    "info": "sample_ddos",
    "version": 1,
    "streams": [
      {
        "info": "Rec Center",
        "source": "file://3.csv",
        "columns": [
          "*"
        ]
      }
    ]
  },

  "inferenceType": "TemporalMultiStep",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": "duration"
  },
  "iterationCount": 1,
  "swarmSize": "small"
}

My Data set:

type,sIP,dIP,sPort,dPort,startTime,endTime,flag,packets,bytes,duration
string,string,string,string,string,datetime,datetime,string,int,int,float
,,,,,T,T,C,,,
0,0.0.0.0,0.0.0.0,1234,5678,2009-01-15 13:01:12.615,2009-01-15 13:01:36.767,PA,2,777,24.152

Note : My swarm.py is same used in oneHotGym example. I have just modified my input data file name.

Any help is appreciated.

Thanks and regards,
Komal.

1 Like

Check your input data set for non integer values for packets and bytes.

Also, what are you trying to predict? And what timescale are is this data? If the interval between data points is sub-second or even sub-minute, I’m not sure that encoding a timestamp is going to help at all.

Hi,

I verified my data and its proper.
Even if I have just 1 row in data still I am getting assertion error.

startTime,flag,packets,bytes,duration
datetime,string,int,int,float
T,C,
2009-01-15 13:01:12.615,PA,2,777,24.152

I want to detect Distributed Denial of service attack using NuPIC and planning to use Anomaly detection of NuPIC. However, initially, I have started with the prediction.
I am not sure what does “encoding a timestamp” means ?
Here the data should be of sub second/ minute. So what would be the good choose in this case?
Also, can you share a doc or explain difference between temporal and non temporal.

Thanks and regards,
Komal.

You don’t need to swarm for anomaly detection. See this example: https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly/one_gym (and video tutorial).

“This example code assumes a swarm has already been run against the input data (see One Hot Gym Prediction Tutorial for details). Model parameters therefore already exist in the model_params directory, and the only step to run this tutorial is to simply execute ./run.py”

Following are my query:

  1. As per this doc, I can’t run anomaly detection without running swarm. Pls give your suggestion.

  2. Also, if time-stamp field is in non-consistent interval and in sub second or sub millisecond, do my anomaly detection work?

Thank you for helping me !

Regards,
Komal.

Sorry about that, I gave you a link to the wrong example for your case. In the example I posted, I was converting a prediction model into an anomaly model. We have an API somewhere for getting canned anomaly model parameters for scalar input, given min/max values, I think. Can someone help me find it? @alavin or @scott? I know we use it in our apps…

Hi,

I want to do Anomaly detection on Streaming data.
If my data don’t have

  1. Date field.
  2. Any single answers for “predictedField”

In this case, which NuPIC/HTM model I should use?

Thanks and regards,
Komal.

@Komal, I saw your new topic and moved it back into this thread. We’ll try to help you here.

You want to call this function. You can import it like this:

from nupic.frameworks.opf.common_models.cluster_params import getScalarMetricWithTimeOfDayAnomalyParams

Once you get the params, I suggest you change them to remove the datetime encoder and just keep the scalar encoding for your input field.

1 Like

Thanks @rhyolight

I am trying to build my model using oneHotGym anomaly/prediction example. However, I don’t have and one specific answer to “predictedField”.
I would be later converting my model for real-time Streaming for Anomaly Detection. Can you share some sample code for anomaly detection which don’t ask “predictedField” ?

If you use the getScalarMetricWithTimeOfDayAnomalyParams() function, you don’t need to identify a predictedField. That function will generate a python dict which is a configuration object for NuPIC’s model. You can alter this object before creating a model with it. Look into it to explore.

https://github.com/numenta/numenta-apps/blob/master/unicorn/py/unicorn_backend/param_finder.py#L324

https://github.com/numenta/numenta-apps/blob/master/htmengine/htmengine/runtime/scalar_metric_utils.py#L97

@rhyolight I am getting Int assertion error bcoz Flag “C” category only takes INT data-type and not STRING. I think this contradicts from what it is written in the doc “Encodes a list of discrete categories (described by strings)…”

Hi,

I went through some code and made some changes.
I have following query:

  1. What does min and max value of getScalarMetricWithTimeOfDayAnomalyParams used for?
  2. How to determine min and max value of our model?
  3. I understand “c0” as date time field, but wouldn’t NUPIC will generate automatically? How should I pass “c0” as input while running the model, I as am getting ValueError: Unknown field name ‘c0’ in input record.

Thanks and regards,
Komal.

Hi Komal,
The min and max values are the expected lower and upper bounds on your data values. For example, if my data is temperature readings from the office, I can confidently set these bounds to something like 50F to 90F. The min and max values are not required (for the random distributed scalar encoder used in those parameters), but certainly help with the encodings.

I don’t understand your 3rd question, but my assumption is there is a mismatch between the field name specified in the parameters and the headers in the data file.

Cheers,
Alex

Delete that encoder from the model params entirely before using it to create a new model. This will remove the date encoding and the “c0” field (you might need to look through the other params to ensure there are no other “c0” references, you can print them all to stdout with from pprint import pprint; pprint(params).

Then when you pass each row of data into the model, exclude the “c0” field and data.

Thanks @alavin So since I have 4 fields, I can’t give definite answer to min and max values. correct?

@rhyolight columns “c0” and “c1” are present in https://github.com/numenta/nupic/blob/master/src/nupic/frameworks/opf/common_models/anomaly_params_random_encoder/best_single_metric_anomaly_params_cpp.json, which is the default swarm param when we get in getScalarMetricWithTimeOfDayAnomalyParams.
So are you suggesting to modify that file itself?

No, you can modify the dict object that getScalarMetricWithTimeOfDayAnomalyParams is passing back to you when you call it.

Each field needs to be encoded. The getScalarMetricWithTimeOfDayAnomalyParams function will get you a complete set of model params with one scalar encoding and one datetime encoding. You can now take these params and modify them programmatically, or even just save them as JSON and modify them in a file format. Then use those params (and your modifications) to create a new model using the ModelFactory.create() like in our examples. It just takes a dict, so you can change the model params into whatever you need.

So if you want to get model params for each of 4 fields, you could call that function for each scalar field with the proper min/max values and extract the encoding parameters for each one into another dict you’ll use to create the model.

@rhyolight Thanks a lot, now I understand.
Just one more thing, so If I don’t have proper timestamp field, I can’t use TemporalAnomaly. I have to use NonTemporalAnomaly, correct ?

I don’t think so, but I really don’t know. Try TemporalAnomaly first. If that causes an error, try the other.