Std::bad_alloc runtime error in swarming

Hi,

I encountered std::bad_alloc runtime error when I tried to do swarming.
The error occured after running tens of minutes.

My key configuration was like the followings:
MaxWorker=4
swarmSize=medium
input file size ~= 80Mb (~180,000 records)
number of Included fields = 4 (and one of them is the predicted field)

I’m working on Win7 64bit desktop with 16G ram.

Error does not occur when I set the swarmSize to small.

Is anybody who faced this problem before?
What can be done for avoiding this error?

Thanks.

That’s a lot of data to run swarming over. There is some kind of memory problem. There is a way you can limit the amount of data processed by each model run during the swarm. Change "iterationCount": -1 to "iterationCount": 3000 to limit it to only 3000 rows of data per model. This might help.

Thank you rhyolight!

Then, limiting it to 3000 rows should be on the assumption that the first 3000 rows have enough pattern to extract, right?
Or, the 3000 rows are automatically sampled from anywhere of the entire dataset?

OK, the same error is also observed during anomal detection of the same dataset (180000 record) with more than 10 fields at the same time.

I understood that the problem is definitely about the memory insufficiency.
Then, do you think this problem can be solved with a system with very high memory capacity (currently 16G -> increase it to maybe 128G)?

Or, could you please give us any general guidelines for using HTM without memory issue other than that you commented above?

  • recommened spec of computing system (in terms of memory)
  • number of columns that HTM can handle simultaneously

Thanks.

10 fields are a lot. The more fields you use, the large the input space for the spatial pooler. Can I see your swarm parameters? What are the column dimensions for your spatial pooler? And how many cells per column in the TM settings?

I slightly modified the swarm_description.py file and did not specified parameters like column dimensions and number of cells for now. (Attached below)
Here I included only 7 fields but there are more actually.
What I want to do is predicting the value of TargetF (Device Fault label) of (t + n) time and it is likely that temporal patterns in F1~F6 up to time t might be collaboratively influensive to the TargetF field value.
But, at this moment, I don’t know which field is the most important one.
That’s why I would like to put as many field as possible to prioritize their importance as a result of swarming.
Do you think this approach is feasible?

Thanks.

SWARM_DESCRIPTION = {
  "includedFields": [
    {
      "fieldName": 'Modified_Time',
      "fieldType": "datetime"
    },
     {
       "fieldName": 'F1',
       "fieldType": "float",
     },
    {
      "fieldName": 'F2',
      "fieldType": "string",
    },
    {
      "fieldName": 'F3',
      "fieldType": "float",
    },
    {
      "fieldName": 'F4',
      "fieldType": "string",
    },
     {
       "fieldName": 'F5',
       "fieldType": "float",
     },
     {
       "fieldName": 'F6',
       "fieldType": "float",
     },
     {
       "fieldName": 'TargetF',
       "fieldType": "string",
     }
  ],
  "streamDef": {
    "info": "sac_0A",
    "version": 1,
    "streams": [
      {
        "info": "SAC Stream",
        "source": "file://0A_test.txt",
        "columns": [
          "*"
        ]
      }
    ]
  },

  "inferenceType": "TemporalMultiStep",
  "inferenceArgs": {
    "predictionSteps": [
      1
    ],
    "predictedField": 'TargetF'
  },
  "iterationCount": -1,
  "swarmSize": "small"
}