Why doesn't htm algorithm match the time series when it predicts?


#1

when I use the htm algorithm to predict , I found a problem. My datasets has two columns,dttm and value,where dttm is the time series and value is the column I want to predict. The time series is incremented in seconds .I finished the codes and the prediction is almost correct.But when I did some changes at the dttm column.Change as follows:there are a total of 3,500 data rows,
the timestamp of the first 3000 lines is increased by one second,and the remaining data, the corresponding time is incremented by 4 seconds.And I originally thought that the algorithm will still predict according to the corresponding time,but it is not.The forecast results are visualized as follows
image
I do not think this result is good.the result means that the prediction of the algorithm is based on the series itself , ranther than the time series.

there are my parameters configuration

SWARM_CONFIG = {
    "includedFields": [
    {
      "fieldName": "dttm",    # CHANGED
      "fieldType": "datetime"
    },
    {
      "fieldName": "value",  # CHANGED
      "fieldType": "float",
      "minValue": -1,
      "maxValue": 1
    }
  ],
  "streamDef": {
    "info": "test",
    "version": 1,
    "streams": [
      {
        "info": "small_test.csv", # CHANGED
        "source": "file://small_test.csv", # CHANGED
        "columns": [
          "*"
        ],
        # Note: the last_record field specifies how many records to run. Leave this OUT to run
        # on the whole file. Leave this to 100 if you want to do quicker runs for debugging.
        "last_record": 5000
                                 }
    ],
    "aggregation": {
      "hours": 0,
      "microseconds": 0,
      "seconds": 1,
      #"fields": [],
      "fields": [
        [
          "value", # CHANGED
          "sum"
        ],
        # Note: The lines referring to the field 'gym' which is
        # no longer present have been removed
        [
          "dttm", # CHANGED
          "first"
        ]
      ],
      "weeks": 0,
      "months": 0,
      "minutes": 0,
      "days": 0,
      "milliseconds": 0,
      "years": 0
    }
  },
  "inferenceType": "TemporalAnomaly",
  "inferenceArgs": {
    "predictionSteps": [
                               3
    ],
    "predictedField": "value" # CHANGED
  },
  #"iterationCount": -1,
  "swarmSize": "medium"
}

and the data

1970-01-01 08:49:54,-0.3681245526846894
1970-01-01 08:49:55,-0.3090169943749561
1970-01-01 08:49:56,-0.2486898871648605
1970-01-01 08:49:57,-0.1873813145857273
1970-01-01 08:49:58,-0.125333233564332
1970-01-01 08:49:59,-0.06279051952933808
1970-01-01 08:50:00,-2.1558735510086122e-14
1970-01-01 08:50:04,0.24868988716484627
1970-01-01 08:50:08,0.48175367410171877
1970-01-01 08:50:12,0.6845471059286802
1970-01-01 08:50:16,0.8443279255020004
1970-01-01 08:50:20,0.9510565162951491
1970-01-01 08:50:24,0.9980267284282714
1970-01-01 08:50:28,0.9822872507286919
1970-01-01 08:50:32,0.9048270524660215
1970-01-01 08:50:36,0.770513242775784

Originally increased by one second, then became 4 seconds .

So I doubt that there are some problems in my configuration ,but I don’t know how to do.

Thanks


#2

It is hard to tell how bad your results are from that chart. But anyway I wouldn’t expect it to predict as well on the data after you make a major structure change like this. If you let it run for 5000 records or so with the new time interval, prediction results should get better, because it will get used to it.

You’ve basically taught NuPIC two patterns here, and it’s not going to predict the new pattern well at all until it has seen enough data points to lock it in and forget about the first pattern.


#3

thanks for your answer.I am a beginner of this algorithm.Could I go further on this issue?
The chart above does show that the algorithm fits into the old pattern, but this is not what I want. Because originally I thought the algorithm’s prediction was based on the column that represents time (that is, the first column of the two columns of data), but the situation does not seem to be like what I think, it seems like that the algorithm makes predictions based on predictive column itself.Am I correct?

Thank you very much


#4

In NuPIC, many fields might contribute to prediction. That is why adding a timestamp to your data can help, because then NuPIC can associate data with time of day, day of week, etc.