About Timestamps in OPF & Reset

question

#1

Hi all,
I’m confusing with the timestamps in OPF.

I have data with time interval of 1 min.
But there are some point where the date jumps.
(This means that data collection failed for few days and can happen in real world.)

timestamp, other fields exist but not shown here
2016-08-16 20:54:00
2016-08-16 20:55:00
2016-08-16 20:56:00
2016-08-16 20:57:00
2016-08-16 20:58:00
2016-08-16 20:59:00
2016-08-16 21:00:00
2016-08-16 21:01:00
2016-08-16 21:02:00
2016-08-16 21:03:00
2016-09-12 00:01:00
2016-09-12 00:02:00
2016-09-12 00:03:00
2016-09-12 00:04:00
2016-09-12 00:05:00
2016-09-12 00:06:00

I used this data I ran OPF (anomaly detection and 1-step prediction) and got anomaly scores and predicted values.

1) Predicted value
In this case, what is the meaning of the next value predicted based on the values upon 8/16/2016 21:03:00?
I guess it’s value predicted for 8/16/2016 21:04:00, which is not that I wanted to get.

Then, what about the predicted value based on the values upon 2016-09-12 00:01:00? Is it for 2016-09-12 00:02:00.
Is my understanding right?

2) Reset
To my understanding, I can think of using “reset” in the 2016-09-12 00:01:00.
Does “reset” lose information learned before 2016-09-12 00:01:00 and begin whole new learning?
Or, just ignore transition from 2016-08-16 24:04:00 to 2016-09-12 00:01:00 and resume learning?

Predicted values are the same no matter calling model.resetSequenceStates() function in the begining of 10/13 data but raw anomalyScore and anomalyLikelihood changed like below.

Result with reset called on 2016-10-13 11:47
timestamp actualValue PredictedValue rawAnomalyScore anomalyLikelihood
2016-10-10 15:36 6.3 6.279269193 0.025 0.840672164
2016-10-10 15:37 6.3 6.285488435 0.025 0.539804591
2016-10-10 15:38 6.1 6.289841904 0 0.651632888
2016-10-10 15:39 6.3 6.292889333 0.025 0.651632888
2016-10-10 15:40 6.3 6.235022533 0.025 0.651632888
2016-10-13 11:47 7.2 6.254515773 1 0.730254637
2016-10-13 11:48 7.4 6.268161041 0.875 0.931660019
2016-10-13 11:49 7.2 6.547712729 0.25 0.952377002
2016-10-13 11:50 7.4 6.80339891 0 0.931660019

Result without reset
timestamp actualValue PredictedValue rawAnomalyScore anomalyLikelihood
2016-10-10 15:36 6.3 6.279269193 0.075 0.605100109
2016-10-10 15:37 6.3 6.285488435 0.025 0.503793428
2016-10-10 15:38 6.1 6.289841904 0.025 0.616145098
2016-10-10 15:39 6.3 6.292889333 0.025 0.605454426
2016-10-10 15:40 6.3 6.235022533 0 0.616145098
2016-10-13 11:47 7.2 6.254515773 0.625 0.656029388
2016-10-13 11:48 7.4 6.268161041 0.3 0.742986019
2016-10-13 11:49 7.2 6.547712729 0.275 0.809383258
2016-10-13 11:50 7.4 6.80339891 0 0.733915274

In this kind of analysis, which one should I take?
And I’m curious why the predicted value did not change at all.

3) Aggregation in swarming
What if I want to aggregate data into 5 min blocks?
Does NUPIC automatically do this for only those data having consecutive timestamp or collapse any five consecutive records no matter the actual timestamp (e.g. mix of data from 8/16 and 9/12)?

Thank you in advance.


#2

hello , I also suffered this question ,and I do not know why, my opinion now is that the algorithm just make prediction based on the predicting column ,and do you know what is the meaning of the aggregationinfo?

thank you


#3

I guess I never responded to this initially. Sorry!

It depends on your model params. If you are predicting 1 step ahead, the prediction will be for one minute in the future. If you’re predicting 5 steps ahead, 5 minutes in the future. This is only valid if all the previous data intervals have been at 1 minute.

It is trying to predict both time and scalar value, but it won’t be good because there was just an abnormal interval. See this topic for more info:

I would not reset in this case. A reset is like manually telling the algorithm that a sequence just ended, and it should sever any current segments that are trying to learn the next point. It ends the sequence. You would only do this if there were a cycle you could identify in the input that was not already described by time itself. What I mean is you don’t have to reset for end of day, or hourly, because those semantics are already a part of the encoding.

You might want to reset if the input data is associated with a machine that just went through a cycle, and was starting again. More info on this here.

My suggestion around aggregation has always been to do it manually, not rely on the swarming process to do it for you. You’ll have better success this way. How you aggregate depends greatly on the structure of your data. The main rule is to chart the aggregated data and see if you, as a human, can see patterns in the chart. If so, NUPIC will be more successful. If you cannot find any patterns, maybe there are none to find at that aggregation level.


#4

Can we manually aggregate through the model params here? Like setting ‘minutes’ to 5 for the 5 minute agg? Or do you mean modify the data file itself in preprocessing? Thanks

'aggregationInfo': 

{ 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0}

#5

I’ve never used the aggregationInfo section of the model params, myself. I’ve always manually aggregated my data.