How multi step prediction works on irregular time series data ?
for ex :
I have the data in the below format (time gap between two events are not fixed ) ,
how current algorithm calculates future steps ?
Is it based on previous time gaps and values ?
what will be the time gap between current step and future step(say P4) ,Is it 4 *( avg of previous time gaps ) or numenta uses different statistical approach
Date/time information is encoded along with the scalar value, thus associating that value with a particular timestamp. You can see how this is done in the HTM School videos for datetime encoding and SP input space.
Do if you provide a timestamp for each point, it does not matter if the interval is irregular. However if you don’t provide the timestamp, it could matter a lot more because it’s assumed the data points are coming in at regular intervals.
I’m not sure that is a valid assumption. It really depends on the data it has seen and the timestamps it saw them. If it has only seen the sequence once or twice, it will have a worse prediction. However, it should generalize so the same values in a sequences at approximately the same times will be recognized.
I’m not sure that the suggestion of Matt will work because htm understand that data come at the regular time interval. The puting both datetime and a data value into multi-encoder is understood as same as puting two scalar values into multiencoder. I think nupic should handle datetime as a special data.
About datetime encoder:
Why do not we use datetime as UTC time and than use of scalar encoder for encoding it?
What advantages of datetime encoder vs scalar encoder(after converting datetime into utc time)?
Are you referring to how the DateEncoder uses periodic scalar encoders? Yes we use the scalar encoder because its easy, but we are not encoding numbers, we’re recording meaning. The meaning of the concept of calendars we humans all choose to use. It’s a hack, because we humans learn this calendar through years of experience. But the rules of the calendar aren’t that complex, especially when the encoding difference between Feb 28 and Jan 1 is negligible. Because sometimes it doesn’t matter if Nov 23 & Nov 28 are encoded as “the end of a month”. And the DateEncoder’s semantics are configurable, so if you want to give more meaning to “day of week” than “time of day” it’s easy.
You are right, it is special because it is a contextual feature of data, not the data itself. In the neocortex, date-time semantics are almost certainly not acquired this way. It’s a brain hack to provide this data as a feature of the input data instead.
You might notice I used the term “hack” a lot in this post, and it’s because I’m talking about encoders, and they are not very biological. They are honestly all hacks, or kludgy first attempts at semantic SDR encoding. I look forward to more encoder hacks because I can see there are so many ways you might encode semantics into an SDR stream.
I’m not sure if my ramblings above gave any insight into this, but I don’t think the time zone matters at all, as long there is consistency.
@rhyolight thank you for your explain. By the way of encoding the meaning of the datetime together the sensor data i believe that we can handle the irregulare timestamp. But if i remember some discussions ago about using neuromorphic sensors with nupic, some people like you said that nupic does not support irregular timestamp of those sensors.
Could you pls confirm this again?
can we use datetime for handling Fast sensor data(with time step in milliseconds or microseconds)? If not, any solution suggestion?
If you input rows with irregular timestamps between them, that is fine if you are feeding the datetime as one of the fields in the row, because you’re encoding the time along with the data. That is fine.
But if you have irregular data rows with NO encoded datetime semantics with the input row, it will get confused because it can’t tell the time context for each row.
@rhyolight For my applications we have all dara with time interval of milliseconds. Before encoding my timestamp with my data, i think in principle we can do in two ways:
use 2 scalar encoders, one for timestamp and one for my scalar data and input the union of 2 sdr into cla
use dataencoder, modified for handling timestamp with seconds and milliseconds + a scalar encoder for my data
The 2nd way looks better by using the time meaning. What do you think?
What is the typical interval between data points? For subsecond streams, I don’t think encoding date/time will help predictions, because the DateEncoder focuses on much larger concepts of time like hours in a day.
Those events are triggered by human activity, so there is no interval. Also, time is very important for that use-case because human activity is very time-based. So we certainly to encode timestamp info for these events.
For rogue behavior detection, it really depends on human activity. A human might go home over the weekend, so there would be no activity for over 48 hours, but then all during they week there could be constant activity during work hours.
Hi all! First Post. Just wanted to pose a potentiality naive question to to this topic. If we have irregular timestamps associated with the scalar data, what does a 1,5,10 etc prediction step look like? In the hot gym example the prediction is based off the 1 hour interval in the csv data file, and therefore a step is 1 hour ahead or 5 hours etc. Thanks!
The model treats records as if they are at regular intervals so you’d need to aggregate your data into regular periods, which would also give meaning to your prediction steps.