Data sampling frequency/interval questions

Hi everyone,

I’m trying to play with NuPic to detect anomalies in really fast data that can have irregular intervals (500ms, 1s, 2s, 10s) and that can be really noisy (orderbook changes).

Here are my questions:

  • Should use a sliding window of N seconds (which will produce data at irregular intervals)
  • Should I use a tumbling window of N seconds (will maybe skip some anomalies that could happens in those N seconds)

Also, as you may know, this type of data can be really noisy, I was thinking of using a sliding window of 1 minute, but this will still produce data at irregular intervals (sliding windows aggregate data at every new datapoint, and those datapoints don’t come at regular intervals).

I also saw in the OPF that there is a window aggregation option (also in HTM Studio), but I guess this will use a tumbling window right ? Is there a built in option for sliding windows ? If not and if those windows can be useful, I may contribute and add it to avoid doing this processing externally.

Thanks !

I don’t know how to solve this with NuPIC today. Does anyone else have ideas?

The problem is that it is hard to label very fast data with time encoding that has meaning. So you can’t encode the time and have the system understand it well. The fact that data is close to the end or beginning of a 300ms period is useless to encode.

So if you can’t encode time into the representation, you have to have regular intervals to build a consistent model of reality. Imagine if your brain suddenly started to get randomly chosen intervals of input data. You would totally freak out.

Oh I see, your comparison made me understand.

So in the case that speed is not an issue but interval is, the best thing to do is to use a “hybrid” windows, which is a tumbling window of N but that outputs every n (1mn length and outputs every second for example).

I heard multiple times that data is “too fast”, for example in one of the demos with the jury, one member of the jury said that an interval of 30s was too fast, but I didn’t understand why.