What should I do about gaps of varius interval in the timestamps of my CSV?


Firstly, I’d like to say I’m really happy I found HTM Theory and this community! I started having similar ideas and concepts all the way back in 2006 (About when you guys started?), And I always knew us like-minded “Hierachical thinkers” will come together into a community some day :slight_smile:

As my first practical step using HTM, I’m a simply attempting to load a CSV I created into HTM Studio. The timestamp is Unix timestamp in miliseconds. However, the timestamp doesn’t occur at the same interval for every row. It is quite random (at the millisecond/second resolution), and I’ve also combined both 1-minute-resolution data and 1-second-resolution-data in the same CSV. Rows are only added to the CSV in the event that a change in one of the columns occurs.

My questions are:

  • Does it matter to the anomaly detection model if there are such gaps or randomness in interval?
  • Does it matter to swarming or the prediction model if there are such gaps or randomness in interval?
  • What can I do to change my data to better suit HTM Studio, if the gaps/randomness are indeed a problem?

For example, I’ve learned that in Machine Learning Algorithms, you need to basically copy the data from the last timestep into the current timestep, when you don’t have any new data for a given timestep, in order to preserve the last known data point for the Algorithm in that way.

The CSV file: https://pastebin.com/raw/j7mjCVXw


1 Like

Thanks for the kind words. :slight_smile:

Probably not. Remember that time is part of the encoding, so interval won’t really matter. There is only one dimension of time for HTM, same as us… the dimension of each moment coming after the previous one. But this dimension is not necessarily linear. As you have noted, sensors may not have regular sampling intervals when gather data. Chemicals in the brain may also affect how much energy is dedicated to processing certain important moments (fight or flight).

For extracting predictions, however, random intervals makes the problem much harder, because have to think about these moments and predict what comes next. To do this, we have to understand the function of time in the equation. If it is linear (all data points are equally spaced), this makes the problem easy. But if there is any randomness, the problem becomes much, much harder.

Don’t use swarming if you are doing scalar anomaly detection. Swarming is tuned to a predicted field, and getting the best predicted values for that particular field. It doesn’t translate well to anomaly detection, IMO. We have a set of model params we use for scalar anomaly detection, I would start there.

You can pre-process your data by aggregating it into a regular interval. There are many ways to do this, which are outside the scope of this conversation. (If someone has advice, please break off a thread from this post.)

But if you just want anomalies, it won’t be a problem. I just ran your data through HTM Studio, and it seems to work:

At least I assumed this was the anomaly you were expecting it to find? It saw it both in volume and close.