I have a question related to applying HTM on real data. What if my data is not exactly regularly spaced, ie.e the measurements are coming after every 30sec approximately, sometimes (29sec,30sec,31 sec)) and comes from a real process and has an obvious daily seasonal pattern. Is it required to be aggregated up to certain extend? Thanks
It shouldn’t be a problem as long as the intervals vary only slightly, but it may be a good idea to aggregate to a larger time interval anyway. The system will learn pattens faster the shorter they are, so if there’s a daily pattern that can observed by sampling every 30 minutes it will be learned much faster than if sampled every 30 seconds. If there are different patterns only visible minute by minute thats why you’d keep the 30 second sample rate, but if larger time chunks are still viable I’d do that.
Here’s the source for the datetime encoder. It’s not set by default to handle sub-minute sampling, so if do keep the 30-second rate it would need some tweaking (in the ScalarEncoder it calls on line 226), or even dropping that encoder entirely and feeding in only the raw data.