Hi all,
I’m new to HTM and would like to ask for your advice on anomaly detection in multivariate data
The data looks like the following
timestamp, category, value
03/04/21 07:10, CAT1, 5
03/04/21 07:10, CAT2, 2
03/04/21 07:10, CAT3, 3
03/04/21 07:10, CAT4, 7
03/04/21 07:10, CAT5, 1
03/04/21 07:15, CAT1, 9
03/04/21 07:15, CAT2, 3
03/04/21 07:15, CAT3, 4
03/04/21 07:15, CAT4, 2
03/04/21 07:15, CAT5, 6
03/04/21 07:20, CAT1, 2
03/04/21 07:20, CAT2, 21
03/04/21 07:20, CAT3, 44
03/04/21 07:20, CAT4, 4
03/04/21 07:20, CAT5, 12
I played with hotgym example using htm.core (python 3.7) and applied it for a single category anomaly detection (using only timestamp and value columns of a chosen category).
Now, I’m having difficulty understanding how I can apply that example with multiple categories.
I have a few days of data with 1-minute interval of over 1000 categories which amounts to over 5 million records in the above format. The categories are not fixed in number as new categories may appear as time progresses.
My usecase is, I have to find anomalies in each category and also if any of the categories is deviating from the rest.
Can anyone guide me (preferably with a sample or settings) for multivariate data in htm.core and how should I structure my data?
Thanks for any guidance.