I am building a system which consists of three different columns, one is timestamp, value and a category. I want to know will HTM Anomaly Detection system work if I simply create Multi Encoder with all three columns or I will have to separate the time series as per category before feeding? There can be potentially categories of order ~10^4
Anomaly detection should work either way. See details of anomaly score calculation in API docs for more.
Having so many categories might be a problem. At a minimum you’ll need 104 bits in the input space, and I don’t like having only one bit represent a discrete data point. I would rather spread it out so several bits are used. Is there any way to generalize over the categories to narrow them down?
No each of them is independent of other and with time(over years) they will keep on increasing also.
Are some of these categories more important to monitor than others?
Those categories are actually different applications on IT Infrastructure. We offer user drop down to select the application so all are equally important, moreover anomaly in any one of the system will trigger alert.
Also in single time multiple application may be active to give some value. Rather than just timestamp be unique, it <timestamp,category> which is unique. Do you think this use case can be handled using HTM
I wonder if pure volumes would be a pattern. Are there common values that can be accumulated across categories? Perhaps they have patterns.
If you could provide more information for each “category”. What I mean is a better way to encode that data than each one being unique. There must be some similarities between categories, right?
The categories are various applications running in some server. We are designing system to monitor various parameter related to those application. An application may be added in time or may be decommissioned. I am using flink-htm implementation as the data was being continuously stream.
Although as far as my understanding is all applications are for unique operations and there isn’t any pattern in them. All though they run of limited hosts and one application may be running on more than one host
It sounds like if you were to use HTM for this problem, you would need one HTM model for each category, which is likely not possible. I’m not sure this is a good fit until we have better HTM hardware options.
Yes, this was the limitation which i was facing when i thought of encoding the category also. Anyway I will see if there is some similarity in the categories. Will get back in a day or so