Multi-variate data

Hi all,

I have question-related to multi-variate data. I have to find anomalies in 2000+ sensors of different characteristics, some sensors may have trend in it.

Will HTM handles trend and find the anomalies? or do we need to remove trend?

Also, do we need to have 2000+ HTM models? and how much time it may take for time series having 200,000 points?

1 Like

Are they sampling the same physical phenomenon or are they independent? Is it the same type of signal source for each sensor (e.g. temperature) or are there multiple modalities?

1 Like

I think removing the trend would be good if possible, because HTM does well with periodic data, so if removing trend makes it less drifty and more periodic that’s good.

You certainly can’t fit 2000+ sensor values into a single model, so yes 1 model per sensor is a better way to scale. If there are clusters of sensors which are highly related to each other you could make fewer models which each contain several sensor values (for instance 400 models with 5 sensors each). I think 1 model per sensor is the easiest way.

Also if there are certain groups of sensors which are measuring very similar things, there may be a lot of shared signal among them. If that’s true you could drop out whatever sensors are redundant, but that also means some exploratory analysis beforehand.

I don’t have a precise answer for this one – depends how your env is setup. Obviously if you could run all these models in parallel it’l be much much faster.

There’s also the issue of setting encoder params for each sensor. Assuming they’re not all the same unit of measurement following the same distribution, each sensor should have custom encoder settings as per its distribution.

I use a simple rule of thumb, where I sample the first x values for each sensor, calculate some custom min/max for each and generate the encoder dict based on that.

Happy anomaly detecting!