NAB: codependent metrics from a single system?

anomaly-detection
question
prediction
multiple-inputs

#1

Does any of the NAB datasets provide different metrics from the same physical system, that would therefore provide different views into the same behavior? For example, do the different realAWSCloudwatch metrics come from the same machines, or from somehow correlated machines (maybe single cluster)?

The idea is to get a set of “codependent” metrics, in the sense that information from one of them can help predict another. If NAB’s metrics don’t have this property, I’m researching a few possibilities in Kaggle – but any suggestions are more than welcome :slight_smile:


#2

Good question, I don’t know the answer though to search for ‘codependent’ metrics you could theoretically use swarming. Just set the predictedField to whichever metric, then the swarm will tell you which other(s) were found most helpful in predicting that metric (along with optimal hyperparameter settings). You could do this for as many predicted metrics as you like and keep track of the ‘codepencies’.


#3

True, we could perform statistical tests of the predictive power of some of the metrics on others. In this case though, I’d prefer a theoretical reason to expect a codependence, such as the metrics coming from different sensors on the same system.

Let’s say that the system (which you only know through the metrics) is in a state A; then all the metrics will embed some info about state A. Then the system transitions to state B, and so the metrics will now embed info about B. (Hidden Markov Model :slight_smile: )
If my task is to guess the state of the system from the metrics, and no metric is sufficiently informative on its own, maybe I can combine their info. This is the scenario I’d like to work on with NAB.


#4

Agree for sure, without something tying the metrics together the codependencies found would be coincidental.


#5

I spend considerable effort in my daytime job sorting out the 4 basic measurement error sources (linearity, span, zero/offset, non-repeatability) and the way these things can interact in electronic scales.

How hard can that be, really?

If you take the basic binary table of 2 inputs & one output as possible ways that two two errors might combine, toss in variable scaling between them for each of the listed error terms - you are already deep into the realm of pseudo-random numbers.

Start to troubleshoot mechanical systems with many components bolted together and you move firmly from science to the dark arts. The flex of a support beam might turn out to be as much as the deflection of the load cell it is supposed to be supporting. The output from that sensor may be very complex; and it could well have some very messy relationship with other sensors on the same system.

Good luck with putting ear tags on your inputs and sorting them out as they work though your system.


#6

I don’t think so.


#7

@rhyolight Thanks. I am considering the data used in this paper instead: https://arxiv.org/abs/1703.07015
I found it here: https://paperswithcode.com/task/multivariate-time-series-forecasting
Their code and data is open on GitHub.

I’ll report back on how clean and useful these data are and what results I can get with them. Maybe add them in NAB after?