Generating corelation between input variables

Hello everyone,

I have been working with the nupic on-line learning models. I have tried tweaking the hot gym example to do various sets of predictions and to detect anomalies in data. However i was wondering if there is a way to co relate between the input variables so as to detect which combination has actually triggered the Anomaly.

For example if there are 5 input sources and 1 anomaly variable. I would like to understand which of the 5 variables caused anomaly and in which combination.

Any help in this direction will be appreciated.

Hi raghuram,

From what I understand I think you’ll have to change your approach in order to viably attribute the anomaly some subset of your input fields. This is because NuPIC takes any number of input fields and condenses them into a single SDR (sparse distributed representation of neurons) at each time step. This internal SDR doesn’t maintain any separation of the input fields that went into forming it. This is what makes your objective difficult. NuPIC takes this input SDR and uses it to output a prediction in the form of another SDR. The anomaly score is basically the difference between the SDR of next input and the SDR NuPIC predicted. In other words, NuPIC is sort of saying: “Having seen the prior 5-dimension input, I’m this much surprised by the current 5-dimension input.” It doesn’t say “I’m this much surprised by field 1, this much surprised by field 2, … etc.”

One simple recommendation could be to learn 5 separate NuPIC models, one for each input field. Look at the anomalies picked up by each of these and see which of these models(s) found your specific anomaly. If the 5-dimensional NuPIC model is raising an anomaly, it means that at least one of the fields is doing something unexpected, but the SDR doesn’t know which. Taking it apart into 5 models could help shed some light on this. Another idea is to run a swarm on the current 5-dimensional data set, which can help tell which subset of the fields is most useful for making accurate predictions.

I of course stand open to correction from anyone, that’s just the basic intuition I have on this.

– Sam

1 Like

@sheiser1 Thanks for the reply.

You are right about the SDR`s and how it makes it difficult to relate the input variables to the anomaly.
However the issue with creating separate models is that in several scenarios these variables would be inter dependent or two or more variables can cause the Anomaly.

Swarming definitely gives some insight into the data and that doesnt exactly point what cause that spike at a particular time in the data. Trying to see if it is possible someother way.

1 Like

Gotcha. Well you could also run models on all the possible subsets of input fields. That would certainly be a fair amount of models (20 I believe) and somewhat tedious, though I think would get you there if not for a more efficient way. Let me know if you have success this way or another.