Anamoly Model Detection


#1

Hi,
I created a model for anamoly detection without swarming while testing the model on a real stream of data i had certain findings,

  1. The anamoly likelihood of the model becomes constant after learning certain pattern it stop changing after about reading 1000 points from my stream.

  2. I had around 29 features fed to the model i just wanted to know is their any way i could find optimize value of n and w for all my features .

  3. how can i improve my model please help

  4. I also had one more concern is it fyn to provide these many feature as input to a nupic anamoly detection model

  5. Is my anamoly detection model purely depedent on the predict field or does it take the entire 29 column into consideration. while finding anamoly likelihood


#2

Let’s start here. To optimize, we need to understand these features better. What do they represent and how are they correlated? What are their original data types, and how are you encoding them?


#3

Hi rhyo,

My main objective is to find server anamoly in real time so the major parameter are load on the server per min per hour per second ,memory param like disk read count disk write count .The real stream gives me 29 features of server data .
Actually while going through one video i got to know that correlation of feature actually does not affect anamoly detection it takes into consideration all the features and learn the pattern accordingly.


#4

Thats why i included all the features that i got from the stream…
Most of the feature and either integer or float value so i encoded by adaptive scalar encoder and thier have been some features which are categorical in nature so i used SDRCategory encoder


#5

A couple quick recommendations:

  • Try the Random Distributed Scalar Encoder (RDSE). My understanding is that it is usually favored over the Adaptive Scalar Encoder in practice.

  • Try one model for each feature, or at least 5 or less per model. Too many features can basically muddy the waters and make predictive signal harder to find. Having multiple features correlated to each other usually doesn’t help, so if you can identify which features behave similarly you can drop those that are redundant.


#6

k …i will try it out and let u know my findings


#7

Here is one way to try to find out what input fields are correlated to one important field. It might help, but it will take some programming.

A swarm is going to try to optimize model parameters for prediction of one input field. So in order to swarm you have to pick one field of input as your predictedField. If you’re just doing anomaly detection, it may be hard to figure out what field that is, but I would choose one with the most obvious patterns (least noisy). The swarm will return model params that only include encoders for the fields it found affected the prediction accuracy. In many cases, the only field worth encoding is the predictedField, meaning processing the rest is wasted (you mentioned this above). But it hopefully will encode other fields that will indicate that those are the good ones to feed into NuPIC.

So my advice is to reduce the amount of input fields, 29 is too many. Analyze your data a bit to reduce it to a few important input fields.