My main objective is to find server anamoly in real time so the major parameter are load on the server per min per hour per second ,memory param like disk read count disk write count .The real stream gives me 29 features of server data .
Actually while going through one video i got to know that correlation of feature actually does not affect anamoly detection it takes into consideration all the features and learn the pattern accordingly.
Thats why i included all the features that i got from the stream…
Most of the feature and either integer or float value so i encoded by adaptive scalar encoder and thier have been some features which are categorical in nature so i used SDRCategory encoder
Try the Random Distributed Scalar Encoder (RDSE). My understanding is that it is usually favored over the Adaptive Scalar Encoder in practice.
Try one model for each feature, or at least 5 or less per model. Too many features can basically muddy the waters and make predictive signal harder to find. Having multiple features correlated to each other usually doesn’t help, so if you can identify which features behave similarly you can drop those that are redundant.
Here is one way to try to find out what input fields are correlated to one important field. It might help, but it will take some programming.
A swarm is going to try to optimize model parameters for prediction of one input field. So in order to swarm you have to pick one field of input as your predictedField. If you’re just doing anomaly detection, it may be hard to figure out what field that is, but I would choose one with the most obvious patterns (least noisy). The swarm will return model params that only include encoders for the fields it found affected the prediction accuracy. In many cases, the only field worth encoding is the predictedField, meaning processing the rest is wasted (you mentioned this above). But it hopefully will encode other fields that will indicate that those are the good ones to feed into NuPIC.
So my advice is to reduce the amount of input fields, 29 is too many. Analyze your data a bit to reduce it to a few important input fields.
Hi @rhyolight and @sheiser1 thank you i am able to detect a predicted field by reducing the features from 29 to 5. and RDSE is really better in giving output at real time.
I wanted to know how to provide mulitple predict field while building anomoly detection model.
Please help me for that.
To pass in multiple fields you need to setup the model params file for it. I’d recommend looking at this example:
Its basically a big nested dict structure, where ‘modelParams’ contains ‘sensorParams’, which contains ‘encoders’. Within ‘encoders’ there are sub-dicts for each field, in this case ‘kw_energy_consumption’, ‘timestamp_dayOfWeek’, ‘timestamp_timeOfDay’ and ‘timestamp_weekend’.
Each of these fields is encoded separately, then they’re all combined into one which is input to the Spatial Pooler & TM. Each different data type obviously has its own set of encoding parameters, and once you fit them accordingly the multi-encoder will automatically combine them into one model.
You can get a sense why its good not to have too many input fields, since the one model has to represent more dimensions the more fields there are. Along with your own dimensionality reduction approach it may be worth it try swarming too.