Anamoly Model Detection

Prabal_Kumar · January 9, 2019, 12:49pm

Hi,
I created a model for anamoly detection without swarming while testing the model on a real stream of data i had certain findings,

The anamoly likelihood of the model becomes constant after learning certain pattern it stop changing after about reading 1000 points from my stream.
I had around 29 features fed to the model i just wanted to know is their any way i could find optimize value of n and w for all my features .
how can i improve my model please help
I also had one more concern is it fyn to provide these many feature as input to a nupic anamoly detection model
Is my anamoly detection model purely depedent on the predict field or does it take the entire 29 column into consideration. while finding anamoly likelihood

rhyolight · January 9, 2019, 3:43pm

Let’s start here. To optimize, we need to understand these features better. What do they represent and how are they correlated? What are their original data types, and how are you encoding them?

Prabal_Kumar · January 10, 2019, 5:41am

Hi rhyo,

My main objective is to find server anamoly in real time so the major parameter are load on the server per min per hour per second ,memory param like disk read count disk write count .The real stream gives me 29 features of server data .
Actually while going through one video i got to know that correlation of feature actually does not affect anamoly detection it takes into consideration all the features and learn the pattern accordingly.

Prabal_Kumar · January 10, 2019, 5:45am

Thats why i included all the features that i got from the stream…
Most of the feature and either integer or float value so i encoded by adaptive scalar encoder and thier have been some features which are categorical in nature so i used SDRCategory encoder

sheiser1 · January 10, 2019, 7:30am

A couple quick recommendations:

Try the Random Distributed Scalar Encoder (RDSE). My understanding is that it is usually favored over the Adaptive Scalar Encoder in practice.
Try one model for each feature, or at least 5 or less per model. Too many features can basically muddy the waters and make predictive signal harder to find. Having multiple features correlated to each other usually doesn’t help, so if you can identify which features behave similarly you can drop those that are redundant.

Prabal_Kumar · January 10, 2019, 10:39am

k …i will try it out and let u know my findings

rhyolight · January 10, 2019, 5:14pm

Here is one way to try to find out what input fields are correlated to one important field. It might help, but it will take some programming.

A swarm is going to try to optimize model parameters for prediction of one input field. So in order to swarm you have to pick one field of input as your predictedField. If you’re just doing anomaly detection, it may be hard to figure out what field that is, but I would choose one with the most obvious patterns (least noisy). The swarm will return model params that only include encoders for the fields it found affected the prediction accuracy. In many cases, the only field worth encoding is the predictedField, meaning processing the rest is wasted (you mentioned this above). But it hopefully will encode other fields that will indicate that those are the good ones to feed into NuPIC.

So my advice is to reduce the amount of input fields, 29 is too many. Analyze your data a bit to reduce it to a few important input fields.

Prabal_Kumar · January 22, 2019, 8:20am

Hi @rhyolight and @sheiser1 thank you i am able to detect a predicted field by reducing the features from 29 to 5. and RDSE is really better in giving output at real time.
I wanted to know how to provide mulitple predict field while building anomoly detection model.
Please help me for that.

rhyolight · January 22, 2019, 5:39pm

I always hate pointing this out but… the code to extract multiple predictions has not been written yet. See Predicting Multiple Output Values.

sheiser1 · January 22, 2019, 5:41pm

To pass in multiple fields you need to setup the model params file for it. I’d recommend looking at this example:

github.com

numenta/nupic/blob/master/examples/opf/clients/hotgym/anomaly/one_gym/model_params/rec_center_hourly_model_params.py

MODEL_PARAMS = \
{ 'aggregationInfo': { 'days': 0,
                       'fields': [],
                       'hours': 0,
                       'microseconds': 0,
                       'milliseconds': 0,
                       'minutes': 0,
                       'months': 0,
                       'seconds': 0,
                       'weeks': 0,
                       'years': 0},
  'model': 'HTMPrediction',
  'modelParams': { 'anomalyParams': { u'anomalyCacheRecords': None,
                                      u'autoDetectThreshold': None,
                                      u'autoDetectWaitRecords': None},
                   'clParams': { 'alpha': 0.01962508905154251,
                                 'verbosity': 0,
                                 'regionName': 'SDRClassifierRegion',
                                 'steps': '1'},
                   'inferenceType': 'TemporalAnomaly',

This file has been truncated. show original

Its basically a big nested dict structure, where ‘modelParams’ contains ‘sensorParams’, which contains ‘encoders’. Within ‘encoders’ there are sub-dicts for each field, in this case ‘kw_energy_consumption’, ‘timestamp_dayOfWeek’, ‘timestamp_timeOfDay’ and ‘timestamp_weekend’.

Each of these fields is encoded separately, then they’re all combined into one which is input to the Spatial Pooler & TM. Each different data type obviously has its own set of encoding parameters, and once you fit them accordingly the multi-encoder will automatically combine them into one model.

You can get a sense why its good not to have too many input fields, since the one model has to represent more dimensions the more fields there are. Along with your own dimensionality reduction approach it may be worth it try swarming too.

– Oops, seems I’ve answered the wrong question

Prabal_Kumar · January 23, 2019, 5:43am

Hi @rhyolight sorry to bother u i already went through all these article …i just wanted to confirm.

Topic		Replies	Views
I have some realtime server data with total of 17 feature(column) in my dataset NuPIC question	5	535	December 11, 2018
Regarding Anamoly Detection problem NuPIC	4	467	February 25, 2019
Anamoly Likelihood value for my model do not change after running the mode for 45 min NuPIC	2	291	February 11, 2019
Anomaly detection for multi features NuPIC	15	1819	May 15, 2019
How much should I trust swarm results? NuPIC swarming	38	1905	March 10, 2019

Anamoly Model Detection

Related topics