Outlier prediction using nupic

Millan · May 25, 2018, 11:24pm

I have a time series dataset of around 5k records.There are some outlier values counting 10.My aim is to predict these outlies…For that i tried opf and network models and also did different changes to the parameters like modifying resolution for encoding,alpha value but the outlier prediction is not happening…offcourse it is improving the prediction of commonly seen values.

So I wanted to know if outlier prediction is possible using nupic?If yes…how can i do that…should i change any specific parameter or any codebase which i can use.
Thanks in advance.

rhyolight · May 26, 2018, 3:17pm

Show me the data!

Millan · May 28, 2018, 1:11pm

Here is the data.
https://www.pastiebin.com/5b0bff787829a

Thanks.

rhyolight · May 29, 2018, 4:45pm

@Millan I ran your data through HTM Studio after changing the date format so it would work. I saw anomalies around Apr 9 in your data.

What are the outliers you are expecting it to catch? I don’t see any.

Millan · May 31, 2018, 9:57am

I am aiming for prediction not anomaly detection.
In the dataset ,the outliers are the values those are more than or equal to 90 . For these actual values, the prediction values are coming very less .Even for the actual values of more than or equal to 80, the predictions are coming less not impressive.

So basically i want to improve the predictions for higher values.

sheiser1 · May 31, 2018, 12:09pm

Do these higher outliers occur with any pattern? Like do they tend to come after certain values or sequences? The system can definitely predict those higher values, but only if they occur as part of some discernible pattern. The more complex the pattern the more repetitions it will take to learn. If they’re basically noise values on the other hand then it won’t predict them and they’ll produce high anomaly scores.

Also it may be worth trying a simple Scalar Encoder with a max-val equal to the minimum out-lier value. This will clip all higher inputs to 90 for example, so the encoder would group anything above 90 into the same encoding (basically a catch-all ‘out-lier’ category). This could help, though only to the extent that the outliers occur with some kind of temporal pattern.

rhyolight · May 31, 2018, 3:34pm

Prediction is harder. I suggest you swarm over your data and see what parameters it comes up with if you have not already. How far ahead are you trying to predict?

Millan · June 6, 2018, 11:13am

Thank you Sheiser1…I ll try out this approach.

Millan · June 6, 2018, 11:16am

@rhyolight…I have done the swarming. On the swarming result, i did different tuning on the parameter which improved prediction , but for higher values i am yet to see good result.I am trying to predict 4 step ahead.

rhyolight · June 6, 2018, 3:13pm

Thank you for trying. It is hard for the system to predict outliers like without manual resets. For example, if you know the period of the pattern occurring, you could tell the TM to reset its current sequence state, essentially cutting off what it is learning. On the next time step it will start to learn a new sequence. These outliers are parts of a periodic pattern, and the only way to nail it down really is to reset at the same point in the period.

This is a crappy answer, I know, but we honestly don’t really know how temporal patterns are reset in the brain, I talked about this with Jeff in a video.

Millan · June 8, 2018, 8:25am

The higher values mostly occurs somewhere between current day 9pm to next day 4am. So should I reset the sequence at 9pm or at the start of the day(at 0:00 am)?

Currently I am resetting the sequence at start of the day(0:00) which is after 96th records(counter=95) of each day. As I am using algorithm API, below is the code snippet.

    tm.compute(activeColumnIndices, learn=True)
    activeCells = tm.getActiveCells()
    if counter % 95 == 0:
      tm.reset()

Please suggest if this is the right way to do or should i stop learning as well?

rhyolight · June 8, 2018, 2:55pm

That seems right to me. I’m not sure what advice to give you. We have found that HTM in this form provides the most value giving anomaly indications, not predictions.

Topic		Replies	Views
How HTM predict peaks in a simple pattern NuPIC	1	588	September 13, 2018
Nupic Anomoly Detection NuPIC question	1	488	June 7, 2019
Difference between Actual and Prediction is high but anomaly score is low NuPIC multiple-inputs	7	2630	July 3, 2017
Is my data being predicted correctly? NuPIC usage-help	37	6451	August 9, 2016
Getting Periodic Spikes in Prediction and no change in anomalyscore NuPIC anomaly-detection , prediction	3	647	April 13, 2018

Outlier prediction using nupic

Related topics