Outlier prediction using nupic


#1

I have a time series dataset of around 5k records.There are some outlier values counting 10.My aim is to predict these outlies…For that i tried opf and network models and also did different changes to the parameters like modifying resolution for encoding,alpha value but the outlier prediction is not happening…offcourse it is improving the prediction of commonly seen values.

So I wanted to know if outlier prediction is possible using nupic?If yes…how can i do that…should i change any specific parameter or any codebase which i can use.
Thanks in advance.


#2

Show me the data! :wink:


#3

Here is the data.
https://www.pastiebin.com/5b0bff787829a

Thanks.


#4

@Millan I ran your data through HTM Studio after changing the date format so it would work. I saw anomalies around Apr 9 in your data.

What are the outliers you are expecting it to catch? I don’t see any.


#5

I am aiming for prediction not anomaly detection.
In the dataset ,the outliers are the values those are more than or equal to 90 . For these actual values, the prediction values are coming very less .Even for the actual values of more than or equal to 80, the predictions are coming less not impressive.

So basically i want to improve the predictions for higher values.


#6

Do these higher outliers occur with any pattern? Like do they tend to come after certain values or sequences? The system can definitely predict those higher values, but only if they occur as part of some discernible pattern. The more complex the pattern the more repetitions it will take to learn. If they’re basically noise values on the other hand then it won’t predict them and they’ll produce high anomaly scores.

Also it may be worth trying a simple Scalar Encoder with a max-val equal to the minimum out-lier value. This will clip all higher inputs to 90 for example, so the encoder would group anything above 90 into the same encoding (basically a catch-all ‘out-lier’ category). This could help, though only to the extent that the outliers occur with some kind of temporal pattern.


#7

Prediction is harder. I suggest you swarm over your data and see what parameters it comes up with if you have not already. How far ahead are you trying to predict?


#8

Thank you Sheiser1…I ll try out this approach.


#9

@rhyolight…I have done the swarming. On the swarming result, i did different tuning on the parameter which improved prediction , but for higher values i am yet to see good result.I am trying to predict 4 step ahead.


#10

Thank you for trying. It is hard for the system to predict outliers like without manual resets. For example, if you know the period of the pattern occurring, you could tell the TM to reset its current sequence state, essentially cutting off what it is learning. On the next time step it will start to learn a new sequence. These outliers are parts of a periodic pattern, and the only way to nail it down really is to reset at the same point in the period.

This is a crappy answer, I know, but we honestly don’t really know how temporal patterns are reset in the brain, I talked about this with Jeff in a video.


#11

The higher values mostly occurs somewhere between current day 9pm to next day 4am. So should I reset the sequence at 9pm or at the start of the day(at 0:00 am)?

Currently I am resetting the sequence at start of the day(0:00) which is after 96th records(counter=95) of each day. As I am using algorithm API, below is the code snippet.

    tm.compute(activeColumnIndices, learn=True)
    activeCells = tm.getActiveCells()
    if counter % 95 == 0:
      tm.reset()

Please suggest if this is the right way to do or should i stop learning as well?


#12

That seems right to me. I’m not sure what advice to give you. We have found that HTM in this form provides the most value giving anomaly indications, not predictions.