Hi I have a dataset of memory usage in which a certain application runs for 5 min at random instants of time and increases memory consumption. So I trained the HTM on this dataset… During real-time prediction, I am not getting any anomaly when this application runs for more than 5 min. During training, I ensured that this particular application whenever it runs, it runs for only 5 mins. For getting an anomaly after running for more than 5 min during real-time prediction, what parameters should be changed ?
Does this mean you trained a NuPIC model with 5 minutes worth of memory usage values, saved the model and then fed it data of >5 minutes of memory usage values?
You may be better of not splitting data into train/test, which is common since NuPIC models learn continuously from streams while doing anomaly detection. If you train (‘enableLearning’ in NuPIC) for only 5 minutes, then any behavior the model sees which weren’t there in the training set should yield high anomaly scores.
Does this mean the model is not outputting anomaly scores at all? Or its not outputting high anomaly scores at anomalous times whenit should be?
No I trained the model for total 4 hours during which i was running a particular high memory usage application for 5 min at various instants of time. So basically I have patterns corresponding to those 5 min.
I am simply using the enablelearning() method during training and disablelearning() while prediction.
Model is outputting anomaly whenever I cross the memory consumption limit that it has seen in the training. For example, if during training the upper limit for memory usage is 85% and I cross this limit during prediction then it will show an Anomaly for all those points. But it has not learnt that pattern of 5 min otherwise it should have raised anomaly after running for 5 min.
So what parameters should be tweaked to make sure it learns that 5 min pattern?
Just leave learning on all the time. There is no need to have a training vs test phase. In HTM you are typically always learning.
How granular is your time input? And are you encoding time at all along with the memory usage? Are there large gaps in the time between 5-min periods?
Between 5-min periods, the maximum gap is just 10 min. Mostly I am running this 5-min application at 3-7 min interval. Yes I am encoding time along with memory usage. I am recording the memory usage every second over the entire duration of 4 hours.
Actually the requirement is such that I have to record this pattern of 5 min and train and save it so that I can run prediction using this model at any later time.
If each new data point to the NuPIC model covers just 1 second I think the datetime encoder isn’t default setup to handle that. So that encoder may be producing identical values for hundreds of points in a row, basically injecting noise into the system. If you’re working at that velocity I’d drop the datetime encoder and just use a single scalar (or rsde) encoder on the raw numbers.
I think it could also help us if you plotted out the data over time, maybe highlighting the portions you’re training / testing on and where you think the anomalies should be getting raised. Also couldn’t hurt to share the model_params config you’re using.
As shown in the plot, I am using this entire data for training and I had enabled the learning mode during training. These regular patterns with peaks is when I am running this 5 min application.
So during real time prediction, I just want to see how I can get an anomaly if I run the app for more than 5 min. I just disabled learning during prediction to see if model has learnt this 5 min pattern or not.
I am using this model_param file : https://github.com/numenta/nupic/blob/master/examples/opf/clients/cpu/model_params.py
Ok cool, there’s clearly some periodicity in there to work with.
Each of those periods is a 5-minute app run? If you want the model learning that pattern it should see it start to finish a number of times, depending on how many time steps compose it. So if you’re sampling every second, then 5 minutes means 300 time steps, which theoretically means it’ll take that many periods before its learned completely. If this is right I’d suggest sampling at a lower rate, so the pattern can be learned faster.
I see in the model_params file that the ‘minval’ and ‘maxval’ are set to 0 & 100 respectively. Looking at you data, all the action is from like 65 to 85. First thing I’d do is bring in the min/max to something like those. With a min/max of 0/100 values any value between say 75 & 85 will get very similar or identical encodings, depriving the SP the granularity it needs to catch the patterns at play here.
Also to be clear,
Seems like I am getting slightly better results while I sampled at 10 seconds per data point instead of 1 second per data point.
When I am running the app for more than 5 min, I am getting some anomaly scores close to 90% for about 1 min and stop getting anything after that while the app is still running.
Are there any other parameters that I can play with like something in tmParams or spParams ? Will encoding timestamp help ?
So in your case ‘better’ results means having a greater difference in anomaly scores between times when the app is running vs otherwise?
It makes sense to me that down-sampling is helping, since it effectively shortens and smoothens the patterns, thus making them easier for the system to learn faster.
I think 10 seconds is still too fast for timestamp to help. The datetime encoder helps with patterns over larger time frames – so if the system would benefit from knowing that its say 1:00 PM vs 6:00 AM, Monday vs Thursday or Fall vs Spring. In your case all the timestamps fall within the same couple hours on the same day – so the datetime encoder will be giving you nearly identical values at each timestamp, which again is just injecting noise.
I’d say the most important thing is still the encoders – dropping timestamp and especially tightening the minval & maxval to like 65 & 90 or so if you haven’t already.
Once you’ve done that you could play with is the ratio of
permanenceDec. Default they are both 0.1, but you could drop the Dec to say 0.05. This effectively turns up the learning rate, since it’ll take more to drop a link (synapse) than build one.
Generally though if you want to do any param tweaking (outside the encoders), you should get to know TM enough to have the intuition for how any such tweak will impact the learning process. The default params there have been found to work well across many data sets, so I’d always emphasize data preprocessing (like the 10 vs 1 second sampling rate, etc) and the encoder settings.
By “better results” I mean I am getting some anomaly scores close to 90% and above when the app runs for more than 5 min which is what I want. The requirement is just that I should be able to get anomaly when the app runs for more than 5 min.
However this 90% anomaly scores that I get after app finishes 5 min lasts for just 1 min after which I don’t get any anomaly. Ideally I should keep on getting anomaly for the entire duration after 5 min when the app is running.
Well I have tightened the min-max interval and it has certainly helped.
So in terms of your plot, this would mean an extended peak compared to the others right? Are there any such >5 minute app runs in your plot there?
I’d be curious to see a plot containing the training (which I think is what you showed) along with the testing (which I assume contains normal 5 minute runs and extended runs). It’d also help to see the anomaly scores plotted right below. I assume you’re already looking at something like this to guide your summaries here.
It could be the case that a >5 minute run would show high anomaly scores right around 5-6 minutes when the period would normally end and memory usage would drop back down to around 65 – tho it may not keep showing high anomalies the whole time after.
This is what the plot looks like. We can see the build-up of anomalies after the 5-min period is over. But it stops after 1 min even though the app is still running.
Ok great, tho would you split the anomaly scores into a separate plot there? With this one its hard to see the memory usage values since the y-axis is so stretched by the anomaly scores.
[quote=“sheiser1, post:14, topic:7388”]
With this one its hard to see the memory usage values since the y-axis is so stretched by the anomaly scores.
[/quote] The axis on the right side shows memory usage with range from 50-100. The memory usage is about 84-88% in this plot.
For the case when I close the app before 5 min, I get the following plot with almost no anomalies which is expected as per the training data.
So there’s a huge drop off in memory usage at one point, which could have a huge effect on the encoders’ behavior.
This plot is cool with the 2 axes, although for me at least it’d still be easiest to look at the anomaly scores on their own plot - since they have their own scale. And then have another for the memory usage values like your 1st plot, where we could clearly see the periodic patters. That’s just if you want me to dig into the model behavior more, don’t mean to ask anything unneeded from your end.
Since python is managed code I find it hard to trust memory usage metrics.
These are the plots for prediction in real-time where the data point is taken every second. x-axis corresponds to timestamp for each second in real-time.
The high memory app as can be seen starts at about 78. When this app is runs for more than 5 min, we can see anomalous behaviour starting at about 452 and ends at around 518 and dies after that.
Does the anomaly score graph periodically go back up (for example ~every 5 minutes) if the the app continues to run for a long time? Also, did you have learning enabled, or was it disabled when you ran it for these graphs?