High anomalylikelihood values for hot gym anomaly example

mseinstein · May 12, 2017, 3:39pm

Hey everyone, I’ve been playing with HTM for the past few weeks now and I seem to be getting very high anomalylikelihood values for all of my datasets. As a sanity check I re-ran the hot gym anomaly tutorial, with the Tuesdays removed, and am getting likelihood values which are consistently and significantly higher than @rhyolight was getting in the Hot Gym Anomaly Tutorial. Also, If you take a look at the likelihoods in the inserted image you can also see that there is a high anomaly likelihood for not just the first Tuesday, but the following ones as well. Any help would be much appreciated.

I am running nupic 0.6.0 and nupic.bindings 0.6.0. I am using the github version of requisite hot gym files, including the model params file, i.e., not running my own swarm. Or to be more accurate, a slightly modified version of those files as the github is using nupic 0.7.0, so I have changed the requisite parameters back down to the 0.6.0 versions, e.g., HTMprediction to CLA, tmParams to tpParams and so on.

https://gist.github.com/mseinstein/a4775f460c0e60e824dfd688f646f3cb
https://gist.github.com/mseinstein/4448491cb328b3f1a2354cc680569fcd

rhyolight · May 12, 2017, 4:48pm

So here are my results after re-running this example on tip of master:

Definitely looks different than yours. These are using all the default values of the example, I just checked it out and ran python run.py.

If you have changed any model parameters even just a bit, it could drastically change the results of the model.

rhyolight · May 12, 2017, 4:57pm

Running with Tuesdays gone now…

rhyolight · May 12, 2017, 5:11pm

Here are my results with Tuesdays gone.

With any of these results, you need to decide upon an anomaly likelihood threshold to flag anomalies. I think we typically use between 0.999 - 0.99999 depending on the data set.

mseinstein · May 12, 2017, 8:02pm

Here is my version of Tuesdays gone using plot.ly.

So my likelihoods are still higher than yours on average, but I think your plot has a higher average likelihood than the one from the video (please excuse the poor quality screenshot below). Whereas before your likelihood would drop down to the 0.3 - 0.4 range, now it never drops below 0.5. Also, in the video by the second Tuesday, your likelihood had already dropped to about 0.9, but in your current graph the likelihood hits 0.99 every Tuesday. Which I assume is due to changes in the underlying code over the past 3 years, as the old threshold was 0.9 and now you are saying you use a threshold of 0.999 - 0.99999.

Instead of trying to hunt down any possible changes in the base code I might have made, I am going to try running the code in a clean version of nupic and see what I get. Since the seeds are set in the code, can I assume that I should be able to get (almost) the exact same values as you for the anomaly likelihood?

rhyolight · May 12, 2017, 9:04pm

Good idea. They should be the same if we are both using the same code version. I ran mine at SHA d86e484. To get exactly the same version, run:

git checkout d86e484

mseinstein · May 15, 2017, 1:10pm

So I tried running with a clean version of the code. Since I haven’t compiled the nupic code from scratch in the past I started with the easier test case of just setting up a virtualenv using nupic 0.6 and nupic-bindings 0.6 from the pypi repository, while using SHA d86e484, and got the same results as I did before.

I then built nupic from source, bringing me up to nupic 0.7.0.dev0 and nupic.bindings 0.6.2.dev0, and once again got the same results as I did in the past. I then repeated this entire process on a different computer and got the same results for both builds. I guess I am left with two questions

Why am I getting different results than you? Could it be due to differences in OS (I am using windows 7.1 x64)?
Do the differences really matter, i.e., are they refletive of some underlying flaw in my code which will give me incorrect results from here on out?

rhyolight · May 15, 2017, 4:39pm

That should not work because of breaking changes between 0.6.0 and SHA d86e484. Something seems fishy. If I install nupic at 0.6.0 via pip install nupic==0.6.0:

nupic 0.6.0
nupic.bindings 0.6.0

Then I run the example, I get this expected runtime error:

Traceback (most recent call last):
  File "run.py", line 31, in <module>
    from nupic.frameworks.opf.model_factory import ModelFactory
ImportError: No module named model_factory

I’d like to see the exact results CSV file so I can compare with mine, if you can post it. [quote=“mseinstein, post:7, topic:2311”]
Why am I getting different results than you? Could it be due to differences in OS (I am using windows 7.1 x64)?
[/quote]

The only thing I can think of is if there are differences in how random seeds are generated on different operating systems, it could affect initialization. But the overall quality of results should be similar, even if individual predictions and anomaly scores are not exactly the same. [quote=“mseinstein, post:7, topic:2311”]
Do the differences really matter, i.e., are they refletive of some underlying flaw in my code which will give me incorrect results from here on out?
[/quote]

Big differences matter. The data plot you posted in your first post looked quite different from the 2nd plot you posted. If you are still getting the same consistently high anomaly likelihood values as you were showing in your first post, it could be a problem.

mseinstein · May 15, 2017, 6:43pm

Apologies, for not being clear in what I did. I did use nupic 0.6.0, and it does initially break, but I did change the variable names back from the 0.7 version to the 0.6 version (pretty much the reverse of your WARNING: 0.7.0 breaking changes post).

rec_center_hourly_model_params.py
- Line 12: ‘model’: ‘HTMPrediction’, ==> ‘model’: ‘CLA’,
- Line 64: ‘tmEnable’: True, ==> ‘tpEnable’: True,
- Line 65: ‘tmParams’: ==> ‘tpParams’:
run.py
- Line 33: model_factory ==> modelfactory

As far as I can tell though these are just variable name changes, right?[quote=“rhyolight, post:8, topic:2311”]
I’d like to see the exact results CSV file so I can compare with mine, if you can post it.
[/quote]

Here is the original rec-center-hourly_out using nupic 0.7
https://gist.github.com/mseinstein/bdd0b8980905f17bf8e8881db8c0db03

And here it is after removing the Tuesdays

https://gist.github.com/mseinstein/2065766c2176e77e95af34ca1f5bff84

rhyolight · May 15, 2017, 8:19pm

I’m not sure if this is a big deal. Here is a comparison chart.

Predictions are compared on the bottom, anomaly scores on top. There is some divergence, which could be because of changes made to algorithms recently. We do have some regression test that run performance tests, but they are pretty loose.

@scott Are you concerned about divergence like this after 3000 input records between 0.6.0 and 0.7.0.dev0?

scott · May 15, 2017, 8:33pm

No, small changes are expected.

mseinstein · May 15, 2017, 10:32pm

Thanks @rhyolight and @scott. Just wanted to confirm that the base case was working for me before spending at least the next few months applying HTM to a new project.

Topic		Replies	Views
Anomaly Detection - Poor results - Build issues or Tuning issues on Real Data NuPIC	1	429	June 7, 2020
NuPIC: Anomaly Scores report out 0.0 on anomaly/hot_gym/one_gym example NuPIC bug	14	1421	January 5, 2017
Results Comparison: HTM Studio vs Nupic Applications	10	1080	July 26, 2017
Regarding anomaly detection in HTM.core Talks and Events question	0	319	August 13, 2022
Anomaly detection Newbie NuPIC	3	833	October 25, 2017

High anomalylikelihood values for hot gym anomaly example

Related topics