Help debugging for anomaly detection


@cogmission the script you seek is here: (thank you @mrcslws).

For instructions on how to run this, refer to the readme at, particularly the second invocation for running specific NAB data files.


@alavin I saw you guys working on that - thank you so much!

Matt advised me to get some down time. I think we could all agree it’s sorely needed :slight_smile:

So I’m at the movies with my son, and will try this out later tonight or tomorrow morning.

You guys are great, thanks again!



@alavin @mrcslws

Guys - thank you very much for this!

Does NAB use the likelihood functionality or does it use just the raw scores? Any chance I can just get this like I’ve been looking at it with the raw anomaly calculation so I have as little complexity as possible to debug? Then maybe later we can get into the likelihood stuff, but I don’t want to have to deal with that just to vet the difference between HTM.Java and NuPIC ok?

EDIT: Sorry I saw the likelihood stuff first and wrote this without looking in the handleRecord method…

@rhyolight this doesn’t get you off the hook though :wink: LOL! because I can’t do a direct thing-for-thing comparison using NAB because the Java detector is impossible to trouble shoot due to its cross-process nature… So I still need something simple that I can run side by side in Eclipse and really see what’s going on in a direct comparison…



@alavin @mrcslws @rhyolight

Ok I got Marcus’ script to run like this:

python -d numentaTMLowLevel --dataDir ~/git/NAB/data --windowsFile ~/git/NAB/labels/combined_windows_tiny.json --profilesFile ~/git/NAB/config/profiles.json --detect

But when I try to run the htmjava detector so that I can see the same output side by side, I get:

Traceback (most recent call last):
File “”, line 183, in
File “”, line 68, in main
detectorConstructors = getDetectorClassConstructors(args.detectors)
File “”, line 41, in getDetectorClassConstructors
d : globals()[detectorNameToClass(d)] for d in detectors}
File “”, line 41, in
d : globals()[detectorNameToClass(d)] for d in detectors}
KeyError: ‘HtmjavaDetector’

I’m assuming it can’t find the htmjava detector due to some kind of path error or something?

I have only have Luiz’ branch checked out as NAB, so any NAB directory lookup will find the version with Luiz’ branch which includes the htmjava detector. It resides at: /Users/cogmission/git/NAB which you can see from the above python run command, is the default relative directory for NAB.

Here’s a pic to answer any name resolution questions…


I know what’s up here. I’ll send you a reply later this morning when I have time. What command are you doing to run htm java?


I forgot to mention this is the command I tried to use to run the htmjava detector:

python -d htmjava --dataDir ~/git/NAB/data --windowsFile ~/git/NAB/labels/combined_windows_tiny.json --profilesFile ~/git/NAB/config/profiles.json --detect

Thanks Alex!


Thanks David. And where do you have the script – where are you running from?


Here’s what to run @cogmission:

  • Run from your NAB directory, i.e.:
cd ~/git/NAB
python -d htmjava --detect --windowsFile labels/combined_windows_tiny.json
# This runs the detector in "~/git/NAB/detectors/htmjava/" on the subset of data specified in the windowsFile json.
  • Run the NuPIC TM detector from your nupic.research directory, i.e.:
cd <your path to nupic.research/projects/nab_experiments>
python -d numentaTMLowLevel --detect --dataDir ~/git/NAB/data --windowsFile ~/git/NAB/labels/combined_windows_tiny.json --profilesFile ~/git/NAB/config/profiles.json
# This runs the NuPIC TM detector in "<path to nupic.research/htmresearch/algorithms/anomaly_detection/>" on the subset of data specified in the windowsFile json.



Comparing the two outputs, what strikes me immediately is the lock-step synchronicity in the “reactions” of both Anomaly detectors. They both have an anomaly of 1.0 until record 11, then at record 11 they both register “less” of an anomaly…

NuPIC == 0.825
htmjava == 0.25

I think what’s going on here is that htmjava’s RDSE is not getting initialized with the same parameters. It may not be that the “NODE” isn’t getting read like I thought, but that the variables and setup are not aligning such that they’re getting applied correctly - either due to naming mismatches or a mismatch in how the htmjava RDSE is expecting its configs.

What I see is a lack of gradations in the “resolution” of the detections. Htmjava seems to swing more “intensely” between 1.0 and 0.0 where NuPIC seems to have “finer” differences.

I would say that things are working but that the resolution is somehow not getting set because the changes are in lockstep synchronicity they are just vastly different.

These are just my initial observations…

I posted these files, please let me know what you think?:

The anomaly score column seems to have the same “anomaly_score” value until line #606 where they both change to another value for the first time…

They both have: 0.0301029996659 until line #606 when they both change to different values but not the same value.


It would be best to focus on the raw scores, not the “anomaly_score” column, leaving the anomaly likelihood calculation out of it.



Yes, but what can we deduce from the comparison of these outputs? What insights can Numenta give me about how the Java TM is performing, possible insights into configuration differences? Anything?

This is Numenta’s territory, not mine - have at it. :slight_smile:

Also, from the meeting you or Luiz were going to look into what the actual configurations are that HTM.Java has (the RDSE and the other params) - whether they are actually being set on the RDSE? I don’t believe they are because the variable names don’t match up. But I’m at a loss for how to debug this since I can only run it directly and can’t debug across the python runner and it’s pre-scanning of the data and passing the min/max values to the Java side. I really think this should be happening on the Java side so we can debug it and not hidden away in the Python so we can’t actually manipulate the JSON node preparation and the scanning of the data to derive the min/max values.

I at least determined that the direct loading of HTMModel is broken (because it in fact loads from a file and not a pre-manicured JSON node) so Luiz needs to guarantee that both logic paths load the same way so that we can see what 's going on… This is why I mistakenly thought that the configs were never getting read - because that Python path of logic loads a different way then the path which loads the config from a file.


@alavin @lscheinkman

What I mean to say is that I’m interested in your analysis of the output in terms of how the configuration might affect the differences?

Mainly because I see things changing in lock-step synchronization which indicated to me that there is no algorithmic or network-compositional problem - and nudges me toward some form of misconfiguration as a likely cause?


@alavin @lscheinkman @rhyolight,

I was able to printout the actual properties used by the Java RDSE during exectution…

Does this look right?

  minIndex: 500
  maxIndex: 500
  w: 21
  n: 400
  resolution: 1.5825004536384615
  offset: null
  numTries: 0
  name: [1.5825004536384615]
  buckets : 
  [ 500 ]: [194, 38, 160, 390, 151, 84, 104, 340, 243, 255, 159, 358, 101, 355, 249, 14, 100, 19, 224, 369, 269]


@alavin @lscheinkman

Here’s the comparison of JSON parameters passed in, to HTM.Java’s parameters…

It does look like they’re getting set - so the only outstanding question is the RDSE above?

JSON Params passed in:



Printout of HTM.Java Connections

	Spatial: {
	Temporal: {
	Other: {
		fieldEncodings:{value={fieldName=value, fieldType=float, resolution=1.5825004536384615, encoderType=RandomDistributedScalarEncoder}, timestamp={fieldName=timestamp, formatPattern=YYYY-MM-dd HH:mm:ss, fieldType=datetime, encoderType=DateEncoder, timeOfDay='21':9.49}}


@alavin @lscheinkman @rhyolight


Is this alright to do?

     if(t1.getFeedForwardSparseActives() == null || t1.getPreviousPredictiveCells() == null) {
             return t1.anomalyScore(1.0);

…or should this be some other return value if there aren’t any predictions ?


@alavin @rhyolight


I updated the script myself. Can you guys tell me if I’m making any Pythony mistakes here?: (check out the few lines at #'s 73 to 79)



@cogmission I’ll take a look later this morning…


Pythonic issues aside, here are some problems with the script:

  • lines 62 and 64 should use self.encoder:
    62: encoding = self.encoder.encode(value)
    64: bucketIdx = self.encoder.getBucketIndices(value)[0]

  • The TM compute step should come after the anomaly score calculation. We want to compare the region’s representation of the current value (i.e. activeCols) against what the region predicted from the last timestep (i.e. prevPredictedColumns). I don’t see you doing this. This is fundamental to how anomaly detection with HTM works.

  • Why do anything with this “anomaly computer”? Your lines 71-86 should ==

  • Okay one Pythonic thing :stuck_out_tongue:: line 71 is odd, better to do activesCols = set(self.spOutput.nonzero()[0].tolist())


@cogmission David, you asked me to update your script so that it reads its input from one of the NAB scalar anomaly files, so I did that here:

I updated some of the params, and added a date encoder. Also, I am not sure how to hook up the classifier, so I left that part commented out.



Wait! What What?

Yes that totally makes sense! In HTM.Java I store the previous prediction - I got confused here it seems.