Help debugging HTM.java for anomaly detection

Alex,

Since the calculation is so simple - you’re right I can just add the basic calculation inline to the file. However, eventually I may need an example of the “likelihood” code but for now (and how to incrementally add entries to it and work with the weighted average et al.), I guess we don’t have to complicate things.

Thanks,
David

@lscheinkman can you try this HTMModel.java? It bypasses the Observer stuff and just uses a naked Layer. I can’t test it on MacOS due to some NuPIC-related issue.

We’ve identified that Layers with Observers diverge in their previousPredictedColumns from those without, definitely happens when perfect prediction breaks down. This might be the cause of your issue. There’s no need to use the Observer to do NAB, hopefully this version works better.

@fergalbyrne

This Gist (Posted last week) contains code that compares the entire Network and even in a threaded state to code using NO NETWORK, LAYER OR OBSERVABLES WHATSOEVER, AND THEY ARE EXACTLY THE SAME.

Sorry for the caps, but I don’t believe you read this and ran this when I mentioned it long ago in this conversation.

Again. That code shows NAPI vs. Nothing and the output is exactly the same.

You can’t compare the Layer fields because those are constantly being overwritten. Comparing the Observable output to those Layer fields, I’m not certain is a valid test.

I’m interested to see the output from your HTMModel code, as that may be helpful in diagnosing the problem. Also we should try using absolutely no NAPI at all (and just piece together the algorithms) - and compare that as well.

Please let me thoroughly analyze the code before drawing conclusions?

@fergalbyrne @lscheinkman @alavin,

I updated the GIST test to show the active cells and the predictive cells in addition to the other fields being compared.

Again…

This comparison is between the Full NAPI and just the RAW algorithms (no Network or Layer).

The output is exactly the same. This proves that there is consistency in the output. @fergalbyrne, I did an output of the cell content of your entire interaction test and the cells are not the same.

Your claim that the output of both the NAPI and the Layer fields is the same until you introduce an anomaly, upon further inspection of the Cell output, doesn’t appear to be accurate. The fact is that I don’t see them ever being the same.

This just proves that you can’t rely on the Layer fields which are constantly getting overwritten. I’ll probably remove those fields and force the user to use the Observer in non-alpha code. If this experience has taught me anything, it’s taught me that offering methods which aren’t guaranteed to be “sane” is not a good thing.

Here’s my sample from your “interaction test”, of the cell and segment content as well as the column content which shows that there is never any consistency between the Observer and Layer (where the Gist above comparing NAPI vs. Nothing, which anyone can run themselves, shows absolute equality):


Cycle: 4209
-----------
**From the Observer:**
activeSegments = [27, 47, 57, 59, 58, 62, 60, 61, 126, 125, 127, 128, 28, 37, 50, 85, 82, 98, 94, 95, 92, 88, 93, 87, 89, 91, 90, 104, 100, 102, 105, 101, 103]
predictiveCells = [11, 42, 9, 43, 32, 101, 69, 76, 49, 20, 74, 107, 44, 102, 52, 103, 22, 6, 112, 45, 117, 16, 53, 57, 72, 7, 96, 31, 67]
activeCells = [21, 75, 14, 17, 23, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113]
successfullyPredictedColumns = [3, 12, 2]
winnerCells = [21, 75, 14, 17, 23, 50, 57, 106, 111]
sdr = [21, 75, 14, 17, 23, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113]

**From the Layer:**
activeSegments = [22, 23, 177, 178, 139, 140, 138, 144, 142, 141, 143, 27, 32, 97, 95, 98, 99, 100, 101, 96, 94, 89, 91, 93, 90, 92, 21, 106, 105, 102, 103, 104, 176, 179, 180, 175, 107, 109, 111, 112, 108, 110, 113, 153, 155, 151, 156, 152, 154, 48, 49, 46, 148, 146, 147, 28, 18, 20, 132, 134, 136, 63, 59, 61, 62, 58, 60]
predictiveCells = [104, 111, 53, 73, 43, 48, 15, 117, 102, 56, 110, 6, 77, 49, 17, 54, 107, 108, 114, 47, 12, 52, 22, 72, 106, 58, 8, 44, 21, 109, 30, 68, 11, 45, 96, 42, 10, 99, 34, 70, 55, 46, 50, 13, 51, 97, 32, 67, 7]
activeCells = [74, 16, 19, 12, 20, 76, 17, 13, 22, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113]
successfullyPredictedColumns = [12, 2, 3]
winnerCells = [74, 16, 19, 12, 20, 76, 17, 13, 22, 52, 59, 105, 112]
sdr = [74, 16, 19, 12, 20, 76, 17, 13, 22, 48, 49, 50, 51, 52, 53, 54, 55, 56, 57, 58, 59, 102, 103, 104, 105, 106, 107, 108, 109, 110, 111, 112, 113]


Cycle: 4210
-----------
**From the Observer:**
activeSegments = [85, 82, 96, 98, 94]
predictiveCells = [103, 22, 48, 102, 6]
activeCells = [76, 49, 20, 74, 107, 102, 52, 103, 22, 112, 16, 53, 72]
successfullyPredictedColumns = [12, 8, 3, 17, 18, 2]
winnerCells = [76, 49, 20, 74, 107, 102, 52, 103, 22, 112, 16, 53, 72]
sdr = [76, 49, 20, 74, 107, 102, 52, 103, 22, 112, 16, 53, 72]

**From the Layer:**
activeSegments = [148, 147, 106, 105, 102, 103, 104, 97, 95, 98, 99, 100, 101, 96, 139, 140, 138, 144, 142, 141, 143]
predictiveCells = [55, 50, 106, 58, 8, 44, 53, 49, 17, 54, 107, 108, 114, 47, 43, 48, 15, 117, 102, 56, 110]
activeCells = [104, 111, 53, 73, 48, 15, 102, 110, 77, 49, 17, 107, 108, 12, 52, 22, 72, 106, 21, 109, 50, 13, 51]
successfullyPredictedColumns = [17, 18, 8, 12, 2, 3]
winnerCells = [104, 111, 53, 73, 48, 15, 102, 110, 77, 49, 17, 107, 108, 12, 52, 22, 72, 106, 21, 109, 50, 13, 51]
sdr = [104, 111, 53, 73, 48, 15, 102, 110, 77, 49, 17, 107, 108, 12, 52, 22, 72, 106, 21, 109, 50, 13, 51]

Fergal, all I’m trying to say at this point is that the Layer can’t be used and was never intended to be used as a defacto output. We need to continue the analysis from another direction. Running the code in HTMModel without the Observer may be an interesting test, but I think we need to run things without the NAPI altogether to really test the theory of the Observable interference?

It shows you precisely what the inputs and outputs are for each region, and also how to see any data structure you want from the regions.

Then I would be rewriting all of nupic into your test script.

If you want more info than was specced out in your test script, you can inspect TM state with self.tm.getSelf()._tfdr.getX(), where X is any of the getter methods defined in the algorithm implementation, for example getNumSegmentsInCell().

1 Like

I ran @fergalbyrne’s gist on NAB (with the most recent NuPIC) and got the same results as the code currently in the HTM.java detector PR.

1 Like

Thanks @alavin for running the tests. I have done an RNG in Go which mimics the output from numpy. I’ll open that up this week. It’s possible that using the numpy-mimicking RNG we can isolate what the problem is here.

Hi Alex,

Please forgive my tone, I was a bit unwound in my frustration and I believe a bit of it bled into my responses to you. You have done nothing but offered your time, concern and effort and I appreciate it.

That being said. (<— Never trust a sentence that begins like that) :slight_smile: I think merely adding in the Anomaly code at that line would be much simpler than what you wound up giving me (with the whole thing converted to Network API use)?

Anyway, I believe I can add it myself, but one of the points behind having someone from Numenta do it would be a confirmation of exactly what field I take from the TM to hand off to the Anomaly code (the actual field, not some value that’s held in a Network API field with a slightly different name than the variable within the TM where the data comes from - I need to see the actual field that the value is taken from).

Maybe I’ll add the line myself and I can have you confirm that I’m taking the correct field from the TM to give to the anomaly code? That’s an idea?

Cheers,
David

1 Like

Ok now this causes me to question whether the NAPI ever had a problem in the first place (we did uncover a few problems but they may not have been as far reaching as we assumed)? On that note, I will provide a version of HTMModel that uses the raw Java algorithms, and we’ll run that and see what output we get?

1 Like

That’s a good plan.

2 Likes

@lscheinkman @alavin

Need a little help. Anybody know where the “params JSON file” is that I pass into the HTMModel(JsonNode) constructor? (where it is found in the NAB repo?)

1 Like

@cogmission the json is in nupic here, which is pulled into the detector initialization here.

2 Likes

@alavin @lscheinkman,

I see the HTMModel expects a certain data file with the inputs “fieldName” and “timeOfDay”? So it runs only one data file apparently? Can you tell me which input file in the NAB data files it expects? (looks a lot like HotGym params) (I need to be pointed to the file’s exact location though)

Thanks,
David

P.S. I found the problem… I’m working out the testing of it now…

These are keys in within the sensor params that I linked in the previous message. In the test code I sent (__setupEncoder(encoderParams)), I show how to take the params from that json and set them up for use in NAB.


In NAB an instance of an algorithm will be run on one CSV data file at a time, where the headers are always “timestamp” and “value”. The sensor params are setup such that they identify the timestamp and value fields, and encode them accordingly.

Alex, I need the input file now, not the config… (File name and location) :wink: I couldn’t find the input file name anywhere, please just point me to it so I can make progress?

There’s nothing in either of those links that tells me what project or repo, path and filename of the input file (csv file)? Please Please let me get this done!

Also, thank you for responding on your Sunday off! :slight_smile:

I’m not running the entire NAB, that might be what’s confusing you? I need to point the HTMModel to the exact file so I can verify it runs - then I’ll hand it to you or Luiz to run the NAB because I don’t have NuPIC set up yet.

If you mean the input CSV data files, they’re in the NAB data dir: https://github.com/numenta/NAB/tree/master/data. For example, we use this machine sensor data file often in presentations.

When you do what we do there are no days off :wink: – I like it that way.

1 Like

@alavin

You’re a dedicated man! I knew it! Like me!

Oh I see, all the files have 2 parameters! I see great, thanks!

@alavin @lscheinkman,

So the problem was the configs were never getting set because the JSON node wasn’t parsed. So we wound up with all models having bogus configuration leading to performance <= Random

Here’s a gist for the fixed HTMModel.java and the best_single_metric_anomaly_params_tm_cpp.json files…

Things I changed were:

=== HTMModel.java

Line #240

  1. The path to the modelParams parameter JSON node was missing the modelConfig parent node entry…

Changed:

JsonNode modelParams = params.path("modelParams");

to…

JsonNode modelParams = params.path("modelConfig").path("modelParams");
  1. Added a logic branch to install a default “resolution” for the RDSE if none was mentioned in the config file (which it isn’t) - (resolution now defaults to 0.1)

  2. The model must be run with the command line parameter "-s 1" <— make sure you do this or it will bomb. The “s” parameter is described in your main() method to be # of lines to skip.

  3. This isn’t that crucial but I added a “finally” block to close the resources and a close() call before the exception thrown on line #350

I still think the config should have “n” and “w” set for the RDSE, there’s no mention of those parameters in the config file, but I use “n=25”, and “w=3” for my HotGym example (not sure if those are best or not). Should these be left without anything?

== best_single_metric_anomaly_params_tm_cpp.json

These must be changed…

  1. Line #33 “fieldName”: “timestamp”
  2. Line #40 “fieldName”: “value”

Please try running these when you get a chance… Please don’t forget to run with “-s 1” so that the first line of the data files are skipped. This is needed because the PublisherSupplier is already set up with the configured headers and so the code doesn’t require the header to be in the files.

Please ignore the last post. I’m trying to work backwards and remember the cause of the exceptions I got while debugging so I can map the cause and the remedy I used, to the exception… But that wasn’t the cause of the one I posted… Still working to document everything…