Help debugging HTM.java for anomaly detection

cogmission · July 28, 2016, 9:04pm

In HTM.Java’s real code, and in the NetworkConsistencyTest just written because of this issue - I store the previous predictions and use those to compare to the current active columns.

This QuickTest.py file was written a year ago before the time I had the “epiphany” about what I was doing wrong in the Network API with the inputs and output.

Sorry, I overlooked this. However the **current / real ** code doesn’t have this oversight. I was hoping it did because that would account for the problems.

Even though I calculate the Anomaly after the call to TM.compute() - the Anomaly is being calculated on the previousPredictiveCells (converted to columns of course) which gets stored before the current call to TM.compute().

I will make the changes to QuickTest.py and QuickTest.Java (if they’re there in that file) and then you can give it a “look-over” to see if they’re correct.

Once again, HTM.Java stores the previous predictions and uses those in its Anomaly calculation here it is.

cogmission · July 28, 2016, 9:30pm

@alavin

The thinking is to compare code to code seen in the NAB detectors. Doesn’t the code in the “anomaly compute” do the same thing?

cogmission · July 29, 2016, 3:31am

@rhyolight Sorry I didn’t look at this right away, I stepped away for most of the day. Thank you very much, I can definitely work with your changes! I will integrate that along with the changes @alavin suggested and we’ll have an actual raw test layer to compare the entire processing chain.

Thanks again!

cogmission · July 30, 2016, 1:15pm

Taking a brief “respite” in the interest of having “identical” behavior, and re-writing the TemporalMemory to be in sync with the new “columnSegmentWalk” approach recently merged into NuPIC…

After that, I will continue with the comparison/analysis…

cogmission · July 31, 2016, 7:41pm

@rhyolight @lscheinkman @mrcslws @alavin @fergalbyrne

Here’s my implementation of a Python Generator in Java: (Note: Work in progress )

Benefits:

Able to define a processing loop independently of the iteration mechanism
yield() can be called from anywhere in the processing loop (not necessarily just at the end) (like in Python)
Can use as an iterator or foreEach Iterable
next() blocks until a value is available in case next() is called before the call to yield()
Derivates such as an Integer generator can be created with simple syntax (i.e. IntGenerator.of(lowerBound, upperBound))

I’m open to suggestions or critiques though?

cogmission · July 31, 2016, 8:01pm

…with this I can get near perfect semantic parity with the Python version of the excitedColumnsGenerator() method:

Compare Java version to Python version

mrcslws · July 31, 2016, 8:05pm

I haven’t looked at it closely, but one thing to note: part of the reason we used a generator in Python is because it feels very natural to use them in Python. “yield” is already part of the language, so we’re not introducing conceptual overhead by using it.

In C++ we implement this column/segment walk via iterators: https://github.com/mrcslws/nupic.core/blob/6c6ad5d45a133d4b48c4918df1919e267a949577/src/nupic/algorithms/TemporalMemory.cpp#L69

In other words, you don’t necessarily have to use a generator. It’s probably best to use whatever approach feels best in Java. And maybe adding a generator class is the right approach – I just wanted to point out that it’s not the only option.

cogmission · July 31, 2016, 8:10pm

Cool, I’ll take a look, thanks for the cpp reference - I’ll see what “feels” best - though I have to say that the Generator approach is very similar to Iterators and the new Streaming functionality of Java 8 - I’ll take a look, and thanks for the feedback, that’s exactly the kind of response I was hoping for!

alavin · August 3, 2016, 4:48pm

Hi @cogmission… any updates?

cogmission · August 3, 2016, 6:29pm

Hi @alavin,

Not sure if you’ve been following everything but I just wrote a universal random number generator UniversalRandom (and tested it side-by-side in both languages, confirming identical output) and now I’m re-writing the Java TemporalMemory to bring it in line with the new Python TM (with columnSegentWalk treatment). I’ve been chronicling my progress here (I know this thread is very long so you might not have wanted to read all of the content).

I am due to finish up the TM unit tests hopefully today, and using the new UR I will be able to get the exact same output in both languages. I then plan to use the work @rhyolight did to provide a barebones Layer in Python, and compare that to the Java version of the same thing to see if we get the same anomaly scores (we should).

If they are different, I will record the output of the SP and just use a direct TM -> Anomaly assembly pumping in the same SP SDRs and see if we then get the same output. We HAVE to at that point.

Anyway so I will move up and down the level of breaking things down until I find where the “departure” is - possibly re-writing the SpatialPooler too (there have been updates since I wrote the current one in 2014/2015.

So I’m doing preparatory work to break things down - with the goal of getting identical output.

cogmission · August 3, 2016, 9:23pm

Status Update:

Using the new UniversalRandom RNG, I have finished re-write of the TM and have completed 13 tests (so far) - all with exactly the same output (same cell, segment and synapse indexes chosen from burst etc.)…

I think I’m about half way through the tests…

alavin · August 3, 2016, 11:16pm

I’ve read every word . This project is important to me.

Keep in mind when comparing the anomaly scores you don’t want to use the anomaly likelihood calculation. In NAB this corresponds to the “raw scores”.

Thank you for the update, keep it going!

cogmission · August 4, 2016, 6:00am

No worries. I know this. It’s just that there’s so much here and I think I misspoke because some of the other stuff I mentioned is actually in other threads, I believe and not here. I really appreciate your support Alex!

Cheers

cogmission · August 4, 2016, 7:21pm

@alavin

TemporalMemory & Test re-write complete!

Next step: Assemble @rhyolight’s handy work with yours and my own to make rudimentary layer in both Python and Java and get back to testing NAB! (probably starting tomorrow because I’ve already been working 13 hours [since 1:00am])… Just a heads-up! (I’ll check the code in for merging tomorrow, I have to remove the old stuff and make sure the TemporalMemoryMonitorMixin stuff still works)

cogmission · August 6, 2016, 11:05am

@alavin @rhyolight @fergalbyrne @lscheinkman @mrcslws

Status:

Upon removal of the “old” TemporalMemory, I have more than a few tests to examine; either for fixing/adjustment, removal or rewrite - depending on the nature of the failure. This is to say that the anticipated step of swapping the old for the new is a bit more involved than previously expected.

My current goal is for the new code to be error free by Monday, and to start back in on preparing the skeleton code to be able to return to NAB testing then.

Cheers,
David

rhyolight · August 8, 2016, 10:26pm

A post was split to a new topic: Potential bug in spatial pooler

cogmission · August 10, 2016, 10:47pm

@alavin

Hi I just want to make sure exactly what is supposed to go into the Anomaly.computeRawAnomalyScore() method? I can’t believe I’m asking this again, but this is all so confusing…

I’m pretty sure Subutai said that you take:

Take the output of the SpatialPooler which represents the “activeColumns” coming in to the TemporalMemory.
The previously predictive columns (as derived from the predictive Cells in t - 1)
So the comparison is between the predicted input of the TM in t - 1 and the current input coming from the SP?

But now I’m not sure if you take the currently activated columns from the TM and not the TM’s input?

Also above you probably meant “columns” but you said “cells” and I just want to make sure?

alavin:

… temporal memory

activeCells = self.tm.getOutputData("bottomUpOut")
predictedActiveCells = self.tm.getOutputData('predictedActiveCells')
print "TemporalMemory Output (active cells) = "
print "\t", activeCells.nonzero()[0]
print "TemporalMemory correct predictions (active cells that were previously predicted) = "
print "\t", predictedActiveCells.nonzero()[0]

print "Raw anomaly score =", 1.0 - predictedActiveCells.sum() / activeCells.sum()

So I just need to clear this up?

alavin · August 11, 2016, 3:39pm

Sorry for the delay @cogmission, on PTO…

To reiterate this:

	for each timestep t: 
		1. run data through encoder
		2. run SP computation
		3. get values for raw anomaly score calculation:
			a = representation of input at time t (i.e. SP currently active columns)
			b = representation of prediction at time t-1 (i.e. columns of TM's currently predicted cells)
		4. compute raw anomaly score:
			rawScore = 1 - b/a
		5. run TM computation

cogmission · August 11, 2016, 3:45pm

@alavin,

Sorry for contacting you on your PTO (Matt also informed me that you were away until the end of the week)… …and thank you for responding!

That clears that up. I just wanted to make sure.

alavin · August 15, 2016, 4:54pm

Hi David, any updates on this?

Topic		Replies	Views
New Release of HTM.Java! (v0.6.9-alpha) HTM.Java	2	798	October 1, 2016
Noobie Question: How to use NuPIC for a NAB dataset? NAB	9	1458	November 23, 2019
Testing NAB with HTM.java with different configuration NAB question , htm	3	847	May 10, 2018
Anomaly Score Difference betewen HTM studio and HTM.java HTM.Java htm	5	857	July 20, 2017
Understanding Anomaly detection through HTM Implementations question , community , nupic-wiki	1	864	October 15, 2021

Help debugging HTM.java for anomaly detection

Related topics