Help debugging HTM.java for anomaly detection

cogmission · July 19, 2016, 11:34pm

@lscheinkman,

Great! Thanks a lot, I saw that you had done this earlier!

Cheers,
David

alavin · July 20, 2016, 12:04am

Thank you for the explanation @cogmission, I’ll take a look tonight when I have time.

Now that you’ve corrected this, have you tried running the detector on NAB?

cogmission · July 20, 2016, 3:41am

Alex,

Thanks, having the additions to that file will help me analyze the problem. When I said I wasn’t using the “predictiveColumns” I was referring to the “control” file not the NAPI. The NAPI was doing it right from the beginning. This confirmed that the NAPI was doing everything correctly and just like the “control” file which was a simple chain of algorithms.

Therefore, the NAB problem lies elsewhere. That’s why I want a simplistic, officially vetted Python version of the chain of algorithms so I can compare the entire thing because maybe there is a problem in one of the algorithms, or there is a problem in the NAB detector. But I need your inclusions in the Python example file to begin to narrow things down.

Thanks for your help Alex!

Cheers,
David

cogmission · July 21, 2016, 10:57am

@alavin, @rhyolight,

Any word on adding the Anomaly calculation to that file? Who is actually going to do this, Alex you or Matt you?

I don’t want to rush anyone, I just want to know what’s going on? I know both of you are stretched thin and very busy - but I just want to know what’s happening that’s all?

@fergalbyrne I’m going to post (edit) the gist with the comparison code so you can see there’s no discrepancy in the NAPI. To summarize to be clear: running the data through raw algorithms and the NAPI produces exactly the same output so we can consider that proof that the NAPI doesn’t impact the process - though there may be a problem elsewhere (in the original Java Anomaly code, or in the detector). If you’ve been following this, I am going to use a bare bones Python version of my test code to compare and see where any discrepancies lie. Given this information, let me know from what angle you want to continue your investigation, if at all? Also clean up of the code is still very welcome if that’s all you want to do?

Link to most recent Gist revision

Thanks guys!
David

rhyolight · July 21, 2016, 2:05pm

Like I said earlier, I can’t get to this until next week. I can probably look into it then. I need to write some code against the NuPIC Network API anyway, so this will a good chance.

cogmission · July 21, 2016, 2:44pm

Thank you Matt. I didn’t mean to rush you its just that Alex wrote after you said this that he would look at it that night… So I got confused as to who was going to do it and when. No problem, I appreciate your help!

alavin · July 21, 2016, 3:01pm

I plan on taking a look at this, but haven’t had a chance to come up for air this week. Hopefully later today

cogmission · July 21, 2016, 3:14pm

Ok, you guys can duke it out in the corner… just don’t hold me responsible!

No really, it’s fine whoever can get to it and when… I really appreciate it guys!

cogmission · July 21, 2016, 6:15pm

Hey Guys,

I found a bug that could be very well be causing the errors. I added a more stringent test to the test class which I said had identical output. Instead of merely outputting the results as they are being produced, I collected a sample from both the control code and the NAPI code. I then did a DeepEquals on the Sample and discovered that the activeColumns (the feedForwardActiveColumns field) were later overwritten! As it turns out, that variable in the Layer class that was being used internally to set a value in the ongoing calculations!

So thanks to this rigid test, I was able to catch it and fix it. Now taking samples from both the control and the NAPI generated output now leads to identical (non-overwritten) output. This could have been the problem and was obscured because immediate printout was clean.

I now submitted a PR and added the NetworkConsistencyTest which demonstrates all fields are now isolated and nothing overwritten and no “re-used pointers” are occurring.

See here:

@lscheinkman or @alavin if you could update your HTM.Java clones and test this, it would be appreciated. I want to see if the NAB runs better with this fix. I can’t run the NAB yet until NuPIC is building again and it will still take me some time because I have to reinstall NuPIC to get the latest code. So if one of you could run this, I’d appreciate it, if you have time?

Cheers,
David

lscheinkman · July 21, 2016, 8:55pm

@cogmission
I ran NAB with the latest ‘htm.java’ from master (e3c50a2) but unfortunately I got the same results.

Final score for 'htmjava' detector on 'reward_low_FP_rate' profile = 3.74
Final score for 'htmjava' detector on 'reward_low_FN_rate' profile = 12.80
Final score for 'htmjava' detector on 'standard' profile = 9.29

Luiz

cogmission · July 22, 2016, 12:07am

@lscheinkman

Ok, we’ll keep digging!

alavin · July 22, 2016, 12:44am

@cogmission @rhyolight I’m having issues pushing to the gist, so here’s the reworked script:

'''
Created on Feb 8, 2015

@author: David Ray
'''

import numpy as np
import pprint

from nupic.frameworks.opf.common_models.cluster_params import (
	getScalarMetricWithTimeOfDayAnomalyParams)
from nupic.frameworks.opf.modelfactory import ModelFactory



class Layer():

		""" Makeshift Layer to contain and operate on algorithmic entities """

		def __init__(self, networkInstance):

				self.networkInstance = networkInstance

				self.sensor = self.networkInstance._getSensorRegion().getSelf()
				self.sp = self.networkInstance._getSPRegion().getSelf()
				self.tm = self.networkInstance._getTPRegion()

				self.tm.getSelf().computePredictedActiveCellIndices = True

				self.theNum = 0


		def input(self, value, recordNum, sequenceNum):
				""" Feed the incremented input into the Layer components """

				if recordNum == 1:
						recordOut = "Monday (1)"
				elif recordNum == 2:
						recordOut = "Tuesday (2)"
				elif recordNum == 3:
						recordOut = "Wednesday (3)"
				elif recordNum == 4:
						recordOut = "Thursday (4)"
				elif recordNum == 5:
						recordOut = "Friday (5)"
				elif recordNum == 6:
						recordOut = "Saturday (6)"
				else: recordOut = "Sunday (7)"

				if recordNum == 1:
						self.theNum += 1
						if self.theNum == 100:
								print "bl"

						print "--------------------------------------------------------"
						print "Iteration: " + str(self.theNum)

				print "===== " + str(recordOut) + " - Sequence Num: " + str(sequenceNum) + " ====="

				output = np.zeros(self.sp.columnDimensions)

				# Run through network model
				inputData = {"value": value}
				result = self.networkInstance.run(inputData)
				rawScore = result.inferences["anomalyScore"]

				# Print out some info for the...

				# ... encoder
				print "RDSEncoder Input = ", value
				print "RDSEncoder Output = "
				print "\t", self.sensor.getOutputValues('sourceEncodings')[0].nonzero()[0]

				# ... spatial pooler
				print "SpatialPooler Output = "
				print "\t", self.sp._spatialPoolerOutput.nonzero()[0]

				# ... temporal memory
				print "TemporalMemory Output (active cells) = "
				print "\t", self.tm.getOutputData("bottomUpOut").nonzero()[0]
				print "TemporalMemory correct predictions (active cells that were previously predicted) = "
				print "\t", self.tm.getOutputData('predictedActiveCells').nonzero()[0]



def _createNetwork(minVal, maxVal, verbosity=1):
		# Create model
		modelParams = getScalarMetricWithTimeOfDayAnomalyParams(
			metricData = np.array(()),
			minVal=minVal,
			maxVal=maxVal,
			tmImplementation = "cpp"
		)["modelConfig"]

		if verbosity > 0:
			print "Model params:"
			pprint.pprint(modelParams)

		# Setup encoder params for this test data
		__setupEncoder(modelParams["modelParams"]["sensorParams"]["encoders"])

		model = ModelFactory.create(modelParams)
		model.enableInference({"predictedField": "value"})

		return model



def __setupEncoder(encoderParams):
		encoderParams.pop("c0_dayOfWeek")
		encoderParams.pop("c0_timeOfDay")
		encoderParams.pop("c0_weekend")
		encoderParams["timestamp_dayOfWeek"] = None
		encoderParams["timestamp_timeOfDay"] = None
		encoderParams["timestamp_weekend"] = None
		encoderParams["value"] = encoderParams.pop("c1")
		encoderParams["value"]["fieldname"] = "value"
		encoderParams["value"]["name"] = "value"



def _runThroughLayer(layer, recordNum, sequenceNum):

		layer.input(recordNum, recordNum, sequenceNum)  # value = recordNum ???



if __name__ == '__main__':

		# Create a network model that expects metric data in the range 1,7.
		net = _createNetwork(1, 7, verbosity=1)

		layer = Layer(net)

		i = 1
		for x in xrange(2000):
				if i == 1:
						layer.networkInstance.resetSequenceStates()

				_runThroughLayer(layer, i, x)
				i = 1 if i == 7 else i + 1

cogmission · July 22, 2016, 1:01am

@alavin

Thank you, I’ll work on it first thing tomorrow!

cogmission · July 22, 2016, 6:59am

Hi Alex,

First, let me say, “Thank You”, for all of your hard work!

The idea was to avoid container constructs like Model and ModelFactory and to show what is going in and out of each algorithm (no support frameworks). So to start around line 84 in the QuickTest.py file and physically add the necessary code to feed the output of the TemporalMemory in to the Anomaly code and provide printouts of the input data and output data for same?

We want to see:

What exactly goes into the Anomaly code and exactly what comes out of it?
What transformations of the data are necessary (if any) to take the output from the TemporalMemory and pass it into the Anomaly code?
The structure kept the same as the original file so that it is the mirror image of the QuickTest.java file. (I apologize, you didn’t know that existed).
So if possible, on line 12, please add the direct import of the Anomaly code; on line 24, please declare the reference variable to the Anomaly code, and on line 85 please begin transforming the TM output (with printouts) and add it into the Anomaly class keeping the same style of documentation - and printout the Anomaly result?
Nothing hidden, everything simple and inlined right in front of us so we can see exactly what’s going on… (Please don’t include anything that requires debugging into another file that isn’t the Anomaly code itself?)

Please keep in mind. This is “HTMNetwork for Dummies”

Another point is. If the file isn’t exactly like what’s mentioned, then I can’t make a Java file that does the same simple thing and I can’t compare everything minutely enough for trouble shooting our issue. I guess I failed to mention there is a QuickTest.java file as well and it looks almost exactly the same and has exactly the same output. Also we don’t want to get rid of the Classifier code, we want to output everything so we can see what’s going in and out of everything.

I really apologize, I know your time is valuable but this code has to act as a “sanity check” for seeing right in front of my face, what goes into and comes right out of each and every algorithm so that I can apply that knowledge to HTM.Java.

In addition, this is the file I point to (in both languages) when people want the most accessible example of how to work with the algorithms themselves.

Anyway, please let me know if my craziness requires any more explanation? And thanks a million for letting me pester you with these exact requirements, again I know how valuable your time is and I really appreciate everything you’re doing to help debug things.

Cheers,
David

fergalbyrne · July 22, 2016, 12:54pm

@cogmission, there’s nothing happening in the Anomaly code except it’s doing the calculation of the rawScore, which is just (1 - predicted/active) - that is the correct calculation. The problem is (or was if the TM/Observer code is now fixed) that the wrong information is/was coming out of the TM/Observer to feed that calculation.

cogmission · July 22, 2016, 12:59pm

@fergalbyrne

I don’t believe at least, that it is fixed just yet because @lscheinkman tested my changes - but as soon as @alavin finishes the Python simple layer file, I’m going to test it against the Java version which should get similar quality results, and if that works I’m going to test it against the NAPI and see where it goes wrong… Those are my plans for now, don’t know how else to approach this…

What’s coming out of the TM Observer is the same as what’s coming out of the TM by itself, exactly! That’s the problem!

fergalbyrne · July 22, 2016, 1:04pm

I don’t know. I suggest you test that hypothesis:

Run the TM/Observer over a dataset and record its outputs.
Don’t connect the Observer and just record the TM outputs on the same data (with the same RNG seed).
Compare.

If they don’t match exactly, then the attachment of the Observer is causing mutation in the TM, and creating erroneous output.

edit: The outputs of interest to the NAB issue are previousPredictedColumns and activeColumns.

cogmission · July 22, 2016, 1:08pm

@fergalbyrne

I did exactly that! Did you not read over the earlier messages? I even posted the Gist code that proves it?

https://gist.github.com/cogmission/1b8386700368e37e8764c4b39caff55d

It’s now proven … So I don’t what to do…? But we’ll push on…

fergalbyrne · July 22, 2016, 1:16pm

Eh, no. Look at the NAB code. It doesn’t use Anomaly.compute(). It accesses fields of the Observer, and runs methods on the Observer. You’re not testing the same thing. And the NAB code uses a standard Layer, not this SimpleLayer which doesn’t run the same methods.

As I’ve been saying all along, I suspect that the Observer and TM are interacting in some way. In the computation actually being run by the NAB code. Why don’t you just run the NAB code with and without the Observer there at all, and compare what comes out? The NAB runner doesn’t need to use the Observer, because it feeds the HTM the inputs one at a time.

cogmission · July 22, 2016, 1:23pm

@fergalbyrne

I’m sorry buddy, but I don’t believe that’s correct. The NAB interacts with HTM.Java code through the HTMModel that bridges the two (that @lscheinkman) wrote.

https://github.com/lscheinkman/NAB/blob/htm.java/nab/detectors/htmjava/src/main/java/nab/detectors/htmjava/HTMModel.java#L350

Unless there is something I’m missing? @lscheinkman/@alavin ? Can you please confirm from where the NAB is getting its data?

Also, I can’t run the NAB because I have to re-install and Update NuPIC (Yesterday I was waiting on the build to be fixed).

Are you still investigating this? Are you able to run the NAB?

Topic		Replies	Views
Performance optimization of HTM.java HTM.Java	15	1612	September 3, 2017
Help debugging the accuracy of anomaly detection on NYC_taxi data HTM.Java	1	918	November 30, 2017
Anomaly Score Difference betewen HTM studio and HTM.java HTM.Java htm	5	857	July 20, 2017
Input source for htm.java other than CSV file HTM.Java anomaly-detection	23	1372	September 25, 2018
HTM NUPIC - anomaly score - different result - same settings for SP,TM,encoder Engineering	27	1511	September 24, 2018

Help debugging HTM.java for anomaly detection

Related topics