Help debugging HTM.java for anomaly detection

cogmission · July 25, 2016, 2:14pm

This exception is caused by the incorrect path to the configuration node:

JsonNode modelParams = params.path("modelConfig").path("modelParams");

Exception in thread "main" java.lang.IllegalArgumentException: Cannot initialize this Sensor's MultiEncoder with a null settings
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoders(HTMSensor.java:641)
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoder(HTMSensor.java:600)
	at org.numenta.nupic.network.Network.setSensor(Network.java:791)
	at org.numenta.nupic.network.Region.setNetwork(Region.java:164)
	at org.numenta.nupic.network.Network.add(Network.java:723)
	at nab.detectors.htmjava.HTMModel.<init>(HTMModel.java:57)
	at nab.detectors.htmjava.HTMModel.main(HTMModel.java:346)

The next exception I got was this: caused by no resolution in the config JSON file:
best_single_metric_anomaly_params_tm_cpp.json

Exception in thread "main" java.lang.IllegalStateException: Resolution must be a positive number
	at org.numenta.nupic.encoders.RandomDistributedScalarEncoder.init(RandomDistributedScalarEncoder.java:132)
	at org.numenta.nupic.encoders.RandomDistributedScalarEncoder$Builder.build(RandomDistributedScalarEncoder.java:707)
	at org.numenta.nupic.encoders.RandomDistributedScalarEncoder$Builder.build(RandomDistributedScalarEncoder.java:1)
	at org.numenta.nupic.encoders.MultiEncoderAssembler.assemble(MultiEncoderAssembler.java:75)
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoders(HTMSensor.java:646)
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoder(HTMSensor.java:600)
	at org.numenta.nupic.network.Network.setSensor(Network.java:791)
	at org.numenta.nupic.network.Region.setNetwork(Region.java:164)
	at org.numenta.nupic.network.Network.add(Network.java:723)
	at nab.detectors.htmjava.HTMModel.<init>(HTMModel.java:57)
	at nab.detectors.htmjava.HTMModel.main(HTMModel.java:346)

This exception was then caused by not having “-s 1” (the skip configuration of 1)

Exception in thread "Sensor Layer [NAB Region:NAB Layer] Thread" java.lang.IllegalStateException: java.lang.IllegalArgumentException: Invalid format: "timestamp"
	at org.numenta.nupic.encoders.MultiEncoder.encodeIntoArray(MultiEncoder.java:105)
	at org.numenta.nupic.encoders.Encoder.encode(Encoder.java:625)
	at org.numenta.nupic.network.sensor.HTMSensor.input(HTMSensor.java:429)
	at org.numenta.nupic.network.sensor.HTMSensor.lambda$0(HTMSensor.java:362)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$222(StreamSpliterators.java:294)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at org.numenta.nupic.network.sensor.HTMSensor$Copy.hasNext(HTMSensor.java:280)
	at java.util.Iterator.forEachRemaining(Iterator.java:115)
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at org.numenta.nupic.network.Layer$5.run(Layer.java:2006)
Caused by: java.lang.IllegalArgumentException: Invalid format: "timestamp"
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:899)
	at org.numenta.nupic.encoders.DateEncoder.parse(DateEncoder.java:446)
	at org.numenta.nupic.FieldMetaType.decodeType(FieldMetaType.java:98)
	at org.numenta.nupic.network.sensor.HTMSensor$InputMap.get(HTMSensor.java:306)
	at org.numenta.nupic.encoders.Encoder.getInputValue(Encoder.java:699)
	at org.numenta.nupic.encoders.MultiEncoder.encodeIntoArray(MultiEncoder.java:102)
	... 20 more

That’s it. That should be all you might encounter if you don’t have the fixes I mentioned in place… this is just to help while you configure the NAB settings to employ these remedies. Hopefully this helps you because I’m pretty sure you’ll have to change a file name and/or alter what file is pointed to and then make sure the params are setup correctly…

Again, I’m wondering if “n” and “w” need to be put in that JSON file as well?

cogmission · July 25, 2016, 2:53pm

I changed the declaration location of the output and input variables back to where they were and removed the finally clause because for some reason the local final pointer to System.out couldn’t be copied to another variable? Very weird but I stopped getting output to Standard Out! That’s one for the record books! Does not make sense…

So if you’ve copied the HTMModel.java file from the gist before this post, please get it again with the latest changes.

alavin · July 25, 2016, 3:58pm

The RDSE only needs the desired resolution, which it calculates via numBuckets in the config.

@cogmission @lscheinkman I’ll try running this before we chat later this morning.

cogmission · July 25, 2016, 4:22pm

@alavin I don’t think HTM.Java has a numBuckets parameter, so I may have to make allowances for this param in the RDSE’s Network initialization… That’s probably why there is an exception surrounding resolution

Thank you!

EDIT: I’m going to add an issue for this…

rhyolight · July 25, 2016, 4:28pm

You don’t need to add an issue for this, although it has been a point of confusion for me as well. See:

github.com

numenta/nupic.workshop/blob/master/part-1-scalar-input/run_prediction.py#L36-L41


# RDSE - resolution calculation
valueEncoderParams = \
  modelParams["modelParams"]["sensorParams"]["encoders"]["value"]
numBuckets = float(valueEncoderParams.pop("numBuckets"))
resolution = max(0.001, (maxInput - minInput) / numBuckets)
valueEncoderParams["resolution"] = resolution

The RDSE wants a resolution, but in order to get a decent resolution, you need to know the min/max of the data. That’s all this calculation does. It cannot be “baked into” the RDSE without knowing about the data.

cogmission · July 25, 2016, 4:31pm

I only intended this for HTM.Java?

The numBuckets parameter can be specified from an external config file in the NAB (and therefore NuPIC) but it not in HTM.Java.

The issue I mean to file is in HTM.Java to be able to specify this externally and have the NAPI pick it up and apply it when found?

@rhyolight - Please Check this out: https://github.com/numenta/nupic/blob/master/src/nupic/frameworks/opf/common_models/anomaly_params_random_encoder/best_single_metric_anomaly_params_tm_cpp.json#L41

…and tell me if you still recommend I not add this to HTM.Java?

rhyolight · July 25, 2016, 4:39pm

It’s really up to you. I don’t like having configuration for an encoder that isn’t a direct pass through into the encoder, but requires code to interpret it.

cogmission · July 25, 2016, 4:42pm

I understand… That’s exactly what we do. The NAPI takes parameters and applies them to each encoder, that’s all. By putting things in the Parameter class, we avoid asking the user to manicure each point in each component’s setup. However, this isn’t being done for the RDSE at the present time…

Otherwise the user has to write scaffolding code like what is found within the detector and other places. I’m trying to remove as much of this as possible…

alavin · July 25, 2016, 11:24pm

@cogmission
Upon further inspection of your gist, @lscheinkman and I are confident this is not the problem. The “fixes” you called out are handled in htmjava_detector.py:

The modelParams “node” is bypassed here, and the modelConfig dict is later piped to the Java side here.
The renaming of the “timestamp” and “value” config fields is already done.

FWIW, to run one data file through NAB at a time for debugging purposes, I recommend modifying the NAB runner to detect one dataset: simply change the multiprocess call self.pool.map(detectDataSet, args) to detectDataSet(args[0]). This allows you to debug your detector (HTM.java) on a single file without having to modify any of its internals. Alternatively, the less hacky way of doing this is detailed here in the NAB readme.

cogmission · July 26, 2016, 2:06am

Guys,

Can someone give me explicit detailed instructions for how to run both the Python and the Java versions with the file of my choosing?

This is Greek to me. How do I run it? What are the instructions for running it for both the Python and the Java side of things? How do I specify a single file in that line? Do I need a path to the file? I don’t use Python, so I need this explained. Actually, this is where I said before I would hand this off to you and Luiz, but I just want to see the outputs of both languages to see the quality of the Anomaly scores for both? I also want to see what the internal parameters are for both so I can verify in a matter of fact way what is happening?

Also, from the meeting I asked for outputs from the same input file of both the Python and Java versions so I could see how the quality of the anomaly scores compares?

EDIT: From what I’ve read you can’t get the scores when this is run in “one-file” mode? I feel like I’m blindfolded with one hand tied behind my back. I can’t debug across Python and Java processes using my IDE (Eclipse). I can’t output things to Standard Out because that’s being coopted for inter-process communication between Python and Java - so how am I supposed to debug this? I can’t even get the scores so that I can compare the quality of results between Python and Java without doing a 30 minute run!?!?

This is why I asked for QuickTest.py to be updated with simply the Anomaly code so that I can compare it directly to QuickTest.java with no framework indirection; but I’m given some NetworkAPI stuff that is useless to me because I need to see everything working and not a black box I can request results from. I’m really at a loss here for what to do next? (I’m very grateful for @alavin’s effort to do this, but it is useless to me as a means to do porting work).

I’m in crisis mode here…

alavin · July 26, 2016, 4:19pm

Here’s an example of debugging an algorithm by running one NAB file – the “realKnownCause/nyc_taxi.csv”. In nab/runner.py, this would be your detect() method:

	def detect(self, detectors):
		"""Generate results file given a dictionary of detector classes

		Function that takes a set of detectors and a corpus of data and creates a
		set of files storing the alerts and anomaly scores given by the detectors

		@param detectors     (dict)         Dictionary with key value pairs of a
																				detector name and its corresponding
																				class constructor.
		"""
		print "\nRunning detection step"

		count = 0
		args = []
		for detectorName, detectorConstructor in detectors.iteritems():
			for i, (relativePath, dataSet) in enumerate(self.corpus.dataFiles.iteritems()):

				if self.corpusLabel.labels.has_key(relativePath):
					args.append(
						(
							count,
							detectorConstructor(
								dataSet=dataSet,
								probationaryPercent=self.probationaryPercent),
							detectorName,
							self.corpusLabel.labels[relativePath]["label"],
							self.resultsDir,
							relativePath
						)
					)
					if "nyc_taxi" in relativePath:
						detectDataSet(args[i])

					count += 1

		# self.pool.map(detectDataSet, args)

And then I run from the command line python run.py --skipConfirmation -d htmjava --detect. NAB automagically writes the detection results to CSV files. For this example you’ll find it in “/results/htmjava/realKnownCause/htmjava_nyc_taxi.csv”. If you want to compare to any results file with the numenta detector, all of those are in the numentaTM dir of the repo results.

If you want NAB to detect and score for a single (or subset) of files, follow the approach I previously linked: https://github.com/numenta/NAB#run-subset-of-nab-data-files. Your command line run would be python run.py -d htmjava --detect --score --windowsFile labels/combined_windows_tiny.json. You’ll see that after the scoring step, NAB writes scoring info into those same results CSVs.

rhyolight · July 26, 2016, 4:22pm

I looked for examples of this, but they all involve an abstraction through a Region. I don’t see any examples of SP / TM ==> Anomaly.

cogmission · July 26, 2016, 4:29pm

@rhyolight

We can’t be satisfied with that. It’s obvious that whoever wrote the Region knows how to pass the data and what should be passed? If it’s done within a Region it can be done outside of it? Can you ask the author or an expert who knows what should be passed and how? I mean the Anomaly tests do it - why can’t we do it? I mean after all - I even did it. (In Java, and maybe not correctly - which is why I need to see it for myself in Python)

rhyolight · July 26, 2016, 4:37pm

You don’t need an expert to tell you that. I can tell you that. I think we’ve already discussed what should be passed. The Anomaly.compute() function takes the current active columns from the SP, the previously predicted columns from the TM, and the raw input value.

github.com

rhyolight/nupic/blob/17a2320b7e23f28de63522fb3c41af639c499639/src/nupic/algorithms/anomaly.py#L120-L121


      
            def compute(self, activeColumns, predictedColumns, 
          			inputValue=None, timestamp=None):

You can see this in the CLAModel:

github.com

rhyolight/nupic/blob/b6993db4fea2a6bdf0ec5f7fd346826fb4dacdeb/src/nupic/frameworks/opf/clamodel.py#L651-L654


      
          score = self._anomalyInst.compute(
                                       activeColumns,
                                       self._prevPredictedColumns,
                                       inputValue=self._input[self._predictedFieldName])

cogmission · July 26, 2016, 4:39pm

I don’t understand why you wrote this then?

cogmission · July 26, 2016, 4:43pm

@alavin,

Thank you Alex. I really appreciate that. This should be very helpful in diagnosing things.

rhyolight · July 26, 2016, 4:43pm

You have been asking for an explicit example of data coming from a SpatialPooler instance and a TemporalMemory instance, passing directly into an Anomaly instance, right? I don’t see any examples of this that do not involve the Region abstraction. But you can see what the Anomaly.compute wants just by looking at how it is used in the CLAModel. But that data is not coming directly from SP / TM objects like you are asking.

cogmission · July 26, 2016, 4:46pm

Oh… I see what you’re saying… You aren’t telling me my request is impossible to help with, right? fingers crossed

rhyolight · July 26, 2016, 4:47pm

No, of course not. But it is going to take some time to put together the example you are asking for.

cogmission · July 26, 2016, 4:51pm

I finally see (from the other side) why I was taught something when I used to write programs for Stock Brokers. My manager told me that…,

“…sometimes people don’t always want to hear every little step in your thinking or approach to a problem because they view every interim step as something final. Sometimes you just need to keep it to yourself until you come to them with a solution.”

Now I understand why, and why I used to get Stock Brokers freaking out on me… "

Err… Besides the fact that they were freaks in a very tense environment to begin with…

Topic		Replies	Views
New Release of HTM.Java! (v0.6.9-alpha) HTM.Java	2	798	October 1, 2016
Noobie Question: How to use NuPIC for a NAB dataset? NAB	9	1463	November 23, 2019
Testing NAB with HTM.java with different configuration NAB question , htm	3	848	May 10, 2018
Anomaly Score Difference betewen HTM studio and HTM.java HTM.Java htm	5	857	July 20, 2017
Understanding Anomaly detection through HTM Implementations question , community , nupic-wiki	1	865	October 15, 2021

Help debugging HTM.java for anomaly detection

Related topics