Help debugging HTM.java for anomaly detection

This exception is caused by the incorrect path to the configuration node:

JsonNode modelParams = params.path("modelConfig").path("modelParams");
Exception in thread "main" java.lang.IllegalArgumentException: Cannot initialize this Sensor's MultiEncoder with a null settings
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoders(HTMSensor.java:641)
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoder(HTMSensor.java:600)
	at org.numenta.nupic.network.Network.setSensor(Network.java:791)
	at org.numenta.nupic.network.Region.setNetwork(Region.java:164)
	at org.numenta.nupic.network.Network.add(Network.java:723)
	at nab.detectors.htmjava.HTMModel.<init>(HTMModel.java:57)
	at nab.detectors.htmjava.HTMModel.main(HTMModel.java:346)

The next exception I got was this: caused by no resolution in the config JSON file:
best_single_metric_anomaly_params_tm_cpp.json

Exception in thread "main" java.lang.IllegalStateException: Resolution must be a positive number
	at org.numenta.nupic.encoders.RandomDistributedScalarEncoder.init(RandomDistributedScalarEncoder.java:132)
	at org.numenta.nupic.encoders.RandomDistributedScalarEncoder$Builder.build(RandomDistributedScalarEncoder.java:707)
	at org.numenta.nupic.encoders.RandomDistributedScalarEncoder$Builder.build(RandomDistributedScalarEncoder.java:1)
	at org.numenta.nupic.encoders.MultiEncoderAssembler.assemble(MultiEncoderAssembler.java:75)
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoders(HTMSensor.java:646)
	at org.numenta.nupic.network.sensor.HTMSensor.initEncoder(HTMSensor.java:600)
	at org.numenta.nupic.network.Network.setSensor(Network.java:791)
	at org.numenta.nupic.network.Region.setNetwork(Region.java:164)
	at org.numenta.nupic.network.Network.add(Network.java:723)
	at nab.detectors.htmjava.HTMModel.<init>(HTMModel.java:57)
	at nab.detectors.htmjava.HTMModel.main(HTMModel.java:346)

This exception was then caused by not having ā€œ-s 1ā€ (the skip configuration of 1)

Exception in thread "Sensor Layer [NAB Region:NAB Layer] Thread" java.lang.IllegalStateException: java.lang.IllegalArgumentException: Invalid format: "timestamp"
	at org.numenta.nupic.encoders.MultiEncoder.encodeIntoArray(MultiEncoder.java:105)
	at org.numenta.nupic.encoders.Encoder.encode(Encoder.java:625)
	at org.numenta.nupic.network.sensor.HTMSensor.input(HTMSensor.java:429)
	at org.numenta.nupic.network.sensor.HTMSensor.lambda$0(HTMSensor.java:362)
	at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193)
	at java.util.Spliterators$IteratorSpliterator.tryAdvance(Spliterators.java:1812)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.lambda$initPartialTraversalState$222(StreamSpliterators.java:294)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.fillBuffer(StreamSpliterators.java:206)
	at java.util.stream.StreamSpliterators$AbstractWrappingSpliterator.doAdvance(StreamSpliterators.java:161)
	at java.util.stream.StreamSpliterators$WrappingSpliterator.tryAdvance(StreamSpliterators.java:300)
	at java.util.Spliterators$1Adapter.hasNext(Spliterators.java:681)
	at org.numenta.nupic.network.sensor.HTMSensor$Copy.hasNext(HTMSensor.java:280)
	at java.util.Iterator.forEachRemaining(Iterator.java:115)
	at java.util.Spliterators$IteratorSpliterator.forEachRemaining(Spliterators.java:1801)
	at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
	at java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
	at java.util.stream.ForEachOps$ForEachOp.evaluateSequential(ForEachOps.java:151)
	at java.util.stream.ForEachOps$ForEachOp$OfRef.evaluateSequential(ForEachOps.java:174)
	at java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
	at java.util.stream.ReferencePipeline.forEach(ReferencePipeline.java:418)
	at org.numenta.nupic.network.Layer$5.run(Layer.java:2006)
Caused by: java.lang.IllegalArgumentException: Invalid format: "timestamp"
	at org.joda.time.format.DateTimeFormatter.parseDateTime(DateTimeFormatter.java:899)
	at org.numenta.nupic.encoders.DateEncoder.parse(DateEncoder.java:446)
	at org.numenta.nupic.FieldMetaType.decodeType(FieldMetaType.java:98)
	at org.numenta.nupic.network.sensor.HTMSensor$InputMap.get(HTMSensor.java:306)
	at org.numenta.nupic.encoders.Encoder.getInputValue(Encoder.java:699)
	at org.numenta.nupic.encoders.MultiEncoder.encodeIntoArray(MultiEncoder.java:102)
	... 20 more

Thatā€™s it. That should be all you might encounter if you donā€™t have the fixes I mentioned in placeā€¦ this is just to help while you configure the NAB settings to employ these remedies. Hopefully this helps you because Iā€™m pretty sure youā€™ll have to change a file name and/or alter what file is pointed to and then make sure the params are setup correctlyā€¦

Again, Iā€™m wondering if ā€œnā€ and ā€œwā€ need to be put in that JSON file as well?

I changed the declaration location of the output and input variables back to where they were and removed the finally clause because for some reason the local final pointer to System.out couldnā€™t be copied to another variable? Very weird but I stopped getting output to Standard Out! Thatā€™s one for the record books! Does not make senseā€¦

So if youā€™ve copied the HTMModel.java file from the gist before this post, please get it again with the latest changes.

The RDSE only needs the desired resolution, which it calculates via numBuckets in the config.

@cogmission @lscheinkman Iā€™ll try running this before we chat later this morning.

1 Like

@alavin I donā€™t think HTM.Java has a numBuckets parameter, so I may have to make allowances for this param in the RDSEā€™s Network initializationā€¦ Thatā€™s probably why there is an exception surrounding resolution

Thank you!

EDIT: Iā€™m going to add an issue for thisā€¦

You donā€™t need to add an issue for this, although it has been a point of confusion for me as well. See:

The RDSE wants a resolution, but in order to get a decent resolution, you need to know the min/max of the data. Thatā€™s all this calculation does. It cannot be ā€œbaked intoā€ the RDSE without knowing about the data.

I only intended this for HTM.Java?

The numBuckets parameter can be specified from an external config file in the NAB (and therefore NuPIC) but it not in HTM.Java.

The issue I mean to file is in HTM.Java to be able to specify this externally and have the NAPI pick it up and apply it when found?

@rhyolight - Please Check this out: https://github.com/numenta/nupic/blob/master/src/nupic/frameworks/opf/common_models/anomaly_params_random_encoder/best_single_metric_anomaly_params_tm_cpp.json#L41

ā€¦and tell me if you still recommend I not add this to HTM.Java?

Itā€™s really up to you. I donā€™t like having configuration for an encoder that isnā€™t a direct pass through into the encoder, but requires code to interpret it.

1 Like

I understandā€¦ Thatā€™s exactly what we do. The NAPI takes parameters and applies them to each encoder, thatā€™s all. By putting things in the Parameter class, we avoid asking the user to manicure each point in each componentā€™s setup. However, this isnā€™t being done for the RDSE at the present timeā€¦

Otherwise the user has to write scaffolding code like what is found within the detector and other places. Iā€™m trying to remove as much of this as possibleā€¦

@cogmission
Upon further inspection of your gist, @lscheinkman and I are confident this is not the problem. The ā€œfixesā€ you called out are handled in htmjava_detector.py:

FWIW, to run one data file through NAB at a time for debugging purposes, I recommend modifying the NAB runner to detect one dataset: simply change the multiprocess call self.pool.map(detectDataSet, args) to detectDataSet(args[0]). This allows you to debug your detector (HTM.java) on a single file without having to modify any of its internals. Alternatively, the less hacky way of doing this is detailed here in the NAB readme.

Guys,

Can someone give me explicit detailed instructions for how to run both the Python and the Java versions with the file of my choosing?

This is Greek to me. How do I run it? What are the instructions for running it for both the Python and the Java side of things? How do I specify a single file in that line? Do I need a path to the file? I donā€™t use Python, so I need this explained. Actually, this is where I said before I would hand this off to you and Luiz, but I just want to see the outputs of both languages to see the quality of the Anomaly scores for both? I also want to see what the internal parameters are for both so I can verify in a matter of fact way what is happening?

Also, from the meeting I asked for outputs from the same input file of both the Python and Java versions so I could see how the quality of the anomaly scores compares?

EDIT: From what Iā€™ve read you canā€™t get the scores when this is run in ā€œone-fileā€ mode? I feel like Iā€™m blindfolded with one hand tied behind my back. I canā€™t debug across Python and Java processes using my IDE (Eclipse). I canā€™t output things to Standard Out because thatā€™s being coopted for inter-process communication between Python and Java - so how am I supposed to debug this? I canā€™t even get the scores so that I can compare the quality of results between Python and Java without doing a 30 minute run!?!? :stuck_out_tongue:

This is why I asked for QuickTest.py to be updated with simply the Anomaly code so that I can compare it directly to QuickTest.java with no framework indirection; but Iā€™m given some NetworkAPI stuff that is useless to me because I need to see everything working and not a black box I can request results from. Iā€™m really at a loss here for what to do next? (Iā€™m very grateful for @alavinā€™s effort to do this, but it is useless to me as a means to do porting work).

Iā€™m in crisis mode hereā€¦

Hereā€™s an example of debugging an algorithm by running one NAB file ā€“ the ā€œrealKnownCause/nyc_taxi.csvā€. In nab/runner.py, this would be your detect() method:

	def detect(self, detectors):
		"""Generate results file given a dictionary of detector classes

		Function that takes a set of detectors and a corpus of data and creates a
		set of files storing the alerts and anomaly scores given by the detectors

		@param detectors     (dict)         Dictionary with key value pairs of a
																				detector name and its corresponding
																				class constructor.
		"""
		print "\nRunning detection step"

		count = 0
		args = []
		for detectorName, detectorConstructor in detectors.iteritems():
			for i, (relativePath, dataSet) in enumerate(self.corpus.dataFiles.iteritems()):

				if self.corpusLabel.labels.has_key(relativePath):
					args.append(
						(
							count,
							detectorConstructor(
								dataSet=dataSet,
								probationaryPercent=self.probationaryPercent),
							detectorName,
							self.corpusLabel.labels[relativePath]["label"],
							self.resultsDir,
							relativePath
						)
					)
					if "nyc_taxi" in relativePath:
						detectDataSet(args[i])

					count += 1

		# self.pool.map(detectDataSet, args)

And then I run from the command line python run.py --skipConfirmation -d htmjava --detect. NAB automagically writes the detection results to CSV files. For this example youā€™ll find it in ā€œ/results/htmjava/realKnownCause/htmjava_nyc_taxi.csvā€. If you want to compare to any results file with the numenta detector, all of those are in the numentaTM dir of the repo results.

If you want NAB to detect and score for a single (or subset) of files, follow the approach I previously linked: https://github.com/numenta/NAB#run-subset-of-nab-data-files. Your command line run would be python run.py -d htmjava --detect --score --windowsFile labels/combined_windows_tiny.json. Youā€™ll see that after the scoring step, NAB writes scoring info into those same results CSVs.

1 Like

I looked for examples of this, but they all involve an abstraction through a Region. I donā€™t see any examples of SP / TM ==> Anomaly.

@rhyolight

We canā€™t be satisfied with that. Itā€™s obvious that whoever wrote the Region knows how to pass the data and what should be passed? If itā€™s done within a Region it can be done outside of it? Can you ask the author or an expert who knows what should be passed and how? I mean the Anomaly tests do it - why canā€™t we do it? I mean after all - I even did it. (In Java, and maybe not correctly - which is why I need to see it for myself in Python) :stuck_out_tongue:

You donā€™t need an expert to tell you that. I can tell you that. I think weā€™ve already discussed what should be passed. The Anomaly.compute() function takes the current active columns from the SP, the previously predicted columns from the TM, and the raw input value.

You can see this in the CLAModel:

I donā€™t understand why you wrote this then?

@alavin,

Thank you Alex. I really appreciate that. This should be very helpful in diagnosing things.

You have been asking for an explicit example of data coming from a SpatialPooler instance and a TemporalMemory instance, passing directly into an Anomaly instance, right? I donā€™t see any examples of this that do not involve the Region abstraction. But you can see what the Anomaly.compute wants just by looking at how it is used in the CLAModel. But that data is not coming directly from SP / TM objects like you are asking.

Ohā€¦ I see what youā€™re sayingā€¦ You arenā€™t telling me my request is impossible to help with, right? fingers crossed :slight_smile:

No, of course not. But it is going to take some time to put together the example you are asking for.

1 Like

I finally see (from the other side) why I was taught something when I used to write programs for Stock Brokers. My manager told me thatā€¦,

ā€œā€¦sometimes people donā€™t always want to hear every little step in your thinking or approach to a problem because they view every interim step as something final. Sometimes you just need to keep it to yourself until you come to them with a solution.ā€

Now I understand why, and why I used to get Stock Brokers freaking out on meā€¦ " :wink:

Errā€¦ Besides the fact that they were freaks in a very tense environment to begin withā€¦ :stuck_out_tongue: