SDRClassifier returning confidence of NaN

PhilGoddard · May 7, 2019, 5:15pm

I’ve created various models where after the model has run for some time the classifier confidence value starts returning NaN.

As a sanity check, I’ve gone back to the standard hotgym example (http://nupic.docs.numenta.org/stable/quick-start/network.html) and found that it too suffers from this problem.
For data, I am using the rec-center-hourly.csv file (https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/prediction/one_gym), but I am replicating/extending the data to have 30000 time steps (rather than the default 4394 times points).
Nothing else is being changed from the default hotgym code/model.
Specifically, at about the 19000th data point, the 5-step ahead confidence starts returning NaN.

What would be causing this?
Is it a bug?
Is there any way to guarantee that the classifier will not return a NaN?

rhyolight · May 8, 2019, 2:34pm

This is probably a bug, so I created a bug report.

I wonder if the SDR Classifier in the community fork has the same problem? cc @David_Keeney @breznak @dmac

breznak · May 8, 2019, 4:33pm

Can you try to replicate this and give us a test? Problem is you’re probably using python OPF framework, which requires nupic.bindings from nupic.core, and is not compatible with our bindings from nupic.cpp.

On the other hand, we have a C++ version of the hotgym benchmark, but you’d need to add Classifier (+ Predictor, that’s how we call the N-steps ahead prediction feature). See Real-life benchmark: Hotgym example using C++ algorithms · Issue #30 · htm-community/htm.core · GitHub

Ideally, if you can create a synthetic unit-test that crashes on nupic.core, so I can adapt and test it at nupic.cpp.

There’s a high chance we’d fixed this, as our classifier has been revamped a lot.

PhilGoddard · May 8, 2019, 4:53pm

We’re using the Network API version (as per the link in my original posting) not the OPF version.

The only thing changed is that a longer data set is passed through the model.

I’ve attached a code package with example data to the bug report.

breznak · May 8, 2019, 5:41pm

Perfect, that means we can run it easily, we should be 100% NetworkAPI compatible!

Would you be able to create a PR to Pull requests · htm-community/htm.core · GitHub ? If not, I’ll get to it the next week.

@David_Keeney will be excited to hear this, as he’s our main NetworkAPI pusher!

breznak · May 8, 2019, 5:47pm

yep, I think community repo should be ready to give your example a shot

A PR is best welcome!

rhyolight · May 8, 2019, 8:47pm

A post was merged into an existing topic: Will the deprecated OPF be replaced?

rhyolight · May 8, 2019, 6:13pm

2 posts were split to a new topic: Is the OPF deprecated?

rhyolight · May 8, 2019, 8:51pm

@PhilGoddard Assuming this is the code we’re talking about, can you get access to some of these values during a compute cycle where predictionConfidence is NaN?

Does classifierRegion.getOutputData("probabilities") or classifierRegion.getOutputData("actualValues") look any different in structure?

def getPredictionResults(network, clRegionName):
  """Get prediction results for all prediction steps."""
  classifierRegion = network.regions[clRegionName]
  actualValues = classifierRegion.getOutputData("actualValues")
  probabilities = classifierRegion.getOutputData("probabilities")
  steps = classifierRegion.getSelf().stepsList
  N = classifierRegion.getSelf().maxCategoryCount
  results = {step: {} for step in steps}
  for i in range(len(steps)):
    # stepProbabilities are probabilities for this prediction step only.
    stepProbabilities = probabilities[i * N:(i + 1) * N - 1]
    mostLikelyCategoryIdx = stepProbabilities.argmax()
    predictedValue = actualValues[mostLikelyCategoryIdx]
    predictionConfidence = stepProbabilities[mostLikelyCategoryIdx]
    results[steps[i]]["predictedValue"] = predictedValue
    results[steps[i]]["predictionConfidence"] = predictionConfidence
  return results

I will try to replicate and work on this issue in my live-stream tomorrow. I have a bunch of other community NuPIC issues to work on.

PhilGoddard · May 9, 2019, 3:30pm

Yes, that’s the function being used.

Not sure what you mean by “different in structure” – they are the expected size and data type.

Superficially the problem is that probabilities has at least one NaN value in it, and argmax is returning the index of that NaN, so predictedConfidence is NaN.

An initial thought was to get the index of the max probability ignoring NaN values. But that just masks the problem - it doesn’t fix it, and doesn’t explain why there is a NaN in the first place.

rhyolight · May 9, 2019, 4:00pm

QQ: are you still sending in timestamps in the extended data? Are they duplicates of existing data the model has seen? Or do they continue forward in time logically?

PhilGoddard · May 9, 2019, 5:59pm

They continue forward in time logically. See the example input data file included in the code package attached to the bug report.

rhyolight · May 9, 2019, 7:25pm

Phil, I tried debugging this today for awhile (video coming soon). It looks like a bug in the C++ SDRClassifier. If you change your implementation to py, does it work?

clParams:
    implementation: py

PhilGoddard · May 9, 2019, 8:01pm

Yes, changing to py eliminates the NaN’s.

Can you enunciate on what the error is in the cpp implementation?

I guess I’d always assumed that the py implementation was really just a wrapper around the cpp implementation, rather than being a completely separate implementation.

rhyolight · May 9, 2019, 8:10pm

I wish I knew. I had trouble debugging my python 2 / C++ build envioronment this morning and ended up pip install nupic so I could not debug into the C.

Yeah me too. The only reason we’d do this in C is for speed. Strangely, this bug depends not only on how much data as been seen by the model, but also on the prediction step value. I have only see this happen on 5-step predictions (not 1, not 4, not 50). I’m a bit flummoxed by this. The problem I suspect is in the weight matrix somewhere evaluating to 0 in some calculation and causing a divide/0 errror. These matrices are summed in order to get the output. If any of them are nan, it could cause this.

PhilGoddard · May 9, 2019, 8:20pm

In one of our “real” models we saw the problem at 20 steps ahead – which is the model that led to starting this thread.

I agree that a div by zero somewhere is the most likely culprit.

rhyolight · May 9, 2019, 8:36pm

breznak · May 9, 2019, 10:22pm

Out of curiosity, how is HTM performing at ~20 steps ahead prediction? As that sounds pretty impressive.

OT, as I recall, all this >1 steps ahead is a hack, because natively HTM predicts only the next step. So I imagine something like for i in steps: pred = HTM(pred)

PhilGoddard · May 9, 2019, 11:18pm

It’s OK, although not as good as 1 step ahead.

pulinagrawal · May 10, 2019, 12:33am

I cannot be sure yet because I have not been able to give enough time to cpp implementation, but this could possibly be related to the issue. If some one gets to debugging this in cpp, this may be something to look at.

Topic		Replies	Views
Confused with SDR classifier and very small alpha values NuPIC question , classification	4	855	July 31, 2016
Nupic Anomoly Detection NuPIC question	1	482	June 7, 2019
SDR Classifier (am I using it right?) NuPIC question , classification	2	633	January 5, 2021
Hot Gym 96 step ahead prediction: AssertionError NuPIC	5	1359	October 6, 2016
0step ahead prediction NuPIC question	3	898	December 14, 2016

SDRClassifier returning confidence of NaN

Related topics