Can we get TM predictedCells from NuPIC ModelFactory models?

sheiser1 · September 27, 2019, 5:15am

Anyone know if its possible to get the number of predicted TM cells out of a ModelFactory model?

I’m creating these on the fly and storing them in a dictionary as they’re continuously updated. This allows multivariate files to be streamed in and modeled field-by-field, with no prior exploration.

I know TM objects contain the predictedCells, though I’m not sure if I can extract the TM object from a ModelFactory model object.

Thanks as always!

rhyolight · September 27, 2019, 3:57pm

model._getSPRegion().getSelf().getAlgorithmInstance() gets you the SP
model._getTPRegion().getSelf().getAlgorithmInstance() gets you the TM (bad naming I know)

There’s a partial example of this hidden in the serialization docs. It’s not an official API, obviously.

sheiser1 · September 27, 2019, 10:37pm

I got these attributes when looking into this object, it seems to have a lot of stuff though not the getPredictiveCells() function exactly.

Seems its actually a BacktrackingTM.cpp object:

type(myTM)
<class 'nupic.algorithms.backtracking_tm_cpp.BacktrackingTMCPP'>

I think I’ve actually got it here

getPredictedState()
:returns: numpy array of predicted cells, representing the current predicted
  state. ``predictedCells[c][i]`` represents the state of the i'th cell in 
  the c'th column.

sheiser1 · May 3, 2020, 7:54am

Hey NuPIC people (like @rhyolight if I can bug you here)

So I’m trying to get the amount of predictions coming out of TM along with the anomaly score, and getting conflicting output, between:

getPredictedState()
and
result.inferences["anomalyScore"]

It’s saying the anomaly score is for instance 0.5, but it only shows 1 predicted column. This seems impossible, since only 1 predicted column should cause an anomaly score ~1.

I’m using not using an SP, just a multi RDSE encoder with total size = 2100 (700 * 3 fields), and 111 active columns (37 per fields) – so not as sparse as usual.

'model': 'HTMPrediction'
'inferenceType': 'TemporalAnomaly'
'temporalImp': 'cpp'

Here are the functions I wrote, which are now yielding this weird stuff:

That precisionScore value is supposed to be between 0 and 1 – since count_correctly_predicted_columns should be <= len(predCols). But its not turning out that way.

Any idea where I may be going wrong??
I’ve been digging thru the source pulling out infPredictiveStatte from the TM object and trying to find where the TM’s getOutputData method is defined, as used here:

I figure this method must be access the right infActiveState and infPredictedState, which I wonna get hold of for my precision measure.

Thanks again!!

sheiser1 · May 4, 2020, 8:07pm

To simplify, here’s the sequence of calls:

init model

model = ModelFactory.create(modelConfig=params[“modelConfig”]) model.enableInference(params[“inferenceArgs”])

train & save model

result = model.run(training_data) (looping over each row)
model.save(save_path)

load in model & disable learning

loaded_model = model.load(save_path)
loaded_model.disableLearning()

feed in test data

result = model.run(test_data) (looping over each row)
anomaly_score = result.inferences[‘anomalyScore’]
TM = model._getTPRegion().getSelf().getAlgorithmInstance()
prediction_density = TM.getPredictedState().sum()

The problem is that the prediction density doesn’t match with the anomaly score.
For example:

or

or even

Where activeColumns comes from:

    SP = model._getSPRegion()
    if SP is not None:
        activeColumns = SP.getOutputData("bottomUpOut").nonzero()[0]
    else:
        sensor = model._getSensorRegion()
        activeColumns = sensor.getOutputData('dataOut').nonzero()[0]

And predCols comes from:

TM_infPredState = TM_obj.getPredictedState()
predColumns = [i for i in range(len(TM_infPredState)) if 0 < sum(TM_Cols_PrevPredStates[i]) ]

Here is my model config:

    modelConfig = \
        {'aggregationInfo': {'days': 0,
                             'fields': [],
                             'hours': 0,
                             'microseconds': 0,
                             'milliseconds': 0,
                             'minutes': 0,
                             'months': 0,
                             'seconds': 0,
                             'weeks': 0,
                             'years': 0},
         'model': 'HTMPrediction',
         'modelParams': {'anomalyParams': {u'anomalyCacheRecords': None,
                                           u'autoDetectThreshold': None,
                                           u'autoDetectWaitRecords': None},
                         'inferenceType': 'TemporalAnomaly',
                         'sensorParams': {
                             'encoders': encoder_dict,
                             'sensorAutoReset': None,
                             'verbosity': 0},
                         'spEnable': False,
                         'spParams': {'columnCount': 2048,
                                      'globalInhibition': 1,
                                      'inputWidth': 0,
                                      'boostStrength': 2.0,
                                      'numActiveColumnsPerInhArea': 40,
                                      'potentialPct': 0.8,
                                      'seed': 1956,
                                      'spVerbosity': 0,
                                      'spatialImp': 'cpp',
                                      'synPermActiveInc': 0.05,
                                      'synPermConnected': 0.1,
                                      'synPermInactiveDec': 0.08568228006654939},
                         'tmEnable': True,
                         'tmParams': {'activationThreshold': 12,
                                      'cellsPerColumn': 32,
                                      'columnCount': tm_colcount,  ## 2100 (3 encoders, each 700 width)
                                      'computePredictedActiveCellIndices': True,
                                      'globalDecay': 0.0,
                                      'initialPerm': 0.21,
                                      'inputWidth': tm_colcount,  ## 2100 (3 encoders, each 700 width)
                                      'maxInfBacktrack': 10,
                                      'maxLrnBacktrack': 5,
                                      'maxAge': 0,
                                      'maxSegmentsPerCell': 128,
                                      'maxSynapsesPerSegment': 32,
                                      'minThreshold': 10,
                                      'newSynapseCount': 20,
                                      'outputType': 'normal',
                                      'pamLength': 1,
                                      'permanenceDec': 0.1,
                                      'permanenceInc': 0.1,
                                      'seed': 1960,
                                      'temporalImp': 'cpp',
                                      'verbosity': 0},
                         'clEnable': False,
                         'clParams': None,
                         'trainSPNetOnlyIfRequested': False},
         'predictAheadTime': None,
         'version': 1}

Could I maybe bug you @Scott as well?
Sorry about this! I wouldn’t be asking if I hadn’t already dug around for many hours!

sheiser1 · May 4, 2020, 8:59pm

Wait, is this precision already being measured in TM’s _internalStats?
If I understand them right, this should be a valid calculation!?:

TM = TRAIN_model._getTPRegion().getSelf().getAlgorithmInstance()
TM_stats = TM._internalStats
winnerCellsCount = len(activeColumns)
Anomaly_Score = TM_stats[‘curMissing’] / winnerCellsCount
Precision_Score = TM_stats[‘curExtra’] / winnerCellsCount

sheiser1 · May 6, 2020, 1:50am

Ok update. I’n now using:

result = model.run(inputRecord=modelInput)                
sensor = model._getSensorRegion()
activeColumns = sensor.getOutputData('dataOut').nonzero()[0]

TM = model._getTPRegion().getSelf().getAlgorithmInstance()
prevPredictedColumns = TM.topDownCompute().copy().nonzero()[0]
predActiveCols = [c for c in prevPredictedColumns if c in activeColumns]
anomalyScore = 1 - ( len(predActiveCols) / float(len(activeColumns)) )                
precisionScore = len(predActiveCols) / float(len(prevPredictedColumns))

total_distance = anomalyScore + (1 - precisionScore)

I’m using TM.topDownCompute() because TM.infPredictedState is almost always nearly empty. This confuses me, since the anomaly scores from result.inferences aren’t always high as they should be in theory given the very sparse TM.infPredictedState.

It’s also confused how my anomalyScore is sometimes aligned with results.inference['anomalyScore'], but most often not .

rhyolight · May 6, 2020, 2:59pm

These are not easy questions. I’ll try to dig into it later today, but I’m not sure how much I can help.

sheiser1 · May 6, 2020, 6:07pm

Hey @rhyolight! Thank you! I really appreciate the help.

The core problem if you can replicate it, is that the TM.getPredictedState() function is showing too few predictions, given the anomaly scores in results.inferences['anomalyScore'].

rhyolight · May 6, 2020, 8:48pm

I’m not so sure about this. It might make more sense if you rename winnerCellsCount to numActiveColumns. Thinking in that way, does this line still make sense?

count_correctly_predicted_columns = round( (1-anomaly_score) * numActiveColumns, 0)

The anomaly score calculation is:

count_correctly_predicted_columns is something you don’t really know at this point, right? I don’t understand the equation and what the return_precision is trying to do. Are you trying to guess the precision of the current predictions based on the anomaly score?

sheiser1 · May 6, 2020, 10:35pm

Yes, I was using the anomaly score to figure out how many columns must’ve been correctly predicted.

I’m trying to compare the number of columns predicted total to the number of columns predicted correct.

The idea is: an anomaly score of 0 should be seen differently when it took more predictions to achieve it. More predictions means less precision.

rhyolight:

It might make more sense if you rename winnerCellsCount to numActiveColumns . Thinking in that way, does this line still make sense?
count_correctly_predicted_columns = round( (1-anomaly_score) * numActiveColumns, 0)

Yes the renaming makes sense, numActiveColumns is what I meant.
My latest code now instead uses:

    activeColumns = sensor.getOutputData('dataOut').nonzero()[0]

    prevPredictedColumns = TM.topDownCompute().copy().nonzero()[0] 

    predActiveCols = [c for c in prevPredictedColumns if c in activeColumns] 

    anomalyScore = 1 - ( len(predActiveCols) / float(len(activeColumns)) ) 

    precisionScore = len(predActiveCols) / float(len(prevPredictedColumns))

Precision is the proportion of predicted to actually activate.
I see this as a false positive rate, to go with the false negative rate of anomaly score.

The thing I don’t get is why
TM.topDownCompute()
yields so many more predictions than
TM.getPredictedState()

I’ve traced this to _inferPhase2().
The colConfidence values returned by topDownCompute() are being incremented, while the infPredictedState returned by getPredictedState() isn’t.

This seems to mean that
numActiveSyns >= activationThreshold
but
_isSegmentActive(s, infActiveState['t']) = False

Also I set both of these = True in the config:

anomalyMode
computePredictedActiveCellIndices

rhyolight · May 6, 2020, 10:45pm

At this point in your code:

anomalyScore = 1 - ( len(predActiveCols) / float(len(activeColumns)) )

Is your computed anomalyScore different from the one in result.inferences["anomalyScore"]?

sheiser1 · May 7, 2020, 4:48am

Yes, its almost always different!!

My anomaly calculation here uses
TM.topDownCompute()
which returns
TM._columnConfidences()

Wheres the standard anomaly score from
result.inferences
uses
TM.infPredictedState

rhyolight · May 7, 2020, 11:25pm

Sorry it is taking me awhile to get back into this code. I’m still hung up on the anomaly score formula. Here is your code to get your own anomaly score:

activeColumns = sensor.getOutputData('dataOut').nonzero()[0]
prevPredictedColumns = TM.topDownCompute().copy().nonzero()[0] 
predActiveCols = [c for c in prevPredictedColumns if c in activeColumns] 
anomalyScore = 1 - ( len(predActiveCols) / float(len(activeColumns)) )

And here is the formula:

What would you say is equivalent to this in your code?

Screen Shot 2020-05-07 at 4.22.26 PM

It looks like it is predActiveCols. If so, wouldn’t the last line be something more like

anomalyScore = (len(activeColumns) - len(predActiveCols)) / len(activeColumns)

Am I making sense?

sheiser1 · May 7, 2020, 11:44pm

Absolutely!

Yes, I think it should yield the same as this right? (with the subtraction done differently):

I’m all but positive that the source of the difference is my using:
TM.topDownCompute()
as opposed to:
TM.self.infPredictedState['t']()

sheiser1 · May 8, 2020, 12:02am

Yes our anomaly score calculations are equivalent, just checked:

rhyolight · May 8, 2020, 3:41pm

I don’t know what is going on here. I’ve asked for help, but everyone is neck-deep in ML optimizations for an internal milestone. Sorry, I’m a little out of my depth.

sheiser1 · May 8, 2020, 4:56pm

Awesome, thank you!!

I know NuPIC is in maintenance mode and not a priority for you guys at this point.

If there is a bug as it appears it’d be great to find it!

I’ve been digging through the source and glad to investigate any ideas here!

Topic		Replies	Views
Easy way of getting predictive cells from a network model? NuPIC	4	555	April 26, 2018
predictedActiveCells for TMRegion does not seem to work NuPIC sequence-memory , prediction	5	595	July 19, 2019
TM's functions NuPIC	1	571	October 11, 2017
How to get predicted and active cells, active minicolumns, etc NuPIC	3	639	February 14, 2019
TM output represented as SDR for clustering on SDRs Machine Learning sequence-memory , question	1	899	April 23, 2020

Can we get TM predictedCells from NuPIC ModelFactory models?

Related topics