SDR Classifier question

c.compute(recordNum=0, patternNZ=[1, 5, 9],
                  classification={"bucketIdx": 4, "actValue": 34.7},
                  learn=True, infer=False)

In the SDR classifier, am I right to assume the following?

  • bucketIdx is an index for the target class label while

  • patternNZ are the active indices of the encoded raw data in an SDR?

classification –
Dict of the classification information where:
bucketIdx: list of indices of the encoder bucket
actValue: list of actual values going into the encoder
Classification could be None for inference mode.

What exactly is “actValue”, the documentation says it is a list, but in the snippet given a float is used. Is actValue the raw values before being encoded?

Thanks!

bucketIdx is an index for the target class label while

Yes and in the case of numeric inputs, the classifier requires that they are separated into discrete buckets and bucketIdx would be the index of the actual value (while the actual numeric value is passed as actValue)

patternNZ are the active indices of the encoded raw data in an SDR?

Yup!

Is actValue the raw values before being encoded?

Yup!

1 Like

@scott

Yes and in the case of numeric inputs, the classifier requires that they are separated into discrete buckets and bucketIdx would be the index of the actual value (while the actual numeric value is passed as actValue)

So the ‘compute’ method is not to be iterated for each sample, but each value in each sample?

If you take the iris data for example, [5.0,4.1,1.3] is a sample, are you saying bucketIdx for 5.0 is 1…4.1 is 2…1.3 is 3, and the actual value for each compute step is 5.0, 4.1, 1.3?

What is a sample? And what are you trying to predict? All three values?

The NuPIC API doesn’t handle predicting multiple fields very well. The bucketIdx and actValue are for a single predicted field and each model can only have one predicted field. If you want to predict all three values, the simple initial way to do so is have three separate models, each predicting a different value (but potentially with all three fields as inputs). You could drop down to the network API interface to have multiple classifiers in the same “model”, each predicting a different input value.

I’m trying to predict a class label. Each sample is in the form of a list of floating-point values [5.4, 1.2, 3.5] along with an associated category label. I was using the MultiEncoder to encode the list into a single SDR ([5.4, 1.2, 3.5]) and train the model on those SDRs with a given label.

Doing the same thing for the test data… the desired output is a probability distribution predicted for each category of the sample, so as to compute the log loss of the tests.

I did this for the KNN classifier and it worked extremely well with very little effort.

It sounds like you should be able to just pass the class label as both bucketIdx and actValue and a single model will work fine. Do you have problems with this?

I think I understand now, I believe I was doing that, but I was getting the same class prediction for every sample so I assumed I was doing something wrong.

It appears though that is because all of my values for SDRs have the exact same active indices. I’m looking into how get some diversity out of the encoders. I think my resolution is wrong.

Ahh got it. If you have trouble finding good encoding parameters then feel free to post which encoder you are using and the range of values and I can suggest some parameters.

I’m using the MultiEncoder to combine these:

 {'petal_length': {'clipInput': True,
                      'fieldname': 'petal_length',
                      'maxval': 6.9000000000000004,
                      'minval': 1.0,
                      'name': 'petal_length',
                      'resolution': 3,
                      'type': 'ScalarEncoder',
                      'w': 41},
     'petal_width': {'clipInput': True,
                     'fieldname': 'petal_width',
                     'maxval': 2.5,
                     'minval': 0.10000000000000001,
                     'name': 'petal_width',
                     'resolution': 3,
                     'type': 'ScalarEncoder',
                     'w': 41},
     'sepal_length': {'clipInput': True,
                      'fieldname': 'sepal_length',
                      'maxval': 7.9000000000000004,
                      'minval': 4.2999999999999998,
                      'name': 'sepal_length',
                      'resolution': 3,
                      'type': 'ScalarEncoder',
                      'w': 41}}

The resolution value is too high here. I’d recommend dropping the resolution entirely and adding 'n': 121 or similar. I’d also change w to be around 21. So something like this:

'petal_length': {'clipInput': True,
                      'fieldname': 'petal_length',
                      'maxval': 6.9000000000000004,
                      'minval': 1.0,
                      'name': 'petal_length',
                      'n': 121,
                      'type': 'ScalarEncoder',
                      'w': 21},

(and similar changes to the other two encoders as well)

You can play with the value of n within a range of around 31 to 201 (assuming w=21 and min/max the same)

The way you have it right now, it chooses the number of total bits, n, based on the min, max, w, and resolution but it comes out too small with such a big resolution so encodings for the petal_length only have three possible values. Here is a sample encoding for petal_length with the parameters you listed:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0], dtype=uint8)

I’m getting some really terrible results for this classifier, the data here is spatial, it’s not temporal at all, is this a bad classifier to use for something like this?

This classifier, or any similar, should work fine on your problem. Like most things in machine learning, any component not properly configured can ruin the results. When the encoder parameters were bad in your setup, for instance, it wouldn’t have mattered how much you tweaked the other parts, you would always get bad results. So you really need to validate each step in your setup to make sure they are performing correctly.

But you said the KNN classifier worked well for you so I’d recommend sticking with that (“if it ain’t broke…”). I am curious how that worked given the bad encoder parameters you had before but the bottom line is that if it worked well then I’d stick with it!