SDR Classifier question

JustAnEngineerO · December 14, 2017, 6:47pm

c.compute(recordNum=0, patternNZ=[1, 5, 9],
                  classification={"bucketIdx": 4, "actValue": 34.7},
                  learn=True, infer=False)

In the SDR classifier, am I right to assume the following?

bucketIdx is an index for the target class label while
patternNZ are the active indices of the encoded raw data in an SDR?

classification –
Dict of the classification information where:
bucketIdx: list of indices of the encoder bucket
actValue: list of actual values going into the encoder
Classification could be None for inference mode.

What exactly is “actValue”, the documentation says it is a list, but in the snippet given a float is used. Is actValue the raw values before being encoded?

Thanks!

scott · December 14, 2017, 7:43pm

bucketIdx is an index for the target class label while

Yes and in the case of numeric inputs, the classifier requires that they are separated into discrete buckets and bucketIdx would be the index of the actual value (while the actual numeric value is passed as actValue)

patternNZ are the active indices of the encoded raw data in an SDR?

Yup!

Is actValue the raw values before being encoded?

Yup!

JustAnEngineerO · December 14, 2017, 7:59pm

@scott

Yes and in the case of numeric inputs, the classifier requires that they are separated into discrete buckets and bucketIdx would be the index of the actual value (while the actual numeric value is passed as actValue)

So the ‘compute’ method is not to be iterated for each sample, but each value in each sample?

If you take the iris data for example, [5.0,4.1,1.3] is a sample, are you saying bucketIdx for 5.0 is 1…4.1 is 2…1.3 is 3, and the actual value for each compute step is 5.0, 4.1, 1.3?

scott · December 14, 2017, 9:14pm

What is a sample? And what are you trying to predict? All three values?

The NuPIC API doesn’t handle predicting multiple fields very well. The bucketIdx and actValue are for a single predicted field and each model can only have one predicted field. If you want to predict all three values, the simple initial way to do so is have three separate models, each predicting a different value (but potentially with all three fields as inputs). You could drop down to the network API interface to have multiple classifiers in the same “model”, each predicting a different input value.

JustAnEngineerO · December 14, 2017, 9:33pm

I’m trying to predict a class label. Each sample is in the form of a list of floating-point values [5.4, 1.2, 3.5] along with an associated category label. I was using the MultiEncoder to encode the list into a single SDR ([5.4, 1.2, 3.5]) and train the model on those SDRs with a given label.

Doing the same thing for the test data… the desired output is a probability distribution predicted for each category of the sample, so as to compute the log loss of the tests.

I did this for the KNN classifier and it worked extremely well with very little effort.

scott · December 14, 2017, 9:45pm

It sounds like you should be able to just pass the class label as both bucketIdx and actValue and a single model will work fine. Do you have problems with this?

JustAnEngineerO · December 14, 2017, 9:56pm

I think I understand now, I believe I was doing that, but I was getting the same class prediction for every sample so I assumed I was doing something wrong.

It appears though that is because all of my values for SDRs have the exact same active indices. I’m looking into how get some diversity out of the encoders. I think my resolution is wrong.

scott · December 14, 2017, 10:13pm

Ahh got it. If you have trouble finding good encoding parameters then feel free to post which encoder you are using and the range of values and I can suggest some parameters.

JustAnEngineerO · December 14, 2017, 10:45pm

I’m using the MultiEncoder to combine these:

 {'petal_length': {'clipInput': True,
                      'fieldname': 'petal_length',
                      'maxval': 6.9000000000000004,
                      'minval': 1.0,
                      'name': 'petal_length',
                      'resolution': 3,
                      'type': 'ScalarEncoder',
                      'w': 41},
     'petal_width': {'clipInput': True,
                     'fieldname': 'petal_width',
                     'maxval': 2.5,
                     'minval': 0.10000000000000001,
                     'name': 'petal_width',
                     'resolution': 3,
                     'type': 'ScalarEncoder',
                     'w': 41},
     'sepal_length': {'clipInput': True,
                      'fieldname': 'sepal_length',
                      'maxval': 7.9000000000000004,
                      'minval': 4.2999999999999998,
                      'name': 'sepal_length',
                      'resolution': 3,
                      'type': 'ScalarEncoder',
                      'w': 41}}

scott · December 14, 2017, 11:41pm

The resolution value is too high here. I’d recommend dropping the resolution entirely and adding 'n': 121 or similar. I’d also change w to be around 21. So something like this:

'petal_length': {'clipInput': True,
                      'fieldname': 'petal_length',
                      'maxval': 6.9000000000000004,
                      'minval': 1.0,
                      'name': 'petal_length',
                      'n': 121,
                      'type': 'ScalarEncoder',
                      'w': 21},

(and similar changes to the other two encoders as well)

You can play with the value of n within a range of around 31 to 201 (assuming w=21 and min/max the same)

The way you have it right now, it chooses the number of total bits, n, based on the min, max, w, and resolution but it comes out too small with such a big resolution so encodings for the petal_length only have three possible values. Here is a sample encoding for petal_length with the parameters you listed:

array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1,
       1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 0, 0], dtype=uint8)

JustAnEngineerO · December 15, 2017, 12:38am

I’m getting some really terrible results for this classifier, the data here is spatial, it’s not temporal at all, is this a bad classifier to use for something like this?

scott · December 15, 2017, 6:50pm

This classifier, or any similar, should work fine on your problem. Like most things in machine learning, any component not properly configured can ruin the results. When the encoder parameters were bad in your setup, for instance, it wouldn’t have mattered how much you tweaked the other parts, you would always get bad results. So you really need to validate each step in your setup to make sure they are performing correctly.

But you said the KNN classifier worked well for you so I’d recommend sticking with that (“if it ain’t broke…”). I am curious how that worked given the bad encoder parameters you had before but the bottom line is that if it worked well then I’d stick with it!

Topic		Replies	Views
How the SDR Classifier Works NuPIC classification	10	2908	October 22, 2019
SDRClassifier will not compute! NuPIC	1	323	March 17, 2019
SDR Classifier without TM NuPIC Community Fork question	7	881	May 23, 2020
Why was the SDR Classifier's target distribution changed? NuPIC	2	541	September 17, 2018
NuPic : predicted SDR NuPIC	6	705	May 17, 2016

SDR Classifier question

Related topics