How the SDR Classifier Works

Hopding · October 9, 2016, 11:02pm

Hi all. I just finished writing up a blog post on the SDR Classifier. Just wanted to share it here.

Excerpt of post:

I was recently involved in the porting of NuPIC’s SDR Classifier to Java, for the HTM.Java project. When I began the process, I knew pretty much nothing about the SDR Classifier or neural networks. So, I took extensive notes while I learned about them. I’ve decided to curate my notes and present them in this blog post, in the hope that they will prove useful.

cogmission · October 10, 2016, 7:44am

Andrew (@Hopding) ,

Awesome job! Thank you so much for sharing your learning experience and hard work with us! Your blog made it very easy to follow the inner workings of the SDRClassifier, and I really appreciate the care you took in keeping things very easy to understand!

life_happy · September 29, 2019, 10:40am

Thank you very much . Your post is very useful and value for me.

Btw , can you explan more detail the pink components ( bucket index, record number, other Fields, Predicted Fiels) in your figure below

and you also give a very clear example in ( Example Learning Case) . Can you add a more detail example from the beginging step( input space --> encoder )

Thank you very much.

rhyolight · October 1, 2019, 5:49pm

You might enjoy watching this video explanation of the SDR Classifier.

life_happy · October 1, 2019, 10:33pm

Thank you @rhyolight. i will look that video. BTW i would like to ask

In general, amount of bucket is calculated by =n-w+1 ?

But in the RandomDistributedScalarEncoder , the active bits ( 1- bits) is not continuous ( sparkly) but distributed throughout the space. So how can i calculate the amount of bucket and bucket index (the algorithm of getBucketIndices) ?

Thank you very much

life_happy · October 18, 2019, 1:51pm

Is there anyone help me anwser the question above ? thank you very much

rhyolight · October 18, 2019, 4:17pm

I don’t know, but I would look at the RDSE tests first.

brev · October 19, 2019, 2:41am

Hi @life_happy,

In general, for the original simple scalar encoder, yes, the bucket calculation is simple like you mention (although I’m not sure on the exact formula).

As for the RDSE, from the source:

Numbers within [offset-resolution/2, offset+resolution/2] will fall into the same bucket and thus have an identical representation. Adjacent buckets will differ in one bit.

resolution and offset are encoder parameters.

The RDSE keeps a map of buckets in memory for lookup (unlike the simple scalar encoder, which can be reverse-engineered). The map is created on encoder init.

The key to the RDSE bucket lookup is this little bit of code on line 210:

 bucketIdx = (
     (self._maxBuckets/2) + int(round((x - self._offset) / self.resolution))
 )

Does that help?

sheiser1 · October 20, 2019, 10:53pm

If I’m sampling each metric to set the resolution for its encoder, should I also be setting the offset from this sample too?

I see in the RDSE source that offset is set to the first value by default, but what if the first value happens to be an outlier? (far from the median of the metric’s distribution). Wouldn’t that skew the resulting encoder from then out?

I see that the

_fixupRandomEncoderParams()

function in

getScalarMetricWithTimeOfDayAnomalyParams()

only addresses the resolution, but I wonder if an RDSE would be more robust if the offset were determined from a sample rather than the first value alone.

Thanks!

brev · October 21, 2019, 12:38am

Yes, I think you’re on to something. Good idea on the sampling solution!

I wonder if that’s why offset was offered as a parameter in the first place.

sheiser1 · October 22, 2019, 12:39am

Alright thanks @brev!

So here’s my current logic (seeking all critiques!):

Sample first n rows in stream, accumulating n values for each metric.
Get basic summary statistics for the metrics, something like:

30%20PM1078×322 24.6 KB
For each metric, set RDSE minVal & maxVal to its ‘5%’ and ‘95%’ values (representing 5th & 95th percentile). The resolution is then calculated by:

resolution = max(minResolution,
(maxVal - minVal) / encoder.pop(“numBuckets”)
)

NEW: For each metric, set RDSE offset equal to its ‘50%’ value (representing the median). This seems more robust than just using the first value for each metric as default.

If anyone sees any issues or caveats to that please let fly!

Topic		Replies	Views
SDR Classifier (am I using it right?) NuPIC question , classification	2	626	January 5, 2021
How to do classifier level? Numenta Theory	2	1200	May 14, 2017
SDR Classifier question NuPIC question	11	1090	December 15, 2017
SDR Classifier for spatial pooler label Engineering	12	1024	August 16, 2020
Sparse Distributed Representation Class NuPIC Community Fork	7	881	July 9, 2019

How the SDR Classifier Works

Related topics