How the SDR Classifier Works

Hi all. I just finished writing up a blog post on the SDR Classifier. Just wanted to share it here.

Excerpt of post:

I was recently involved in the porting of NuPIC’s SDR Classifier to Java, for the HTM.Java project. When I began the process, I knew pretty much nothing about the SDR Classifier or neural networks. So, I took extensive notes while I learned about them. I’ve decided to curate my notes and present them in this blog post, in the hope that they will prove useful.

15 Likes

Andrew (@Hopding) ,

Awesome job! Thank you so much for sharing your learning experience and hard work with us! Your blog made it very easy to follow the inner workings of the SDRClassifier, and I really appreciate the care you took in keeping things very easy to understand! :tada:

1 Like

Thank you very much . Your post is very useful and value for me.

Btw , can you explan more detail the pink components ( bucket index, record number, other Fields, Predicted Fiels) in your figure below

and you also give a very clear example in ( Example Learning Case) . Can you add a more detail example from the beginging step( input space --> encoder )

Thank you very much.

You might enjoy watching this video explanation of the SDR Classifier.

1 Like

Thank you @rhyolight. i will look that video. BTW i would like to ask

In general, amount of bucket is calculated by =n-w+1 ?

But in the RandomDistributedScalarEncoder , the active bits ( 1- bits) is not continuous ( sparkly) but distributed throughout the space. So how can i calculate the amount of bucket and bucket index (the algorithm of getBucketIndices) ?

Thank you very much

Is there anyone help me anwser the question above ? thank you very much

I don’t know, but I would look at the RDSE tests first.

Hi @life_happy,

In general, for the original simple scalar encoder, yes, the bucket calculation is simple like you mention (although I’m not sure on the exact formula).

As for the RDSE, from the source:

Numbers within [offset-resolution/2, offset+resolution/2] will fall into the same bucket and thus have an identical representation. Adjacent buckets will differ in one bit.

resolution and offset are encoder parameters.

The RDSE keeps a map of buckets in memory for lookup (unlike the simple scalar encoder, which can be reverse-engineered). The map is created on encoder init.

The key to the RDSE bucket lookup is this little bit of code on line 210:

 bucketIdx = (
     (self._maxBuckets/2) + int(round((x - self._offset) / self.resolution))
 )

Does that help?

2 Likes

If I’m sampling each metric to set the resolution for its encoder, should I also be setting the offset from this sample too?

I see in the RDSE source that offset is set to the first value by default, but what if the first value happens to be an outlier? (far from the median of the metric’s distribution). Wouldn’t that skew the resulting encoder from then out?

I see that the

_fixupRandomEncoderParams()

function in

getScalarMetricWithTimeOfDayAnomalyParams()

only addresses the resolution, but I wonder if an RDSE would be more robust if the offset were determined from a sample rather than the first value alone.

Thanks!

Yes, I think you’re on to something. Good idea on the sampling solution!

I wonder if that’s why offset was offered as a parameter in the first place.

1 Like

Alright thanks @brev!

So here’s my current logic (seeking all critiques!):

  1. Sample first n rows in stream, accumulating n values for each metric.

  2. Get basic summary statistics for the metrics, something like:

  3. For each metric, set RDSE minVal & maxVal to its ‘5%’ and ‘95%’ values (representing 5th & 95th percentile). The resolution is then calculated by:

resolution = max(minResolution,
(maxVal - minVal) / encoder.pop(“numBuckets”)
)

  1. NEW: For each metric, set RDSE offset equal to its ‘50%’ value (representing the median). This seems more robust than just using the first value for each metric as default.

If anyone sees any issues or caveats to that please let fly!

1 Like