I was recently involved in the porting of NuPIC’s SDR Classifier to Java, for the HTM.Java project. When I began the process, I knew pretty much nothing about the SDR Classifier or neural networks. So, I took extensive notes while I learned about them. I’ve decided to curate my notes and present them in this blog post, in the hope that they will prove useful.
Awesome job! Thank you so much for sharing your learning experience and hard work with us! Your blog made it very easy to follow the inner workings of the SDRClassifier, and I really appreciate the care you took in keeping things very easy to understand!
and you also give a very clear example in ( Example Learning Case) . Can you add a more detail example from the beginging step( input space --> encoder )
Thank you @rhyolight. i will look that video. BTW i would like to ask
In general, amount of bucket is calculated by =n-w+1 ?
But in the RandomDistributedScalarEncoder , the active bits ( 1- bits) is not continuous ( sparkly) but distributed throughout the space. So how can i calculate the amount of bucket and bucket index (the algorithm of getBucketIndices) ?
In general, for the original simple scalar encoder, yes, the bucket calculation is simple like you mention (although I’m not sure on the exact formula).
Numbers within [offset-resolution/2, offset+resolution/2] will fall into the same bucket and thus have an identical representation. Adjacent buckets will differ in one bit.
resolution and offset are encoder parameters.
The RDSE keeps a map of buckets in memory for lookup (unlike the simple scalar encoder, which can be reverse-engineered). The map is created on encoder init.
The key to the RDSE bucket lookup is this little bit of code on line 210:
If I’m sampling each metric to set the resolution for its encoder, should I also be setting the offset from this sample too?
I see in the RDSE source that offset is set to the first value by default, but what if the first value happens to be an outlier? (far from the median of the metric’s distribution). Wouldn’t that skew the resulting encoder from then out?
I see that the
_fixupRandomEncoderParams()
function in
getScalarMetricWithTimeOfDayAnomalyParams()
only addresses the resolution, but I wonder if an RDSE would be more robust if the offset were determined from a sample rather than the first value alone.
NEW: For each metric, set RDSE offset equal to its ‘50%’ value (representing the median). This seems more robust than just using the first value for each metric as default.
If anyone sees any issues or caveats to that please let fly!