I don’t think I explained resolution
very well.
The resolution
parameter is the smallest unit of data that you care about. Its units are those of the data set. A data point divided by the resolution
and truncated to an integer is its bucket number. The encoder will generate a single unique pattern for each bucket.
If your data contains categories then the resolution is 1 and each category has its own bucket so each has its own unique pattern.
As an example, if you have data that ranges from -1.00 to +1.00 and you care about increments of 0.02 (the resolution), then all values between 0.32 and 0.34 would be in the same bucket and result in the same pattern.
The entire range of numbers in your data (the maximum value minus the minimum value) divided by the resolution gives the total number of buckets. All potential buckets will occupy activeBits
bits in the output array. If there are too many buckets they may not fit in the output array so the resolution
parameter should be the largest value that you can get by with.
With the Scalar Encoder, each bucket is encoded directly into the output bits. There are times that you may have too many buckets or you may not know the maximum data value. In this case if the number of data values actually used are a reasonable quantity there is another encoder that can be used. The RandomDistributionScalerEncoder (or RDSE Encoder) makes a hash of the bucket number and uses the hash to generate the pattern.
As with any hash, care must be taken to make sure the width of the resulting hash value is small compared to the total number of unique values actually used to avoid excessive collisions.
With either encoder, remember that the output is a bit array but it is not really an SDR. It is the job of the Spatial Pooler to apply the sparsity and create a real SDR.