Small-size encoders

mcleverley · September 29, 2020, 8:36pm

Trying to encode a bunch of scalars: SMART stats for hard-disk health metrics, eventually hoping to predict failure in advance with temporal memory / anomaly. Used the hotgym.py example as a base structure.

I’ve tried defining several unique ScalarEncoders or RDSEs (concatenating together into one encoding for SP feed), but they crash if I put the bit_size allocation too low for any of them - somewhere under 30ish for each SE or RDSE.

This is a little odd for me, since some of the features from this dataset are strings (~45 unique categories, i used pandas’ categorical coding to clean this for scalar encoder) and one is just True/False, so allocating minimum 30 bits is a huge computational waste.

However it runs fine (just slow) when I give “enough” bitspace to each feature encoder. My code is up on my Github here:

# failure 1-bit encoder to minimize SDR size. we only need one bit to encode 1 or 0, after all
fail_params = ScalarEncoderParameters() 
fail_params.minimum = 0
fail_params.maximum = 1
fail_params.size = 30 # if i go under 30, it crashes: CHECK FAILED: "args_.activeBits > 0u" 
    # this is a pretty awful problem, considering i'm wasting 29 bits of calculation
    # i'm probably missing some configuration in a setting somewhere
fail_params.sparsity = 0.02
# fail_params.resolution = 0.88
fail_encoder = ScalarEncoder(fail_params) # it probably could just be a ScalarEncoder, not RDSE ...

Wonder if it’s something to do with min/max range declarations or something similar. Has anyone had any success with few-bit scalarEncoders or RDSEs / successful code I could compare with?

sheiser1 · September 29, 2020, 9:20pm

Where is this class defined?
Haven’t seen any issue like this before, so may or may not be any help.

dmac · September 30, 2020, 2:24am

The ScalarEncoderParameters can be somewhat difficult to work with because they cover many use cases, and some of the options are mutually exclusive.

Try using the following parameters:

p = ScalarEncoderParameters()
p.category = True
p.minimum = 0
p.maximum = 1
p.activeBits = 1
ScalarEncoder(p).size # size == 2

In order to use this encoder, you will need to convert your boolean numbers into integers in the range [0, 1], like this:

import htm.bindings.sdr
x = ScalarEncoder(p)
x.encode(int(False)).dense # returns: array([1, 0], dtype=int8)
x.encode(int(True)).dense # returns: array([0, 1], dtype=int8)

mcleverley · October 2, 2020, 8:35pm

This worked terrifically, thanks. I wonder if it’s because I specified sparsity=0.02, which probably doesn’t calculate well with 5 bits (compared to 30). params.activeBits=X seems a lot cleaner when I think back to the HTM school example of encoders having maxBits & activeBits on for any given data input.

Is there a computational reason we need 2 bits (size did indeed check to 2 when i ran this) to encode a boolean feature? I would guess a size=1 “bit_inactive” = False and “bit_active” = True, but I’m surely missing something important.

dmac · October 2, 2020, 9:27pm

Possibly not for your example.
But in general, yes, you need lots of bits for HTM stuff to work correctly. Reducing the number of bits to the absolute minimum might not yield good results.

CollinsEM · October 3, 2020, 4:07am

For HTM, and possibly other network architectures, it’s best to give input vectors roughly the same activation energy (i.e. the same number of on bits) for all valid states. That way the network isn’t biased towards some inputs simply because they have more active bits than others. That’s why we don’t use the dense binary representations of chars, ints, and floats as inputs. The semantics are in the bit positions, which provides the opportunity for more robust representations of semantic overlap.

Topic		Replies	Views
Community CategoryEncoder in python exist? NuPIC	8	858	April 12, 2019
Some beginner questions Engineering	1	695	September 23, 2016
RDSE implementation is wasting buckets? Numenta Theory	11	2658	July 10, 2019
Which scalar encoder should you choose? Engineering encoders	2	1004	January 12, 2018
In Random scaler encoder what does the parameter resolution do? NuPIC Community Fork question	9	1202	March 31, 2020

Small-size encoders

Related topics