I have now put together a delta encoder that I think works as expected (or which at least seems reasonable in its context) and would appreciate some feedback and a bit of guidance before I move on to the next stage:

Setup:
n = 2048
sparsity = 0.01
w = 20
Range = 232 (min = 0, max = 232)
Buckets = 232 (i.e. buckets = range)

Q1: is this a reasonable encoding framework?

Q2: I’m still unsure how I use buckets or if the above is reasonable.
The data that I’m basing my encoder on has a mean of about 4.9 and a sd of 6.6. so it’s heavily skewed towards zero (differences are coerced to be only positive). A graph looks like this:

Bit ranges:
With how it’s set up at the moment, I’m getting the following SDR breakdown:
Bits 0-20 represent 0-10 (I’ve bounded it at the top and bottom values) and each successive value is represented by moving 1 bit further along - so 11 is represented by bits 1-21, 12 by bits 2-22 and so on.

Rob, I’m not sure you’re going about it the right way. A delta encoder should be very similar to a scalar encoder. Except you’re encoding the difference between two scalar values, not the values themselves. Here are some references:

The delta is what is being encoded - so, I get the delta (this value - last value) and coerce it to be positive (taking the square root of the squared value). This is the value that is sent to the encoder. In delta.py it appears that the initial delta is made = 0 whereas I’m ignoring it completely. The graph is to show the distribution of the deltas (i.e. x = delta values and y = count). As you noted, it has a very long tail which I need to account for in the encoding.

I have read Encoding Data for HTM Systems multiple times but I guess at some point I got confused and applied sparsity to the encoder. Does it matter though if the encoding itself is in a sparse array?

In delta.py there’s a line of comment that reads:
“It returns an actual value when decoding and not a delta”. Does this mean that I need to somehow encode the value? The end-point with my implementation is to look for outliers in the pattern of changes themselves regardless of underlying value.

In Building HTM Systems in the Encoder section you state that:

Data must be encoded into sparse binary arrays for an HTM system to process it. These binary arrays define an input space for the Spatial Pooling algorithm

As I understand the encodings can vary in sparsity, as long as they still satisfy the requirements for encoders in Encoding Data for HTM:

Semantically similar data should result in SDRs with overlapping active bits.

The same input should always produce the same SDR as output.

The output should have the same dimensionality (total number of bits) for all inputs.

The output should have similar sparsity for all inputs and have enough one-bits to handle noise and subsampling.

Encoders always (in my experience) have many more 0’s than 1’s, though they are not as sparse as the Sparse Distributed Representations (SDR’s) output from the Spatial Pooler and passed into the Temporal Memory, which are strictly limited to 2% activation.

It also seems like you could preprocess the raw data by differencing, so the new raw data would be the differences themselves and you could just use a scalar encoder from there. Smart to write your own encoder though regardless.

Thanks Sam - yes I’m pre-processing as the data stream (of raw values) arrive - so only the non-NA deltas get passed to the encoding function. I guess that at some point I confused “encoders are sparse” with “encoders output an SDR” (per the Spatial Pooler).