Hi,

I’m not sure this is a good term for it. Other term could be bit pair addressing.

One can think of it as a simple function that:

Takes any SDR (or vector of bits with a majority of bits 0) of size N with P 1 bits, P << N, and transforms it into a much larger and sparser SDR of size N*(N-1)/2 and P*(P-1)/2 one bits.

The output space represents all possible bit pairings in the input N bit space, and the 1 bit positions in the output vector are formed by all possible pairings of bits of value 1 in the input space.

A short example, having an input space of N=10 bits and P=4 (four active bits), or 4/10 input like:

Input in dense format: [0,1,0,1,1,0,0,0,1,0]

Input in sparse format: [1,3,4,8]

Output pairs: [(1,3), (1,4), (1,8), (3,4), (3,8), (4,8)]

We see there are 6 output pairs which is 4*(4-1)/2

The total number of possible pairs for N=10 is 10*(10-1)/2 = 45 possible pairs in the output space.

So the bit pair expansion transforms the 4/10 input to an 12/45, so sparsity decreases from 0.4 to 0.26 while the storage space increases proportional to N**2/2

When we work with more… “meaningful” SDRs the expansion is significant, e.g. a 50/1000 bit SDR expands to a “huge” 1225/499500 bit SDR.

Ok, I hope the above is a clear enough, the obvious following question is why such a redundant, wasteful encoding would be more “powerful” than the original one in any way?

If you try to feed that into a Spatial Pooler or Temporal Memory almost certainly would slow them dramatically, assuming the available RAM is not exhausted first.

Well, let’s begin with the SDR Classifier, from the HTM core package.

When trained directly against MNIST images (represented as a 784 bit SDR), the classifier plateaus at almost 92% accuracy.

When trained on the bit pair encodings of the same images, the same classifier will plateau at over 97.8% accuracy.

Otherwise said, failure rate drops almost four times from 8% to 2.2%

Just to be clear - bit pair expansion only creates an intermediate very wide (~300 kbits) and very sparse hidden layer between input and classifier, without weights, biases, no convolutions nor data augmenting, no parameters to tune, so no time wasted to train a first layer of ~240M potential weights, just a simple and fast expansion function, resulting in accuracy loss dropping from 8% to 2.2%

I think this result alone is reason enough to motivate a few of you to further research/investigate towards two key questions:

- why is this happening? like… what properties hidden in the input data are exposed, and which an output single layer classifier finds so useful.
- what other interesting SDR/ML machinery can be made based on this function: bit pair expansion