2D MNIST Encoder

Hi I did some experiments involving SDRClassifier and a few encoders.
SpatialPooler, FlyHash Encoder and a newer FH 2D Encoder which combines idea of random projection from fly hash with spatial proximity.
Which means two neighbor pixels in an image contribute with similar projections in the encoder, while two distant pixels have relatively dissimilar (random) projections

In order to get comparable results with other ML classification tasks, I’ve modified the HTM MNIST example from an online style to a “batched” operation:

  • First stage is encoding the whole MNIST (60k train + 10k test) dataset into SDRs with specified encoder and parameters.
  • Second use the SDRClassifier for a variable number of epochs till I got converging results.

One most important point was that classification results are heavily dependent on the amount of information available, mostly the number of 1 bits in its input SDRs.

In order to compare the three encoders (SP, FH, FH_2D) they were tuned to produce similar SDRs in terms of size and solidity. Since I could not do much in controlling SpatialPooler solidity, I simply parametrized the other two encoders to output SDRs at the average solidity reported by SP.

Results were that while SpatialPooler slightly outperforms Fly Hash encoder, the 2D encoder is way ahead.

Sample results:

77/1024 bits SDRs (77 ON Bits) corresponding to (32,32) columnDimensions in SP, with potentialRadius: 7

Classifier epochs on pre-computed encodings: 30

FH Encoder: 94.03%
2 x SpatialPooler: 94.34%
FH_2D_Encoder : 96.13%

2x SpatialPooler means I fed it twice the x_train dataset in learning mode. It produces slightly better results over a single pass.

611/6241 bit SDRs - which corresponds to (79,79) columnDimensions and potentialRadius of 11

FH Encoder: 97.13%
SpatialPooler: 97.24%
FH_2D Encoder: 97.86%

I will also post the script, currently there are dependencies on modules that are still changing/unstable.


If you are interested in the low end:

31/400 SDRs - corresponding to (20,20) columnDimensions

FH Encoder: 89.74%
Double Spatial Pooler: 90.24%
FH_2D Encoder: 93.45%

For reference, 30 epochs of SDRClassifier on B/W, unencoded 28x28 digit images was 91.82% with 120/784 bit SDRs.

Which means the 2D encoder was significantly more useful with significantly less data.

@cezar_t very interesting results with new 2Dencoder

It might work well only with MNIST style of images though.
Thick lines, simple figures, no grays

Could you pls explain how do they work? Thanks

Ok, is a bit of talk. Let’s say there are three stages, their names sound cryptic but I need to explain each stage.
In reality the execution order is different, here is the… “semantic” order. 1. is the last stage to encode the whole dataset, 2. prepares the encoder and 3. computes “balancing weights”

  1. Running a dense matrix projection of the image pixels.
    With a random matrix RM of shape e.g. 28x28x1000 floats applied to any 28x28 image it would produce a 1000 bit “Fly Hash” SDR . Algorithm is simple - for every pixel in the input image e.g. row 5x col 7 I multiply its value with the RM[5,7] and add the result to a total buffer.
    That is simply a dot product between image and RM
    The resulting 1000 floats vector I multiply it with “ballancing vector” (see 3. below) then select the highest P values e.g. 50 for 50/1000 resulting sparsity.
  2. but, in order to make it sensitive to position in 2D (28x28) input space I first start with a 34x34x1000 random matrix and convolve every 7x7x1000 submatrix into a single 1000 bit line in the the final 28x28x1000 marix.
    So instead of applying convolutions to every image I convolve the encoder itself. Then use it as in fly hash (random projection) encoding described above. This makes adjacent lines in the RM more “close” to each other. I haven’t tested much with different size kernels, 7x7 seemed quite lucky :stuck_out_tongue:
  3. Final preparation stage I can call it “VU-meter balancing” this uses a subset of the dataset to figure out which values in the 1000 long dense vector need to be “stretched” or “compressed” in order to avoid having some bits in the SDR being over-represented and other under-represented. So if total activations average 100 / each sdr bit and some bit activates 125 times other 70 times these weights are adjusted so the have more even chances (each activating approx 100 times)
    This balancer Is ran on only 5000-10000 images once before actual encoding. The number is picked so the number of samples is significant yet, being quite costly, doesn’t impact too much overall encoding time.

I hope it makes sense.

@cezar_t thanks for your explain. That looks like the patch encoder of Numenta, where the matrix RM plays the role of receptive field.