Lossy mnist encoder

I found a way to make a spatial pooler-like sdr encoder which is reversible. and by reversible I mean the weight matrix is symmetric and you can just transpose it to get the decoder.

this encoder is trained in a way similar to spatial pooler but has scalar weights instead of integers, its loosely inspired by Hinton’s greedy learning algorithm and it finds features rather quickly.

The algorithm goes as follows:

  • initialize the weights randomly between a small range like (-0.01, 0.01)
  • for each training example:
    • run the image forward through the matrix and apply a WTA (with boosting) at the output to get a SDR
    • run the SDR backwards through the matrix to get a reconstruction estimate of the image.
    • set reconstructed pixels with negative values to zero.
    • subtract the original image from reconstruction to get a difference
    • subtract from weights the outer product of the difference and the resulting sdr times a learning rate.
    • rinse and repeat

results:

top = original, middle = sdr, bottom = reconstructed.
image image

imageimage

some features learned:
image image
image image
image image

6 Likes

That’s a nice little auto-encoder. It looks like the main differences between your approach and a more traditional auto-encoder are that yours is both shallow and wide. In other words, you are using only one hidden layer, and that layer encodes the latent variables/features. By enforcing k-WTA, you are essentially forcing each image through a bottleneck (similar to traditional auto-encoders), but on effectively different sub-networks.

Are the latent variables scalar values? Or do you force them to 0/1? Are the features at the bottom the result of passing a single latent variable (SDR size of 1) back through the decoder? If that’s the case, then it looks like your network has learned something akin to an orthogonal basis set (eigen-vectors) or perhaps a set of convolution filters.

2 Likes

I force the values to 0/1 but all features are updated simultaneously so they do affect each other in ways I dont completely understand.
I need to include bosting otherwise the features get stuck and all images end up producing the same sdr.

well, about that, it still kinda works if I make it shallow and narrow.

image image
image image

it seems to be learning some kind of on/off gabor filters/wavelets.
image image image

2 Likes

oh, sorry, I misinterpreted what you meant.
yea, those are the weights from a single bit, its basically doing a reconstruction from a single bit indeed.

1 Like

I guess that would mean that you have more data being encoded in the weight matrix than you need. The words that I would use to describe it are that the columns (or rows) of your weight matrix are encoding eigenvectors, and your latent variables are the eigenvalues. Once you have a set of eigenvectors that span all (or nearly all) of the space of possible MNIST digits, then the remaining weights are redundant. That’s not to say that additional weights are useless. Often, the redundant vectors will match different images preferentially and allow you to encode them with smaller SDR’s (fewer on-bits).

Since you are forcing your latent variables to 0/1, it would seem that the filters are adjusting to allow you to stack them in such as way as to obtain brighter pixels in regions where they overlap positive weights and darker pixels where they overlap a mix of signed weights, thus allowing them to cancel out.

1 Like

yea, thats what I think is happening but the particular shapes they choose are always a little funny and seem to have some gravitating effect where they force others to be similar instead of different, If I could get rid of that I think the encoder would have better performance.

1 Like

How can you guarantee the sparsity?
How fast is your algorithm?

1 Like

sparsity is enforced by K-WTA with boosting just like spatial pooler and it runs at around 2000 iterations per second for 1000/30 sparsity on my machine.

1 Like

@JarvisGoBrr thanks.
I am interested in how your encoder can be realized with HTM neurons or SP instead of using floating weights

1 Like

@thanh-binh.to starting with step 2:

  • run the SDR backwards through the matrix to get a reconstruction estimate of the image.

here-s a multiplication matrix HTM works with binary synapses attached to dendrites attached to cells grouped in columns, you would have a bit of a problem figuring it out, and is much worse than picking “floats” vs “ints” vs “bits”

1 Like

Perhaps you could create a separate weight matrix for each digit, or more generally use something akin to a gated network for the decoder.

  • Use the filters you have now to encode the image and generate a latent variable SDR.
  • Use this SDR as input to a linear classifier.
  • Based on the results of the classifier, pass the latent SDR into to a secondary weight matrix which is trained specifically to decode the latent variable SDR back into a image.

You should be able to do this with an arbitrary number of categories in the classifier, but you should have an equal number of decoder matrices that could be switched on based on the detected class.