Encoding high dimensional vector data/ word embeddings

You have best chances with an encoding that preserves similarity. I would try fly hash encoder, which roughly is a simple random projection from a (relatively) low dimension space of floats to a (relatively) higher dimensional space of sparse bits (== SDRs)

If you have no idea what I’m talking about here-s an article to begin with. I found this image quite relevant.

My attempt at implementing a simple one was quite satisfying on MNIST, if you want to try it on your data too, I can help explaining what it actually does, if the source seems too cryptic.

1 Like