Encoding high dimensional vector data/ word embeddings

cezar_t · April 17, 2022, 12:53pm

You have best chances with an encoding that preserves similarity. I would try fly hash encoder, which roughly is a simple random projection from a (relatively) low dimension space of floats to a (relatively) higher dimensional space of sparse bits (== SDRs)

If you have no idea what I’m talking about here-s an article to begin with. I found this image quite relevant.

My attempt at implementing a simple one was quite satisfying on MNIST, if you want to try it on your data too, I can help explaining what it actually does, if the source seems too cryptic.

Topic		Replies	Views
SDR questions for image encoding (newbie) Engineering encoders , question	5	2410	December 15, 2016
How can I encode data with large number of categories? HTM.Java encoders , category-encoding	8	912	January 4, 2020
Would HTM be good for anomaly detection in a sensor network? Getting Started anomaly-detection , question	4	1117	February 19, 2020
I want advice on Using HTM for Anomaly Detection in Streaming Data Machine Learning	0	40	September 26, 2024
Anamoly detection with HTM NuPIC	2	819	January 22, 2018

Encoding high dimensional vector data/ word embeddings

Related topics