Conditional locality sensitive hashing associative memory

I wrote this note about an idea I had:

After reading this Memory Mosaics paper on Arxiv where they replace neural networks in transformers with associative memory:


Thanks for the paper reference, it is quite interesting.

Regarding your note on using a hyperplane to split data points in two equal sized subsets, I ran a small experiment with a slightly different purpose - to chose P out of N routes (or “clusters”) for e.g. a MoE with P=4/N=32 classifiers.

The problem there (as in your case) is to have a means for each plane to cut evenly (whatever the desired ratio is 1/2, 1/8 etc) through the data space.
The algorithm can start with planes not entirely random but passing through the centroid of a sufficiently large number of points then shifting them “up” or “down” iteratively until you get a desired split ratio.

1 Like