I wrote this note about an idea I had:
https://archive.org/details/conditional-lsh-am
After reading this Memory Mosaics paper on Arxiv where they replace neural networks in transformers with associative memory:
I wrote this note about an idea I had:
https://archive.org/details/conditional-lsh-am
After reading this Memory Mosaics paper on Arxiv where they replace neural networks in transformers with associative memory:
Thanks for the paper reference, it is quite interesting.
Regarding your note on using a hyperplane to split data points in two equal sized subsets, I ran a small experiment with a slightly different purpose - to chose P out of N routes (or “clusters”) for e.g. a MoE with P=4/N=32 classifiers.
The problem there (as in your case) is to have a means for each plane to cut evenly (whatever the desired ratio is 1/2, 1/8 etc) through the data space.
The algorithm can start with planes not entirely random but passing through the centroid of a sufficiently large number of points then shifting them “up” or “down” iteratively until you get a desired split ratio.