OpenAI Paper Review: GPU Kernels for Block-Sparse Weights

Tomorrow morning:


Some sort of locality sensitive hashing to switch in and out groups of parameters in various possible ways is a cheap way of boosting performance.

There is even a question of whether you need matrix operations at all except for the synthetic ones that are effected by some fast transform algorithms.

It would be amusing and not amusing at the same time if there are currently clusters of hundreds of GPUs burning up serious electrical power to run a bubble sort level O(n^2) algorithm where a O(nlog(n)) algorithm might do instead.

Anyway the paper is a reminder to try LSH parameter switching with fixed filter bank neural networks and see what happens. It would be unfortunate if it interfered with evolutionary training algorithm I use, I’m not sure what will happen.