SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

vpuente · March 11, 2020, 4:22pm

This might be interesting. It uses an interesting form of “sparsity” to alleviate the computational cost of DL.

More info for LSH in

Can be this used for a better encoder???

SeanOConnor · March 12, 2020, 11:21pm

I think they did use a 44 core CPU which I would consider in the same bracket as a GPU. Though they didn’t use the full CPU single instruction multiple data instruction set. Or maybe they did unknowingly as their compiler autovectorized their code. Even the Java just in time hotspot compiler will autovectorize these days.

Anyway I am a big fan of random projection/locality sensitive hashing.
You can just do HD or HDHD etc. Where H is a matrix operation that in practice is replaced by the fast Walsh Hadamard transform and D is some random diagonal matrix with +1,-1 entries. That gives a random projection by sequency (similar to frequency) scrambling.
If you binarize the output of HD you have a fast locality sensitive hash.

Mild hashing (eg. HD) followed by nonlinearity (eg. binarization) allows the weighted sum to act as a general associative memory.
https://ai462qqq.blogspot.com/2019/11/artificial-neural-networks.html

SeanOConnor · April 1, 2020, 9:49am

I did a JS version of a Fixed Filter Bank neural network. You keep the neural network weights fixed (by using a fast transform for them) and adjust the non-linear activation functions instead:
View: https://editor.p5js.org/siobhan.491/present/Bgk9KvmMn
Code: https://editor.p5js.org/siobhan.491/sketches/Bgk9KvmMn

All the hardware and libraries neural network researchers use these days are adjusted and specialized for conventional O(n*n) nets. And all their high level work like GANs, ReNet etc. are based on the characteristics of that sort of net.
The are 90% locked in already. It would be rather an upheaval to have to go down into the basement, replace the broken light bulb, look around and tidy up the mess.

cezar_t · April 16, 2021, 6:39pm

Here-s an updated paper on SLIDE, they report more improvements and provide proofing code on github.

Since the algorithm speculates sparse activation for each layer, I wonder if similar optimizations could be applied to HTM

If such techniques could scale on networked machines instead of cpu cores only, then lots of possibilities are opened.

Topic		Replies	Views
Algorithmic Speedups via Locality Sensitive Hashing & Bio-Inspired Hashing - September 8, 2021 Current Research	3	442	September 23, 2021
OpenAI Paper Review: GPU Kernels for Block-Sparse Weights Current Research live , journal-club	3	1061	August 20, 2019
Fast Large Associative Memory Lounge	1	436	June 6, 2020
Some Aspects of Backpropagation Lounge	1	74	February 4, 2025
Associative Memory via Predicitve Coding Lounge	25	1634	November 2, 2021

SLIDE : In Defense of Smart Algorithms over Hardware Acceleration for Large-Scale Deep Learning Systems

Related topics