How Can We Be So Dense? The Benefits of Using Highly Sparse Representations

lucasosouza · July 8, 2019, 7:04pm

In dropconnect, at each round during training, a random set of weights is set to zero. During inference, all weights are used - it is still a dense model, since all connections have a weight attributed to it. A common interpretation of dropout techniques (but not the only interpretation) is that it allows you to learn several different models with one single network, so you are actually learning an ensemble of smaller networks that shares some parameters.

In the paper you cited, weights are sparse at initialization and at inference. But most important, what leads to robustness is not the sparse weights alone, but the combination of sparse weights and sparse activation functions (k-winners with boosting).

Topic		Replies	Views
Evading adversarial attacks with sparse representations and its implications NuPIC	2	650	January 13, 2021
Compare spatial pooling with other sparse methode Lounge question	5	465	November 14, 2019
Research on NN sparsity Lounge	10	609	February 19, 2023
The biological reason for sparsity? Numenta Theory	4	737	July 6, 2020
Numenta Technology Demonstration: Sparse networks perform inference 50 times faster than dense networks, with competitive accuracy Machine Learning	6	1216	November 12, 2020

How Can We Be So Dense? The Benefits of Using Highly Sparse Representations

Related topics