Sparse Networks from Scratch: Faster Training without Losing Performance


I like the overview of sparse neural networks here:

And this video that is trying to tell you something:

Anyway a single neuron in a ReLU layer forward connects to n weights in the next layer. In the positive activation state (x>=0) ReLU is f(x)=x and the vector pattern defined by the n forward weights is projected with intensity x onto the next layer. In the negative activation state ReLU is f(x)=0, and nothing is projected onto the next layer.
That doesn’t seem great to me. Perhaps you should keep the positive activation state behavior and change the negative activation state behavior. Have an alternative set of forward connected weights for the next layer and project that vector pattern onto the next layer with intensity x when x is negative. That x is negative doesn’t really matter. Then the activation function is always f(x)=x, with the caveat that the activation function switches between 2 sets of forward connected weights in the next layer depending on x>=0 or not.