Some activation functions in neural networks directly cause information loss.

ReLU will lose 50% of the information, the threshold function is even worse causing a reduction to 1 bit. Even invertible activation functions when used in conjunction with the weighted sum cause information loss due to magnitude mismatches.

The result is input information is lost too early before it can be used, output information cannot be composited over a number of layers because it is chopped up as it goes along.

One solution is ResNet where you are decide in advance how much information routing there is and where it goes.

If you use a zero switched slope activation function f(x)=a.x x>=0, f(x)=b.x x<0 the system has the option to let a=b=1 which lets information through unscathed. The system then has to option to decide information routing and act like a ResNet to whatever extent is necessary.

That activation function also works very well with evolution based network optimization because there is no quantization.

Also if you use the zero switched slope activation function with random projection neural networks the a’s and b’s become directly the system weights.

1 Like