Auto ResNet

SeanOConnor · June 6, 2019, 12:59pm

Some activation functions in neural networks directly cause information loss.
ReLU will lose 50% of the information, the threshold function is even worse causing a reduction to 1 bit. Even invertible activation functions when used in conjunction with the weighted sum cause information loss due to magnitude mismatches.
The result is input information is lost too early before it can be used, output information cannot be composited over a number of layers because it is chopped up as it goes along.
One solution is ResNet where you are decide in advance how much information routing there is and where it goes.
If you use a zero switched slope activation function f(x)=a.x x>=0, f(x)=b.x x<0 the system has the option to let a=b=1 which lets information through unscathed. The system then has to option to decide information routing and act like a ResNet to whatever extent is necessary.
That activation function also works very well with evolution based network optimization because there is no quantization.
Also if you use the zero switched slope activation function with random projection neural networks the a’s and b’s become directly the system weights.

Topic		Replies	Views
Nice video on ReLU activation function Lounge	0	620	June 14, 2019
Single layer neural net with 2 way nonlinearity Lounge	4	804	June 24, 2016
Why you should use wide neural networks Machine Learning	9	3776	November 3, 2017
And there they stopped Machine Learning	1	910	March 16, 2020
ReLU neural networks as amplitude modulated dictionaries Lounge	7	734	September 9, 2021

Auto ResNet

Related topics