Why you should use wide neural networks

Thanks for the links. It is amazing that current deep neural networks don’t let you dial how much non-linearity to use in the activation function. Especially when you consider the compounding of non-linearity as the data goes through a number of layers. I get a very good improvement by controlling that parameter to match the number of layers and the problem.
With the proviso that I only have the hardware/time to explore simple test problems.