You can view the hidden layers in a deep neural network in an alternative way.

First a nonlinear function acting on the elements of an input vector. Then each neuron is an independent weighted sum of that small/limited number of non-linearized elements.

An alternative construction would be to take multiple invertible (information preserving) random projections of the input data (each giving a different mixture of the input data.) Then apply the nonlinear function to every element of those. Then using the dimension increase give each independent weighted sum an independent set of non-linearized values, instead of sharing just one small set.

What’s the difference? The network layer has the same number of weights either way. I’ll have to think about it.

In the biological brain you likely won’t find such structured non-linearity sharing as in current deep networks. If you are looking for a difference.