I suppose then that a deep neural network is a partition forest. Where each layer is funneling information to some among many branches going upward (forward) and such a process is being predominately driven by the dot product operation. The nonlinear activation functions only being there to break symmetries.
It might be unhelpful for a symmetry such as y=net(x), -y=net(-x) to exist. Such a net couldn’t learn y=net(x), z=net(-x), y not equal z.
How is a dot product nonlinear? If you take the dot product of some random vectors with a fixed vector the result will have a low magnitude in higher dimensions due to cancellation effects. Only a small subset of possible vectors will result in any significant output (a selective filter.)
Then you have a (fuzzy) decision forest whose branches are determined by dot products. Depending on the net topology after each branching all the results can be merged before another branching process. It is maybe a little more dense in some respects than an ordinary decision forest.