I’m watching this video about over-parameterization in neural networks:

https://youtu.be/XfzlCYHkhmI

To which I added this comment (using my daughter’s account because I’m banned forever by that site.)

“If each layer is normalized to a constant vector length (or nearly so) then the weighted sum in each neuron is an associative memory (AM.) It can recall n patterns to give n exact scalar value outputs. The non-linearity separates patterns that are not linearly separable. Thus you have a pattern of AM, Separation, AM, Separation… It is not surprising in that light that if you over-parameterize the AM it will simply result in repetition code error correction which is more likely to be helpful than harmful…”

I think I demystify better! Or at least I can explain every point in the video to myself in a way that impresses itself as fully coherent to my neural processing unit.