I found this video helpful:
https://youtu.be/QEWe-aRBUAs
Definitely the number of breakpoints a network can generate has an impact on its ability to fit curves. I would worry that a conventional artificial neural network is very breakpoint deficient because there is only 1 ReLU per n weight parameters in a network of width n. A very cheap way to generate large numbers of breakpoints is to use the intermediate calculations of the out-of-place Walsh Hadamard algorithm followed by a parameterized or weighted ReLU or switch slope at zero function. The cost then is one addition or subtraction operation per nonlinear function and an occasional predetermined random sign flip. Actually in real hardware the rate is limited by memory bandwidth, which can cut your dreams down to size mightily.
https://github.com/S6Regen/IntermediateWHT
1 Like