One reason the fast Walsh Hadamard transform may be useful in neural networks is more efficient (less correlated) parameter usage. Looked at here from the 2 path ReLU perspective:
https://archive.org/details/technical-note-whd-mixing-with-two-path-re-lu