I have a simplification of some ideas I’ve worked on.
You have some input vector X, unusually you weight each element in the vector and apply the fast Walsh Hadamard transform (WHT.) You regard each output of the WHT as the result of the summing procedure of a 'normal" artificial neuron. Then you just apply one of the usual non-linear activation functions, and there you have your neuron output. What are you getting here? A fully connected neuron in touch with all the inputs while only using one effective weight. You are getting weight compression/sharing.
An objection is weights may become zero during training and then you completely lose information from the associated input element. However for technical reasons you should use wide networks anyway. If you double the width of the neural network and replicate the input then there are more ways to preserve information.
That is extremely simple but it takes some experience and insight gained from experience to consider it a sensible thing to try.
You would have to go into the details yourself to see that it makes substantial sense.
I’ll try out the idea soon. I have some game code to write too.
Uber have a paper about weight compression with neural networks and it is generally known that deep networks can be compressed 1:50 without much loss of accuracy.
https://en.wikipedia.org/wiki/Fast_Walsh%E2%80%93Hadamard_transform