I did a little arithmetic regarding “no multiply operation” neural networks. Just using random sign flipping with WHT patterns of addition and subtraction to do random projections, followed by binarization and weighting. I would say you could fit 100 million integer add/subtract logic units (low transistor count, low power) on a current semiconductor chip. Clocked at 1 billion operations per second that would give you a 100 Peta operations per second for your network. I presume you could evolve deep neural nets in real time with such performance.
Also I noted that you may not need full integer precision since 2’s complement arithmetic wrap around would only introduce some additional nonlinearity that might not be a problem.
Some people are working on binary cellular automata type networks that could similarly fit more efficiently on silicon than say current GPUs and CPUs. And those could even be more interesting because they would have dynamic reservoir type properties.