It’s kind of weird that you are mixing decision making and construction of the output together in conventional artificial neural networks.
The ReLU switch decisions are based on boosting of many weak learners in the prior layer that boil down to looking specific dot product correlations in the input. It seems less than ideal to muddle that in with construction of the output response. Perhaps it would be better to do the 2 things as separate streams by having each ReLU switch 2 separate dot product systems. On starting with the input data and making decisions based on dot products with that. The other starting with an all 1s vector and leading to the output with its own independent system of dot products. The common thing being the switching decisions.
1 Like