Crossover, temporal decorrelation and mixability

You have a modular learning system. You create a twin for each module. During training one twin is active as part of the system, and one inactive and switched out of the system. The choice between the 2 being random.
I don’t know what would happen with HTM if you did that. However for neural nets you may be able to avoid having to train on mini-batches (which is messy and stupid). Very likely the twins would converge to the same function (even if the internal parameter were different.) That is because mixability would end up being enforced by optimization. There could be other interesting effects as well. I’m just tying out the concept at the moment. It seem to be working at least for neural nets that are trained by evolutionary strategies (ES).