There was a paper indicating 4 to 16 demultiplexing occurring in the brain. That is also a sort of step function (4 binary inputs light up one of 16 outputs.) If I do 1 to 2 demultiplexing in single layer nets I get great generalization ability but rather slow training. If I do 4 to 16 demultiplexing I get less good generalization but much faster training. Obviously there are trade offs, and perhaps evolution picked one for us?