Step activation functions

Recently I recognized more clearly that step activation functions in single layer neural networks give the best performance in terms of learning speed and separation of similar inputs.

I use the signof function:
fn(x) = 1, x>=0
fn(x) =-1, x<0

Or a soft version:
fn(x) = sqr(x), x>=0
fn(x) =-sqr(-x), x<0

There are very fast bit hack versions of the square root function if you need them.
Anyway this paper provides some justification:
https://www.ncbi.nlm.nih.gov/pmc/articles/PMC3921404/

There was a paper indicating 4 to 16 demultiplexing occurring in the brain. That is also a sort of step function (4 binary inputs light up one of 16 outputs.) If I do 1 to 2 demultiplexing in single layer nets I get great generalization ability but rather slow training. If I do 4 to 16 demultiplexing I get less good generalization but much faster training. Obviously there are trade offs, and perhaps evolution picked one for us?


visit here for video tutorial on activation functions