There-s a a possibility the genetic algorithms have more potential when they are used to alter (aka evolve) the input representation or “embedding” rather than the network itself.
All learning algorithms work reasonably well on most inputs, sure some outperform others in various respects, e.g. some achieve higher accuracy, others better sample efficiency, or compute efficiency, others forget less, some overfit, some are better suited for noisy data, etc… but be it ANN, K-NN, linear, trees, forests, xgboost - whatever - there-s no clear winner.
The takeaway here being that IF there are some learnable correlations or patterns within some input data all learning algorithms manage to figure out, with a better or less degree, that these patterns exist.
What we (animals) seem to have a power to … simplify and clarify the few essential features that expose a property or characteristic.
I don’t know of any algorithm able to do that - to figure out not only there-s a cat in the room but also to pinpoint the few key elements in sensory input which make it sure that fact is true.
Some rules of combine/extract smaller parts from sensory channels which can be evolved should not be hard to implement. With simple enough input some learning algorithm learn very fast, dozen or hundreds of milliseconds per core.
Here-s a simplified schematics:
Raw input multiple sensory channels → Simplifier/combiner layer → Generic Learner → Results
The evolving algorithm targets the simplifier/cominer layer that generates simple(r) representations of the raw input. NOT the learner. We know it discovered an improved “perspective” of the input when the results of the generic learner improve.
Here-s a MNIST example task: let’s find a topology of 20 patches x 10 pixels each which when the input image is represented as 20 scalars (each patch sums up the values of the pixels it contains) we get the best accuracy.
So the genetic algorithm starts with a population of 100 sets of 20 random patches (10 pixels each patch).
The testing algorithm picks 1000 digits from the training dataset.
And re-trains the same small initial network 100 times, every time with a different set of 20 patches.
Then tests each trained network-patch set combination
It doesn’t need to reach top accuracy, only to figure out which of 20 series of patches combinations allowed its “learner” to outperform the others, and combine the combine the winning set (e.g. 20 out of 100) in the following generation of the genetic algorithm.