Brittleness vs sample efficiency

The varying results in this particular case are not caused by the model itself (both brittle-fast-learner / reliable-slower-learner only have different input encodings.

What a genetic optimizer could do in this case is to explore various input encodings.


Regarding ensemble learning - here a couple observations:

  1. in certain cases (using the same input encoder) multiple ValueMap evaluators, trained separately,on different data points do not need to be evaluated individually and averaged as in normal ensembles. They all can be merged in a single one by simple overlap (addition) of their parameters. This is an unlikely option in the case of normal NN where each weight function/meaning is influenced by the specific datapoints used in training.

  2. An option to expand parameter space (number of evaluators) with little penalty on performance is to use SDR based routing in a preliminary layer.
    This simply means e.g. for 1000 evaluators, encode each input as 30/1000 SDR and use only the ensemble of corresponding 30 evaluators (out of 1000) for learning/inference for each data point. Not sure if this is clearer but it’s the same idea.

2 Likes