Is billions of parameters (or more) the only solution?

These terms are not even definable, so there is no contradiction. I think NNs do comparison, just backwards, top-layer-first. And search is iterative comparison.
AFAIKT, common / defining core of all NNs is Perceptron, and stand-alone Perceptron can only learn in a Hebbian fashion. Which is weighted summation, then comparison of normalized sum to each input. Then the weight of the input is increased / decreased in proportion to the match from that comparison. This is basically the same principle as in centroid-based clustering.

Backprop is similar, but it only compares the output layer, and learning is done by propagating resulting gradient. But that gradient is basically inverted match/similarity, so it’s the same principle.
As for compression, it’s done by reducing the number of active nodes in the middle layer of an autoencoder. That middle layer represents the highest order of generalization.
So it’s definitely not a conventional search, but there is this coarse and backward serial comparison to the same effect.

2 Likes