What is next?

I agree. I don’t think the brain is computing gradients. Though I do think the solution is seeking predictive energy minima. So the same goal as computing gradients, just a different way of finding minima of those gradients.

Instead of computing gradients, I think the brain finds predictive energy minima as network resonances using oscillations.

No need to update model weights with fine-grained values. And with the additional benefit that the energy minima can vary dynamically. Perhaps explaining why LLMs have such enormous parameter blow outs.

I outlined what I think is the appropriate contrast here:

2 Likes