I agree. I don’t think the brain is computing gradients. Though I do think the solution is seeking predictive energy minima. So the same goal as computing gradients, just a different way of finding minima of those gradients.
Instead of computing gradients, I think the brain finds predictive energy minima as network resonances using oscillations.
No need to update model weights with fine-grained values. And with the additional benefit that the energy minima can vary dynamically. Perhaps explaining why LLMs have such enormous parameter blow outs.
I outlined what I think is the appropriate contrast here: