I was watching a video from Hinton and he says that ANNs can basically do anything. I believe what he means by that is they can approximate any function. A deep feed forward net maps inputs to outputs. Recurrent nets essentially do the same thing but over time. I wonder, could some architecture be used to approximate gradient decent (or some other optimization)? This would mean optimizing an optimizer.
Sorry Matt, I couldn’t resist when thinking about meta