“A significant component of the DQN training algorithm is a mechanism called experience replay . Transitions experienced from interacting with the environment are stored in the experience replay memory. These transitions are then uniformly sampled from to train on in an offline manner. From a theoretical standpoint this breaks the strong temporal correlations that would affect learning online.”-torch Dueling Deep Q-Networks
That does not appear to be online.
“In both the brain, and in deep networks, learning is not immediate. HTM abstracts the forming and strengthening of synapses into a single step process, but it’s actually a multiple-step chain of biochemical interactions that requires many repetitions of the pattern you want to learn. So each time a neuron sees a pattern, its connections are only updated by a small amount, and many repetitions are required to fully solidify the pattern detector. Deep networks are trained in an analogous way, where each presentation of a pattern only updates the synapses a small amount.”
Within seconds brain tissue experiences structural changes that can affect performance. Even without a hyppocampus short term memory works, and learning can occur, it is just that it is quickly forgotten and not permanent. The issue with the need for the hippocampus probably has to do with metaplasticity rules in the brain. If the higher areas have neurons active over longer periods of time as compared to areas lower in the hierarchy, they must have different requirements to make changes to permanence. Metaplasticity can supposedly solve catastrophic forgetting and drastically improve the memory capacity of a neural system.
As regards online learning, the brain can learn while being active in the environment and quickly even within seconds change and adapt to novel information, without being taken offline, that is it can experience changes even drastic ones without interrupting waking activity. True some cases it takes time to improve performance, but in the simpler examples even drastic performance is possible in seconds.
“There are 30 million seconds in a year, and if we assume we process input at 10Hz or more, that’s hundreds of millions of training examples per year, and humans don’t become useful until after more than a decade of training.”
Those are not millions of unique labelled data, a baby may spends most hours asleep, the few hours awake it may spend lots of time looking at one or two toys perhaps even a blank wall. The number of unique voices and sentences can be quite limited. Yet in a few years it will eclipse most anything, and some can even do advanced mathematics and multiple languages.