This is the essence of all online learning algorithms, DL and HTM included. The training set is the continuously arriving input stream. When the statistics of the input stream change, the model adjusts to fit the new patterns. DQN is one example of learning online*, and if the task domain changes (let’s say you change how rewards are delivered) then the model will adapt to that. In fact, this is the concept behind a technique in RL called “reward shaping”, where you give the network an easier task to solve at first in order to learn useful preliminary behaviors, and then progressively increase the difficulty of the tasks in order to coax it into learning more complex skills.
So HTM has no monopoly on adapting to a changing world. It does it in a much more biologically appealing way, certainly, and as a result it has the potential to explain the brain, which most deep networks do not. But the capability to adapt is not beyond mainstream machine learning. As I mentioned above however, HTM has a silver bullet that should in principle help it adapt more quickly to a changing world, which is the sparsity of its learning updates. So each HTM synaptic update is less likely to interfere with the previous things you’ve learned than would a dense update in a standard deep network, and you can be more aggressive with your updates as a result, therefore learning faster.
I really do view the benefits of sparsity as the sole performance advantage of HTM. But that advantage is beyond massive, so it is certainly worth exploiting.
(*DQN is a bit of a special case because of experience replay: when the task changes the replay buffer will get stale, full of data about the world as it used to be, rather than it is now. A better example would be A3C, the new-ish hotness in reinforcement learning, which does on-policy updates using the present experience instead of a replay buffer, in a much more “online” way.)