Hi, I work with "traditional" NNs in the "ML world", and I've noticed people emphasizing the distinction between continuous online, and offline learning, so I'd like to comment on this.
First of all, one of the ways to prevent overfitting is to increase the training dataset, to the point where none of the training input patterns have been seen more than once. This could be done, for example, by continuously distorting, or transforming patterns in the original training dataset. If we feed the network one such pattern at a time (rather than a batch of patterns, as it's usually done for efficiency purposes), the learning effectively becomes "online". Therefore, we can say that "online learning" should lead to less overfitting. Overfitting can still happen in this scenario, if the patterns it has seen so far are very similar to each other.
A network might not perform well for other reasons: perhaps it hasn't been trained enough, or it does not have enough weights (or parameters, synapses, neurons, etc) to encode all important features of the patterns (this would be "underfitting"), or it hasn't been properly setup (wrong initial conditions, learning rate, regularization, etc). These reasons apply equally well to both online, and offline learning networks, because there is no fundamental difference between the two learning types.
Second, the dropout regularization is nothing more than random "cell death". If an ANN is large enough, we can remove a random subset of neurons (up to 50%, or even more) every time we show it a pattern, and it will learn better as a result (but slower). I don't quite understand why wouldn't this technique help HTM learn better features (more fundamental or robust associations/transitions)?
EDIT: I forgot to bring up the most important bit:
what happens if you use HTM on a static dataset? If you repeat the data enough times, I think the model will eventually memorize every transitions and give you perfect predictions on the training data, but the good performance will not translate to a novel dataset
This is a problem, in my opinion, because a human brain generalizes extremely well, especially through repetition.
I haven't yet read the "Continuous Online Sequence Learning" paper, so I apologize if these concerns have been addressed there.