HTM underfit/overfit


#1

As everyone know, HTM is not well performed in certain specific test. In ML word, is it underfit or overfit?

With my understanding, it is overfit. So question comes.

  1. Does it be improved if we train more (huge…) data?
  2. Why the dropout doesn’t work?

#2

Hi @roy.rm.gu,

Welcome! I’d just like to weigh in here and say that HTM systems don’t suffer from over/under fitness because they aren’t pre-trained, and they are totally online learners - they learn from the data being inputted. As the data changes its characteristics, the HTM begins to learn the new input; discerning spatial and temporal relationships between the data points being streamed. Nothing at all prevents it from learning a completely new set of data at any point in time.

While I have little to no experience with classical ML NN’s, I’m pretty confident that what I’m saying is true. You may get more experienced responders to corroborate or refute what I’m saying, I don’t know…

Also, you can add up to 50% noise (I believe that’s the figure), or have 50% cell death and the HTM will still compensate and recover… I don’t have the link to @ycui’s experiment with this (and there are others), but maybe @rhyolight can find it?


#3

@cogmission It’s in @ycui’s recent paper: “Continuous Online Sequence Learning with an Unsupervised Neural Network Model”

http://www.mitpressjournals.org/doi/abs/10.1162/NECO_a_00893#.WHZw3NUrKpr

Figure 9 is the one you’re thinking of, but this paper shows other comparisons with classic ML techniques as well.


#4

Thanks Christy! (@cmaver)

I see from that paper that my figure of 50% was off - it’s 30%. Anyway, there are quite a few distinguishing features not found in classical NN networks that make HTMs really interesting! I’ve seen other talks/writings that speak of the robustness of HTMs besides that recent experiment, also… very cool stuff!


#5

@roy.rm.gu Interesting question. I don’t have a clean answer to your question. There are several things to consider.

First, the training scheme of HTM is quite different from traditional ML. Since we consider prediction tasks from streaming data (with no buffer at all), there is no separate training and testing dataset. The performance of HTM improves initially as a function of training time (which is proportional to the amount of streaming data it has seen) and will stablizes eventually if the statistics of the data stream doesn’t change. However, it is important to note that this is different from having a huge, static training dataset as in most machine learning applications. In general the model won’t overfit as long as the noise is non-repeatable and the parameters are set properly . Only the repeatable part of the sequence will be remembered.

Second, there is a tradeoff between flexibility and stability, which is controlled by some learning rate parameter in any continuous online learning algorithm. If learning rate in HTM is high, the model adapts quickly to changes but may not have good stable performance. The bad performance can be thought as overfit in this case, because the model is trying to memorize noise that aren’t really repeatable. If you have a lot of training data and you know the statistics doesn’t change much, you can use a low learning rate. I guess the performance will improve in that case.

Finally, what happens if you use HTM on a static dataset? If you repeat the data enough times, I think the model will eventually memorize every transitions and give you perfect predictions on the training data, but the good performance will not translate to a novel dataset. This is definitely overfitting in my opinion. There are no easy ways to get around this problem in HTM. We don’t have effective regularization or dropout techniques in HTM yet. It is an interesting research direction to explore.


#6

I believe I disagree with this even though I have no real training or experience to bring to bear on this subject. I thought the definition of “overfitting” was to configure your model to perform really well for a given dataset and then find that it was so specifically configured (either by adjustment of parameters or some other means), that it no longer is able to extrapolate and be applied to unequal but similar problems?

If that definition holds, wouldn’t an HTM merely adapt to data representing a new problem domain without “getting stuck” with only being suitable to one specific data set? Wouldn’t the permanences of the SP gradually change in their entirety and wouldn’t the synapses for the TM specifically set for previous data get culled (deleted) and others would replace them in a gradual relearning of the new problem domain - on the fly?

Btw, please understand that me challenging you on this is just my way of learning in my obstinate way :wink:


#7

@cogmission I agree with you if you let HTM learn on the new (testing) dataset, it will eventually perform well there. By “using HTM on a static dataset”, I am trying to think in terms of classical machine learning applications, where you train your model on one large static dataset, and then turn off learning and evaluate its performance on a separate test dataset. We don’t use HTM in this way, but this is how machine learning is done in most cases. This is where the distinction between overfit and underfit makes sense clearly. It is hard to think about overfitting in continuous learning scenarios.


#8

@ycui

Thanks for taking the time and helping me get clarity on this. You guys are always so responsive and courteous - I appreciate it! This community has always been a safe space to learn and communicate, and its because of the many patient interactions like this one!


#9

Hi, I work with “traditional” NNs in the “ML world”, and I’ve noticed people emphasizing the distinction between continuous online, and offline learning, so I’d like to comment on this.

First of all, one of the ways to prevent overfitting is to increase the training dataset, to the point where none of the training input patterns have been seen more than once. This could be done, for example, by continuously distorting, or transforming patterns in the original training dataset. If we feed the network one such pattern at a time (rather than a batch of patterns, as it’s usually done for efficiency purposes), the learning effectively becomes “online”. Therefore, we can say that “online learning” should lead to less overfitting. Overfitting can still happen in this scenario, if the patterns it has seen so far are very similar to each other.
A network might not perform well for other reasons: perhaps it hasn’t been trained enough, or it does not have enough weights (or parameters, synapses, neurons, etc) to encode all important features of the patterns (this would be “underfitting”), or it hasn’t been properly setup (wrong initial conditions, learning rate, regularization, etc). These reasons apply equally well to both online, and offline learning networks, because there is no fundamental difference between the two learning types.

Second, the dropout regularization is nothing more than random “cell death”. If an ANN is large enough, we can remove a random subset of neurons (up to 50%, or even more) every time we show it a pattern, and it will learn better as a result (but slower). I don’t quite understand why wouldn’t this technique help HTM learn better features (more fundamental or robust associations/transitions)?

EDIT: I forgot to bring up the most important bit:

what happens if you use HTM on a static dataset? If you repeat the data enough times, I think the model will eventually memorize every transitions and give you perfect predictions on the training data, but the good performance will not translate to a novel dataset

This is a problem, in my opinion, because a human brain generalizes extremely well, especially through repetition.

I haven’t yet read the “Continuous Online Sequence Learning” paper, so I apologize if these concerns have been addressed there.