Continuous Online Learning

I just read a nice paper [1], which reminded me that sometimes people on this forum say that continuous online learning is a distinguishing feature of HTM based models. That opinion probably comes from lack of familiarity with relevant mainstream research [2]

[1] https://openreview.net/forum?id=HyxAfnA5tm
[2] http://www.arxiv-sanity.com/search?q=online+learning

1 Like

It is. Also by distinguishing/distinctive it also means it’s a first-class feature and most importantly it is not exclusive to HTM. I think this is what is always emphasized in this forum at least form what I have observed.

You must be new here. There have been countless discussions how HTM is both different from and superior to DL because it employs online learning.

Relatively new I guess. I have not encountered anything similar though. It’s good to know your observation and thanks for sharing the links.

I had some DL work before I become focused with HTM and I don’t have the same argument with respect to online learninh but I can agree that the first thing that comes to my mind when I hear HTM ML is online learning. Superior? There has not been enough work yet to prove it is more superior than NN based ML in my opinion.

The first thing that comes to my mind is that HTM is supposed to be “like a brain”. To me, online learning is neither especially interesting nor unique feature of a brain. I’m more interested in how brain forms long term memories, how it accesses (and quickly modifies) them, and how it organizes higher level concepts. How does it build a “world model” or acquire a “common sense”? These are important intelligence features, which current ML methods can’t model or even mimic, and I think that HTM can actually get there first.

@michaelklachko
From the paper you linked:
In principle, commonly used gradient-based learning methods, such as SGD, can easily be used as online learning algorithms (Bottou, 1998). In practice, their performance with deep neural network function approximators is limited (Sahoo et al., 2017): such high-dimensional models must be trained with batch-mode methods, minibatches, and multiple passes over the data

Fair call, this is what I have seen in the models I have looked at.

The system described in the paper is still RL based so there must be some teacher, and something has to pick which of the Chinese menu priors it is adding the learning to. I did not see where the system builds its own Chinese menu from scratch. Unless I am badly misunderstanding what I read - they trained the model up first (the prior) using off-line learning and then switched in different model segments for training as needed - this prior was what was tuned with online gradient decent as the new training data was encountered. Calling this on-line learning is a vary different thing from what HTM does on-line from a naive model.

Based on this I have not been inspired to dig though the collection offered on your second link but perhaps you are familiar with that material and you can offer your best model to illustrate something that is as capable as HTM?

3 Likes

In order to achieve state of the art results, yes, currently you want to use batch SGD with deep NNs. This does not mean that it won’t work in an online learning setting (batch size of 1) - it still works, just not as well. For example that Sahoo et al. paper says “yes, it’s difficult to do online learning, but here is a novel approach that makes it work better.” That’s pretty much the point of any online learning research, including HTM.

It seems you’re confusing “supervised vs unsupervised” learning, and “online vs offline” learning. They are separate concepts, and one does not need, or follows from, another. If you’re interested in the former, you should try training an LSTM for language modeling (unsupervised task) while backpropagating error and updating parameters after each sample (online learning). Probably won’t work as well as batch training, but it should still work. Here we’re discussing the latter (the online learning aspect).

The prior in this case is the initial weights. The model has been initialized in such a way that when it has to learn online it learns quickly. It does not mean it learned to perform those different tasks in advance.

To me, systems with strong priors are actually more brain-like than HTM: a human is most likely primed for quick learning when born, just like this model.

I’m not sure what you mean here, current HTM based models are not really capable at any task. Any mainstream ML model will destroy an HTM based model, if we care about performance on any particular benchmark.
Need I remind you that time when Nupic team entered Kaggle anomaly detection competition? For those who don’t remember, Nupic team couldn’t even get into top-100 on the scoreboard: American Epilepsy Society Seizure Prediction Challenge | Kaggle which is embarassing, considering an anomaly detection is the task HTM is most suited to.

Having said that, I’m not here to criticize HTM. I’d rather discuss my previous comment, and how that could be incorporated into HTM theory.

2 Likes

I strongly doubt that in the case of the cortex. “Priors” might be located in other structures (and modulate the cortex learning process via BG->thalamic matrix->pia). But cortical column can’t have priors. Otherwise, cortex flexibility will be gone.

BTW, I think that we are quite slow learners at “boot time” :slight_smile:

2 Likes

Ooo gauntlet thrown :wink:

BTW, do you know if there is documentation into the strategy that team used in the competition? I am curious to see if I can understand what about that particular case made it difficult to perform as well as a lot of others (encoding concerns, fields involved, pattern complexity, repeating inputs, etc).

4 Likes

I don’t remember, Fergal led that team, so you might want to ask him.

1 Like

Most likely HTM model ignores many learning modes, methods, and dynamics employed by a human brain at boot time. There are imitation learning, reinforcement learning, semi supervised learning, self supervised learning, long/short term memory interactions, attention, and probably many many other things we don’t even know about yet.

Actually a particular structure(s) of a cortical column might serve as a “prior”. NNs used in RL are a lot less structured (they are typically just fully connected layers).

1 Like

The reference to supervision goes to the nature of the online learning. HTM online learns the delta against the contents of the memory where RL has to get some feedback if it is doing the right thing or not. The nature of what is learned by the HTM online learning is distinctly different. RL spoon feeds information from the trainer so I don’t see this a running unattended. HTM can be left to its own devices and learns purely off the data that is being fed to it. It is a different degree of “online” in the sense of an autonomous agent.

As far as the utility of what operation is performed - I agree to a certain point. HTM as it exists is not a complete system; it must be combined with something more to make it do something useful.

I violently disagree with your assertion that the priors are encoded in the cortex. I firmly believe that this is completely the realm of the old lizard brain. Cortex is a pure data sponge.

These two threads outline the broad strokes of what I think is necessary to harness the cortex algorithm into a functional system. There are many moving parts. Many of the details are not included; some are elucidated elsewhere, some are a work in progress.
This is the “big picture” view:

It takes the accompanying posts to explain how the various blocks work.

This post is the broad strokes of what is in the orange block:

You demonstrate an analytical mind; I am interested in your criticisms of my thoughts on this.

2 Likes

I think you are right: HTM/CLA ignores most details in the cortical column. High-level considerations can be modeled there. But at lower levels, many details (such as bursting dynamics, neuromodulation, etc…) aren’t considered yet. Nevertheless, is the step in the right direction. I think that is the cornerstone over many other pieces will be built.

In any case, my point was about neuroscience. The structure of the cortex is regular, and although somehow you can “define” separate regions [1], I think most of that is a consequence of the “sensory” input [2]. Ergo, if the cortical columns are regular, the only prior there is the “base” algorithm, which is agnostic to the data.

[1] M. F. Glasser, T. S. Coalson, E. C. Robinson, C. D. Hacker, J. Harwell, E. Yacoub, K. Ugurbil, J. Andersson, C. F. Beckmann, M. Jenkinson, S. M. Smith, and D. C. Van Essen, “A multi-modal parcellation of human cerebral cortex,” Nature , vol. 536, no. 7615, pp. 171–178, 2016.

[2] N. V De Marco García, R. Priya, S. N. Tuncdemir, G. Fishell, and T. Karayannis, “Sensory inputs control the integration of neurogliaform interneurons into cortical circuits,” Nat. Neurosci. , vol. 18, no. 3, pp. 393–401, Mar. 2015.