Learning long-range dependencies

adepesters · June 2, 2018, 2:40pm

Hi everyone,

I have a question regarding the way HTM learns long-range dependencies, and in particular how it compares to the way LSTM does it.

It is my understanding that in order to learn long-range dependencies between two events A and B separated by x number of events, HTM would need to memorize the entire sequence of x events, even though those events may not be causally related to A or B. In contrast, LSTM does not need to learn the entire sequence, thanks to its gating mechanism. Rather, it learns a direct relationship between A and B. Am I understanding correctly that currently, in the particular scenario where the x events are not causally related to A or B, HTM cannot actually learn a direct connection between A and B?

I am aware of the good results achieved by HTM on long sequences learning as shown by Cui et al. (2016). However, the need for HTM to learn the entire sequence, instead of learning a direct connection between A and B, seems like a important limitation. Would you agree on that?

Thank you very much

sheiser1 · June 2, 2018, 3:25pm

It seems you’re right, especially if the x-event sequences btwn B and A are not steadily predictable. Even if they were, the longer they get the more repetitions of the overall sequences the TM would need to learn that total context. I think connecting A to B in those cases at least calls for minimal noise and higher repetitions.

sunguralikaan · June 2, 2018, 9:08pm

Definitely a valid criticism and an actual problem in many cases.

There is actually a way to do it with HTM but I do not think it is practical for most of the use cases. Temporal Memory ™ can be configured such that the current active cells representing B at time t, not only form connections to the previously active cells at time t-1 (last event of x) but also to the cells prior to that (t-n). If you run it long enough the causality of x events would not be captured but A would. However, this is not how vanilla TM works and if it is configured as such, the predictions caused by a single activation may not be useful due to many false positives.

subutai · June 4, 2018, 4:08pm

Yes, this is correct for the HTM Temporal Memory algorithm. We discussed this (and a couple of other limitations) in the Cui et al (2016) paper (see the third limitation in Section 6.4). We used a variation of your example, the Reber grammar task, as an illustration of this.

It is quite possible that we could extend the algorithm to handle these cases, but we have not focused on this. It would be great if someone wanted to tackle it.

TaherHabib · June 1, 2020, 1:39pm

Hi,

Could you please elaborate on how this configuration of TM might look like? I have found this discussion on using graded SDRs to improve random access of SDRs quite helpful. Does your answer relate to this in anyway?

I would really appreciate any help on this matter since I am studying the problem of learning long-range dependencies in HTM, using the cases of Extended and Continual Reber Grammars. Thanks

Bitking · June 1, 2020, 2:42pm

Trademarked?

Jose_Cueto · June 1, 2020, 3:23pm

What do you mean by “causally” here? Did you mean correlation?

What do you mean by “direct relationship”?

Some context also, is the task for sequence prediction?

Paul_Lamb · June 1, 2020, 7:43pm

Haha, that one has got me before too. Discourse turns (TM) into ™

Topic		Replies	Views
Building up longer term contextual representations with HTM Numenta Theory sequence-memory , question , time	9	554	February 2, 2023
How does the HTM model learn about sequences? Numenta Theory	4	830	May 30, 2017
Related knowledge with HTM? Related Papers question	4	419	May 23, 2022
HTM + Logic for sequence learning Machine Learning sequence-memory	2	480	November 16, 2023
Is there a paper for comparing traditional Machine learning and HTM system? Machine Learning	2	493	May 15, 2020

Learning long-range dependencies

Related topics