My analysis on why Temporal Memory prediction doesn't work on sequential data

First of all, I hope that my analysis is flawed, and someone points out what I’ve got wrong. I like a lot Jeff Hawkings’ theory on how the brains work, and really want to see HTM solving real-world problems! However, I think I’ve gained enough knowledge about HTM through reading almost all papers published by Numenta, stepping through the NuPIC’s code, and playing with sanity-nupic visualization tool to share my observations.

It seems that the main problem with TM is that it can’t recognize when an event (i.e., “input pattern”) occurs again in the same context. I’ll use a very simple input sequence: “A,B,C,A,B,C,A,B,C,…”. TM in its form presented in papers can’t learn to predict that “B” occurs after “A”, “C” after “B”, and “A” after “C”. Assume we use the following parameters:

  • columnCount: 6,
  • cellPerColumn: 16,
  • initialPerm: 0.51,
  • connectedPerm: 0.50 (a synapse becomes connected as soon as it is created),
  • maxSegmentsPerCell: 1,
  • maxSynapsesPerSegment: 1,
  • activationThreshold: 1.

Also assume that the spatial pooler assigns columns 1,2 to “A”, columns 3,4 to “B”, and columns 5,6 to “C”. Here’s a visualization of what happens to TM cells on each step through the input sequence:

In steps 1-4, TM bursts the columns corresponding to the input events (shown in red).

In step 1, cells #1 in each column are randomly chosen to represent A’s context.

In step 2, cell #6 in column #3 and cell #3 in column #4 are randomly chosen to grow a single synapse on each of them to the cells that were chosen to represent “A” (in this example, I don’t talk about segments since there is 1 synapse per segment).

The same is done in steps 3 and 4: TM randomly chooses a cell in a column and grows a synapse on it to a cell chosen to represent the previous input.

In step 5, cell #6 in column #3 and cell #3 in column #4 were predicted since both cells #0 in columns #1,2 were active in the previous step (as these columns bursted). So, only these cells are activated.

Similarly, events “C” and “A” are correctly predicted in steps 6-7.

However, in step 8, “B” is unexpected since its representative cells #6 (column #3) and #3 (column #4) were waiting for cells #1 in columns #1 and #2 to get activated. Instead, cell #13 in column #1 and cell #11 in
column #2 were active. Thus, both columns #3 and #4 burst.

This scenario repeats continously: 3 correct predictions followed by a misprediction.

As I figured out, TM needs that the user specifies when to “reset” an input sequence, that is, it needs to know when the context of the first event occurs again. I suppose that’s why all the datasets used in NAB encode the date as a part of the input. For example, in the “hotgym” dataset used in the HTM School youtube videos to teach how TM predicts time series, the context of the energy consumption is defined by the day of the week and the time of the day. Correct me if I’m wrong about this.

P.S. I’m aware of the backtracking_TM used by default in sanity-nupic (I couldn’t even find a value for the “temporalImp” parameter that would force NuPIC to use the original TM class defined in temporal_memory.py). The comment to the _inferBacktrack() method in backtracking_TM.py kind of explains why it’s used. However, I don’t see how backtracking can help with the issue I described above. Also, why it is implemented (and used by default) if it’s not a part of the “official” HTM model and is not explained in papers?

11 Likes

You’re exactly right.

The thing you’re describing is a well-known problem with the TM. We talked about it in a couple past HTM Hackers’ Hangouts. We don’t know how to solve it, but it seems like current research might shed some light on this. It could involve attention.

By the way, I really like your visualizations above!

1 Like

Thank you for your response!

How is it a well-known problem if it’s not mentioned in any published HTM paper? Correct me if I’m wrong but after spending almost a month on researching the HTM theory, I haven’t seen this problem being mentioned in any tutorial, documentation or HTM school video. And this is a major limitation, cause it basically makes TM useless for temporal sequences whose values do not depend on the moment of time. There is no mention of this in the NAB publication from 2017. How so?

I still hope that I’m missing something… Could you please share those Hangout videos discussing this problem?

3 Likes

It shouldn’t repeat continuously. Once each learned element in a repeating sequence has bursted the second time (so first element will burst three times), the representations for all elements in the sequence should stabilize.

Note that I haven’t actually tested this particular scenario in NuPIC itself, so I don’t know if there is some slight logical difference in my own implementation (I’ll have to check). Here is a demo I wrote which visualizes the TM process similar to your drawings above (mouse over cells to see how they are connected)

Just use C D and E instead of A B and C since it is a piano. It should stabilize after the following point in the repeating sequence (bold indicates elements where it will burst): “CDECDECDECDECDEC”.

There is a different problem, however, which may be related if there is in fact a logical difference between my implementation and NuPIC. I have discussed it on another thread. Some of the columns end up with two cells representing the same context (with one connected a bit better than the other). This of course impacts the overall capacity of the system. I think it should be possible to tweak the learning process a bit so that the best connected of the two will eventually win out.

2 Likes

I see one problem in your depiction (you might have just drawn it this way for simplicity sake, but I think it is relevant). The problem starts between steps 8 and 9. In your depiction, you have shown one-to-one cell connections, but in reality the cells active in step 9 would grow connections to the two new cells chosen in step 8. I’ve drawn in all the missing connections that should be present after learning in step 9:

image

EDIT

NM, I see you have configured 1 for max segments per cell and 1 for max synapses per segment. In that scenario, then a cell can never connect to multiple cells, and you will see the behavior you described (bursting every round and never stabilizing). However, I do not think that is a normal configuration to use.

You might be interested to look into the Sparsey model. I know that the author has pointed out weaknesses of the temporal memory as implemented in HTM which his architecture purports to solve and that he has acquired a patent about something (I haven’t studied in detail) reminiscent of this “resetting” situation.

1 Like

For what it’s worth, my implementation doesn’t have a reset function. I haven’t personally found a need for one. The function of bursting already takes care of unexpected switches from one sequence to another.

1 Like

@Paul_Lamb thank you for pointing this out! the TM pseudocode in BAMI says that the winning cells do grow additional synapses to previously active cells, up to SYNAPSE_SAMPLE_SIZE synapses.
This would make winning cells for “B” in step 5 to grow synapses to cell 13 in column 1 and cell 11 in column 2, right? This means that we shouldn’t observe any upredicted input from now on. In your demo though columns for “B” in step 8 burst. (then columns for “C” in step 12 burst, and finally the columns for “A” in step 16 burst)
Could you please explain why the bursting is hapenning in step 8? Cell #6 in column 3 and cell #3 in column 4 have synapses to cell # 13 in column 1 and cell #11 in column 2 which were grown in step 5. Seems like the bursting shouldn’t be hapenning…

1 Like

Sure, the reason is that when B is first learned in step 2 (lets call these cells B’), they grows connections to a random set of cells in the A columns (let’s call these A’). However, those A’ cells do not have connections to anything (since they were first in the sequence). So when A is learned the first time (in step 4), a new set of cells is selected (lets call this A’‘). B’ is predicted in step 5 because the A columns were bursting (meaning A’ are active). However, in step 7 the active cells are A’’ (which B’ is not connected with). This results in the B columns bursting in step 8.

Oh, wait a minute, you are right :blush:

In step 5, B’ should grow some connections with A’’, and thus be predicted for step 8. I think you have found a bug in my implementation. I’ll have to investigate…

This isn’t a bug in your implementation. The BAMI pseudocode has the same behavior. In step 5, it won’t grow new connections to the previous winner cells, because it already has SYNAPSE_SAMPLE_SIZE (a.k.a. “maxNewSynapseCount”) active synapses.

If this logic used “number of synapses to previous winner cells” rather than “number of synapses to previous active cells”, then it would have the alternate behavior that you’re expecting. But that would have other bad effects: if the TM learns sequences “ABCD” and “XBCY”, it would assign the same SDR for both occurrences of C, and then it would always predict a union of D and Y afterward, regardless of whether it had seen “ABC” or “XBC”.

3 Likes

Ah, yes of course. I like topics like these because it challenges and reinforces my understanding.

I’m thinking I do still have a bug (or I just need to go back to the basics and refresh my memory), since now I would expect to see the behavior that @oiegorov described in the OP (i.e. representations never fully stabilizing for all elements in a repeating sequence)

Thank you for your response. Do you confirm that the problem I described in the OP is valid?

Also, what exactly prevents the highlighted cells in step 5 to grow synapses to cell #13 in column 1 and cell # 11 in column 2 if we had maxSegmentsPerCell = 1, maxSynapsesPerSegment = 4, maxNewSynapseCount = 4? Wouldn’t the highlighted cells try to grow 2 more synapses? I’m not sure about that cause I don’t remember seing any discussion about the necessity of growing additional synapses for a cell that was correctly predicted…

You’re correct that the TM handles repeating sequences poorly, by default. The only immediately available solution is to use resets. The Backtracking TM addresses this problem by, upon bursting, asking, “Would this have bursted if the sequence had ‘started’ more recently?”, although I can’t say for certain whether it handles this flawlessly. Another imperfect approach I’ve used is change the “winner cell” selection process so that it selects the same cell within the minicolumn every time the minicolumn bursts, within a limited timespan. The timespan would need to be at least as long as the repeating sequence.

You’re correct that if maxNewSynapseCount (which is poorly named – elsewhere I call this “sampleSize”) is greater than the number of active minicolumns (and hence the number of winner cells), then in Step 5 it will grow 2 more synapses, connecting A’’ -> B’. But if this sample size is ≤ the number of active minicolumns, then it won’t connect A’’ -> B’. Typically the sample size is less than the number of active minicolumns – otherwise it wouldn’t really be “subsampling”, and it would have the ABCD / XBCY problem that I mentioned.

6 Likes

I found the bug. I had written a loophole where a random sampling of up to maxNewSynapseCount previously active cells (which may not include the current connected cells) could end up forming connections to the currently active cells. I believe this is also the cause for my aforementioned issue with multiple cells in the same column representing the same context.

1 Like

Thank you for the clarification! So backtracking is used is to replace resetting?
I found out that NAB uses backtracking by default. Does it still use resetting? if yes, does it reset the sequence when a new week starts?

2 Likes

I hope this isn’t going off topic (I think it is relevant to the topic), this particular goof may hint at a possible direction to explore for stabilizing the representations in a repeating sequence. If the system is allowed to make a smaller number of additional connections (beyond maxNewSynapseCount) for some of the current winning cells (some number of them above activationThreshold) to a random sampling of the winning cells from T-1 that it isn’t already connected with, then the representations would stabilize after the second time through the repeated sequence.

This would then lead to the case you mentioned of ambiguity for the C in ABCD vs XBCY. However, this implementation would result in duplicate cells in a sub-set of the C columns for the C in XBCY. One of the duplicates would be the same cell from ABCD and would be connected more weakly than the other duplicate which is unique to XBCY. The learning step could be tweaked to degrade the weaker of the two, which would eliminate the ambiguity.

It is an interesting idea. I’ll have to think it out further, and I’ll let you know what I learn from it.

1 Like

Yeah, backtracking can act as a replacement for resets. I haven’t spent much time considering how effectively it replaces resets, but it definitely helps.

The “Numenta HTM” NAB detector uses backtracking, and it doesn’t use resets. The same is true of HTM Studio, HTM for Stocks, and it’s what we used in Grok. You can think of the Backtracking TM as a productized version of the Temporal Memory. It’s the pure algorithm, plus some non-biological stuff.

One quick note: the NAB README has a second table of results of different HTM variations. In that table, the “NumentaTM HTM” row is pure temporal memory, without backtracking. You can see that it does okay, but it’s better with backtracking.

3 Likes

That’s so interesting! Might I ask roughly how to implement Backtracking in Nupic? May there be any examples? I’m using it for my thesis and would be very curious to see if/how it might effect my results. Thanks again for all your guys thought leadership and guidance, I’m loving this thread