My analysis on why Temporal Memory prediction doesn't work on sequential data

Yes, I haven’t needed one myself (partly because of the unintentional bug I mentioned above which had the interesting side-effect of stabilizing repeating sequences)

Oh, cool.

This seems like an interesting concept for stabilizing sequences (and possibly even biologically plausible?). I for one would be curious to see any results you get testing this out!

The root of the problem seems to be a need for (and current lack of) the ability to consolidate and generalize among possible contexts for a given input. The semantic similarity between contexts is not taken into account with respect to network prediction.

My first thoughts to alleviate this would be hierarchical processing but there may be a better way the brain solves this within a layer itself.

1 Like

Occam’s razor - go for the H of HTM here.

1 Like

This is interesting question. I also never found anything in some paper regarding this issue. Is there any working example, which shows how HTM learn sequences?
@rhyolight: In which hangout’s was talked about this issue?

Thanks

I am not aware that this problem is defined in our papers, but we’ve talked about it a lot, especially on this forum, and also in some research meetings and hangouts. Finding the specific videos is kindof hard.

2 Likes

Could this issue could be mitigated by getting presynaptic cells from more than just the prior tilmestep? Learning in the TM only connects current WinnerCells to WinnerCells from t-1. What if the pool of presynaptic cells didn’t just include WinnerCells from t-1? What if it included say t-1, t-2 & t-3?

Does anyone know if this is implemented anywhere? I don’t know of a NuPIC param which toggles this.

This is a really interesting idea. My initial conclusion is that this alone won’t address the problem, but I need to think it through some more. It might be possible to combine this idea with something else in order to establish a link between the end of one pass and start of the next one.

I drew up a quick visualization for a simple single repeating input and one minicolumn. This depicts a single minicolumn moving through time from left to right (so 15 total timesteps) I put distal connections to T-1 on the left of the minicolumn, and to T-2 are on the right of the minicolumn. Red cells are active, blue ones predictive, and split red/blue are both active and predictive.

image

Hopefully this is enough iterations to demonstrate the pattern that is occurring by adding the T-2 connections. I’ve also thought trough slightly more complicated examples in my head and I think they have a similar behavior to this. This is how otherwise vanilla TM algorithm would behave, but there are other variations to the algorithm that might be worth considering (such as ability to predict in same step distal connection is formed, preventing simultaneous predictive and active states, etc).

2 Likes

Do you happen to have a framework for visualizing/analyzing the behavior on these more complicated examples? I’m thinking if you had say 2 sequences which share a common subsequence, like:

A,B,1,2,3,4,5,C,D,A,B,1,2,3,4,5,C,D…
&
W,X,1,2,3,4,5,Y,Z,W,X,1,2,3,4,5,Y,Z…

Maybe we could find a relationship between the length of the shared subsequence, the Markov order of distal learning used (t-1, t-2, t-3, etc), and the number of repetitions it takes for each TM version to distinguish between the 2 overall sequences.

I think the NuPIC TM would need a new data structure, to track earlier WinnerCells than those from t-1, in order to implement this.

Not really. I usually just tweak HTM.js. I haven’t had time to do that with this idea yet, though. I’ll post a video (and link if you want to play around with it further) once I have.

1 Like

I did this in 2015.

1 Like

Oh cool! Any results you’d be willing to share?

It was for my master thesis. I verified the concept for sequence prediction of images from MNIST dataset (as far as I remember). You can check it here:
https://ieeexplore.ieee.org/stamp/stamp.jsp?arnumber=8620601

Original work is here (2014-2015):
http://scholarworks.rit.edu/cgi/viewcontent.cgi?article=9856&context=theses

4 Likes

Hey everybody =)

this is a very nice and interesting discussion :heart_eyes:
Now I have some questions:

@Paul_Lamb

is this demo supposed to be correct? I tried some things and I think there are small mistakes. Eg. if I choose Max Segments per Cell = 1 , Max Synapses per Segment = 1 and Max New Synapse Count=3, one cell is actually connecting to 3 other cells… I am sure this shouldn’ be the case… or am I totally wrong there?

Sooo besides this I have a question about the backtracking (BT). I read the source code and wanted to investigate when backtracking will happen for the example in the first post:

As I understood we will use backtracking always when we have to many bursting columns or we have not enough predicted cells for the next timestep.
Now I was investigating what will happen for this example if we use Backtracking TM.
In my opinion BT will occur in the steps 1-4 without making any change, because we have not yet learned the sequence.
The next time BT will happen is in step 7, when we do not have any predicted cells, right?
And if so, then I think nothing will change, because we have no sequence learned, which will lead to a prediction of B in this specific context.
So in my opinion the bursting in step 8 is unavoidable.

In my understand in step 8 the activation will change, bc BT recognizes, that if we would start with the A without context, then B would be predicted.
So I think, that in step 9 the cells 3 and 6 in columns 3 and 4 should be activated. So we can start from step 5 again.

But here are also some uncertainties. If we follow the sequence again (starting from step 5 because BT changed the activation), what will happen when A occurs the next time? Will the columns for B burst again?
Or is the backtracking also changing the connections some steps before? What I mean is: After the C comes again (with the representation of step 6) which cell will be in a predictive state to predict A?

I hope I could express myself :sweat_smile: :see_no_evil:
Thanks a lot in advance for any clarification :blush:

2 Likes

I agree with you – that seems like a logic error. Probably question of whether or not the max new synapse count has precedence over the other two params. In edge cases like this where the parameters are contradicting each other, it isn’t uncommon to see counter-intuitive behavior.

1 Like

Thanks for the super fast reply :blush:
I see… is this an error in your implementation or do I have to expect such a behavior in nupic as well?
Do you know that?

I haven’t used NuPIC enough to have tested this configuration, unfortunately.

1 Like

ok thanks =)