TemporalMemory for prediction

Hi all,
I am using TM for predicting a sinus wave where TM is directly connected with Scalar Encoder (n=1024, w=21).
After learning, I turn TM in inference mode and observe the predictive cells.
I found that there is one bucket, by which TM is NOT able to bring some cells into predictive states, I.e. no predictive cells at every period.

Could anyone explain this phenomenon and recommend me to avoid this?
Thanks

1 Like

There seems to be a common fundamental misunderstanding interweaved with lots of attempts at using HTMā€™s TM. Even @marty1885 's recent nice experiment is imho flawed with some misconceptionā€¦
Although his good results may be a hint that it is I, who did not understand something after all, here is my take:
The HTM TM algorithm is intended to predict finite sequences with a definite starting point. Give it a C, and it will propose the set of ā€˜next noteā€™ from all learnt adagios including a C. Then give it a D, and in the event that thereā€™s only one learnt sequence containing ā€˜CDā€™, it will now correctly predict next note, then next one after thatā€¦ all the way till the end. And thereā€™s an end.

Either you did reset the TM at some periodic point during the learning phase, in which case you understood this part wellā€¦ but then it is no wonder that it canā€™t predict that (even repetitive) starting point itself.
Or you did not reset the TM at allā€¦ in which case it tried to learn the whole thing as a single sequence where every sample is independent of the other. Nothing told it that the sequence is a loopā€¦ there will necessarily be some finite n where, at tn+1, almost nothing can be inferred from tn, even with the knowledge of t0 to t1ā€¦, even with your (meta) expectation that it will have noticed that t0=tn. It didnā€™t notice that.

Long story short: by design, each instance of a same input pattern, with a distinct index in a learned and recognized sequence, is a distinct instance from the TMā€™s viewpoint. Even when your human brain would have spotted ā€œregularitiesā€ within the sequence itself.

3 Likes

@gmirey thanks for your information.
In my test, I do not reset after learning.
I use SDRClassifier for multi-step prediction with HTM.core from HTM community.
One-Step-prediction works very good except my problem. My sequence has a length of 15 buckets. I want to find a systematic solution!

A systematic solution to spotting subsequences nested inside a sequence would require, as @Bitking would put it, to find the H working with that TM.

Some discussions around that:

1 Like

I think it is mathematically no true. And Etaler implements a modified TM alforithm. Let me try to explain.

First of all. For any given bit in a SDR, since the TM predicts weather a bit is on or not by using cells in the column, the TM has cell_per_column ways to activate it. Secondly, Etaler implements the connect-to-all synaptic growing algorithm. Causing it assuming identical input patterns with unknown context to be the same unless the context suggest otherwise or learned the relation later on.

From Etalerā€™s TMā€™s perspective, the sequence [A, A, A] is [A, A, A]. And the sequence [A, B, C, B, C, D] will initially recognized as [A, B, C, B, C, D] until it discovers it actually be [A, B, C, Bā€™, Cā€™, D].

1 Like

But it is not in the reality with my test case, where we do not have a clear sequence. Sometimes there is a slight change at any element of this sequence, e.g.
[A, B, C, B, Cā€™, D] or
[A, B, Cā€™, B, C, D].
In my test with sinus waves we have bucket numbers instead of letter, so that C and Cā€™ are in the neighborhood like
C=102, Cā€™=103
For this reason, TM fails to predictā€¦

2 Likes

Okay.
Well, in case A, B, C, D, and Cā€™ are all very distinct (ie, their input patterns after spatial pooling do not share set-bits), I believe the TM would have a hard time recovering from that Cā€™ indeedā€¦ (however, if you have only one learned sequence, I guess it could fall on its feet after a few iterations).

Now, if you followed @rhyolightā€™s recommendations wrt the setup of a scalar encoder, your value of ā€˜102ā€™ should have many bits in common with that ā€˜103ā€™ā€¦ and thus, Iā€™d bet the TM would do almost fine.

If that is not the caseā€¦ maybe try to have that encoder sparse itself, and shortcut the SP step ?

1 Like

@gmirey I do not think about the encoder problem, because C and Cā€™ are two neighbore buckets and they will share at least one bit
It seems to be an effect that TM will learn a sequence, but it confuses at the end of sequence and at begin of the new oneā€¦
In my snapshot below, you will see the input in red color and the predicted in the blue color. The predicted value at -0.2 is failed at every 2 sinus periodsā€¦

I think your n/w ratio may be too high here, I remember that it need nit be more than 10 or 11 to 1 (so you could try n=231 w=21). With n=1024 and w=21, even numbers that are relatively close together may have no overlap in their encoding vectors.

1024 / 21 = ~49 buckets. Assuming a sine wave ranges from -1 to 1 this makes each bucket around 0.04, which mean that the inputs 0.95 and 0.99 have no overlapping encoding bits. So depending on the granularity of your sine wave plus this coarse encoding, you maybe effectively feeding in:
A,A,A,A,A,B,B,B,B,B,C,C,C,C,Cā€¦Z,Z,Z,Z,Z,1,1,1,1,1ā€¦23,23,23,23,23

@sheiser1 sorry, it does not help with your parameter setup. From my experience, only parameter w and activationThreshold have more influence on the prediction performance.

In some discussion thread, @sheiser1 said you are working intensively on sequence prediction. Do you use SDRClassifier for this purpose?

@sheiser1 for n= 1024, w = 21 you should have 1024 - 21 +1 = 1004 buckets (why 49 buckets?) so that two neighbor buckets have enough overlap in my computing.

There are 1004 overlapping buckets yes, I meant 49 distinct buckets. So the input 0.0 activates 21 separate bits from the input 0.04.

Just to make sure its clear, by ā€˜distinctā€™ I mean like these encodings which no overlap.
1111000000000000000000000000000000000000 ā€”> -1.0
0000111100000000000000000000000000000000 ā€”> -0.96
ā€¦
0000000000000000000000000000000011110000 ā€”> 0.96
0000000000000000000000000000000000001111 ā€”> 1.0

To this spatial pooler this makes -1.0 and -0.96 as distinct from each other categorical input ā€˜Aā€™ & ā€˜Bā€™.

@sheiser1 understood ā€œdistinctā€ term. Again, the reduction n from 1024 to 240 can not solve the prediction failure.
But I really do not understand in my case, why TM can not predict the same bucket at the same context. If I let TM learn a little more, then the same problem is happend with another bucket

Neither n nor w matter on their own, only the ratio between then because that sets the granularity with which the encoder sees the data.

So if you prefer to keep n at 1024 you could raise the w to ~93 for the same effect. Iā€™d highly recommend trying this because a very disparate encoder like n=1024, w=21 will make for little or no overlap between encodings, which the SP relies on activate relevant columns. Have you seen the HTM school videos on SP? Theyā€™ll illustrate this point very well and is very watchable in case you havenā€™t yet.

So that must mean youā€™ve tried it?

Anyone will tell you that the n/w encoding ratio can and often does have major, majors effects on model performance as I have experienced.

What would really help to diagnose this would be plotting the raw anomaly scores over time as well as the number of predicted cells. The anomaly score measures how surprised the system was by each input, and the number of predicted cells measures how precisely the system is predicting. I usually take the number of predicted cells / 40, since 40 is the default number of columns to activate when the system makes one prediction.

Mostly Iā€™ve been using anomaly detection to do sequence classification. This hasnā€™t involve the SDRClassifier, though as I understand the SDR classifier maps a set of predictedCells to a set of encoding buckets, ranking them from most strongly to least strongly predicted based on which cells are predicted and how many there are.

In my opinion Iā€™d try using something other than a standard sine wave to test out forecasting, maybe sin + cos to make the patterns more intricate, and adding some noise too.

I think whatā€™d really shine light on this would be to plot the raw data with the predicted values on one chart, then the raw anomaly scores on another and the number of predicted cells / 40 on another.

@sheiser1
I have just tested with w=93, but the same problem.
But I find your idea for plotting number of predicted cells is coolā€¦ iā€˜d try it

In my test, I have some combination sin, cos and different frequency and amplitudes too

Interesting, thanks for giving it a shot! Please do post what you find, Iā€™m curious to learn from this case!

@sheiser1 here you can see my plot of number of predicted cells for n=1024, w=21 (top plot)
In the 2nd plot, my TM is the learning mode (blue line is horizonal) and in inference mode ( blue sinus curve predicted value).
Here is very interesting, that TM can not learn one bucket (number of predicted cells = 0), the prediction behaviour stays the same after changing into inference mode.
I think, for sequence learning, one unpredictable bucket can be the 1st or the last of sequence depending on when you start learningā€¦

1 Like