TemporalMemory for prediction

Interesting! How many time steps are in each iteration of the sequence? Say its 30, I’d show the model 30 repeats of it and make the same plot. Hopefully those number of predicted cells should settle down. Would you crank up the repeats and generate more of the same charts?

No! There is a xOffset on the bottom left, do that the number of steps of learning more than 440 before inference.
Tomorrow’s I‘d like to put every 3 input data into encoder, so that input data will be distinct, Let‘s see the behavior of TM!

xOffset 403 means it learns from 403 data points? Its known that the TM can learn any sequence, but the longer the sequence the more repetitions it needs to learn.

What I’d do is leave learning on for at least 900 time steps, so it can see the pattern of length 30 for 30 repetitions, make the same plots and see if the number of predicted cells changes at all.

@sheiser1 your hypothese seems to be correct: N repetitions/pattern for the pattern set of length N Thanks for your recommend.

For showing, I put snapshot of my plots. I’d like to say that TM learn sequences very quickly (in my test after 2 times of seeing a complete sequence), but for making a stable prediction it needs to see N times.


1 Like

@sheiser1 as you wish in the last post, you are interested in TM prediction for a complex wave. Here is the result (top plot for number of predicted cells; bottom plot: red for input, and blue for one-step-predicted value)

1 Like

I observed the same experiments but by different runs. I found that the TM behavior is totaly different, sometimes very quickly learn, and sometimes very slowly and it depends on how many buckets from input space come to encode: more buckets come, smarter learning

I try to study more, but currently really do not understand this phenomen with continous data. I think no problem with discret fixed sequences.

Does anyone have any idea?

1 Like

By this do you mean that more granular encodings make for faster TM learning by helping to distinguish more finely between inputs?

@sheiser1 I can explain as follows:
I use the same parameters in all experiments with sinus wave.
But in a run, after 100 time steps, my encoder outputs max 12 buckets and no any more bucket further for very long time steps.
In this case, TM can not predict in each 20 time steps.
But in some runs, my encoder provides until 16 different buckets within 200 time steps. In those cases, after 300 time steps TM always can predict some cells. It works perfect!

I found this phenomena today and have not time left to understand why?

I don’t quite follow, what is it exactly that’s changing? Is it the number of buckets contained in the encoder that’s 16 vs 12?

the buckets inputed into SP-TM like
after 10 time steps:
256 268 302 357 425 501 502 578 646 701 735 747
after 50 time steps:
256 268 302 303 357 425 501 502 578 646 701 735 747
after 100 time steps:
256 268 302 303 357 425 501 502 578 646 700 701 735 747
after 500 time steps:
256 268 302 303 357 425 501 502 577 578 646 700 701 735 747
after 900 time steps:
256 268 302 303 357 425 426 501 502 577 578 646 700 701 735 747

and after 500 time steps, TM can predict well…

I am confused about this conversation, so let’s try to talk about and define some terms. This will ensure we are talking about the same things.

@thanh-binh.to What you are calling “buckets” may not be the things we are talking about. You seem to be using them as indices of the input space, but that’s not a bucket. A bucket is a set of indices that can represent many values.

Do these numbers represent some index to a bucket you’re tracking? Or are they actually directly indexed to the input space?

1 Like

@rhyolight exactly, it is bucket index, what is used in Scalar encoder

1 Like

In real application for continuous learning, two scalar values, very close together, can belong two bucket indices or to one bucket index. In my observation, at any time instance if they belong two difference bucket indices, then TM is able to predict more cells. Currently I do not understand it and want to discuss with you all.

1 Like

If they belong to two buckets, does that mean the encoder creates an encoding with twice as many bits? If the TM got that, it would have more predictions because it would be predicting two or more sequences at once.

1 Like

@rhyolight no. encoder have constant number of active bits. Imagine, that the encoder resolutions is 0.5, then the input values 0.4994 and 0.5002 belong to 2 different bucket indices, i say 1 and 2. For my sinus wave application, the sinus wave is sampled at T, so that the sampling values for longtime still in the bucket index 1, because it’s value in [0, 0.5). Until a time step, it’s value is slightly over 0.5, I say 0.5002, encoder generate new bucket index and so on.

1 Like

This seems really important. If I understand right the encoder is too coarse, if its grouping too many different raw inputs to the same buckets. That effectively compresses out elements of the sequences.

So the TM may be essentially seeing:

‘A,A,A,A,A,A’
when there exists a pattern at a lower granularity, like
‘A1,A2,A3,A1,A2,A3’

If the encoding space is more fine grained it’ll make more buckets and create more chances for new subsequences to emerge. This will also open the door for more noise and obscuring of longer term patterns (losing the forest for the trees basically).

1 Like