As part of trying to understand the confidence/probability value returned along with a prediction, I have come across a curious situation that I’m wondering if anyone can help explain.
After a suitable run-in period, this generates the 1-step ahead prediction (upper axis) at the specified confidence level (lower axis) shown in the image below.
I don’t understand why the confidence is consistently high for predictions that are rising (from below 0.5 towards 1) and then going over the top of the sine wave, where as the confidence for the predictions that are falling (from -0.5 towards -1) and then going around the bottom of the sine wave can be low.
Why does the CLA have more difficulty making a prediction in one region than another when the symmetry of the data would indicate (at least naively) that there is no difference between the regions?
The sine wave is perfect periodic input, but every time it sees a cycle, the TM cannot distinguish between seeing one repetition of the pattern, or two, or three or 1000. It never knows to close the period and see that it’s just a simple pattern repeating. So the confidence will never settle.
Try reseting the TM once per period and see what the confidence looks like. (I assume the confidence is some form of anomaly likelihood?)
That would make sense if the confidence was consistent (either always good or always bad or always jittery).
However I don’t see how it explains the observed behavior.
If it were true, then at the start of the pattern (from say midnight to 6am, i.e. the rise of the sine at the beginning of the day) it shouldn’t distinguish between 1, or 2 or 1000 patterns either, and the confidence in that region would be bad too. Yet it isn’t
I’m still trying to better understand why the prediction confidence is not symmetric with the above sine-wave example.
The link given by @marty1885 is very interesting in and of itself, but it’s not clear that the sine-wave example fits into the problem described there. Specifically, the sine-wave example, since it is based on the hotgym example, does use time encoding, and as with hotgym the data is tagged with time of day and day of week.
Unfortunately, resetting the TM every period doesn’t make sense either. Although it would be possible with the sine-wave example it doesn’t for the practical problem that this would feed into.
But even if those things applied, they don’t explain why the prediction confidence would be asymmetric. (They might explain why it is good or bad, but not why it’s asymmetric.)
I’m still looking for suggestions on why the asymmetry exists.
There are two things being encoded in your example: sine & time. There is one sine wave per time period (day). What if you increased it to be several waves per period? Or 1.5 waves per time period? I would be interested to see how the prediction confidence responded over time to the dual resolving semantics. I’m not sure if it would answer any questions, however, just curious.
The only thing that comes to my mind is perhaps the encodings for the values not being granular enough. Just intuitively, I would have expected the confidence to be lowest around the middle of the upward slope and middle of the downward slope, and for confidence to be highest near the peaks and valleys. Your graph is more what I would intuitively expect if there were a lot of overlap in the encodings between (-1, 0.5), and less overlap between those encodings and the ones between (0.5, 1). I hadn’t considered @rhyolight’s observation about time also being part of the encoding, so I’ll ponder on it some more, taking that into account.
It’s tempting to still see patterns in the confidence for these images. In particular the predictions seem to have better confidence just after the peaks/troughs and worse near 0. This is consistent with there being a higher density of points near the peak/troughs than near 0.
They’re fresh off the press, so I haven’t decided how to interpret them properly yet.
Thanks for running this experiment, Phil. How many data points have these models seen at the point where you graphed them?
It might be helpful to plot the anomaly scores and likelihoods in addition to (or instead of) your confidence value. I’m not sure what extra calculations you are doing, but the raw anomaly score can sometimes be enlightening in situations like this.
I hate to add yet another experiment parameter to this, but you might also add a jitter to the sine wave (random noise), which makes it more natural. I suspect the system might perform a bit better with added noise like this, since it forces it to generalize rather than memorize the same locked-in pattern over and over.
All the images are for a model that has seen about 16000 data points, corresponding to about 160 sine wave cycles.
I was deliberately avoiding adding noise to remove randomness from the problem (although I have wondered whether some randomness would improve the learning.)
We see stuff like this when doing integer math on MPUs. I don’t know the code and am totally uninterested in digging through nested OOP crap but this problem looks strangely familiar.
Problems with signed vs absolute values come to mind.
Where this gets to be a real problem is comparisons.
Another common and related problem is shifting frames of reference during calculations. IE: picking the wrong intermediate value for further calculations.
Two other perennial problems are a) getting scaling wrong. Note that the error seems to vary with the absolute value of the sine wave. If the calculation was using this input to establish scaling of the value the error would look something like what is shown. b) This is based on a time varying value; using the “before” in one part of the calculation and the “after” in a different part can be maddenly difficult to find. This is an insidious version of the wrong frame problem.
Adding two cents that prolly do NOT apply to this problem.
Picking the wrong stride for the measurement can produce “chunky” calculations. The delta between readings is the dT and dV part of your slope calculation. If this is either aliasing with the samples taken of the raw date (unlikely in this case) or too small for the representation (integer math) can be problems.
Yes, confidence values are the ones that are abnormally fluctating, for no apparent reason.
Yes, the first plot shows prediction (in blue), confidence (in orange) and anomaly scores (in red). Anomaly scores is the red line across the center, mostly 0 with tiny blips here and there.