We’ve been playing with the anomaly detection application of HTMPrediction model to look for anomalies in sinusoid signals. I used swarming to create model_params.py, with 1 step prediction based on a perfect sine input with 50 samples per period.
This model performance is almost perfect, after manually changing ‘clParams’.‘alpha’ from : 0.09663 to 0.001 or smaller. It can correctly detect glitches added to the signal and can predict the next sample accurately, biggest errors are at the peaks and troughs.
However the model is very sensitive to the number of samples per period. With less than 30 or more than 200 samples per period the model can’t predict the sinusoid. Regenerating model_params.py with different number of samples per period doesn’t seem to make any difference or using a longer/shorter input, or varying alpha.
Does the HTM model have an innate sequence length (memory) that it works best with? We’ve tried doubling many of the tmParams and spParams without any improvement in results.
We’ve also found for the case that works well, a better metric of anomalies is to calculate a simple error value = abs(best prediction - actual value)
Instead of using the anomaly_score output. Which has high values at the start during learning but also has spurious large values at unexpected moments.
Its my first time posting on this forum so forgive me if this is not the correct location.
Is there a way to share some snapshots etc…?
Just spitballing without plugging in actual numbers - it may not be the length of the sequence.
The size of the step may be too large or small for your SDR-ifier to capture the differences between samples.
Also, are your step sizes an integer multiple of your waveform?
Thanks for your reply. I generated some sinusoids from python where the step sizes are an integer multiple of the waveform and I’ve tried other sinusoids that aren’t. The results look largely the same.
What determines the size of the SDR-ifier? I think in my case it’s 2048 based on ‘columnCount’ and ‘inputWidth’, I’ve tried modifying these values too…
working well with 50 samples per period:
Not working well, 1000 samples per period (same model)
gif sequence showing results for different samples per period.
average error versus samples per period
I don’t know the inner workings of that code so I am out of suggestions.
My guess there-s too high an overlap between samples at low frequencies and too low at high frequency.
Low overlap is relatively easy to understand - the next cycle is a high chance that samples fall somewhere in between with similar previous values.
High overlap results in very long sequences to memorize for a cycle and it might exceed capacity and struggle to keep track.
I guess you can test the two assumptions above by adjusting the encoder’s output sparsity
One way to have this done as an internal feedback would be cool.
I mean have the algorithm “seek” for lowest anomaly by slowly controlling its upstream SDR source (encoder, spatial pooler, etc.) to either increase or decrease sparsity.
A cycle/rhythm detector as discussed here could be useful to figure out in which “direction” should the sparsity be adjusted.
Unless I miss my guess, the problem with most attempts to model continuously varying functions with the temporal memory algorithm is that there is currently no proper way to encode the rate at which the independent and dependent variables are changing together. That is to say, the sequence memory is storing discrete transitions of both variables, let’s say x and y, but is not actually creating a model of that transition.
In order to generalize to variations in step size, the network would have to be exposed to multiple data sets with varying step sizes, and it would have to store these as unique sequences if the SDR encodings of either value changes from one transition sequence to the next. Sure, the SDR overlap may be able to generalize a little bit and make a decent guess at the next value, but it is in effect just playing back previously recorded sequences. If there is a significant deviation in sampling rate, or the encoder was not sufficiently sensitive to resolve the subtle change in input, then it will not be able to generalize to the new data.
What I would propose is to find a way to encode not just the discrete samples, but also their derivatives. I would even go so far as to say that the encoder should be dedicating more bits to storing the changes in the sensed variables (delta-coding), than the actual variables themselves. You will also need to anchor your representation to absolute values, but to make accurate predictions, especially when the sampling rate may change, you need to encode how the values are changing from one sample to the next.
In theory, this is where a grid cell module would come in handy. The GCM should be able to path integrate a continuously varying function by occasionally anchoring to an absolute value, but frequently updating that value with a model of the instantaneous changes.
I wonder how some kind of reservoir between (or besides) encoder and TM would bring that … insight in recent past.
Reservoir can be a bunch of randomly connected neurons without learning or boosts.
Thank you for your insightful replies. You linked to an interesting discussion on detecting long temporal patterns - cycles and rhythms. I get the feeling that using the temporal memory algorithm to model a long period sinusoid is not currently a great fit.
I don’t have enough knowledge of the workings of the HTM to be able to test some of the modifications you proposed. However I can adapt the sampling rate of my input signal to try to limit it to the region that works well…