I am using HTM to implement data anomaly detection
According to my understanding:
HTM continuously learns the data pattern and predicts the next input, and gets anomalous scores by measuring the deviation of the model’s predicted input from the actual input.
and I think:
When the data is abnormal, the network will still update the synaptic connection.
The idea comes from:
The given data is a sequence anomaly, but the HTM test results show that some points in this sequence are abnormal and the other points are normal. I guess the above situation is caused.
If so, whether the point considered abnormal can be fed back to the HTM model to restore the synaptic connection to the value of the previous step.
Any help will be less appreciated.
Yes, you are correct. But in normal operation, anomalies should occur very rarely. So much so that HTM will have time to recover from the mis-learning by updating synaptic strengths in the following sequences. Learning even anomaly happens also assists HTM’s ability to adapt to new patterns and distribution shifts gradually.
And that leads to
That is not doable in the NuPIC/HTM.core as the design doesn’t allow it. But other HTM implementation (like mine) does.
Thank you for your reply, I understand what you mean.
However, I still have a confusion. The following is my original data. The black point is the original point, the red point is the HTM anomaly detection result, and the green point is the abnormal data label.
Although the above figure looks good, the HTM algorithm considers some abnormal points (larger values, empirical knowledge) to be normal, as shown in the following figure.
The traditional method is generally based on statistics, and the rise of the gradient beyond the threshold is considered to be the beginning of the abnormality, and continues until the gradient falls below the threshold to conclude the abnormality.
The HTM algorithm considers that some points are normal during this period of time, and some points are abnormal.
I am not quite sure why this is happening.
Can you suggest a possible direction?
I can’t tell for sure without your code. But it looks like HTM (to be specific, the Spatial Pooler) having just started learning and have no idea what’s going on. Remember HTM is doing realtime anomaly detection. It can’t see into the future and determine weather the an anomaly is present. It has to rely on past data.
I’m currently using the Hot Gym anomaly example script as a template (https://github.com/numenta/nupic/tree/master/examples/opf/clients/hotgym/anomaly) and just made some minor changes to fit my data set.
My data set is similar to the example data set, with about 50,000. The data shown in the above table is about 20,000,so I think HTM has been learning for a long time.
So I am still confused why HTM thinks the yellow point in the above table is normal.
My guess is that it is due to the anomalies you highlighted are at the very beginning of the data. Where the SP haven’t been trained enough.