I’ve done a lot of work on anomaly detection lately, and it’s been mostly on finite data sets trying to find transient phenomenon. So in that sense, I haven’t been using the continuous learning capability and have been treating HTM just like the more traditional training set & test set paradigm. Because this is not necessarily how HTM was envisioned for anomaly detection, I’ve had to go deep into the internals to understand things and make them work for this case.
I have several comments and questions that I’d like to bring up during the discussion today:
(1) There’s parts of the HTM code that appear like they are no longer used. These include the Anomaly Likelihood Region, and the Anomaly Auto-Classifier. The latter appears to be in operation but not really used in any of the examples, preferring to use AnomalyLikelihoodHelper instead. Is this useful for anything?
I found that I can disable it’s operation by doing the following:
classifierRegion = model._getAnomalyClassifier()
(2) Can you do swarming for anomaly detection? The instrumentation appears to be broken for this use case of TemporalAnomaly since it tries to access the above features that don’t appear to work correctly. In practice for new data, I’ve swarmed the TemporalMultiStep inference type and then converted it to TemporalAnomaly later. I also add the predictionSteps (1,2,3,4,5) which seems to give a more robust solution.
(3) I am also doing multi-variable anomaly detection. Now, I don’t use time-of-day or anything like that. All of my data is in time starting from t=0 on the order of seconds so that kind of contextual info is not useful to me. Instead, I have to add multi-channel data to give the contextual clues for anomaly detection as well as synchronize the starting point so the learned sequences can be comparable across data sets.
Are there any tips and best practices for multi-variable anomaly detection? I remember watching a video by @subutai that mentioned these things verbally. Such things are avoiding correlated signals, but I think this might be desirable in an anomaly detection application because you want to know when things differ. Further, using inputs as precursors to another input change the next step. I’ve found this is useful in practice to trigger learning and comparing a temporal sequence.
Any other advice or guidance on this issue?
(4) In the absence of correlated signals, you can always create your own start signal that is a unit-step function into a ScalarEncoder. So long as this it changes one step before the beginning of the sequence, it will learn that sequence. Otherwise, in the absence of a synchronized start signal, the anomaly score always go high when a new sequence starts without any context.
(5) In my training set, I select only a handful of normal examples (around 8). I then train them each about 20 times in sequence, non-interleaving. At the end of each training run, I call:
So that the end of a sequence does not get tied to the beginning of a sequence.
(6) How do I learn long sequences effectively? How can I learn them robustly so that there are multiple forks, joins, and paths that can be considered non-anomalous? Since this is non-periodic data, I’ve had to stretch the modelParams to try and make this happen.
- I have manually adjusted these parameters and they seem to help:
Although I’m not quite clear on the purpose of these parameters and how they fit into the temporal sequence learning. This seems to work. Can you explain how these work so that I might be able to optimize them even better?
I’ve found that pamLength of 1 doesn’t learn very well. I’ve also found that increasing pamLength to 100 makes it learn very quickly but it is also very brittle in generalization.
The ‘anomalyParams’ I don’t think will help since they are part of the anomalyClassifier which is not used. Or maybe I am mistaken?
(7) Selecting the scale of the phenomenon we’re interested in. Basically you have to choose your encoders to decide how sensitive you want them to be. I choose a larger ‘w’ when you want to make the learning more forgiving in variation of the values. However, it won’t be able to find high-frequency small-scale anomalous behavior.
if you make your encoders very sensitive with small ‘w’, then the anomaly detection becomes extremely sensitive to any deviation from the training examples. In fact, it’s likely that for every N training example, it learned N different sequences that have no overlap between them due to the sensitivity of the encoders. This would necessitate a lot more training examples and a lot more training to produce an useful anomaly detector.
Any advice on encoder selection for anomaly detection? I did try DeltaEncoders as well but discovered that there was an AdaptiveScalarEncoder underneath which made it difficult to understand it’s impact when examining the SDRs. Instead, I just computed the difference between steps and input the value into another ScalarEncoder.
Of course, the first difference can be quite noisy and have high variance between runs. However, it can be good if you create a filter for the magnitude of differences you want to look for and use that as an input.
(8) Training the same fixed input over and over ad-infinitum should eventually completely learn the sequence and the anomaly score should flat-line to zero. However, I’ve discovered situations where this is not the case. Anomaly scores persist and are constant but the algorithm never learns to flatten them out. This is with continuous learning on and setting model.resetSequenceStates() at end of each run. Maybe there’s something about this latter function that is preventing it from continually learning?
Anyway, this is should be good for conversation on the livecast today.