It’s been a week since I started working on HTM’s, I am a little lost. So we are working on anomaly detection for autonomous cars and we are recording the data like position, velocity, acceleration and predecessor distance fields.
Max 5 fields with a timestamp.
Here are my questions
Q1. Is it better to work with individual models or create a single model with multiple inputs, we are not interested in which field cause the anomaly but would be really great if the model can take advantage of the correlation between the fields. (I was also not able to find any implementations or guide for the multifield model, so any direction will be helpful)
Q2. I want to create anomaly windows, but I can’t figure out the best way to do that?
I am working with python on my local with CSV files btw
Hi @thos1996, welcome!
With just 5 fields plus timestamp either way could work.
The multi-field model has the advantage of potentially finding relationships between the fields, and the single-field model has the advantage of showing different activity for each field.
I generally like the approach of multi single-field models, where an anomaly is raised when a certain % of the fields are being anomalous at the same time. I’d say it’s worth trying both tho.
I’d create a data structure that stores the anomaly scores/likelihoods of each field for all time steps thus far (or the last n time steps). Then craft a sliding window logic which uses that info in some way. A simple baseline could just be: ‘any model is anomalous at all times where its last n anomaly likelihood vales average over 0.9’.
Here are a couple threads about handling multiple input fields:
Best of luck!
This is the GitHub link:- GitHub - tushar-1996/Anamoly-Detection-using-HTM
I have made some changes to the hotgym example. I am able to encode the data but I am not able to figure out what to use instead of “consumption” in line 195,199 because I have multiple variables now.
I want to do unsupervised learning so it is okay if there is no other way to generate anomaly windows.
Also timestamps are not included because the data I have, has 40 values per second so I don’t know how to encode it.
Any suggests on improving the code is welcome, right now I am doing multi-field model.
‘Consumption’ is the metric being predicted in the hot gym example, so in your case that would be replaced by acceleration or velocity or something – but you don’t actually need the ‘Predictor’ object at all for anomaly detection!
You can get the anomaly score and thus anomaly likelihood right from the TM object. The classifier you have (‘Predictor’ object) is only needed to get prediction output in the raw data unit of measurement – but that isn’t actually used for anomaly detection with HTM, since its based on anomaly scores.
So basically you can just skip the ‘Predictor’, and just get anomaly score from the TM. You can do this by comparing the predicted active cells from TM to the total active cells. The proportion of active cells which were NOT predicted is the anomaly score.
Then craft a sliding window logic which uses that info in some way. A simple baseline could just be: ‘any model is anomalous at all times where its last n anomaly likelihood vales average over 0.9’.