Learning from real world data (ECG Heartbeat Categorization) as anomaly detection using HTM


#1

I came across the ECG dataset on Kaggle and thought that I might be able to approach this as a anomaly detection problem. The dataset consist of data in 5 categories. 1 for normal ECG and 4 abnormal; but since I’m approaching it as detecting anomaly, I’ll be training on the “normal” set of data only and using the anomaly score to decide weather a given ECG is “normal” or not.

The setup is simple. Read data from the CSV file (ptbdb_normal.csv), feed them into a scaler encoder then send the SDR into a Temporal Memory to learn the pattern of a normal heartbeat. Then load data from a test set (ptbdb_abnormal.csv and a parts of ptbdb_normal.csv that is not used during training). Run each ECG record trough the TM and find how many anomaly it contains. Then all record containing large amount of anomalies are considered abnormal.

The results are… interesting. I can get at most 65% of all abnormal record recolonized as abnormal(ie. true positive) while 10% of normal record recolonized as abnormal(false positive). I consider this a good result as this is a classification problem; not anomaly detection. However seems that HTM can be very sensitive to hyper-parameters. Set the PermanenceIncrement off by 0.05 and the true positive rate drops by 20%. Same thing setting ConnectedPermanence too low/high. It can be really frustrating tuning the hyper parameters.

Result

And there is the result
%E5%9C%96%E7%89%87

Source code: https://github.com/marty1885/heartbeat-htm/tree/master
(Please forgive for being lazy for not writing CMake/build script :stuck_out_tongue:)

Happy HTM hacking.

This is actually my presentation for a introductory class to ML in my collage. But I decided to also share my experiences and results here.


#2

This is great! Thanks for sharing!

I suggest you use getScalarMetricWithTimeOfDayAnomalyParams() like NAB does:

These are the best-tuned params we have found for anomaly detection on generic streaming scalar data. (Although the “time of day” part will not help you and you can remove that encoder entirely).


#3

Thanks! I tried the values from getScalarMetricWithTimeOfDayAnomalyParams. But they don’t work as well as the ones I ended up with.

Also, I tired to visualize what HTM is predicting. Seems that sometimes HTM is predicting nothing (The gap on the left side of the graph) (The blue line is the value feeded to HTM, orange dots are HTM’s predictions, the red line is the anomaly score) (parameter: SDR length = 512, density = 4.6%)
Is there any way to fix this?


#4

If you want to optimise hyperparameters for your task/dataset, I wrote a script to do it here using hyperopt/hyperas: https://github.com/JonnoFTW/htm-models-adelaide/blob/master/engine/vs_model/optimize_htm.py

The script should be simple to modify for your use case. You can even do it in a distributed fashion if you have access to many different machines.


Use Cases Of HTM Theory