I came across the ECG dataset on Kaggle and thought that I might be able to approach this as a anomaly detection problem. The dataset consist of data in 5 categories. 1 for normal ECG and 4 abnormal; but since I’m approaching it as detecting anomaly, I’ll be training on the “normal” set of data only and using the anomaly score to decide weather a given ECG is “normal” or not.
The setup is simple. Read data from the CSV file (
ptbdb_normal.csv), feed them into a scaler encoder then send the SDR into a Temporal Memory to learn the pattern of a normal heartbeat. Then load data from a test set (
ptbdb_abnormal.csv and a parts of
ptbdb_normal.csv that is not used during training). Run each ECG record trough the TM and find how many anomaly it contains. Then all record containing large amount of anomalies are considered abnormal.
The results are… interesting. I can get at most 65% of all abnormal record recolonized as abnormal(ie. true positive) while 10% of normal record recolonized as abnormal(false positive). I consider this a good result as this is a classification problem; not anomaly detection. However seems that HTM can be very sensitive to hyper-parameters. Set the
PermanenceIncrement off by 0.05 and the true positive rate drops by 20%. Same thing setting
ConnectedPermanence too low/high. It can be really frustrating tuning the hyper parameters.
And there is the result
Source code: https://github.com/marty1885/heartbeat-htm/tree/master
(Please forgive for being lazy for not writing CMake/build script )
Happy HTM hacking.
This is actually my presentation for a introductory class to ML in my collage. But I decided to also share my experiences and results here.