Explainability and HTM

Hello,

I’m interested in using anomaly detection with multivariate time series. I read several papers that treated the issue of handling multivariate time series by using one HTM per input variable. This approach is fine as long as you consider that variables are independant. However, in some cases, variables in time series have dependency relationships that should be taken into account to detect anomalies and reduce the likelihood of false negative and false positive. However, to my knowledge, using one HTM to detect anomalies in multivariate data implies a loss of information about the anomaly detected. It seems indeed more difficult to retrieve the variable responsible for the anomaly detection. We could also be interested in understanding why the HTM has detected this variable as an anomaly.
That’s why, I was wondering if an “explainability” module for HTM and anomaly detection module applied to multivariate time series anomaly detection is feasible. When an anomaly is detected, this module should be able to analyze the HTM in order to retrieve the anomalous variable (by detecting which neurons where activated and not predicted and linking the unexpectedly activated neurons to the inputs), detect what was the expected value of the anomalous variable (by analyzing predicted neurons) and giving the dependencies between this variable and other variables in the time series.

What do you think about it ? Is it feasible ? Does this kind of solution already exists ?

1 Like

I think someone have done it in the early days of HTM and the code lives in someone’s Java implementation. But it never got back into NuPIC. Anyway…

Assuming you are using the standard SP + TM archicture. While the predicted SDR is p P and the real input value sent to the TM is R. You can calculate the not predicted part N = (not P) and R. Then using N as the SP cell activity, walk backwards from every active cell, going through every synapse, back into the input space. Then you’ll get a (possibly rough) map of what the network is expecting to see as the input.
Then from there, you can try to explain what’s going wrong.

2 Likes

I take the model per variable approach, with an anomaly likelihood threshold for each. When enough models breach the threshold at one time the script reports which fields’ models were anomalous. So this addresses the issue of multivariate processing, though not accounting for the between-variable relationships.

4 Likes