Hello,
I am currently working on a BA thesis with the topic “Hierarchical temporal memory for in-car network anomaly detection”. I am nearing the end of my work and I have noticed a few things so far, some of which I can’t answer by simply investigating the issue. So I hoped that someone here can help me out.
I am using HTM-core 2.1.15 with python 3.7 and based my program off the hotgym example. I am analyzing 4 metrics (jitter, avg. gap between packets, bandwidth and frequency for 10ms timeframes) encoded into the same SDR and it works pretty well on our TSSDN network.
One of my main confusions though is that the de-/serialization functions for loading and storing TM/SP (loadFromFile/saveToFile) produce very different results depending on if the model is trained, then stored and loaded again (with learning turned off afterwards) or simply running in online unsupervised learning mode the whole time.
After having done many tests it seems like the TM is the main culprit as it needs a few hundred iterations so that it doesn’t just oscillate in anomaly score the whole time. Another observation is that the anomaly score is waaay less sensitive to the data after loading. Let me show you a few pictures to explain.
First an image of anomaly score oscillating when TM is reloaded but not learning for enough iterations before learn=False (forgive me for the many different values shadowing each other but you should get the gist):
now for the live-reloaded comparison:
This is normal learning with 3 anomalies (DoS attacks) in the data.
This is the reloaded TM/SP fed with the same data and TM gets 900 startup iterations where learn=True
I have done other tests where I learned on a clean dataset with no anomalies and reloaded + analyzed the dataset including anomalies afterwards. It ran on different parameters, was showing VERY much noise on the learning set while showing adequate results after reloading. Same here is that it somehow gets desensitized after loading:
Learning on clean dataset (green values are anomaly raw score, used it for testing/comparing to tm.anomaly)
Analyzing anomalous data after reloading
I am not sure if the parameters would be any help as it is always the same difference in sensitivity it seems and my main question is: Why is there any difference at all, is the serialization working properly or is there something going missing in the process?
Ofc for completion purposes I will provide you the lines that load/store the TM/SP:
storing:
self.sp.saveToFile(_TEST_DIR + '/sp_' + timestring + '.tmp') self.tm.saveToFile(_TEST_DIR + '/tm_' + timestring + '.tmp')
loading:
sp_tmp = SpatialPooler() SpatialPooler.loadFromFile(sp_tmp, _TEST_DIR + '/sp_' + self.parameters["application"]["model"] + '.tmp')
tm_tmp = TemporalMemory() TemporalMemory.loadFromFile(tm_tmp, _TEST_DIR + '/tm_' + self.parameters["application"]["model"] + '.tmp')`