Advice For a New MS Student

anomaly-detection
research
projects

#1

Hello HTM community, this is my first ever post!

I’ve spent the past few months learning about HTM and I have a passion for brain-inspired AI computing, like the rest of you. I’ve read Jeff’s book, watched the instructional videos, installed NuPIC, etc. and I’m very humbled and impressed by the great work you guys are doing.

I want to get involved but I’m unsure where to begin. I’m planning on completing a MS thesis in the next academic year and I’d love to work on HTM but what I’m lacking the most is a concrete problem to tackle/path to follow and I’m unsure where to look for inspiration. I have experience, knowledge and interest in deep learning, data science and computer vision with a background in computational mathematics and computer science.

I was working at a computer security company this past summer as a data scientist in silicon valley and I think HTM has great potential in security applications. An initial idea I had was to adapt HTM to build a brain that monitors temporal network traffic data and ultimately works to detect intrusions as a security measure. I imagine this would be something like an anomaly detection use case. Because of the temporal pattern detecting, noise tolerant, online learning properties of HTM I think it might be able to outperform traditional DL methods. I’m unsure if this is even feasible or worth exploring, however.

Any advice would be greatly appreciated.

Best,
Brody


#2

If you have DL background and if you like a challenge, find a problem that fits good to both HTM and DL (this is already hard). Do a comparative study both functionally and in terms of results. It may sound simple and there were a couple of attempts. However as far as I am aware none is comprehensive or complete enough to satisfy the ML crowd as I’ve read a couple. Even if you weren’t completely successful, we would earn a community member who conducts these experiments so that we are more informed about what exactly we are missing in practice. Or what exactly should we underline to communicate HTM. We all have our own answers here on this matter but very little is backed by actual experiments.


#3

Thanks for your response!

That sounds like a good idea…strong comparative experimental evidence is always important in any scientific endeavor. I could compare HTM with deep LSTMs and other DL methodology on a variety of time series anomaly detection datasets and problems perhaps.

The other thing that I was thinking was designing a parallelized implementation of the spatial pooler and/or temporal memory algorithm. As far as I’m aware there doesn’t seem to be much support to utilize GPUs or multicore computers. Given the independent nature of each cell’s computation you’d think running times could be extensively reduced with GPUs…


#4

I second the idea of solid comparisons.

GPU usage has significantly improved runtime of the spatial pooler in my own implementations (using Tensorflow). The temporal memory is a trickier beast, since a naive approach using dense matrices uses a prohibitively large amount of RAM and can’t fit into system memory, let alone VRAM. A more intelligent approach using sparse matrices is possible, but gets less benefit from the GPU due to the scattered reads and/or writes. But I encourage you to see what you can do, there’s definite potential.


#5

I did a comparative study on HTM and LSTM (not deep LSTM, just 2 layers). The LSTM code (using keras) is here:

My HTM code is here:

I’ll PM you my paper if you like. It’s been submitted to IEEE Transactions on Intelligent Transportation Systems. The main results were that LSTM with online learning (periodically resetting the memory cells) outperformed HTM in most measures. HTM came closer when the distributions in the data changed over time though, it was better at LSTM at doing this.


#6

Hey @Jonathan_Mackenzie, thanks for sharing your work! Oh and I hope it hasn’t been published yet (sounds wrong I know :slight_smile: ). I would be sad if I somehow missed referencing your work. I looked hard for HTM vs LSTM and the closest was done by Numenta through anomaly prediction last time I checked.


#7

Can you provide a rough estimate for the performance increase if you are comfortable with it?


#8

Interesting! I’d love to read more about your experiments. It sounds like we’ll be doing similar things. I’m currently exploring potential time series, anomaly detection datasets and problems to use as benchmarks.

Periodically resetting the memory cells sounds like pseudo-online learning or I guess simulated online learning if you get what I mean. If I understand you correctly it’s effectively just erasing and creating a new network periodically to reflect new data versus adapting an old model to new data in real time along the lines of what HTM does. It seems like the problem itself is easier for LSTMs in that case. It’s still worth comparing though, for sure, and I’m interested in your results.


#9

#10

Lots of data! Thank you!


#11

Sure! Depending on the size (larger is better), I get up to a 10x speedup for the forward pass to compute spatial pooler overlap. With 2000 columns and 7056 input bits the speedup is around 6x on my system, but in my experiments with each cell having its own proximal connections (so 20000x7056 matrices) I get closer to 10x. And in my experiments with a convolutional spatial pooler it’s greater still (12x+ depending on size).

With smaller input sizes (84x84 is reasonably large) it’s hard to get as much of a speedup since transfer overhead starts to dominate. But if you can justify running multiple datapoints through in a batch (parallel sequences), you can get up around 10x even with small inputs and column/cell counts.

Unfortunately the selection of winning columns, the learning step, and the temporal memory algorithm are much harder to speed up on the GPU (and all three are fairly awkward to vectorize with Tensorflow) so I haven’t had much speedup on those parts in my implementations. Sometimes they’re even slower, so I tend to restrict those steps to CPU only.


#12

We got rejected initially by ACM Transactions on Intelligent Systems and Technology, so this is the second attempt at publishing. Still haven’t heard back from IEEE TITS though. I can PM you the paper if you like, the downside is that you can’t reference it.