HTM anomaly detection model too large for streaming analysis

I’m an intern at a company working on anomaly detection system. I’ve been playing around with HTM model for a while now and used it to detect anomalies on our telemetry data. I’m trying to extend the detection system to nearly thousand other metrics with individual models for each one of them. However, I noticed the model size was around nearly 8-10 MB when i saved the HTMPredictionModelProto into a file. I was planning to save models for each metric into a cassandra backend and continuously learn for new data point generated every minute. But, it wouldn’t be practical to have nearly 10MB model size for each metric (almost 80k). I applied compression algorithms on top of the packed proto bytes to cut down the size to nearly 4-5 MB which is still large for a database like cassandra. I’m wondering if there’s another approach that i could leverage to retain the state of a learned model and apply it on yet to be seen data points.
Appreciate for your time and help!

1 Like

You’re asking if there’s a way to shrink the model further. I’ll see if others can answer that, but I’m going to examine your overall project and its assumptions, which seems to be a systems engineering problem, rather than something specific to HTM?

You’re collecting 80k metrics a minute? I worked in an environment like that a while ago… Logging like that is already taking quite a bit of storage and bandwidth. From an application support perspective, it may be more practical to aim smaller, targeting the critical bits of telemetry first. Not everything that CAN be captured is worth capturing.

Let’s pretend you work at a financial institution, and you’re trying to prioritize what data markers to store, because that cost of storing, backing up, and maintaining all those metrics is a PITA. Sometimes programmers can forget that all data needs a cradle-to-grave policy at all… In our pretend scenario, you could theoretically have a cheaper solution by running 80k models across multiple servers (seeing a middle of the road server now has 256 cores and ~512GB RAM), saving out only anomalies to disk, for human or other systems to analyze. Depending on how much compute time is required to run a pass over your data per model (how many ms required + loading/saving time across network), a few 10’s or hundred servers might be able to continuously, in round-robin fashion (you’d distribute your models across your hosts), detect+update each model once a minute. This would be dependent on how much compute your model is requiring.

Are you using C++ or Python? Are you able to see the size of the model while it is loaded in memory (it may be slightly larger when saved out)? You could try just keeping each model in memory, in its own thread, which would reduce the save/load time, while only occasionally dumping out to disk (don’t save after every single update). Assuming that each model, even in memory, is 10MB, overall size of that is 781GB (which could easily be spread across many servers). At that point, the real question is how much compute is required per-model, so that you could distribute the load appropriately.

This is assuming all the metrics you’re tracking actually matter, and that a majority of it isn’t just noise. Like I said, not every metric that can be recorded, needs to be recorded.

Especially on the large scale with computing, it still comes back to basics… RAM, Compute, Network, and Storage (and don’t forget good cradle-to-grave data policy planning). Analyze what your app is doing on its lowest levels, then scale that up and distribute.

Good luck!


Hi MaxLee,

Thanks for your valuable input. Yes, I’ve considered those factors during the initial system design process. As there are too many metrics to consider, i built a poc that’d process around 10 metrics offline over a historic months worth of data to validate the anomalies detected. The system performed quite well and hence we considered extending the system to be a continuous online one for larger set of metrics initially and narrow down later. I’m building a distributed system in python leveraging celery to extend this, which is finished almost entirely except for the step where I’m trying to save the model on a cassandra column. It’s an overload to save megabytes of data since cassandra isn’t optimized for retrieving rows with large columns. Hence, I’m trying to understand if i’m approaching this correctly.

Another thing when designing these systems: Don’t over-engineer it. You’re trying to shove a binary file into what is primarily supposed to be a key store (JSON documents that are indexed) NoSQL server. You’re trying to force the data into a box that it just might not belong in.

If the organization you’re at has a decent NFS server(s), it may honestly make more sense to mount that to your host(s) and save directly as a file, rather than incur the penalty of having middle-man DB software as a part of your system… as you’ve pointed out, saving large binary files into Cassandra isn’t what it was really meant for. A proper NFS server should be mirroring itself to a backup server anyway, so you’d have your redundancy there, while if you’re only dumping your models to disk every Nth update (select a random range between 10-20 at each thread’s startup, which would be modulo’d to trigger a dump to disk), you could reduce your required bandwidth and stress to your NFS server(s).

If your organization needed more help than this, I’m available for consulting ;-). I did this kind of support for several years at a large financial firm that ran distributed systems across several hundred thousand processes, with occasional expansions of up to 1.5 million processes when needed at the time I left.

Also, out of respect for Numenta, their work, and the research they’ve done and are currently doing, don’t forget to see where your project falls within their generous licensing.


We are on an experimental phase leveraging this to see if the process works, but sure! I’ll put in a word with the team if we are looking for consulting. Also, I’d really appreciate if folks here can comment on my original question :).

I know that I’ve heard @subutai say that distal synapses could be sampled. I think this means before you save the TM, you could remove a percentage of the distal connections completely. You’ll have to experiment with how many you can safely prune during a save, how it affects your performance as well as the disk save size.

1 Like

Hi @rvsandeep,

Here is what I would try to reduce the size of the model:

  • Use the code in $NAB/nab/detectors/numenta/ as a starting point for your anomaly detector. Use NAB to test the impact of any parameter changes.
  • Try changing columnCount from 2048 down to 1536 or 1024. Note that you have to change this in both the SP and TM (they should always be the same).
  • Try changing the number of active columns from 40 down to 30 or 20. As you change this, you will want to also reduce activationThreshold, minThreshold, and newSynapseCount.
  • Try changing cellsPerColumn from 32 down to 16, 12, or 8.

In addition, you will find that the model size grows slowly over time. This is because the HTM always adds new synapses. To counteract this, I’ve speculated for a while that you might be able to randomly downsample the number of synapses once you have trained on a few thousand records. Here’s what I would try first: on each iteration keep track of the number of synapses you add (max is newSynapseCount * number of active columns on any given iteration). After the compute step, randomly choose that many synapses throughout the network, and delete them. If a segment becomes empty, delete the whole segment. You might need to add some methods in the class to support this operation.

With all of the above, test with the full NAB benchmark to ensure the changes are not too disruptive to performance. This is slow, but the best quantitative way we know of to ensure decent anomaly detection accuracy. You’ll want to make sure you understand our anomaly detection paper, NAB codebase, and the datasets used in NAB. I think NAB is a valuable tool in debugging.

I have not tried the above, so these are my best guesses. I would be very curious to see what you find out!! Of course, doing the above will speed up the serialization process as well.

There are also many code optimization strategies that have not been implemented and can work (e.g. using smaller bitsize floating point instead of 64 bit, or going down to 8 bit integer math), but that would be more work. (A quick version of this might be to just change the Proto bit sizes. This will lead to a situation where you won’t get identical results after deserialization, but it might be “good enough”. I don’t know enough about Proto to know whether this will really work.)


Thank you @subutai and @rhyolight . Appreciate your detailed response and suggestions. I’ll try to integrate those changes and publish my analysis.

1 Like

I’m tracking this and described some ways to validate the experiment and implement synapse decay, death.
It should behave like dropout in effect on learning (robustness), but additionally improve models performance.