I already searched for hours in this forum, but couldn’t find anything about my question:neutral_face: (but read tons of interesting posts ).
so my first question is, how can I decide about the number of data points I should use to run a swarm? I know that the swarm is just to find the parameters for the model.
In one post i read I should take around 3000.
The guy asked for reasons why exactly this number, but the question remained unanswered… → see last post at:
Maybe someone can help me with this
And also: does it matter if i take the last points or the first ones or some random ones in between?
My second question is the following one:
If I have a temporary anomaly (do the statistics and maybe also the min and max value of the data change), do I have to run a new swarm?
I’d say if you’re doing anomaly detection forget about swarming! Swarming searches for a set of model hyper-parameters that optimizes the model’s prediction accuracy on one chosen ‘predicted’ field – not on anomaly detection performance.
This function will return a set of hyper-parameters like swarming does, though for a uni-variate model. For multivariate anomaly detection my approach is to track multiple models (one for each variable) and declare system anomalies when many of the independent models show anomalies simultaneously.
Sure…when I am doing anomaly detection that’s another thing.
But say the statistics and the values change and I want to do PREDICTION…
Do I have the rerun the swarm or not?
I read the two papers twice… but can’t find an answer to my question:
I would really appreciate an answer to that.
Thanks a lot in advance
The swarming process generates many candidate models each with different hyperparameter value combinations, and evaluates those models over the course of N time steps. So the larger N is, the more thorough an evaluation each candidate model gets. Given this, it’d be theoretically ideal to make N as large as possible so each candidate model has maximal time to prove it’s worth.
The major problem with large N is that it can make the whole process take a REALLY long time, since each candidate model needs to be initialized, hit with N data points and evaluated from N MAPE values I believe (mean avg % error). This can be mitigated though by setting the swarm size to ‘small’ or ‘medium’ at most, which caps the number of candidate models by limiting the set of different hyperparameter combos to try.
The other major * in my mind about swarming is that it is evaluating each model by one specific criteria: how well did this model forecast the value of metric X1? Swarming helps to find model configs with the sole goal of minimizing forecasting MAPE for X1, so no guarantees for X2 or any other metric nor anomaly detection performance, since that’s a different objective.
If the statistics of the data have changed so much that the model hyperparameters are no longer valid, yes that’d theoretically call for re-running the swarm. But this should hopefully be quite unlikely barring some kind of tectonic shift in the data, where instead of X1 ranging mostly from 0 to 1 it is now 100 to 1000 or something.
Having a larger N in your swarm should make this scenario less likely since the chosen candidate model is vetted on more data, though if your N is 3000 and the tectonic shift happened only after 3000 I suppose the issue would persist.
One other thing that running the swarm periodically would do is limit the ability for continuous learning, since each new swarm run would yield a new-born model to train from 0. You’d also need to introduce some criteria to trigger a new swarm, having declared the current model obsolete. Barring that tectonic shift scenario the continuous learning nature of HTM should make this unnecessary!
A very simple (and very possibly wrong) answer is 1000 data points. But I say “very possibly wrong” because it depends on the data. More specifically, it depends on the temporal patterns and the time scales in which they express themselves & the actual period of data availability. The questions “what time interval is between data points?” and “Are the intervals always the same?” are important to answer now. You may need to take some data samples and plot them at different intervals to find these answers.
The anomaly models that @sheiser1 suggested will not work well for prediction. You will need to run a swarm to get a model that is optimized to perform prediction on a specific field of your data. If you want both anomaly detection and prediction, you might find that the canned anomaly model will perform better anomaly detection than the model tuned for field prediction.
Another tip: if a parameter returned by the swarm doesn’t make sense to you, bring it up. Swarms will never find the best parameters, they are just a tool to help you manually tune the network for your data. It is always best to look at the parameters and understand what they mean if you can.