So, I decided to try out using multiple fields as input to get better results following this guide here
I run the exact same test setup as in the guide (although on Ubuntu 14.04 LTS instead of a Mac), running the exact same commands, on the exact same data with the exact same search_def.json files. I first ran the Basic swarm with one field here as the “control test” with a slightly higher error rate result of 2.1, which is fine.
Next, I followed the Multiple fields example 1 guide here. The predicted results were pretty good, my error score was a bit higher at 1.99% than the guide’s 0.88%, although the main problem had to do with the swarms assessment of the field contributions
which were quite different.
The 1st time, I got these results, although I’ll ignore them assuming something went really wrong somewhere.
Field Contributions:
{ u'metric1': 0.0,
u'metric2': 1.5404411210203293,
u'metric3': -111.60297906162586,
u'metric4': -119.64311053972988,
u'metric5': -120.26278962342069}
the 2nd time,
Field Contributions:
{ u'metric1': 0.0,
u'metric2': 20.773117774925623,
u'metric3': -111.60297906162586,
u'metric4': -155.41099945573757,
u'metric5': -120.26278962342069}
The 3rd time,
Field Contributions:
{ u'metric1': 0.0,
u'metric2': 18.53701573428193,
u'metric3': -111.60297906162586,
u'metric4': -165.92191109011168,
u'metric5': -120.26278962342069}
Additionally, when I plotted the results on a graph, both the predictions and anomaly scores were more jittery than the plotted predictions and anomaly scores of the control test of Basic swarm with one field. This is something that I’ve experienced multiple times on my own data sets. Even though the swarm says that certain other fields contributed to a lower error score than without those fields, the resultant predictions and anomaly scores were always more jittery or simply worse.
I also ran the Multiple fields 3 example, and here my results were much closer to the guide’s, with a near identical error rate of 3.87% and somewhat similar field contributions
Field Contributions:
{ u'metric1': 1.478597204035958,
u'metric2': 2.9664000399750816,
u'metric3': -3.73566779252196,
u'metric4': 95.38232870409723,
u'metric5': 0.0}
My guess is that since the guide was created 2 years ago, the swarm’s implementation has changed which results in the divergence of results compared to the guide. However as the guide explains, metric2 should have helped out metric1 about twice as much as my results indicate. I’m really surprised as to why the predictions and anomaly scores are worse off then without the contributed fields’ help though. Could someone please confirm or deny these “findings”?