Trouble replicating results in "multiple fields guide"

So, I decided to try out using multiple fields as input to get better results following this guide here

I run the exact same test setup as in the guide (although on Ubuntu 14.04 LTS instead of a Mac), running the exact same commands, on the exact same data with the exact same search_def.json files. I first ran the Basic swarm with one field here as the “control test” with a slightly higher error rate result of 2.1, which is fine.

Next, I followed the Multiple fields example 1 guide here. The predicted results were pretty good, my error score was a bit higher at 1.99% than the guide’s 0.88%, although the main problem had to do with the swarms assessment of the field contributions which were quite different.
The 1st time, I got these results, although I’ll ignore them assuming something went really wrong somewhere.

Field Contributions:

{   u'metric1': 0.0,
    u'metric2': 1.5404411210203293,
    u'metric3': -111.60297906162586,
    u'metric4': -119.64311053972988,
    u'metric5': -120.26278962342069}

the 2nd time,
Field Contributions:

{   u'metric1': 0.0,
    u'metric2': 20.773117774925623,
    u'metric3': -111.60297906162586,
    u'metric4': -155.41099945573757,
    u'metric5': -120.26278962342069}

The 3rd time,
Field Contributions:

{   u'metric1': 0.0,
    u'metric2': 18.53701573428193,
    u'metric3': -111.60297906162586,
    u'metric4': -165.92191109011168,
    u'metric5': -120.26278962342069}

Additionally, when I plotted the results on a graph, both the predictions and anomaly scores were more jittery than the plotted predictions and anomaly scores of the control test of Basic swarm with one field. This is something that I’ve experienced multiple times on my own data sets. Even though the swarm says that certain other fields contributed to a lower error score than without those fields, the resultant predictions and anomaly scores were always more jittery or simply worse.

I also ran the Multiple fields 3 example, and here my results were much closer to the guide’s, with a near identical error rate of 3.87% and somewhat similar field contributions

Field Contributions:

{   u'metric1': 1.478597204035958,
    u'metric2': 2.9664000399750816,
    u'metric3': -3.73566779252196,
    u'metric4': 95.38232870409723,
    u'metric5': 0.0}

My guess is that since the guide was created 2 years ago, the swarm’s implementation has changed which results in the divergence of results compared to the guide. However as the guide explains, metric2 should have helped out metric1 about twice as much as my results indicate. I’m really surprised as to why the predictions and anomaly scores are worse off then without the contributed fields’ help though. Could someone please confirm or deny these “findings”?

2 Likes

Sorry I haven’t responded. I bookmarked this post and then forgot about it.

I don’t think the swarming implementation has changed since we open sourced in 2013. I’m going to try to look into this on Tuesday.

No problem Matt, I know that you’re busy :slight_smile: appreciate that you’re looking into it.

@Setus Thanks for this post. Sorry for the delay but I’ve been able to replicate your finding, which is troubling. I suspect you are right - something has changed in between. Although the swarming implementation hasn’t changed, I know there were some changes to the Spatial Pooler initialization a while ago. There might be something else as well. It will take a little longer to figure out.

I think this NuPIC issue is very related:

2 Likes

Thank you for replying :smile: Glad to hear that you were able to replicate my finding, despite the implications. Ok I see, changes in certain parts of the system were inevitable with time and progress, so it only makes sense that the swarming algorithm would have to be eventually updated accordingly. It’s fine, I’m in no particular hurry.