Problems with boosting?

Continuing the discussion from Why maxBoost is always 2.0?:

Marek (@breznak) brought up that…

(emphasis mine)

I found these issues:

As read in this discussion between @breznak and @scott, the claim that boosting is broken is currently disputed. I would say the jury is out on this until we have proof that a change in boosting can improve performance of the algorithms.

1 Like

TL;DR of the discussion (please go in for details as I’ll simplify), I claimed boosting is “broken” as it causes artificial rises in activation of the columns (that’s what it should do), but that leads to fake anomaly scores (FP anomaly error). @scott countered it’s what boosting is supposed to do.

So, how to test the impact?

  • I’ve replicated the issue on a simple dataset (sin) where it is an issue.
  • an indirect proof is that boosting is disabled (maxBoost=1.0) set be swarming for optimal params, see NAB
  • a real-world test would be testing on NAB with maxBoost=1.0 and =10.0

I don’t have NAB running here now, can someone replicate the experiment and confirm/deny the performance impact?

1 Like

For simple streaming data, including all the NAB data files, maxBoost should be set to 1.0. It should only be greater for more complex datasets.

Before we can say that boosting is broken, we need to test it on data streams that really require it.

If you test it on NAB or simple datasets with maxBoost=10.0, you will get lots of problems, but that is totally expected.

Then the default should be 1.0 as you suggest, + there should be a big fat warning in the docs for the params about this (complex enough only) dataset.

Now, can we test it then? Do you know of a dataset that is “complex” enough to benefit from the boosting? If there’s no such dataset, I’d even say to go further and remove the boosting code;

Sidenote, irrelevant of the positive/negative impact of boosting, a part of the PR were an implementation of boosting that would not have the negative property of “visible disruptions”

Sure, spatial pooling with 2D topology on a large set of natural images would be a good set. Setting up the task to do this properly requires some thought.

That would be fine, but a PR doesn’t cut it in my opinion. It should be tested outside somewhere (such as nupic-community) in a form that others can run and see the results. This would be a non-trivial change to a core part of NuPIC - so it needs to be well described and tested on a bunch of data. For a core algorithm like spatial pooling, we also need to make sure it is something that is plausible biologically. Then a proposal can be made to incorporate it into NuPIC. (This is the same process we use ourselves - we test and document in nupic.research first.)

2 Likes

If you could point me to any tests and results for the existing boosting mechanisms I would appreciate it…

That is a very good question. I don’t know that we have algorithm level tests of boosting (it hasn’t really been a focus of ours). I think we can come up with a list of desired properties and then test to see whether the current implementation matches those properties.

3 Likes