How boosting changes the SP representation

So, I’m working on more visualizations for the SP to compare how learning improves the spatial representations produced by the SP. In this example, the same noisy encoding is passed into two different spatial poolers. One is random (learning is turned off), and the other is learning over time.

When there is no boost factor (maxBoost=1.0), it looks great and shows very nicely how learning improves the SP’s output. But when I turn boosting on (maxBoost=2.0), it is pretty obvious that the SP’s output representation changes drastically.

Now, I understand the point of boosting, but I don’t understand all the internals yet, so maybe this comment is off-base. But how can this be right? If the main goal of the SP is to maintain the overlap properties of the input space, and boosting changes the output representation so drastically, how can it still be doing a good job?

1 Like

So I let the visualization run all the way after I stopped recording, and boosting never occurred again within the data I sent it. And you can very clearly see that the SP’s representations recovered after the boosting period and were once again comparable to the representations it was creating before boosting:

So it appears my concerns may be overblown, because the representations were only non-comparable to previous ones during the boosting period. This confusion is very likely simply because I don’t understand the boosting mechanism yet. That’s going to be another episode. I’d better get to studying!

But still, if anyone has any insight, I would be happy to discuss.

Cool visuals Matt! And it looks like you answered your original question, no?

Boosting is used in the SP to keep columns competitive such that all the columns are used; a goal of the SP is to efficiently use its resources, so boosting helps keep all columns competing for activations. Seems simple enough, but the underlying mechanism relies on monitoring both the overlap and active duty cycles of the columns. Can we expect these to be explained a bit in an upcoming episode?

Yes, that’s the plan. I’ve already hacked together a backend that is storing the active duty cycles and and overlap duty cycles. The boosting is all tied into inhibition, right? It seems simple enough with global inhibition, but local inhibition complicates everything. I’m considering explaining global inhibition and boosting in the episode 9, then topology and local inhibition in episode 10 (episode 8 is about learning).

Just a quick suggestion @rhyolight, it may be better to include bumping along with boosting. Bumping seems to be more valuable in terms of creating the competition from what I observe. Usually, I turn off the boosting for more stable representations in the short term because of the things you mentioned and just work with bumping.

For the sake of same understanding about these terms:

Boosting

The artificial overlap increase to force a less used column to be active. This would allow the less used columns to survive after inhibition which would help them adapting their synapses to input patterns because only activated columns are allowed to do this. The goal is to get these columns activated without the help of boosting in the future. Active duty cycles is the main metric for this.

Bumping

Increasing the permanence of all the synapses of less used columns until the moving average of the overlap is above a threshold. This does not enforce an activation but helps more synapses to become connected which would indirectly lead to column becoming active on its own in the future because of increased input reception. Overlap duty cycles is the main metric for this.

2 Likes

@sunguralikaan - non-spiking adaptation is plausible and has been discussed before. I’m not sure if there were any concrete objections other than it not making too big of an impact overall. The main negative impact is in anomaly detection so it would be interesting to see if non-spiking adaptation improved NAB results.

Clever ideas, but I need to stick with Numenta’s current implementation of HTM for these videos. Not to say that other algorithms might perform better in different situations, however.

I think @sunguralikaan was referring to the current implementation. Specifically, _bumpUpWeakColumns:

1 Like

Is this a normal part of boosting? Or is this some kind of alternative method? Sorry, I wasn’t planning on researching boosting thoroughly until next week! :sweat_smile:

One thing to keep in mind: If there aren’t enough unique inputs, boosting will wreak havoc.

An easy example: my “trivial sequence”: http://mrcslws.com/blocks/2016/03/13/column-overlaps-and-boosting.html

I have 25 unique inputs, 40 active columns, and 2048 columns. That means at most 25 * 40 = 1000 columns will be used. So at any given time, at least 1048 columns are getting starved, and boosting is going to teeter-totter the representations.

I’m not sure if you’re running into this problem, but you might be.

1 Like

@mrcslws - I thought there was specific logic to catch that case. So if columns are perfect matches then ignore other columns that were boosted or something like that.

1 Like

@rhyolight The visualizations are very nice! I am also studying the effect of boosting during SP learning. You used the term “boosted period” (or when boosting occurs) in the video, I assume this is the period where the output of the SP is drastically different from previous time steps with the same input, right? Do you have boosting on throughout the video?

I think this oscillatory behavior is due to how boostFactors are updated in SP. boostFactors are initialized to be all ones for all the columns. The boostFactors keeps increasing if a column is not active, and they get reset to 1 once the columns become active. I can see how this scheme could lead to oscillations in the SP output. If you monitor the boostFactors over training, will you see an increase in the average boostFactors during the boosted period?

Also, you mentioned that after the boosted period, the SP output becomes reliable again. This is expected because boosting does not necessarily lead to much synaptic permanence changes, especially if you have small increment/decrement values. If a column becomes active due to boosting, it is likely to be active for only a very short time period, because the boostFactor will quickly get reset. I don’t think it can learn much during this short window.

Anyway, I am still trying to understand boosting. I understand the logic of encouraging a distributed representation through boosting, but I am not sure whether it actually does the job.

2 Likes

Yes, you’re right. And I’m not sure what you mean by “have boosting on”. The only thing I did differently was set maxBoost to 2.0. I watched it play out, and I didn’t see a disruption like that again. So I assume boosting only occurred once. Honestly, I never understand this stuff until I do a visualization. Hopefully I can come up with one to see boosting live in action even better. :slight_smile:

@sunguralikaan Sorry about the ignorant post above. I haven’t studied boosting much yet, and now that I’ve re-read your post, it’s very helpful. Thanks for that!

1 Like

Hi everyone,

I’m now also learning about boosting in SP, and I’ve read this in the code:

     *         boostFactor
     *             ^
       maxBoost _  |
     *             |\
     *             | \
      *       1-   |  \ _ _ _ _ _ _ _
     *             |
     *             +--------------------> activeDutyCycle
     *                |
     *         minActiveDutyCycle

is it a representation of the change of boost factor in 1 column over iterations? And how can we set the “minActiveDutyCycle” parameter? Besides, the value of boost factor for each column is not linearly decrease or increase linearly as described, it only has two values 1 or maxboost. I have a lot of confusion about this.

Hi @nluu. Yes, that sounds right.

For global (simple) inhibition: During each SP compute iteration, you’ll calculate ActiveDutyCycle values for all Columns in the region. Take all these values, and find the mean average. You can use this value as minActiveDutyCycle to get started.

You can see how NuPIC does it here (including local inhibition; named targetDensity instead):
https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/spatial_pooler.py#L1476-L1509

That chart doesn’t look formatted correctly, I’ll link to a better version here:
https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/spatial_pooler.py#L1457-L1468

The main ideas the chart is trying to express:

  • If your column’s activeDutyCycle is very low (2%), it’s boostFactor will be very high (greater than 1).
  • Oppositely, a high activeDutyCycle (90%) will result in a small boostFactor (nearer to 0).
  • If activeDutyCycle is a middle value, it’s boostFactor will be nearly equal to 1.
  • The change between the above states is exponential and continuous, not linear or discreet.

Can I know the reason to say that BOOSTING is the reason for learning granularity information (as mentioned in BOOSTING episode of HTM school) and not SPATIAL POOLING LEARNING? Even during learning phase of spatial pooling, the learning happens. So, the columns could have learnt about time granularity (from boosting episode again) when the spatial pooler learns…right? Whats the reason to say time granularity was learnt only during boosting?

@baymaxx boosting in the mechanism which the SpatialPooler relies on to allowing every column to express itself. Otherwise cells can get domination and never turns off. Which bad thing that said cell through learning ended up carrying no information.

The actual learning is still done by the SP’s learning algorithm. Boosting is only assisting the SP to learn better.

You can read more about how boosting works in my blog post (the post me implementing Numenta’s other
paper, but he boosting part is the same thing)

The Boosting section is the only retentive one this context

1 Like

Hi all, I’m now still having some confusion about the affect of boosting on result of HTM learning. As
I understand, according to the boosting, With same Input vector, after some iterations, the output (active column) of the Spatial Pooling will be changed. If I use these results to give it as an input for Temporal Memory, it will return different prediction result. So it is not what we want. please correct me if I was wrong. Thank you so much.

Learning in SP adapts the permanence values between each SP column and its receptive field of encoding bits, so yes the same raw input that activates the same encoding bits can activate different sets of SP columns at different times.

However the 2 set of SP columns representing the same input at different times should overlap quite a bit, unless the SP learning rates are so high as to change these permanences too drastically and basically destabilize the SP. So it seems this could potentially become an issue.