Is Boosting biologically inspired?
What is the significance of boosting?
What are the drawbacks of HTM models without boosting?
How is the output instability caused due to boosting handled?
Can we go for a homeostatis mechanism where we see to it that the all the minicolumns have a relatively stable and similar amount of total synapses? This process would be slow over consecutive activations and could influence individual neurons in a minicolumn instead of the entire minicolumn
Thanks @rhyolight , I understand why we should give a chance to relatively inactive minicolumns since they might represent additional information about the input space. And we need Boosting to do this since we can’t increase the no. of active minicolumns in the layer(since sparsity has to be maintained)(Homeostasis). Am I right?
When you imply increased in efficiency, you are referring to the inclusion of additional semantic information using boosting, right?
I suppose there is no such instability due to SDR properties and effect of Boosting being gradual?
Could you describe the boost factor algorithm? Is it published?
This homeostasis idea is nice, but IMHO is too generic (… I can’t see how is directly applicable to the physiology of the cells).
I think refractory period fits better with the “boosting” idea. You can’t blindly activate the same column again and again. You have to respect the refractory period before to fire again.
From a practical stand point, through a previous activation estimation (v.gr, exponential moving average) you can determine if a column is firing too close to the expected limit (let say, 2% of the time for a sparsity of 2%). In such case, you can “de-boost” the overlap of the input with that column and decrease the chances to win the inhibition again and again.
You can estimate the activation frequency of each column as:
ActEMA(t+1) = alfa+(1-alfa)*ActEMA(t) if winner in t+1
ActEMA(t+1) = (1-alfa)*ActEMA(t) otherwise
Then, the Boost applied to the overlap for each column can be [BF1·exp(-(BF2*ActEMA)/Sparsity)]. Something similar can be done to break ties (instead of using a fixed random tie breaker).
Exponential moving average might be a better choice.
bumpUpWeakColumns is an additional mechanism on top of boosting. It reinforces (increments) all the proximal synapses of columns that have lower average input compared to the others. Personally I found this to be inadequate, so I bump all the synapses also by the boosting factor (not the average input) in my implementation.
No, I’m talking about spreading the meaning out to more minicolumns in the SP, not the inclusion of more semantic info. The efficiency means there is less overloading of meaning in the used minicolumns because more minicolumns are used. The storage of meaning stays the same, it is just spread out.
I’m not sure what you mean. Boosting does not introduce instability.
To clarify @abshej’s point here a little more… essentially when using boosting in a system which wouldn’t normally utilize all of the minicolumns, you end up with a larger/ more dense SDR of minicolumns representing a particular input, where only a subset of that larger representation is active at a given time. This means there is a greater chance that the distal connections made during TM for a particular cell when one of the representations was active will be below the threshold when a different one is active. This means potentially more bursting and more cycles to stabilize on a particular input + context representation.
Picking the right boosting factor is essentially a balancing act between overall capacity and cycles required to form stable representations in TM.
Are you saying this is the instability? Please elaborate.
This would end up in different SDRs for same patterns in some cases. So this pattern will have to be learned again.
I thought, since boosting is gradual, even if the representation of an input changes over time, it will be slightly different, but this cannot be a necessary scenario since the repetition of the same input can occur anytime, after winning columns have been influenced considerably by the boost factors. And how exactly does this help in recognition of previous instances more precisely, as shown in the video?
I am talking about instability in SP outputs.
Instability in SP outputs of course leads to instability in TM outputs, so I think we are talking about the same issue. Since TM is the ultimate goal, that is why I clarified that facet of the issue.
Anyway, if you keep the boosting factor low enough, then that reduces instability in TM as long as enough of the distal synapses are active as the minicolumns for previous input change, so that TM learning adapts to cover more of the larger representation.
Agreed. But the instability in TM’s distal connections isn’t my focus. Wouldn’t disabling boosting be a better option since we cannot predict if enough distal synapses will be active, as that depends on when the input repeats and new connections are formed, along with the activation of other minicolumns(during previous and later inputs), which influence the boosting factors.
And how does the overall mechanism help in more precise recognition of previous occurrences of input if new(different) semantic information is not added because of selection of other columns due to bursting only?
Sure, disabling boosting is a perfectly valid option for some scenarios. What boosting brings to the table is greater capacity for semantically similar contexts in TM. Without bosting, a smaller set of minicolumns is used to represent all contexts of a particular input. With boosting, there is a larger set of minicolumns, thus capacity to learn more contexts of a particular input. That greater capacity comes at the cost of greater instability.
I would add that boosting can also result in finer columnar representation. Here is a scenario I encounter from time to time:
A minicolumn can only update its proximal synapses if it is active. Imagine a scenario where minicolumn A is active on both input patterns X and Y because of its existing potential connections to both. This happens very frequently due to initial synapse configurations. Depending on the increment (say 300) and decrement (say 100) parameters, this minicolumn may keep getting active on both of these inputs. There is no way of making sure that the column activates on only one of the patterns with these parameters. If boosting is enabled, the overlap of this column decreases giving chance to another column B getting active. This newer column now learns patterns X and Y as the boosting allows. A and B both reinforce their connections to X and Y, so B now competes with A for activation on X and Y. I regularly observe that this competition results in specializing to one of the inputs because of columns having different potential synapse pools and the synaptic update rules. Previously you had column A representing both input patterns X an Y. With boosting on, A represents X and B represents Y.
If you set parameters to be decrement>=increment (for example 100 and 100) you can make sure a column always represents a single input. However, you now lost spatial generalization capacity because no column would overlap on representations of two different input patterns.
TL;DR Previously you had column A representing both input patterns X an Y. With boosting on, A represents X and B represents Y.
Edit: decrement<increment should’ve been decrement>=increment.
What I would like to know as well, since homeostatis alone cannot explain boosting, but I haven’t read about it in detail.
At first I thought homeostatis is the maintenance of overall sparsity and nothing more.
Also, can anyone explain why boosting would lead to same semantic information being represented in a different distribution? A minicolumn can be replaced as the winner by another which doesn’t share spatially close receptive field, using boosting, right?