There is one piece of the spatial pooling process that I do not understand properly yet, which is the concept of boosting. The objective is to make sure all the columns learn to represent something useful, regardless of how many columns there are. There are two mechanisms involved. The first is if a column does not win often enough, an overall boosting factor is increased. The second is if a column’s connected synapses do not overlap well with any inputs often enough, permanence values are boosted. The net effect is that columns which are not active very often end up getting higher scores when competing against other more active columns.
The question in my mind is how to prevent this process from leading to constant shifting and oscillations for a given input. This would be particularly the case in a system where there are a lot of columns and not a very large range of different inputs, or where there are a lot of repeating inputs (such as cycling the first part of Mary Had a Little Lamb “EDCDEEE EDCDEEE EDCDEEE…”) Basically it seems that the idea of making sure all columns get used seems to conflict with the idea of maintaining a fixed sparsity. In the above example, if you have 2048 columns and what to maintain 2% sparsity, then the columns representing the low number of different inputs (C, D, and E) would keep changing as unused columns get boosted and beat out the more active columns, then those get beaten by other unused columns, and so on (eventually cycling back around to the original columns).
Is that the expected/ desired behavior, or am I misunderstanding the concept? I suppose another way to avoid oscillations would be to not use so many columns in the first place when you know the the set of possible inputs to be very low (or alternately use representations much more dense than 2%).
I do see that the boosting is being based on averages over the last 1000 iterations. Maybe that number is high enough, that the shifting/ oscillations move so slowly that it doesn’t really have any negative impact?
@Paul_Lamb You are right that the objective of boosting is to make sure all columns are being used to represent something in the input. However, sometimes this could contradict another desired property of SP: stability – similar inputs should be represented by similar outputs. It is particularly problematic if the input space is not rich enough. It is simply not possible to use all columns to represents the cycling of Mary Had a Little Lamb while having the representation to be stably. In this scenario, I think stability has high priority and boosting should be turned off, otherwise I don’t see how temporal memory can learn any meaningful sequence.
I am currently doing research on boosting. Specifically, I want to know under what scenario is boosting helpful. This is still work in progress. My preliminary results suggest that boosting might be required if the input is (1) rich enough (a large number of distinct input presents in the data stream) and (2) parts of the inputs are correlated. If every single inputs is random and independent, I find that columns are naturally being used in a distributed way. If inputs are correlated, either spatially or temporally,some columns could become activated much more frequently than other columns, and boosting tends to prevent that from happening. This is because when a column becomes activated, it tends to extend its connections to the input bits that are correlated with their connected bits, and eventually becomes activated for more inputs.
That is what should happen, there has to be some change but not so much that it unsettles the stability constantly. A little instability is good because the layer does not continue to learn otherwise. The amount is controlled by boosting factor and moving average iteration count. If the input combinations are few, then boosting leads to same columns learning the same stuff and get activated in an interleaving fashion. For spatial pooling purposes different columns learning the same stuff isn’t really problematic, it may actually be good. But when you try to learn the sequences out of those columns with temporal memory, things get bad really quick as @ycui pointed out because a lot of column combinations represent same sequences and temporal memory has to learn all. In the long run it may be healthy when it settles (if it settles), but it just delays learning sequences in the short term. Another thing I observed is if you have a hierarchy of layers, the higher layer has a harder time learning from the input layer if it has boosting on because of the same reasons.
Bumping synapses on the other hand is very useful and it is what I enable for competition. It’s basically allocating synapses in an efficient manner to level the receptive activity of the columns. It also allows better fine tuning for different patterns leading to more and more specialized columns. In my use cases, learning is effected drastically if I turn it off, it helps to balance the layer.
Learning quickly is my priority by the way and I work with real time data so boosting may work better for other scenarios.
I’m very excited about you undertaking this investigation!
Just curious… Are you using the new TM or old TP? When doing your quality assessment, are you extending your assessment to boosting’s impact on the TM results or limiting your work to direct observation of the SP?