I was reading up on the spatial pooler and had some thoughts on how local inhibition would affect the fixed sparsity property of the SP:
One of the core properties of the SP seems to be that it creates fixed-sparsity representations. With global inhibition the SP undoubtedly fulfils this property as it simply achieves s% sparsity by activating the columns with the s% highest input overlaps.
With local inhibition however this doesn’t seem to be the case anymore. For example, consider some column a that is not activated as it is not within the top s% of its neighbourhood. Some other column b within a’s neighbourhood is within the top s% of that neighbourhood, therefore contributing to a not being activated. However there’s no guarantee that b gets activated as b’s neighbourhood might contain other columns with even higher input overlaps, that were not within a’s neighbourhood.
An extreme example of this would be a one-dimensional SP where the input overlaps decrease monotonically from one side to the other. Depending on the hyperparameters, every single column except for a small fraction at the very edge of the SP would be inhibited by their neighbours on one side, leading to an actual sparsity significantly lower than the ‘intended’ sparsity.
I confirmed that this occurs to some extent through NuPIC. I trained an SP with a dataset of around 50k items for one epoch, with columnDimensions=2048, numActiveColumnsPerInhArea=16 and inhibitionRadius=366 (this last one was decided by NuPIC internally). I then fed the same dataset to the SP again. The number of active columns now ranged between 16 (0.78% sparsity) and 48 (2.34% sparsity), which I would not consider a “fixed” sparsity. The Spatial Pooler paper by Cui et al. in contrast shows the sparsity of an SP with local inhibition always being near the target 2%, calling it an “inherent property of the network due to the use of local k-winners-take-all activation rules”.
My questions on this are:
Is what I described above actually the case?
(if so:) Could such fluctuations in SP sparsity pose any practical problems?
(if so:) Has there been any effort in designing some alternative method to pick active columns when local inhibition is active to ensure the target sparsity is adhered to more closely?
I agree with Yuwei. The SP should produce a consistent sparsity of active columns. It seems like there is something wrong here. Can you share your complete SP parameter set? And do you have code you can share as well? Even if we can’t get to the data it could still help.
I’m moving this to nupic because I think this is a configuration problem not a theory issue.
The SP does not guarantee fixed sparsity unless you use global inhibition. Local inhibition results in some variation. This could be problematic if the variation is large but I doubt will be an issue in most applications. We haven’t actually done much experimentation with local inhibition or topology recently though.
Ah, this makes sense you think about local inhibition along with topology. Of course if there is input in some spatial areas much more than others, those corresponding local neighborhoods will have higher activations than neighborhoods with lower activity.
Of course if there is input in some spatial areas much more than others, those corresponding local neighborhoods will have higher activations than neighborhoods with lower activity.
This is true with global inhibition since the strongly activated areas will win out over the less activated areas.
But with local inhibition, it isn’t true, each neighborhood will have the same relative sparsity since each neighborhood separately selects the top k columns. k is the same regardless of the overall activity of the inputs in the different neighborhood.
“A small percentage of columns within the inhibition radius with the highest activations (after boosting) become active, and disable the other columns within the radius. The inhibition radius is itself dynamically determined by the spread of input bits. There is now a sparse set of active columns.”
However later in the Phase 3 it states:
“The inhibition logic will ensure that at most numActiveColumnsPerInhArea columns become active in each local inhibition area”
which fits to the algorithm pseudo-code and your discussion.
What is actually the case? I believe the 2nd one but then the first statement is somehow confusing or I interpret it wrongly.
If I understand you correctly, you could interpret it such that: “at most numActiveColumnsPerInhArea become active in each inhibition area, but practically none become active except in the inhibition area with the highest activations.”
However this would not just be written very confusingly according to the pseudocode we choose all columns that are higher than the stimulusThreshold and ranked k’th score in their inhibition area to be active columns. Not just the ones from the highest activation area.
6. for c in columns
7. minLocalActivity = kthScore(neighbors(c), numActiveColumnsPerInhArea)
8. if overlap(c) > stimulusThreshold and
9. overlap(c) ≥ minLocalActivity then
10. activeColumns(t).append(c)
I’m currently writing SP docs and about to start writing one myself, so I read all that very carefully. I’ve been reading the BAMI SP chapters, but I’ve deliberately skipped local inhibition so I don’t have to explain it early on. This thread is making me reconsider that notion.
@kaikun Tell me how you would put it into words, how local inhibition is working? I don’t think the code is wrong, I think the words could be better. Since I’m in the process of rephrasing things, I’m unusually open to suggestion.
As you know - I have been wallowing in grids lately. This topic does have a direct impact on the discussion of the spatial pooler. Grids are a very special case of spatial pooling.
If you compare a grid forming area to one that is not grid-forming I see the primary difference as whether or not the layer II neurons have excitatory mutual connections at 0.5 mm spacing. Without these connections, there is still competitive action to recognize a local pattern but no influence to extend that pattern recognition beyond the local cell area.
In a nutshell:
The SDRs are the little pattern each dendrite senses. One cell may have dozens of dendrites - each is keyed to record a few patterns. Please note that these patterns are local to the cell - they only extend as far as the dendrite arbor can reach. For cells in the cortex that range is between 0.3 mm and 3.0 mm.
Each column (a cluster of 100 or so cells) may sense hundreds or thousands of little patterns local to the column. The grid is the structure that tie these little pattern sensors together into a larger pattern on a single map. Grid node spacing is about 0.5 mm.
The entire cortex after unfolding is about 1000 mm x 1000 mm so the individual SDRs don’t cover very much of the brain. The brain is thought to be composed of about 100 areas of local processing; doing the math gives each area about 100 mm x 100 mm area. Since the reach of individual cells is small even in relation to the smaller maps the grid structure gives a mechanism where the cells can work together to recognize a pattern that covers the larger area of an area map.
Not all areas work to form these larger patterns - some work to refine a local pattern.
In both cases there is competition via the inhibitor inter-neurons - the strongest response triggers the inhibitors to suppress cells within a certain local topological area. That winner gains whatever benefits there are for training reinforcement.
@rhyolight Yes, the code is all correct. I would simply change the formulation in this one sentence.
Instead of
“A small percentage of columns within the inhibition radius with the highest activations (after boosting) become active, and disable the other columns within the radius.”
Something like
A small percentage of columns with the highest overlap score (after boosting) within each inhibition radius becomes active, and disables the other columns within the radius.
Or as this is still not very correct, as pointed out by the author of the thread:
Every column competes with the neighbors withing its inhibition radius to become active. The k-columns with the highest overlap score (after boosting), disable other columns to become active within the radius. After the competition a small percentage of columns becomes active.
@Bitking Interesting, but that seems more to be for the hierarchical aspect (as mentioned by Jeff @MIT 15dec)? As higher areas have a larger scale and can tie the little pattern sensors together. Not entirely sure how you mean this is grid structured.
The point is that the unified pattern that is formed at the map level combines the one small column pattern recognition beyond the reach of a single column in a parsimonious way.
It is a way to form a global pattern wth local computation.
The brain is doing this (proven without a doubt) and there must be a good reason for it. I offer this as an explanation why.