Differences Between White Paper and Implementation
Years ago, Numenta published a paper called the “CLA White Paper”. The spatial pooling implementation has changed since then. The differences are listed below.
In the white paper, active columns are determined based on their neighboring columns. Generally a column becomes active if it has a relatively high overlap score compared to his fellow neighbors. Computing inhibition in this manner is very computationally expensive, and represents a significant bottleneck in the implementation of the CLA algorithm. Global inhibition tries to tackle this problem by picking the columns with the most active columns from the region as a whole. This basically amounts to sorting the columns and selecting the top N most columns to become active. This is particularly useful when there is no topological information present in the data (i.e. the order in which columns are laid out is irrelevant).
Theoretically, every column has a permanence value associated with every input bit in its potential pool. Practically though, many of the permanence values are extremely small. To reduce the memory footprint of the spatial pooler, permanence values below a certain threshold are zero’ed out. This is referred to as trimming.
Raising Permanences To Overlap Threshold
Columns must have a certain overlap score in order to be considered for activation. This is called the stimulus threshold. If the number of synapses in a connected state drop below this threshold, then it would be impossible for that column to be activated. While the activation duty cycle of such a column will eventually decrease enough such that it’s permanence values will be boosted, we also take extra care to ensure that every column has at least the minimum number of connections required to exceed the stimulus threshold.
Orphan columns are defined as columns which, for a given input pattern, all of their connected synapses were connected to input bits which are turned on, i.e. have a 100% overlap with the input. Yet despite this good match to the input, these columns did not become active after inhibition has been applied. Such columns are referred to as orphans since they have learned to represent a particular input pattern, but another column or columns have learned to represent the input patterns more accurately.
For vision problems, the columns are laid out regularly in a 2D grid and each column as a square radius of inputs that it is connected to. Specifying an input border would result in the input bits on the exterior of the grid to not be included in the potential pools (which in the domain of vision is also referred to as receptive fields). For example, specifying an input border of ‘2’ would result in the bits on the 2 leftmost columns, the 2 rightmost columns, the 2 top rows, and 2 bottom rows to be excluded from the potential pools of all columns
Shared inputs are inputs that are connected, via connected synapses, to more than one active columns. Remember that our goal is to have each column learn a unique spatial pattern. Shared inputs, by definition, do not lend themselves to such a goal. As such, the permanence values associated with shared inputs used to be incremented slightly less than non-shared inputs.
In vision related problems, it is typical that lower level regions learn to represent nearly identical patterns, namely edges. This makes it possible to optimize the learning process by replicating a set of learned permanences from a small subset of columns throughout the entire CLA region. This is also known as ‘weight sharing’ in other machine learning domains. Cloning has been removed from the code base since its implementation has resulted in a lot of complexity.
High Tier Columns
High Tier columns were added to support datasets with very little input patterns. It is important to note that this scenario is only relevant for artificial data sets as real, sensory inputs do not lack input patterns (quite the contrary). In scarce input pattern scenarios, only a small subset of the columns would learn to represent the entire range of input patterns, leaving the remainder of the columns to ‘starve’, i.e. never become active. As these ‘starved’ columns will have their boost values raised (see boosting), they will eventually displace the columns that have learned to represent the input patterns, and will starve a new set of columns. This will lead to perpetual oscillation among the columns, with a different set of columns learning to represent the same input spatial patterns. High Tier is a method that was conceived to prevent this from happening. With High Tiering, we first look at the overlap percent (i.e. what percentage of connected synapses for a column are connected to input bits which are turned on). If this percentage is sufficiently high, we exclude these columns from participating in the inhibition round, securing their spot in the set of active columns.
Read the CLA White Paper first!
|August 20 2013||Cloning, orphan columns, shared inputs support removed. High Tier relegated to separate base class|
|???||Spatial Pooler Memorization (aka “high tier”) was added to increase learning speed. TODO: describe this|
|???||Option to increment shared active input bits less than non-shared|
|Early||Orphan columns added|
|Early||Input border added|
|Early||Topology added for vision|