Why doesn't the Spatial Pooler scale well to numerous fields?

Hi all,

My question refers to a point that Subutai made in the video ‘Multiple Fields in Nupic’ (at about 4:00), where he said that handling more than 5 fields or so becomes difficult for the spatial pooler to handle. If I understand it right, the encodings for each field are concatenated and then fed into the spatial pooler, so having many fields means producing a very long concatenated bit string for it to produce an SDR from within the 2048 column SP space. He says that each column ‘has to participate in understanding the full richness of the space’, since there is no topology imposed that limits each column to looking at only a certain subset of that large encoding space (as he says there is in primary visual cortex).

I’m confused because I thought that the spatial pooler did impose such a topology. I thought that in the spatial pooler each column has a dendrite segment of synapses that each (potentially) connect to one bit in the input space, meaning that each column is effectively looking at a small subset of the encoding space. If this is correct wouldn’t no column be responsible for looking at the whole encoding space? Maybe I’m not clear on what is meant by ‘topology’, but to my intuition it seems that feeding in a longer (concatenated) encoding string to the spatial pooler would simply mean that the number of ‘potentialSynapses’ on each column’s dendrite segment would increase. It does seem on a basic level that feeding in more and more fields means cramming more and more information into the same 2048 column space, which would eventually exceed the capacity of the system at some point and require tons of data to find pattern in, but other than than am I missing something here?

To wrap up, I’m trying to find something novel I can do somewhere in the HTM algorithms for my Ph.D, and thought I could implement a ‘topology’ as Subutai describes if it hasn’t been done yet. I’m really interested in the scalability of HTM to an arbitrary number of fields so I thought this may be a good fit, but I’m totally open to other ideas if anyone has any! I remember Fergel mentioning that Franciso had dealt with this issue in handling data for NASA, but I’m not sure how to find out how exactly he did that. Thanks so much and sorry for the long post,

– Sam


The SP can be initialized with or without topology. In most of Numenta’s experiments and all the applications we’ve created, we have not used topology because it has not added value. Topology adds significant value when the input space of the SP contains topological semantic relationships (like with images). For example (from MNIST):

Here, topology is extremely important. But with the type of streaming scalar data Numenta has used in our apps, we don’t need topology.

Stay tuned for more on topology in upcoming HTM School episodes. This Friday I’ll talk a little bit about “global inhibition”, which means no topology and all the columns are in the same “neighborhood” when inhibition is calculated, so there are no localized groups of columns, just one big one.

Sorry, but it is implemented in the NuPIC SP, even though we don’t use it much. Also, it decreases performance significantly to have topology and local inhibition.

1 Like

Why doesn’t the SP scale well to numerous fields? The function of spatial pooler is to identify common spatial patterns. The number of spatial patterns grows exponentially as a function of fields. Typically the numerous fields are correlated with real data, and identifying common spatial pattern requires simultaneously learn from all of them. Individual columns in SP has limited connectivity with the input space, it could be hard for a single column to learn from many fields.

From a neuroscience point view, you can ask a similar question, why keep touch, vision, hearing separated at the primary sensory cortices? Why not feeding all sensory information to a single cortical region and let it figure out the common SDR patterns? Within each sensory modality, why having a retinotopic map in V1 or a body map in S1, why not scramble everything together? I think it is also related to the efficiency of learning. It is easier to identify commonly occurring patterns in a single sensory modality at a small scale. The problem gets harder quickly as you concatenate multiple sensory fields together. You can try that with the spatial pooler, I doubt you can detect any meaningful patterns with it though.

Topology means “topographic map” here, which is the ordered projection of a sensory surface, like the retina or the skin, to the cortex. If you look at the primary visual cortex or the primary somatosensory cortex, no cortical column covers the whole encoding space. Instead, each column only covers a small part of the sensory space (e.g., small patch of retina/skin), and neighboring columns covers nearby patches.

The current spatial pooler in NuPIC has the option to use “topology”, meaning that each SP column is connected to a small patch of the input space, nearby columns connect to nearby patches, and inhibition is local. I am not sure how well topology is tested in NuPIC, and I am pretty sure that it cannot handle topology with multiple fields (e.g., vision and touch). I think there could be some interesting projects there.


Interesting questions Sam, hopefully I can help a little

The inhibition step will determine the activity of columns based on their neighboring columns. In practice, we typically set the inhibition to be global because computing inhibition relative to local neighborhoods of columns can be computationally expensive. This is useful for data that doesn’t have topological information—i.e., the order in which columns are laid out is irrelevant. In vision, like the MNIST example Matt mentioned, topology can be a significant component to learning features of the inputs. Enforcing a topology in the SP would mean using local (as opposed to global) inhibition, such that the receptive fields of the columns are limited to a subset of the input space. This subset is a specific local neighborhood of the input space. So in the MNIST example, a set of columns may be confined to the input space in the center of the image, while another set of columns would be confined to a corner.

For the implications this has on multiple fields, consider that an SP column tries to learn to represent features in whatever inputs it has access to. With more and more input fields, there are more and more spatial patterns that the full SP tries to represent in the output space. If we allocate local neighborhoods of the input space to sets of columns – i.e., local inhibition – then each SP column can learn to represent a manageable number of patterns. However, with global inhibition, each column tries to learn spatial patterns across the entire input space.

Awesome! What is your area of study? Perhaps we can help with suggestions…


We often set the potential pool in our networks to be a pretty good portion of the input space. So I don’t think this statement is accurate in many cases. Individual columns may not be able to represent some trivial pattern if it doesn’t happen to have overlap, but even if the column only has 50% potential overlap with a complex feature it should be no problem. So columns should have no problem representing features across many fields. The problem is that the column initially representing some important feature now has a bunch of new inputs from the new columns and has to represent the most salient set of coincident active bits, which may no longer be the important feature.

The rest of the responses from @ycui and @alavin do a good job of explaining the fundamental trade off of inhibition radius. You want it large enough that you learn interesting and useful features but not so large that important, but subtle features are lost.

But there are ways to mitigate this problem. @rhyolight @ycui and @alavin mention topology. This is absolutely necessary in some cases but you have to make sure your inhibition areas are sufficiently large to learn the important features in the input. A very simple alternative is to increase the number of active columns (at some point you will also need to increase the total number of columns in order to keep the SP output sufficiently sparse). This results in more features being represented uniquely so you can add fields without losing as many important features from the original fields.


Thank you guys for such thorough responses!! After reading them all 3 times
I think I have the gist of it, and I just hope to verify that by
paraphrasing below. Please let me know where I go wrong or come up short!
I’ll try to be as concise as possible.

With global inhibition, the synapses on each SP column’s dendrite segment
can potentially connect to any bit wherever it is in the encoding space,
not limited to a certain sub-area of the encoding space. This is what
Subutai meant when he said that each SP column has to ‘participate in the
full richness’ of the encoding space. The top 2% of all SP columns
(‘winning columns’ - those with the most connected synapses to active
encoding bits) are activated. These top 2% globally inhibit all other

With local inhibition, the synapses on each SP column’s dendrite segment
are limited in that they can only potentially connect with encoding bits
within a certain designated subset of the encoding space. Likewise the
’winning columns’ don’t inhibit all other SP columns, but only those within
their local ‘inhibition radius’. So the total set of ‘winning columns’ are
made up of a bunch columns that each won their respective local areas
(‘inhibition radiuses’), rather than having to win over all columns in the
entire SP space (or ‘global’ area).

One instance where it is appropriate (or even necessary) to use local
inhibition is when there are semantic topological relationships between the
bits in the encoding space. If I understand this concept correctly as it’s
seen in the written digit example from MNIST, there are semantic
relationships between encoding bits because there are relationships between
the greysclale values of neighboring pixels. In other words, the greyscale
value of each pixel has implications for the values of those nearby. For
example if a certain pixel or group of them are in the middle of the
greyscale (between fully white or dark) it may mean that that area is
phasing in from blank space where nothing is written to dark where the
pencil was pushed down hardest. So a greyish area may imply that one side
of it will be blank while the opposite will be darker. Do I have this
somewhat right??

Also would it be fair to say that 'topological semantic relationships’
between bits in the encoding space (input to the SP) is like saying that
the different bits in the input space effect each other? In other words, is
it therefor most appropriate to use topology in general when the input bits
directly effect each other?
For example if I have a big concatenated
encoding space composed of numerous fields, would it be appropriate to use
topology if I know these fields directly effect each other and use global
inhibition if the fields are totally independent?

Intuitively it makes sense to me that local inhibition (or ‘imposed
topology’ as Subutai put it) may be better fit to handle a big encoding
space containing numerous interdependent fields for several reasons: 1)
because the local inhibition would guarantee that each field would get a
’winning column’ (or several) to represent it, as it’d be impossible for
any one dominant field to claim all the ‘winning columns’ since its columns
would only be connected to limited subsets of the encoding space; and 2)
because breaking a really big encoding space down into pieces using local
inhibition could allow the SP to scale more easily to really long
concatenated encoding vectors. It seems that this second reason could
possibly be why the brain imposes topology in v1 as Subutai says?

Thanks so much again guys!! I can’t tell you how cool it is to have your
brains in on this.

– Sam