Spatial Pooler Implementation for MNIST Dataset



I implemented a spatial pooler (S.P.) and Maximal Likelihood classifier and used them to achieve 77% accuracy on the MNIST dataset. My implementation differs in a few ways from Numenta’s.

The code and figures are available at:
It is written in python3.

Test Methods
I trained the S.P. for 60,000 cycles on the MNIST training data. There are 60,000 samples of training data. I used synthetic training data. I applied random rotations (max rotation of 15 degrees), and random shifts. The resulting (Encoder + S.P. + M.L. Classifier) system is able to correctly classify approximately 77% of the test dataset.

Implementation Differences
This implementation is based on but differs from the one described by Numenta’s Spacial Pooler white paper, (Cui, Ahmad, Hawkins, 2017, “The HTM Spacial Pooler - a neocortical…”) in two main ways, the boosting function and the local inhibition mechanism.

Logarithmic Boosting Function:
In (Cui, Ahmad, Hawkins, 2017), they use an exponential boosting function (see figure 1D from their paper). Notice that their curve intercepts the boost-factor axis and has an asymptote along the activation frequency axis. The activation frequency is by definition constrained to the range [0, 1].

I use the inverse of their function, which intercepts the activation-frequency axis and asymptotically approaches the boost-factors axis. Then scale the boost factor such that at the desired sparsity it equals 1.0

 boost_function = -log( activation_frequency )
 scale_factor   = 1 / boost_function( target_sparsity )
 boost_factor   = boost_function( activation_frequency ) * scale_factor
 boost_factor   = log( activation_frequency ) / log( target_sparsity )

This mechanism has the advantage of having no parameters.
This mechanism yields an entropy of 97% of the theoretical maximum.

Plot comparing the two boosting functions.

Faster Local Inhibition:
In (Cui, Ahmad, Hawkins, 2017), they activate the top K most excited columns in each area,
where K is proportional to the sparsity, and the area is a fixed radius from each column
which is proportional to the radius of the receptive field.

I activate the top K most excited columns globally, after normalizing all columns by their
local area mean and standard deviation. The local area is a Gaussian filtering and
the standard deviation of the Gaussian is proportional to the radius of the receptive field.
In pseudo code:

mean_normalized = excitement - gaussian_blur( excitement, radius )
standard_deviation = sqrt( gaussian_blur( mean_normalized ^ 2, radius ) )
normalized = mean_normalized / standard_deviation
activate = top_k( normalized, sparsity * number_of_columns )

The above figure shows the S.P. pipeline. From left to right, top down: Input is accumulated yielding the raw excitement, the activation frequency (aka duty cycle) is used to boost each column (labeled “Boosted”), then that is normalized by each area’s mean and std-dev (labeled “Locally Inhibited Excitement”), finally the top 2% of columns are selected.

The following samples of inputs and outputs show that although the active columns overlap with the ‘interesting’ areas of the input, they are evenly distributed across those areas. This validates my method of local-inhibition followed by global selection which prevents one area from activating disproportionately.

Note: The input shape is 28x28, the columns shape is 56x56, and the radius for connecting inputs to columns is 3 units (units from the input space). In these figures the column locations line up with their locations in the input space.

Link to Sample Inputs and Outputs:
*Sorry to break the links, new users are restricted to a single image and 2 links

In an ablative experiment, I disabled the local inhibition step. Active columns are still selected globally and columns still have an input radius. You can see that the active columns still tend to be near the interesting areas of the input but that they are clumped. Also, there appear to be areas which are systematically more active.

Link to Sample Inputs and Outputs, without local inhibition:

I would like to continue working with the MNIST dataset. I think that with work I could reach my goal of 90% accuracy.


This is really nice work. I think that you should shoot for ~95% accuracy with the spatial pooler. That is what we (Numenta) get with our SP-only implementation and is still very far from state of the art. But keep in mind that MNIST is a very simple image classification task and the state of the art solutions overfit and get better than human results.

Also just a note that I have been reorganizing [1] so there will be some changes. But I will keep the MNIST SP-only implementation in task and hopefully make it easier to run.



I have a question relatively to ~95% accuracy. As I understood you were able to reach this accuracy because of kNN on top of spatial pooler, and not because patterns from SP had 95% overlap. Right?


My SP reaches 93-94% accuracy at best and does not use a kNN classifier. I experimented with kNN classifiers but found that they used a lot of memory and took a long to time to run, to the extent where kNN was not feasible to use. Instead my classifier keeps track of which neurons correlate with which input categories.

As for Numenta’s MNIST classifiers, I do not know how they work.

I’m assuming you are refering to the overlap between SP mini-column patterns from different inputs in the same category. The statistical classifier which I use will actually recognise many different patterns as the same category, and within a category different inputs will not yield a 95% overlap in output. If the SP had 95% overlap within all categories that would mean that it had learned viewpoint invariance…


Why Python3? I’ve seen reports of problems integrating Python3 with nupic core. I’m interested because I’m now installing a Windows Visual Studio environment to try some HTM code and it’s prompting me to install Anaconda3 by default with optional support for Anaconda2 checked. I believe the default “Python Language Support” option will do straight Python2, and I’m going with that. Since I don’t know how well MS’s installer will support later installation of options, I’m going to take a chance by accepting their default inclusion of Anaconda3, and also include Anaconda2 – hoping these won’t interfere with getting some baseline nupic core programs to run in Python2.


YMMV with Anaconda. Hopefully you can work around it but it gets hairy having multiple versions of python on one system.


Yes, I saw that Anaconda doesn’t work and play well with nupic core in other posts. I’m taking a chance with this because:

  1. Seeing a Python3 program show up here means people are expanding the range of Python configurations for HTM code – and I’d like the option of running those other Python configurations if more such configurations show up (such as Anaconda2 or 3).
  2. By going with Visual Studio’s installation of all this, I’m hoping the way they configure their own versions of these components will be at least integrated and therefore permit me to not involve them in any builds without explicitly asking for them.
  3. I don’t trust MS’s installer to permit me to change my mind later if I want to add an optional component I skipped in the initial VS install.

Of course 30GB is a lot to download over a 3Mbps WISP out here in rural Iowa.