Spatial Pooler Implementation for MNIST Dataset

Summary
I implemented a spatial pooler (S.P.) and Maximal Likelihood classifier and used them to achieve 77% accuracy on the MNIST dataset. My implementation differs in a few ways from Numenta’s.

The code and figures are available at: GitHub - ctrl-z-9000-times/HTM_experiments: Old experiments, use sdr_algorithms repo instead.
It is written in python3.

Test Methods
I trained the S.P. for 60,000 cycles on the MNIST training data. There are 60,000 samples of training data. I used synthetic training data. I applied random rotations (max rotation of 15 degrees), and random shifts. The resulting (Encoder + S.P. + M.L. Classifier) system is able to correctly classify approximately 77% of the test dataset.

Implementation Differences
This implementation is based on but differs from the one described by Numenta’s Spacial Pooler white paper, (Cui, Ahmad, Hawkins, 2017, “The HTM Spacial Pooler - a neocortical…”) in two main ways, the boosting function and the local inhibition mechanism.

Logarithmic Boosting Function:
In (Cui, Ahmad, Hawkins, 2017), they use an exponential boosting function (see figure 1D from their paper). Notice that their curve intercepts the boost-factor axis and has an asymptote along the activation frequency axis. The activation frequency is by definition constrained to the range [0, 1].

I use the inverse of their function, which intercepts the activation-frequency axis and asymptotically approaches the boost-factors axis. Then scale the boost factor such that at the desired sparsity it equals 1.0

 boost_function = -log( activation_frequency )
 scale_factor   = 1 / boost_function( target_sparsity )
 boost_factor   = boost_function( activation_frequency ) * scale_factor
 boost_factor   = log( activation_frequency ) / log( target_sparsity )

This mechanism has the advantage of having no parameters.
This mechanism yields an entropy of 97% of the theoretical maximum.

Plot comparing the two boosting functions.

Faster Local Inhibition:
In (Cui, Ahmad, Hawkins, 2017), they activate the top K most excited columns in each area,
where K is proportional to the sparsity, and the area is a fixed radius from each column
which is proportional to the radius of the receptive field.

I activate the top K most excited columns globally, after normalizing all columns by their
local area mean and standard deviation. The local area is a Gaussian filtering and
the standard deviation of the Gaussian is proportional to the radius of the receptive field.
In pseudo code:

mean_normalized = excitement - gaussian_blur( excitement, radius )
standard_deviation = sqrt( gaussian_blur( mean_normalized ^ 2, radius ) )
normalized = mean_normalized / standard_deviation
activate = top_k( normalized, sparsity * number_of_columns )

The above figure shows the S.P. pipeline. From left to right, top down: Input is accumulated yielding the raw excitement, the activation frequency (aka duty cycle) is used to boost each column (labeled “Boosted”), then that is normalized by each area’s mean and std-dev (labeled “Locally Inhibited Excitement”), finally the top 2% of columns are selected.

The following samples of inputs and outputs show that although the active columns overlap with the ‘interesting’ areas of the input, they are evenly distributed across those areas. This validates my method of local-inhibition followed by global selection which prevents one area from activating disproportionately.

Note: The input shape is 28x28, the columns shape is 56x56, and the radius for connecting inputs to columns is 3 units (units from the input space). In these figures the column locations line up with their locations in the input space.

Link to Sample Inputs and Outputs:
https:// github.com/ctrl-z-9000-times/HTM_experiments/blob/master/MNIST_figures/sample_activations.png
*Sorry to break the links, new users are restricted to a single image and 2 links

In an ablative experiment, I disabled the local inhibition step. Active columns are still selected globally and columns still have an input radius. You can see that the active columns still tend to be near the interesting areas of the input but that they are clumped. Also, there appear to be areas which are systematically more active.

Link to Sample Inputs and Outputs, without local inhibition:
https:// github.com/ctrl-z-9000-times/HTM_experiments/blob/master/MNIST_figures/sample_activations_no_local_inhib.png

Conclusion
I would like to continue working with the MNIST dataset. I think that with work I could reach my goal of 90% accuracy.

8 Likes

This is really nice work. I think that you should shoot for ~95% accuracy with the spatial pooler. That is what we (Numenta) get with our SP-only implementation and is still very far from state of the art. But keep in mind that MNIST is a very simple image classification task and the state of the art solutions overfit and get better than human results.

Also just a note that I have been reorganizing nupic.vision [1] so there will be some changes. But I will keep the MNIST SP-only implementation in task and hopefully make it easier to run.

[1] https://github.com/numenta/nupic.vision

2 Likes

I have a question relatively to ~95% accuracy. As I understood you were able to reach this accuracy because of kNN on top of spatial pooler, and not because patterns from SP had 95% overlap. Right?

My SP reaches 93-94% accuracy at best and does not use a kNN classifier. I experimented with kNN classifiers but found that they used a lot of memory and took a long to time to run, to the extent where kNN was not feasible to use. Instead my classifier keeps track of which neurons correlate with which input categories.

As for Numenta’s MNIST classifiers, I do not know how they work.

I’m assuming you are refering to the overlap between SP mini-column patterns from different inputs in the same category. The statistical classifier which I use will actually recognise many different patterns as the same category, and within a category different inputs will not yield a 95% overlap in output. If the SP had 95% overlap within all categories that would mean that it had learned viewpoint invariance…

1 Like

Why Python3? I’ve seen reports of problems integrating Python3 with nupic core. I’m interested because I’m now installing a Windows Visual Studio environment to try some HTM code and it’s prompting me to install Anaconda3 by default with optional support for Anaconda2 checked. I believe the default “Python Language Support” option will do straight Python2, and I’m going with that. Since I don’t know how well MS’s installer will support later installation of options, I’m going to take a chance by accepting their default inclusion of Anaconda3, and also include Anaconda2 – hoping these won’t interfere with getting some baseline nupic core programs to run in Python2.

YMMV with Anaconda. Hopefully you can work around it but it gets hairy having multiple versions of python on one system.

1 Like

Yes, I saw that Anaconda doesn’t work and play well with nupic core in other posts. I’m taking a chance with this because:

  1. Seeing a Python3 program show up here means people are expanding the range of Python configurations for HTM code – and I’d like the option of running those other Python configurations if more such configurations show up (such as Anaconda2 or 3).
  2. By going with Visual Studio’s installation of all this, I’m hoping the way they configure their own versions of these components will be at least integrated and therefore permit me to not involve them in any builds without explicitly asking for them.
  3. I don’t trust MS’s installer to permit me to change my mind later if I want to add an optional component I skipped in the initial VS install.

Of course 30GB is a lot to download over a 3Mbps WISP out here in rural Iowa.

1 Like

@rhyolight did nupic work on python3 (I’m using python 27 and i have many problems in moduls )??, if yes can you give me the steps to install nupic witn python 3

No it does not. Stay tuned for the community fork.

3 Likes

A post was split to a new topic: Import Error : cannot Import name SpatialPooler

Hi great work. May I ask if during training the sequence of inputs affect the accuracy? IOW if you randomly pick a sample from the training data during training, does each training yield different accuracies? Reason I ask is I’m studying the SP’s properties and I want to understand more of it. Cheers.

My training data is shuffled and iterated through several times. It should not make that much of a difference, but know that the SP has a natural distribution of accuracies. As with all measurements, if you really want to be certain of any measurement you should measure many times and find the mean and standard deviation.

Thanks for mentioning these. Would you mind also mentioning the standard deviation and mean of the accuracies? If its not too much which classifier algorithm did you use?

I do not remember the mean and standard deviations of this experiment. I’ve actually since come up with a different method of local inhibition for the spatial pooler which works better than what’s described here … My latest work in htm-community/nupic.cpp consistently scores >=95%.

For this experiment I used my own SDR-Classifier which is a clone / re-implementation of the nupic SDR-Classifier.

1 Like

I see thanks. I will read your code as I’m really interested with the algorithm and why it works very well and eventually try to fit with my understanding with the SP.

Hi Scott, do you have some code-sample or more information about MNIST with you SP-only implementation?

Just heads up,

it is now really easy and convenient to experiment with HTM (SP) on the MNIST-like datasets for image classification in htm.core (former nupic.cpp)

I’ve tuned the params, so the model is much smaller (columns), thus faster (~60s) and reaches over 95% on a single pass through data and without any data augmentation (boxing, image rotation, moving around,…)

But I’m reaching because of the log boost function.


It should be now easy to experiment with different: log, exp, none boosting methods.

Comapared to the original post, once the SP’s other params have been tuned, log boosting is performing slightly worse than exp. (and is slower too).
Interestingly, if entropy of the columns would be the measure, no-boosting performs the best (and is the fastest), on the validation test the best results are with exp boosting followed by log.

4 Likes

Can I just ask whether this is a 2D column setup by chance that you are using?

And if so, if you completely remove learning on the SP and just feed in to SDR generated into the KNN classifier what accuracy would you get?

Log boosting seems to create a lot of spurious synapses in TM. If i’m correct, when the activity of a mini-column is low, the boostOverlaps_ of that mini-column is “huge”. That makes that the winners for the same input change substantially.

As a note of caution seems like homeostatic synaptic plasticity (the mechanism behind boost) is disabled in L4 quite soon (After the animal was born [1]). Most of the papers about homeostatic synaptic plasticity talk about the inverse proportion between activation frequency and mEPSC voltage [2] (which ultimately affects who will win the inhibition).

In any case, in [2] you can read:

Currently, we know little about the cellular
and molecular mechanisms underlying homeostatic plasticity in vivo.

[1] A. Maffei, S. B. Nelson, and G. G. Turrigiano, “Selective reconfiguration of layer 4 visual cortical circuitry by visual deprivation,” Nat. Neurosci. , vol. 7, no. 12, pp. 1353–1359, 2004.

[2] G. Turrigiano, “Homeostatic synaptic plasticity: Local and global mechanisms for stabilizing neuronal function,” Cold Spring Harb. Perspect. Biol. , vol. 4, no. 1, pp. 1–17, 2012.

1 Like