Experimenting with stacking Spatial Poolers

I have been messing with stacking multiple layers of Spatial Poolers together and see what happens. I originally thought this is a bad idea, but heck, stacked RBMs are a thing. Why not Stacked Spatial Pooler?

The setup is simple.

  1. Train a stack of N SpatialPoolers with random values between 0~1. Encoded using GridCell
  2. Encode and send the value 0.5 to the stack of SPs and get the output.
  3. Encode and send values from 0~1. See how many bits overlap with 0.5.


Since Spatial Pooler is almost a dimension reduction algorithm. I suppose like any dimension reduction algorithm, the more you do reduce, the less of information gets through and thus HTM should have a harder time to distinguish between values as I stack more and more SPs together. So, when I stacked enough SPs together, the plot of overlapping bits should become a flat line.


Here is the source code:

You will need

  1. Etaler with at least the commit ab9f15d (latest as now)
  2. ROOT for plotting in C++

to run the experiment.


This is a plot of how many bits overlapped when there is no SP. The encoded value goes straight from encoder to comparison.

And with 1 SP. The plot looks more concentrated.

With 2 Spatial Poolers, it gets rectangular-ish

With 4 Spatial Poolers, it gets really like a rectangle

At 8 Spatial Poolers, it becomes an rectangle. Also a funny thing I find is that the base of the plotted shape seems doesn’t widen up. Instead they contracts.

And with 16 SPs. It is the same as with 8 SPs. Nothing changes. Why? Why does stacking SPs together concentrates the activation. This makes no seance mathematically.


I was expecting the plot goes from a bell-shape initially (which I gets right) to a flat line as more and more SP are stacked together. But no, instead the values that can activate the final layer of SP gets more and more concentrated.

Maybe this is how the Neo cortex stabilizes and categorizes data? By using multiple layers of Spatial Poolers? Any comments or theory on why this is happening?


Awesome @marty1885. What is the output (input of next SP) of an SP in this case? Are they columns or the connected synapses?

This was a fun experiment!

How long did you train, and what were the perm increment / decrement settings? Are you training one SP and then stacking it? Or setting up the stack and training them all individually? I would be interested in seeing the results from untrained SPs as well.

I trained the SPs 1000 times, each with a randomly generated value. And no, I stacked all the SPs together and then trained them all together.

As for using untrained SPs. It works as I originally expected - The amount of overlapping cells eventually flattened out.

Edit: The perm inc/dec settings is exactly as same as it is in NuPIC by default. perm_inc and perm_dec = 0.1


Personally I think the results have some practical implications. Maybe the cortex is doing this implicitly to decide what class of object it is processing, etc… Maybe the research team can provide a better explanation and take advantage of this?

1 Like

It works like how a Neural Network works. There is your input SDR x0, you send it into sp1 and you get x1, you send x1 to sp2 so you get x2, etc, etc… until you have xN-1, send it to spN, get xN. And xN is the output of the entire stack.

1 Like

Nice work!

I possibly did essentially the same thing by connecting self-learning memory systems in series. RAM data from the first addresses the next RAM space.

My best guess is that the connection represents going from brain area to brain area. Each memory area tries to make sense of the one before it.

This might be interesting to try on visual information, similar to making a V1, V2, V3, but with all doing the same thing, like now, instead of trying to start from research papers on how the human brain divides up the problem.

1 Like

Any volunteers? I build Etaler for this exact purpose - performing research and handle large amount of computing. I’m out of time doing this sort of large experiments… But I’ll do my best to support. or maybe we could collaborate.


If you are going to process anything visual, I highly suggest you enable local inhibition, which literally adds a whole new dimension to the problem space. :wink: I would also be really interested in seeing how stacked 2D SPs work.

1 Like

Before possibly reinventing the wheel I spent time searching Google Scholar and elsewhere for helpful information, but I did not find anything worth mentioning.

After a few days of thinking a much easier starting point became an earlier more hippocampus related project where I need to connect memory layers much as you described. It gets added to an already existing navigation network containing full scale that vectors out paths between us and what attracts us.

All 2D layers do the same thing. Only difference is the 2D networks are different sizes, from maybe 100 wide to a 1 place/node wide network. There would then be interconnected multiple levels of detail, from whole room, to its cups (with close distance detail filled in by recall of multiple experiences of cups) on its mind at the same time.

If this turns out to make sense in neuroscience too (any evidence either way anyone?) then that will take care of the lesser understood farthest end of the visual stream, and what to do from there, to make something come to life in some biologically plausible way. Any ideas how to wire that one up?


No idea for the biological side of stuff nor how to wiring it. If you need grid cell modules, not just encoders. Lior (who is often not on the forums) is working on one and the PR should come soon.

I’m thinking about potential ways to make it work and I’ll update when I have an idea,

1 Like

It’s here assumed grid cell signals are part of self-location in each 2D map/module. Program calculated X,Y location variables for all body parts already exist, and we can start with these exact head/body angle and coordinates for body center and mouth. Some might call that an easy way to cheat (on a most baffling part) but we can say it is a “machine learning” enabled gift resulting in self-location super-powers, instead of only mortal cellular approximates like ours.

Starting with the standard animal cognition upper motor commands of Left/Right and Forward/Reverse further simplifies, while at the same time being true to biology. Going straight to text output would make a chatbot, not something new of possible interest to neuroscience.

The smallest (1D or) 2D map grid would represent a unique room, while the most detailed map further places itself inside the boundaries of objects to navigate around or over. Sparse data would be one bit per place, for mapping surfaces of solids to touch or bump into, in the 2D maps that stack into a 3D representation. Most all else in each map is empty space around the object, all 0’s. At first the representation would be mapped by a 2D flatland world view of invisible shock zones and invisible wall locations. Can later add bits for color and other properties.

Map geometry must contain (at center is intersection of surrounding 2D or 3D triangles) hexagonal places, but it may be possible that the exact geometry of the mapped data does not matter. In that case each Y location can be shifted one place radius to the right, from the previous, or use 6 cells per hexagonal place/column/subpopulation/group that senses and memorize navigational traveling waves received at its one input.

Less detailed maps would at some level fill remaining gaps seen in the most detailed. There is this way already an articulation mechanism where at the very tip the entire arena circle can be seen as one place. Pooling horizontally as well as vertically should generalize in a way that predicts a connected shape, based upon on a limited number of points. This would add something missing from behavior when using only one 2D map, causing it to have to bash into the wall everywhere before seeing itself fully enclosed.

One question would be (without adding code to instruct to do so) whether after bashing into the invisible walls enough times the virtual critter predicts the wall locations it didn’t bash into yet, and will (when not overly hungry) test its predictions/guesses/hypotheses by slowing down to pleasant bump for touching solid object surfaces at these locations. If true then the bit for that place gets (where necessary) set to 1 in all map layers, else nothing was really there and false prediction remains 0.

Jeff recently mentioned how he thinks this older part of the brain pertains to the later added neocortex:

If cortical columns repeat the same overall methodology in miniature then HTM spatial pooling can be expected to in some way work for both.

I wrote more here in regards to modeling having become easier, and new Torch code example to help get things started:

I’m hoping what I described makes better sense to you at the HTM coding level. Grid module signals became good clues for an underlying memory organization, where a machine learning approach may better demonstrate fundamental basics of how it works.

What is now most needed is the horizontal interconnection geometry of “grid” cell sized modules each 1.4 to 1.8 or so different in size from the next. Bitking?

1 Like

Hey @marty1885,

I’m very interested with your finding, in fact this is related to some of my hypothesis about the SP. I’m quite busy with my day time jobs however I’d be happy to play with your framework and the stack SPs.

One of my hypothesis about the SP’s capabilities is that it can be used to search for potential encoders for a particular dataset akin to CNN’s capability for searching an image filter/kernels for feature extraction. This can then hopefully replace the hand-coded SP encoders. What you have just showed here is I believe a concrete proof of this capability. I tried to experiment with this encoder idea but I didn’t get any progress. I hope to gain more understanding in your experiment and draw more conclusions.

1 Like

@Jose_Cueto Glad you found my experiment useful! The framework should be run-able on Windows, Mac and Linux. But my experiment does rely on ROOT doing the plotting. Which only supports Linux and OS X now (Windows is in Alpha). You might want to replace it with something else if you are on Windows.

1 Like

@marty1885 I inhabit in the *nix world. Anyway thanks for the response, can you please link me to any instructions for getting started with your framework? Is it possible to freeze an SP and make a copy of it?

You can find the source code of the framework here.

To build it, you’ll need:

  1. C++17 capable compiler
  2. Intel TBB
  3. Catch2 (for tests, you don’t need it if you are not building the tests)
  4. OpenCL headers for GPU support (only if you enable GPU)

For building instructions, see https://github.com/etaler/Etaler#building-from-source

And you’ll also need ROOT for plotting.

I have never thought of this… No proper way to copy a SP now. But you can workaround it. I’ll add the feature soon.

  1. Serialize to disk and load
save(sp.states(), "sp.cereal");
  1. Transfer the SP from the backend to itself
SpatialPooler sp_backup = sp.to(sp.connections_.backend());

Unfortunately copying sp.states() won’t work for architecture reasons (without some updates, at least).

1 Like

@Jose_Cueto I’ve just pushed the copy feature to master. Please pull again and the function should be there. Now you can do

SpatialPooler sp2 = sp.copy()

The feature is implemented using the send to the same backend hack. But I figure it is a proper way to do so.


Hey thanks, I’m going to have a look at this this weekend.

1 Like

@marty1885 In your gist, what is the value of “y” here? Is it the activated columns? Or something else? Could you please elaborate in HTM terms? I believe y here is the output of the SP but what is it really?

The y variable is the output of the SP, in active columns (or active cells, since SP have one cell per column).