Experimenting with stacking Spatial Poolers

Nice work!

I possibly did essentially the same thing by connecting self-learning memory systems in series. RAM data from the first addresses the next RAM space.

My best guess is that the connection represents going from brain area to brain area. Each memory area tries to make sense of the one before it.

This might be interesting to try on visual information, similar to making a V1, V2, V3, but with all doing the same thing, like now, instead of trying to start from research papers on how the human brain divides up the problem.

1 Like

Any volunteers? I build Etaler for this exact purpose - performing research and handle large amount of computing. I’m out of time doing this sort of large experiments… But I’ll do my best to support. or maybe we could collaborate.


If you are going to process anything visual, I highly suggest you enable local inhibition, which literally adds a whole new dimension to the problem space. :wink: I would also be really interested in seeing how stacked 2D SPs work.

1 Like

Before possibly reinventing the wheel I spent time searching Google Scholar and elsewhere for helpful information, but I did not find anything worth mentioning.

After a few days of thinking a much easier starting point became an earlier more hippocampus related project where I need to connect memory layers much as you described. It gets added to an already existing navigation network containing full scale that vectors out paths between us and what attracts us.

All 2D layers do the same thing. Only difference is the 2D networks are different sizes, from maybe 100 wide to a 1 place/node wide network. There would then be interconnected multiple levels of detail, from whole room, to its cups (with close distance detail filled in by recall of multiple experiences of cups) on its mind at the same time.

If this turns out to make sense in neuroscience too (any evidence either way anyone?) then that will take care of the lesser understood farthest end of the visual stream, and what to do from there, to make something come to life in some biologically plausible way. Any ideas how to wire that one up?


No idea for the biological side of stuff nor how to wiring it. If you need grid cell modules, not just encoders. Lior (who is often not on the forums) is working on one and the PR should come soon.

I’m thinking about potential ways to make it work and I’ll update when I have an idea,

1 Like

It’s here assumed grid cell signals are part of self-location in each 2D map/module. Program calculated X,Y location variables for all body parts already exist, and we can start with these exact head/body angle and coordinates for body center and mouth. Some might call that an easy way to cheat (on a most baffling part) but we can say it is a “machine learning” enabled gift resulting in self-location super-powers, instead of only mortal cellular approximates like ours.

Starting with the standard animal cognition upper motor commands of Left/Right and Forward/Reverse further simplifies, while at the same time being true to biology. Going straight to text output would make a chatbot, not something new of possible interest to neuroscience.

The smallest (1D or) 2D map grid would represent a unique room, while the most detailed map further places itself inside the boundaries of objects to navigate around or over. Sparse data would be one bit per place, for mapping surfaces of solids to touch or bump into, in the 2D maps that stack into a 3D representation. Most all else in each map is empty space around the object, all 0’s. At first the representation would be mapped by a 2D flatland world view of invisible shock zones and invisible wall locations. Can later add bits for color and other properties.

Map geometry must contain (at center is intersection of surrounding 2D or 3D triangles) hexagonal places, but it may be possible that the exact geometry of the mapped data does not matter. In that case each Y location can be shifted one place radius to the right, from the previous, or use 6 cells per hexagonal place/column/subpopulation/group that senses and memorize navigational traveling waves received at its one input.

Less detailed maps would at some level fill remaining gaps seen in the most detailed. There is this way already an articulation mechanism where at the very tip the entire arena circle can be seen as one place. Pooling horizontally as well as vertically should generalize in a way that predicts a connected shape, based upon on a limited number of points. This would add something missing from behavior when using only one 2D map, causing it to have to bash into the wall everywhere before seeing itself fully enclosed.

One question would be (without adding code to instruct to do so) whether after bashing into the invisible walls enough times the virtual critter predicts the wall locations it didn’t bash into yet, and will (when not overly hungry) test its predictions/guesses/hypotheses by slowing down to pleasant bump for touching solid object surfaces at these locations. If true then the bit for that place gets (where necessary) set to 1 in all map layers, else nothing was really there and false prediction remains 0.

Jeff recently mentioned how he thinks this older part of the brain pertains to the later added neocortex:

If cortical columns repeat the same overall methodology in miniature then HTM spatial pooling can be expected to in some way work for both.

I wrote more here in regards to modeling having become easier, and new Torch code example to help get things started:

I’m hoping what I described makes better sense to you at the HTM coding level. Grid module signals became good clues for an underlying memory organization, where a machine learning approach may better demonstrate fundamental basics of how it works.

What is now most needed is the horizontal interconnection geometry of “grid” cell sized modules each 1.4 to 1.8 or so different in size from the next. Bitking?

1 Like

Hey @marty1885,

I’m very interested with your finding, in fact this is related to some of my hypothesis about the SP. I’m quite busy with my day time jobs however I’d be happy to play with your framework and the stack SPs.

One of my hypothesis about the SP’s capabilities is that it can be used to search for potential encoders for a particular dataset akin to CNN’s capability for searching an image filter/kernels for feature extraction. This can then hopefully replace the hand-coded SP encoders. What you have just showed here is I believe a concrete proof of this capability. I tried to experiment with this encoder idea but I didn’t get any progress. I hope to gain more understanding in your experiment and draw more conclusions.

1 Like

@Jose_Cueto Glad you found my experiment useful! The framework should be run-able on Windows, Mac and Linux. But my experiment does rely on ROOT doing the plotting. Which only supports Linux and OS X now (Windows is in Alpha). You might want to replace it with something else if you are on Windows.

1 Like

@marty1885 I inhabit in the *nix world. Anyway thanks for the response, can you please link me to any instructions for getting started with your framework? Is it possible to freeze an SP and make a copy of it?

You can find the source code of the framework here.

To build it, you’ll need:

  1. C++17 capable compiler
  2. Intel TBB
  3. Catch2 (for tests, you don’t need it if you are not building the tests)
  4. OpenCL headers for GPU support (only if you enable GPU)

For building instructions, see https://github.com/etaler/Etaler#building-from-source

And you’ll also need ROOT for plotting.

I have never thought of this… No proper way to copy a SP now. But you can workaround it. I’ll add the feature soon.

  1. Serialize to disk and load
save(sp.states(), "sp.cereal");
  1. Transfer the SP from the backend to itself
SpatialPooler sp_backup = sp.to(sp.connections_.backend());

Unfortunately copying sp.states() won’t work for architecture reasons (without some updates, at least).

1 Like

@Jose_Cueto I’ve just pushed the copy feature to master. Please pull again and the function should be there. Now you can do

SpatialPooler sp2 = sp.copy()

The feature is implemented using the send to the same backend hack. But I figure it is a proper way to do so.


Hey thanks, I’m going to have a look at this this weekend.

1 Like

@marty1885 In your gist, what is the value of “y” here? Is it the activated columns? Or something else? Could you please elaborate in HTM terms? I believe y here is the output of the SP but what is it really?

The y variable is the output of the SP, in active columns (or active cells, since SP have one cell per column).

I’d like to disagree on this conclusion due to the fact that a well-trained SP will result to a good classifier. A good classifier can distinguish patterns from sets of inputs hence it will also have a stable output (e.g. active columns). The output of a well-trained SP (e.g. SP1) retains only the features that matter the most, hence it will restrict SP2’s input domain, consequently SPN will restrict SPN+k’s input domain. It is similar to how function composition works where the outputs of these functions have smaller domains/sets - f(g(h(i(j(k(\R)))))), k reduces the output k(\R) and so on leaving f with a smaller input domain. I think the experiment and the opposite of the expectation makes sense, at least to me.

Edit: Restrict here means reducing the set size.

1 Like

Someone just linked me to your post when I asked about stacking layers of spatial poolers and temporal memories. I didn’t think to stack JUST spatial poolers, but from the results on random floats, it looks rather promising.

Now, I’m still very new to this field of research, but I have a basic understanding of the mechanics of HTM components, so I’ll give it my best guess theory as to what’s happening.

What I think is happening is while one layer is able to form a good, albeit “shallow” SDR of the random values, having a whopping EIGHT SP’s all work to encode a structure (could I call it a three dimensional structure?) that’s much, much deeper than a single layer. In my opinion, this could yield more detailed and accurate representations, and thus better predictions and outputs. What the overlap graphs tell me (and again, still new here!) is that the SP stack was able to narrow down an allocation of columns specifically for these values, and with the greater representational accuracy, is much more clearly able to “predict” the value 0.5. Don’t know if predict is the right term here, but it’s clear that the more stacked SP’s there are, the less overlap per given value is.

Of course, I do worry about the classic ML problem of overfitting data, which may or may not be happening here. What happens when you feed it other values, like 0.1 or 0.7? Do you still get a narrow square, or curve, in the overlap graphs?

Sure! This is the plots when centering around 0.1.

And the plots centering 0.7

And… GPUs are awesome. They generating these plots so fast.


I think more researches are needed. :smile:

I’m not expecting this result from the first place. So, well… Maybe this could be someone’s master’s thesis!

The role of the SP is to recognise static patterns in encoded input data. Input is sensor data, output is SDR.

The role of the TM is to recognise sequences of SDR patterns over time. Input is a sequence of SDRs.

Neither of these algorithms would be expected to do a good job at recognising higher order static patterns. Rather than stacking SPs, I would expect a new component, specialised for the purpose. It would take two or more SDRs as input, each derived from a different input source, and recognise patterns of input across multiple modalities. You might get that result by just concatenating SRDs and feeding them to an SP, but it probably isn’t the optimal solution.

Image processing is a case in point. One SP might recognise features like lines or angles, another might recognise locations or displacements, both need to be combined to recognise an object. Numenta has done some work on this, but not much published that I can see.


If we think of the stacked SP in this case as an encoder that learns encodings rather than learning for prediction/classification then I think we can get rid of classical overfitting problem.

Another way of looking at this is that the stacked SP’s is an instance of a problem solver in a particular problem space. Now if it is an instance then there must be some other instances out there that need to be discovered, hence the meaning of “overfitting” in mainstream ML becomes irrelevant because in theory Generalization here can be done by consensus of these instances.

1 Like