Can you stack multiple spatial pooler regions on top of each other?

Hello, forum!

I’m trying to stack multiple spatial pooler regions on top of each other in a network. They are preceded by a sensor and followed by a temporal memory region. My question really is the following: how would the linking between each pooler change? This is what I have for the sensor to pooler link:"sensor", "spatialPoolerRegion", "UniformLink", "")"sensor", "spatialPoolerRegion", "UniformLink", "",
    srcOutput = "resetOut", destInput = "resetIn")"spatialPoolerRegion", "sensor", "UniformLink", "",
    srcOutput = "spatialTopDownOut", destInput = "spatialTopDownIn")"spatialPoolerRegion", "sensor", "UniformLink", "",
        srcOutput = "temporalTopDownOut", destInput = "temporalTopDownIn")

Can I add another region and call it “spatialPoolerRegion2”, or is that unacceptable?


1 Like

I am trying to get 5 different scalar inputs into the HTM and hopefully predict 3 of them. Each of the scalar inputs is changing over the same time scale. Should each scalar encoder be paired with a temporal memory one, like in the file?


Hi @eW3bb,

Is this question related to this discussion? Does the Geospatial Coordinate Encoder support prediction?

If so, I wouldn’t mind starting with a summary of the problem to help anchor the discussion. Would it be correct to say that you have an object moving through 3D space in some sort of pattern that would allow prediction, and you want to predict the next location?

Have you tried simply feeding each scalar stream into its own Encoder–>SP–>TM model? As the simplest approach I think this is worth starting with, and will always let you predict each value individually. You can also feed the 5 streams into a single model, though I don’t think you can predict each of them individually, only detect anomalies on the total stream as a whole.

I’m not suggesting that hierarchical stacking wouldn’t work, though I think this approach (how well and why it works) is a lot more uncharted.

Okay, here’s what I want to do:

  • I have an object moving over time, going to one of a few specified locations. I know its position at every time step and I want to predict either the final location or the next step in the path.

The current goal is to pass in the position and receive the final location as an output. Is this possible with a single HTM (with stacked scalar encoders)? Do I need to encode the final locations of the training set into the network? Then, would I stack them as such:

  • Scalar/Scalar/Scalar/Temporal



Huge disclaimer: I’m very far from the most experienced person on this forum, if anyone else chimes in then I’ll likely defer to them :grinning: Also, my approach is to build up the answer from basics, this is for my benefit as much as anything, so don’t think I’m being condescending!

It’s an interesting problem. @cogmission’s comment from the other topic is worth keeping in mind:

Encoders transform input data into a format which encodes the semantics of the problem domain

(I’d merge these topics but don’t have the permissions).

Definitely read/watch these if you haven’t already:

So keeping in mind that semantically similar input should result in a higher overlap, you have two different measurements at play (location and a movement vector).

In isolation, the concepts are straightforward:

  • The closer two objects are in 3D space, the more overlap should result because we’d assume close proximity implies semantic similarity
  • The closer the two movement vectors match, the more overlap should result because we’d assume that similar direction and speed implies semantic similarity

As others have noted, there’s an implementation here, although it’s been noted as an incomplete implementation and needs some work.

But if you don’t use this approach, as @sheiser1 said you’ll have a bunch of predictions in isolation. In your scenario, to me this pushes the overall “semantic similarity” problem downstream. Let me butcher the tutorial image from “Building HTM Systems” to try to illustrate:

You now know on each individual axis how close the objects are, but we’re still just as far from semantic similarity in the 3D space (e.g. similar position on the x axis does not imply similar overall location in 3D).

The problem of encoding the movement vector is similar and done in parallel, then I imagine stacking the spatial poolers would work to predict the next step.

As for predicting the final location, I guess this would be a subsequent post-HTM classifier step, predicting a categorical value that you’d need to provide alongside the other data. Of course the usual machine learning caveats apply, the data in the sequence would have to have predictive power on the final location, whenever it’s ambiguous in the real world then it won’t be an accurate prediction.


Well said @jimmyw.

I wna make sure I follow. By ‘stacking’ do you mean sending SP1’s activeColumns to SP2, and sending that to a TM.compute for the prediction? I’m really curious about both your and @eW3bb’s intuition for this stacking.

Your task sounds similar to mine @eW3bb. I have joystick-generated x,y,z movement and acceleration values, made by people on a repeated task. The goal is to classify new unlabeled play-streams to known subjects. It is simpler though as I’m doing just anomaly detection w/out prediction.

Now I’m simply using the multi-encoder to feed all 4 scalars into one SP->TM model for each subject. When a new unidentified stream flows is, I run all saved models on it and rank them by total anomaly score. The one with the lowest is the best match.

This seems pretty crude and I wonder what you (and anyone) thinks.

1 Like

Thanks @sheiser1, it sounds like you’ve explored this area quite a bit already.

My intuition was that it makes sense to divide the encoding into two independent parts, because putting location and direction on the same SDR would break the principle of maintaining overlap for semantic similarity. I’ll show my thinking a bit more, hopefully without labouring the point too much.

Here you have location as a blue dot, scalar encodings are the thick blue lines and the dashed sphere is kind of what the CoordinateEncoder produces, with a configurable radius and flattened of course.

Now introducing direction vectors:

In our problem domain, the light green arrow makes for a very different overall semantic meaning to the dark green arrow, to the extent that the similarity we had by having similar locations is completely eroded.

For example in the below picture:

  • the blue location is semantically different to the orange location, but
  • the combined (blue location + light green vector) is semantically similar to the (orange location + purple vector) because they are heading toward the same location
  • the combined (blue location + light green vector) is semantically completely different to the (blue location + dark green vector) because they are heading in opposite directions

Therefore I figured that what we’d want to avoid is confusing the spatial pooler by having them in the same input space.

So back to @eW3bb’s initial question, which of these should it look like? :

I think for accuracy it needs to be the right hand side, but I’m not experienced enough with nupic yet to know whether or not this makes architectural sense.


Upon reflection, I think in my last post I’ve put too much focus on the movement stream which could be redundant anyway. The history of movement is really included in the location history, and knowing the current movement really just means you know the very next location value.

1 Like