How SMI might inform Temporal Pooling

Now that you are moving full speed ahead with sensorimotor learning, where does temporal pooling stand?

So far there are lots of resources and documentation on spatial pooling and temporal sequence learning. Even lots of info on sensorimotor learning! But the concept of temporal pooling seems to be not so solidified. It was never covered in HTM school and the only discussions I’ve seen on the forum here is different strategies to try.

You have a TemporalMemory in NUPIC, but I’m pretty sure that’s just sequence learning. What’s the status of true Temporal Pooling? Is it on hold or will it be part of the sensorimotor learning research thrust? Did it fall through the cracks?

1 Like

It’s because a lot of people understand this very well at this point.

There is no focus on temporal pooling research, but in the SMI* model, the way objects are identified in the output layer feels very similar to temporal pooling in the sequence memory model. It makes sense that there are similar processes occurring on different layers (because of similar cellular structures), so that hints that temporal pooling might work in a similar way (some time of union? just spitballing).

* SMI: sensorimotor integration (am I allowed to invent jargon?)

By ‘Temporal Pooling’ are we referring to the process of recognizing what sequence we’re in, as opposed to predicting the next input as in Temporal Memory? I remember being told that this was now being referred to as ‘Union Pooling’, is this referring to something different?


@mrcslws Correct me if I’m wrong here, but union pooling is happening in the output layer of the SMI model to identify objects based on feature/location pairs. I would not call it temporal pooling. It might hint at ways we might perform temporal pooling, but that is not what current research is focused on.

Ok I gotcha. So if Temporal Pooling is in effect the pooling layer (above the TM layer) should be changing more slowly than the TM layer right? Because while the TM layer is seeing A,B,C,D,…X,Y,Z the Pooling layer is just seeing ‘alphabet’ the whole time. Is this broadly/conceptually right?

In the current implementation of hierarchy using the Network API Temporal Pooling is not being applied right? Although the lower level TM is being effected by the apical dendrites coming from the upper level TM, that upper level isn’t changing any more slowly than the lower right? Thanks again @rhyolight

More like the pooling layer is seeing every combination of every letter in the alphabet that has ever led to the current stream of input letters from any point in the past.

So let me be clear: there is no current implementation of hierarchy at Numenta. Not in NuPIC or in the SMI code. The Network API is the framework upon which we might build the hierarchy. We don’t have any code in NuPIC’s OPF that structures a hierarchical network. I’m also not aware of any networks that link TMs together in the way you are referring. Maybe you are are referring to the SMI model? but saying TM instead of “input layer”?

This is confusing because terminology changes sometimes, but I’ll talk to Jeff and clarify it before I start making SMI videos.

So I just want to make sure I understand things clearly. What was the resolution of TP? What was the conclusion?

I remember there was a lot of talk about this, youtube videos and such, but there was no resultant documentation or papers. What did you learn? Why are you no longer researching it?

It just kind of fell off the radar. I’m sure you had this discussion internally, but we internet enthusiasts are not privy to it.

One of the reasons I’m asking, besides actually wanting my own TP implementation, is that I actually did a 3-part series talk at my company presenting about the HTM theory starting from the HTM neuron, SDR, encoders, spatial poolers, and temporal sequence learning. I went into the advanced concepts of Sensorimotor Learning and Temporal Pooling. I found I had a lot to talk about in Sensorimotor Learning, but hardly any materials for talking about Temporal Pooling.

The only thing I could do was explain what it was and what the end result would be, but I had nothing else to talk about after that. It seems like such a critical component yet there is hardly any material on it.

As far as I know it has never been properly implemented. We know it must occur in the brain, but we don’t know exactly how. You might ask this question on #htm-theory and get a better answer there.

I was referring to the hierarchy as implemented in this demo:

I think I mis-spoke in saying that TM’s were linked together, rather that an SP from level 2 links to the TM from level 1.

I hardly know anything about the SMI stuff so I’ve gotta catch up there, though from what you say it seems that you guys are able to focus on it without an immediate need for Temporal Pooling, correct? This runs a bit counterintuitive to me, but I really don’t know enough to say.

We are writing software. I don’t think RNI did software.

That’s correct. Our solutions for object recognition in the output layer of SMI might inform us in the area of temporal pooling.

[quote=“rhyolight, post:12, topic:2490”]More like the pooling layer is seeing every combination of every letter in the alphabet that has ever led to the current stream of input letters from any point in the past.

This may be becoming a topic of its own, but I just thought I would check if my understanding is correct, since I didn’t quite understand the point @rhyolight is making. From my understanding, what is represented in a temporal pooling layer is not just a representation of the collection of inputs, but also the collection of their contexts. The TP layer should have a different representation of “ABCDEFG” than of “GFEDCBA” even though the collection of letters is the same, because in the TM layer the cells representing each of the letters was different due to the two different temporal contexts.

Based on my limited understanding, I would venture to guess that they will turn out to be extremely similar if not identical. In both cases, the pooling/object layer is receiving distal input from other cells in the same layer, receiving proximal input from the lower layer, and providing a biasing feedback to the lower layer. The main difference I think is in the lower layer, not the pooling/object layer. Specifically where the lower layer gets its distal input from to generate context (other cells in the same layer in the case of TM, or cells representing allocentric location in the case of SMI). Perhaps I am looking at it naively, but I can’t imagine the pooling/object layer really needs to know where the context is coming from (sequence or location) – it’s job is simply to form a union of input + context representations.

I think about it like this (I may be wrong).

Say you have a spatial input “D” represented as a set of active columns in the SP, and you want to find out what memorized sequences “D” is known to have been a part of. This is hard, because you can only identify cells cells that become predictive because of the spatial input “D”. As the sequence progresses (assuming we can label a sequence as it plays out), we should be able to narrow down the set of potential sequences that match the pattern. But I’ve never tried this, and I don’t know how it works. @Paul_Lamb I know you have done a lot of independent research on this topic, so you might be more informed that I am.

1 Like

Ah, yes I see your point. An easy way to imagine what you are describing is say you have trained two sequences A, B’, C’, D’, E’, F’, G’ and G, F’’, E’’, D’’, C’’, B’’, A’’, and you have done a reset (so no active or predictive sells in either the TM layer our TP layer). Then an input D comes in unexpectedly. The TM layer columns for D are bursting, with cells for E’ and C’’ in predictive state. In this scenario, your thought is that a representation containing a union of both possible sequences would activate in the TP layer. A subsequent input of E would then narrow down the active cells in the TP layer to represent just the first sequence.

Admittedly, I haven’t actually explored resets with pooling yet (the application I am working toward will not use a reset function and I haven’t had a need for one yet). So I have only explored scenarios where the TP/object layer always has active cells (unless of course you just started up the application and nothing has been learned yet). My main focus has been on switching from one sequence/object to another one, and how the TP/object layer transitions between different representations (and how two sequences/objects can become merged into a single representation).

From my understanding, the TP/object layer should be more stable than the TM/SMI layer. In other words, a single unexpected input should not significantly change the active cells in the TP/object layer. In the above scenario, say I input A, B, C, E, F, G (i.e. I accidentally skipped D). When I get to E and the columns burst, I don’t want the TP layer to immediately switch to a representation of every sequence that contains E (in fact, I don’t even want the columns for “E” to burst at all in this case – the biasing signal from the TP layer should already have cells for E’ predictive). A single error in the input stream should have minimal effect on the active cells in the TP/object layer, and it should recover quickly. If, on the other hand, I input “A, B, C, E, D, C, B, A” (i.e. I’ve switched to the second sequence), the further into the second sequence I get, the more the TP layer will shift its representation to the one for the second sequence. How quickly this transition happens depends on the configuration parameters.

Obviously, I am describing my own implementation of a TP/object layer, so I could be completely off base :slight_smile: But hopefully you can see what I meant by there being no difference in implementation between a TP or object layer. It’s purpose, as I see it, is to simply form a stable representation of the sequence/object, and use that representation to bias the predictions (and ultimately the activations) in the TM/SMI layer.

I just split this off of htm-hackers-hangout-july-7-2017. @jhawkins I wanted you to be aware of this discussion.

1 Like

@Paul_Lamb Sounds interesting! Do you have any working examples you can share?

Thanks Matt. The following might help.

Numenta did not implement a Temporal Pooling (TP) layer for sequence memory ™. We talked about it a lot, but didn’t do it. However, for the past year we have been working on sensory-motor inference (SMI), and as part of that effort we have implemented a TP layer. The temporal pooling layer for sequence memory and for sensory-motor inference ideally should be identical, the TP we did for SMI should work for TM.

The TP layer forms an SDR that is unique for the object being sensed. The TP SDR is also stable over changing input as long as the underlying object being sensed is the same. To achieve the cells in the TP layer have to learn to recognize every pattern in the SMI/TM layer. Bear in mind that the SDRs in the TM/SMI layer are unique to the object. If the TM is following a melody, that is it is correctly predicting the next input, then each TM SDR can uniquely identify the melody. The TP layer will be unique, not a union, to the melody. If an input is ambiguous, or unexpected, then it will cause a union of SDRs in the TM layer wihich will cause a union of SDRs in the TP layer.

The implementation of TP we did for SRI has two more features. First, the cells in an SDR for an object form connections to each other. This means that once the TP layer has locked onto an object SDR, it will continue be biased toward that interpretation. The other difference is we model multiple columns, and the TP layers in adjacent columns reinforce each other. This way if column 1 can’t tell if the object is A or B and column 2 can’t tell if it is B or C. Together they settle on B.

Our manuscrpt describing SMI, including the TP algorithm, is very very close to finished. We hope to post it to bioArchiv within a week. In the manuscript, we show that the TP layer in a single column can reliably recognize hundreds of objects.

Using the TP layer with TM introduces one little twist. The TM sequence memory can learn extremely long sequences. That’s not true for SMI wtih objects. Ther are only so many locations on a object that can be sensed. It is likely in the brain that when you learn melodies for example, that not all notes in the melody are equally learned by the TP layer. The beginning of the melody and the refrains would be more likely. I would expect that it is easier to name the tune at the beginning and during refrains (they repeat more)…


@jhawkins Thanks for weighing in! This was very helpful in clarifying the current state of TP. I look forward to the manuscript.

Sure, it will be up on GitHub at (this implementation is in JavaScript). I started some commits for TP, which you can see in the Temporal Pooling demo (I’m currently hosting the demos at That demo is incomplete – the TP layer isn’t feeding back to the TM layer yet, and the default parameters are pretty bad. I unfortunately coded my self into a bit of a corner during refactoring of the library, when I attached the process steps at the layer level. This makes bi-directional interactions between two layers problematic. I can go into this issue in more detail if you are interested, but the point is that I am refactoring this again to allow better coordination between layers during the process.

As for the parameters, you can try and tinker around with them to improve the behavior. You’ll find that there is a bit of a balancing act between:

  1. How quickly a representation gets locked in
  2. How stable the representation is
  3. How quickly it transitions when the sequence changes

These are also affected by the inputs themselves. I’ll tweak the default parameters to work better for the demo once I finish refactoring. In the mean time, you can at least get a sense for how representations are formed by inputting a simple sequence like “CDCDCDCDCDCD…” until the TP layer stabilizes, then switching to a different simple sequence like “FGFGFGFGFGFG…”

I’ll also be pushing up an object recognition demo once the refactoring is done (as I mentioned there is no difference in the pooling layer between sequence memory vs. object recognition). This one will demonstration what @jhawkins described, where cells in the pooling layer of adjacent cortical columns form distal connections with each other (the so-called “long distance” connections of the two-layer SMI circuit).

Good point – I definitely used the word “union” too loosely above, since it has a specific meaning in relation to SDRs. In my TP implementation, I take a union of active cells in the TM layer over the past few time steps to use as the input to a modified spatial pooler, selecting a unique combination of columns to activate in the TP layer. The slight modification from the normal SP process is the addition of a column persistence property which factors into which columns are activated (to increase stability of the representations).

I’m really looking forward to the manuscript being posted as well. I expect the algorithm to be a lot better than the one I am using in htm.js :smile: