Temporal Pooling Strategies

I have been experimenting with several different strategies for temporal pooling, and thought I would get some input from others who might have done their own implementations as well (or at least theorized on the process). To start the conversation, I’ll describe a couple of the implementations that I have come up with so far that seem to work, and list advantages and disadvantages I’ve encountered.

Strategy #1: Modify a spatial pooler so that sub-samples its input from a union of cells in the lower layer which have been active at any point over a configurable number of time steps. Behavior of the columns and cells of the pooling layer is otherwise identical to typical HTM (learning, bursting, etc). Additionally, cells in lower layer grow apical connections with cells in the pooling layer which are active over multiple time steps.

For sequence memory, this means multiple elements in a sequence get combined into a single representation in the pooling layer. When enough elements in that particular section of the sequence are re-encountered later, the cells representing that section of the sequence become active. The other cells in the lower layer which represent that section of the sequence become predictive (due to apical connections). This generates predictions for elements of the sequence multiple time steps into the future.

For object recognition, this means that feature/location pairs get combined into a single representation in the pooling layer. When enough feature/location inputs are re-encountered later, the cells representing the object become active. The other cells in the lower layer representing the object become predictive (due to apical connections).

This strategy seems to work best for object recognition. It also seems to be more resistant to switches between sequences or objects than the other strategies I have tried.

One disadvantage in sequence memory for long sequences is that you end up with multiple different representations in the pooling layer (versus a single representation). For object recognition it seems to resolve itself over time as the feature/location pairs are encountered in different orders.

Strategy #2: Cells in pooling layer grow proximal connections with active cells in the lower layer at the current timestep. Once activated, they remain active over a configurable number of time steps. While cells in the pooling layer are active, cells in the lower layer grow apical connections with them over multiple time steps. This creates a sort of trailing effect, where the full representation in the pooling layer changes more slowly (longest active cells deactivating as newer cells activate, maintaining a fixed sparsity).

The advantage of this strategy is that it is able to more quickly begin predicting multiple time steps into the future in the lower layer (because some of the cells in the pooling layer representing the sequence or object activate right away, versus trailing behind a few time steps).

The disadvantage of this strategy is it isn’t able to take advantage of the learning process that comes with a spatial pooler. I haven’t gotten around to implementing one, as the first strategy seems to be a better approach. Without learning, it also has obvious problems of not specializing to particular inputs.

Strategy #3: Cells in the pooling layer grow proximal connections with active cells in the lower layer at the current time step. When activated, cells in the pooling layer also grow proximal connections with a percentage of active cells in the lower layer at the previous time step. Cells in the pooling layer do not remain active over multiple time steps if they only receive proximal input from a single time step. Instead they remain active over multiple time steps due to proximal input over multiple time steps. The more frequently a context is encountered, the longer the cells in the pooling layer remain active.

The advantage of this strategy when used in sequence memory is that it can be used to generate an active representation of elements of the sequence multiple time steps into the future. The other strategies do this too, but with this strategy it is able to look further and further into the future the more often a particular sequence is encountered, with no theoretical limit. Other strategies are limited in how far forward they can predict by the configuration. Additionally the representation is a forward-looking representation (versus a representation that includes elements of the sequence that have already happened). This is a useful property for RL in particular.

The disadvantage of this strategy is that it does not demonstrate a fixed sparsity. The more often a sequence is encountered, the more dense the representation becomes. In practice, this would probably require a system of sub-sampling to maintain fixed sparsity (shouldn’t have too negative an impact, given the properties of SDRs)

2 Likes

Having a cap to the length of the learnable sequence might not be a bad thing, since it probably needs multiple levels of abstraction. An extremely long sequence isn’t best in most situations. This doesn’t mean you need a hierarchy. A mix of durations in the same pooler could work.

To create a fixed sparsity, you could send the pooling layer’s output to a spatial pooled, like the union pooled on github does or did.

Maybe a mix of those strategies would be most flexible. For example, you could randomly assign each cell a strategy. That way, at least some cells will e.g. learn quickly in each situation, creating a recognizable pattern for downstream cells (while unfit cells have little impact because they aren’t part of that pattern).

1 Like

There is another facet of temporal pooling that I thought I would get some ideas on. Say I go with Strategy #1 above, where I use active cells over N time steps from a typical sequence memory layer as the input to a spatial pooler controlling active columns in a second typical sequence memory layer.

Should there be a parity of time steps (where one time step in the first layer is equal to one time step in the second layer), or should the second layer work on a slower timescale (increment its time steps every N time steps in the first layer)?

I have implemented both ways, and second seems to be somewhat more stable in initial tests, but just curious how others may have implemented time relationships in temporal pooling.

I’ve always been working with parity of timesteps, because the neural plausibility of slower timesteps at higher levels is questionable at best.

1 Like

Good point. I think I need to explore the case of a single repeating input some more, since this has a big impact on the pooling layer when using a parity of timesteps. I had thought of incrementing the time step (in temporal memory) only when spatial pooling causes the active columns to change, but that is probably even less plausible in a biological system. Of course the whole concept of discrete time isn’t all that plausible in the first place :wink:

True! Caveats about spike time with respect to the phase of an underlying oscillation notwithstanding. :slight_smile:

@Paul_Lamb A bit late to the discussion but what have you gone with and have you tried accumulating and decaying overlaps? For example, the overlap of every column decays with a 0.9 multiplier while accumulating their overlaps. This allows the pooling layer to learn multiple activations from the lower layer because columns stay active longer than their inputs. This activation time even becomes longer when those columns adapt to the sequence of the lower layer. Accumulating also seems like a biologically plausible solution.

3 Likes