I have been experimenting with several different strategies for temporal pooling, and thought I would get some input from others who might have done their own implementations as well (or at least theorized on the process). To start the conversation, I’ll describe a couple of the implementations that I have come up with so far that seem to work, and list advantages and disadvantages I’ve encountered.
Strategy #1: Modify a spatial pooler so that sub-samples its input from a union of cells in the lower layer which have been active at any point over a configurable number of time steps. Behavior of the columns and cells of the pooling layer is otherwise identical to typical HTM (learning, bursting, etc). Additionally, cells in lower layer grow apical connections with cells in the pooling layer which are active over multiple time steps.
For sequence memory, this means multiple elements in a sequence get combined into a single representation in the pooling layer. When enough elements in that particular section of the sequence are re-encountered later, the cells representing that section of the sequence become active. The other cells in the lower layer which represent that section of the sequence become predictive (due to apical connections). This generates predictions for elements of the sequence multiple time steps into the future.
For object recognition, this means that feature/location pairs get combined into a single representation in the pooling layer. When enough feature/location inputs are re-encountered later, the cells representing the object become active. The other cells in the lower layer representing the object become predictive (due to apical connections).
This strategy seems to work best for object recognition. It also seems to be more resistant to switches between sequences or objects than the other strategies I have tried.
One disadvantage in sequence memory for long sequences is that you end up with multiple different representations in the pooling layer (versus a single representation). For object recognition it seems to resolve itself over time as the feature/location pairs are encountered in different orders.
Strategy #2: Cells in pooling layer grow proximal connections with active cells in the lower layer at the current timestep. Once activated, they remain active over a configurable number of time steps. While cells in the pooling layer are active, cells in the lower layer grow apical connections with them over multiple time steps. This creates a sort of trailing effect, where the full representation in the pooling layer changes more slowly (longest active cells deactivating as newer cells activate, maintaining a fixed sparsity).
The advantage of this strategy is that it is able to more quickly begin predicting multiple time steps into the future in the lower layer (because some of the cells in the pooling layer representing the sequence or object activate right away, versus trailing behind a few time steps).
The disadvantage of this strategy is it isn’t able to take advantage of the learning process that comes with a spatial pooler. I haven’t gotten around to implementing one, as the first strategy seems to be a better approach. Without learning, it also has obvious problems of not specializing to particular inputs.
Strategy #3: Cells in the pooling layer grow proximal connections with active cells in the lower layer at the current time step. When activated, cells in the pooling layer also grow proximal connections with a percentage of active cells in the lower layer at the previous time step. Cells in the pooling layer do not remain active over multiple time steps if they only receive proximal input from a single time step. Instead they remain active over multiple time steps due to proximal input over multiple time steps. The more frequently a context is encountered, the longer the cells in the pooling layer remain active.
The advantage of this strategy when used in sequence memory is that it can be used to generate an active representation of elements of the sequence multiple time steps into the future. The other strategies do this too, but with this strategy it is able to look further and further into the future the more often a particular sequence is encountered, with no theoretical limit. Other strategies are limited in how far forward they can predict by the configuration. Additionally the representation is a forward-looking representation (versus a representation that includes elements of the sequence that have already happened). This is a useful property for RL in particular.
The disadvantage of this strategy is that it does not demonstrate a fixed sparsity. The more often a sequence is encountered, the more dense the representation becomes. In practice, this would probably require a system of sub-sampling to maintain fixed sparsity (shouldn’t have too negative an impact, given the properties of SDRs)