Reconciling multiple simultaneous predictions with temporal pooling

As far as I understand, the following is true:

i) Temporal pooling is envisioned as the representation of a sequence by a pattern of activity that remains relatively stable while that sequence is predictable.

ii) When the data contains branching sequences, multiple simultaneous predictions are made at every branch point.

My question is about how temporal pooling should operate during sequences that contain branches. The pooled representation of two sequences AB and AC should presumably be the same during the subsequence A and should change at the branch point depending on whether B or C is encountered, but if both sequences have already been seen, then the pooling cells downstream cannot “know” that a branch has occurred based only on the activity of the upstream population.

How can multiple simultaneous predictions be reconciled with the idea that a sequence representation should be stable while it is well-predicted?

1 Like

To maybe jumpstart discussion a bit, here are some disorganized thoughts on this. In all of this I’m using letters ABC etc to represent entire predictable sub-sequences, rather than individual timestep inputs as is otherwise common in this community.

In an idealized situation where the temporal memory is completely pre-trained on the data such that all possible transitions are learned, then degenerate situations happen when the pooled representations are learned, for example:

Since the temporal memory is able to completely predict every possible transition, then a single pooled representation is learned for the entire data. This seems undesirable.

If instead the temporal memory is trained alongside the temporal pooling population, then you can get degenerate situations as well, for example:

If the network sees meta-sequence AB AB CB CB CB CB then during each B of the CB sections, the representation might be the same as the AB representation, because the network has no way of knowing that AB is actually composed of two individually predictable sequences after only seeing it repeated several times. So then you get a weird situation where each time B occurs later you get a representation that includes A, even though A hasn’t been seen in a very long time.

I’m not sure if that made any sense, but it seems to me like a potential problem. Maybe it’s solved by competitive self-organization analogous to the spatial pooler, but I haven’t verified this by experiment.

This could also possibly be solved by a finite limit (whether soft or hard) to the length of a sequence that can be pooled over, but I still see problems when pooling across the boundary between A and {B or C}.

If anyone has done this or has been thinking through it, I’d love to hear about it.

Hi @jakebruce, I was silent when you were contributing back then but nice to have you again :slight_smile:

If the pooling is implemented in such a way that columns actually “pool” their recent overlaps like a reservoir, the unions of AB and CB would be represented by different columns. The current activation of the pooling layer does not change instantly at every iteration, therefore the overlaps causing any activation at B is just a part of the total overlap at that time. As a result, two sequences would have different columns activated at the point of B because at that time, the input/overlaps would also include the previous step like a reservoir, not only B.

At least that is my understanding of how Nupic implementation handles that.

Edit: To clarify, there should be columns having more overlap because of pooled C input which should inhibit the columns that adapted to the AB sequence. At that time even if the input A is still considered a “pooled” input it should have decayed a lot more and as a result C would have a higher weight because of recency. Still this is just a description of ideally functioning pooler.

From a different perspective, pooling just collapses down temporal information. So maybe that is how it should work. If the pooler only learnt AB so far, it seems natural that any sequence involving B is represented partially by the columns of AB. I would not see that as a problem if it settles into different columns for CB and AB in the long run. The reservoir approach of implementation of union pooler tells me that it should.

Hi @sunguralikaan, thanks for the welcome.

I think this will become clearer once the community settles on a mechanism.

As it stands, the mechanism of “well-predicted cells activate slow currents in their postsynaptic targets” does exhibit the behavior of pooling across predictable boundaries even when those boundaries can branch. Basically what’s to stop AB from having the same representation as AC, since the A representation will persist through both AB and AC due to the slow persistent currents.

Since I’ve been away for a while, I’m not sure what you mean by a “reservoir.” Is this a new concept in temporal pooling, and is it documented somewhere?

1- This current decays in time, so recency has a factor here. Any particular activation is biased according to recent inputs.
2- To my understanding it should not be about how many branches there are but how many frequent/significant ones there are. If there was few AC and lots of AB I do not see same columns representing both as a problem on the short term. On the opposite, maybe it should start like that.

What stops same columns representing A and B and C in the default Spatial Pooler? They get represented differently because there are mechanisms creating competitions among columns such as synapse adaptation, boosting, bumping synapse permanences based on activation frequency.

The most recent temporal pooling implementation that is published on Nupic is an extension of Spatial Pooling. So there is a competition among columns just as SP. If AB and AC appears distinctively enough some columns will specialize on the total overlap of A+B and some will specialize on A+C.

Pseudo-Pseudo Code
Published Implementation

Reservoir is just something I use not an official analogy but there were topics discussing temporal pooling implementations in the forums if you missed them. 1, 2

Maybe these would help clear things up a bit if I am not making sense.

Thanks @sunguralikaan, those are useful resources.

I’m not convinced that competition is a good solution to this problem, although I can see how it would work given the right implementation and the right assumptions about the data (minimal noise, for example).

In my work I have “solved” this problem with a hack that constrains a layer to not learn transitions from predicted->surprise->predicted again. This way only predictable chunks are stably encoded, but it means that multiple layers are strictly necessary for encoding meta-sequences. One could justify this by appealing to an acetylcholine-gated learning process since ACh is often implicated in novelty-detection, but it does have the unfortunate consequence of being unable to learn multiple branches within a single layer. However, I’m not yet convinced that multiple prediction branches are in fact required, given that we have a hierarchy to work with.

More investigation will certainly be in order, and it’s definitely something I’m interested in exploring.

2 Likes

Sometimes my posts sound like I am convinced on my own answer but that is generally not the case :slight_smile: Just trying to come up with an explanation.

On the other hand, this solution made me remember another write up about temporal pooling. Maybe it could cause idea sparks here and there. Especially the paragraph below has relevance to your approach.

[details=Relevant paragraph from the write up.]> Any change that a layer cannot predict will be passed on as a change in

the next layer. Any change that a layer can predict will be pooled and
not result in a change in the next layer. Put another way, any change
that cannot be predicted will continue up the hierarchy, first layer 4
to layer 3 then region to region. If you were looking at a walking dog,
layer 4 would provide an input to layer 3 that after temporal pooling
would be partially stable (the image of the dog) and partly not stable
because the dog is moving. Layer 3 would try to learn the pure high
order sequence of walking. All these examples require a hierarchy, but
applied to simple problems you might not need a hierarchy.[/details]

2 Likes