Neuroscience behind Temporal Pooling

Yes, the condition for temporal is predictability. As long as the inputs are correctly predicted then the pooling layer keeps the same activation state and learns to pool the changing input.

(It gets a bit more complicated when considering multiple columns. L2 cells connect to other L2 cells in nearby columns. We believe these inter-column connections allow columns to vote, or reach a consensus, on what object is being observed. In the multiple column scenario if several columns think they are observing object “A” and another nearby column isn’t sure, the columns that think it is “A” will bias the pooling layer in the unsure column. Feedback from L2 to L4 in the unsure column biases the input to be interpreted as part of “A”. In summary, a column that can’t predict its input on its own may be able to predict its input based on the belief of neighboring columns. E.g. if you look at an object through a straw you might have to move the straw multiple times before recognizing the object, but with a full retina you can recognize it in one glance. Or, if you touch an unknown object with one finger (no vision) you probably can’t tell what the object is without multiple touches, but if you grab the object with multiple fingers you can often tell what it is in one touch.)

Capacity. I was worried about the capacity of temporal pooling. E.g. L2 cells have a finite number of synapses. These synapses have to recognize multiple patterns in L4 and they have recognize multiple patterns in L2 in their own column and in nearby columns. This presents a limit to the number of objects that can be learned, the number of features per object, and the number of columns that can vote together. We have done some analysis and simulations of this. So far it looks like L2 can learn quite a bit under a reasonable set of assumptions. The capacity goes up the sparser the L2 activation is. Empirical observations are that L2 is indeed quite sparse. This is one of the reasons experimentalists have difficulty finding active cells in L2.

Even though temporal pooling capacity is large it still has a limit. This is why a single region such as V1 or S1 can’t possibly recognize complete objects that span the full region. We still need a hierarchy. We don’t yet have a comprehensive theory of hierarchy. That is something I am working.

6 Likes