Spatial Hierarchy: The visual cortex is a hierarchy that represents small features at the bottom (edges) and gradually builds up larger features to the top (whole objects).
Temporal Hierarchy: The auditory cortex represents small features at the bottom (tones) and gradually builds up larger features to the top (words, sentences).
By this logic spatial pooling pools spatial features while temporal pooling pools temporal features. It is like the receptive fields could be represented like this:
The spatial receptive field is pooling features in one moment in time while the temporal receptive field is pooling features over a duration of time. So there could be a temporal field that spans only milliseconds in time at the bottom of the hierarchy (that say… spans 64 cells) and a field at the top that spans seconds (again spans 64 cells). This means that a temporal pooling cell at the bottom only remains active for milliseconds (as responding to small features) while cells at the top remain active for seconds (responding to larger features). I guess this is what Jeff meant when he says ‘representation becomes more stable the higher up the hierarchy’?
My question is - does HTM temporal pooling do this (or something similar to this)? or does each region learn an arbitrary length sequence (as opposed to a fixed length sequence as I’ve described above)?