Correct me if I’m wrong, as there was old temporal pooler that is now the temporal memory, so I may actually be speaking about something else. So here it goes:

As far as I understand, if we take two regions in the hierarchy the bottom region would receive a sequence of inputs. As long as the sequence is correctly predicted the active cells of the bottom region would be pooled over time and become an input to the higher region. In other words if the A->B->C->D sequence is correctly predicted by the bottom layer, the higher region would get a union of all active cells for A, B, C and D. When the prediction breaks ( predicted D->X, but got D->Y), pooling stops. This way we learn A->B->C->D as an object the commonly occurs in the world. As far as I understand higher region can bias the bottom region into perceiving A->B->C->D, when it correctly recognizes the sequence as well by top->bottom feedback. If I haven’t messed up anywhere, and please correct me if I did, then my question is:

What is the neuroscience behind this, considering that bottom and top regions operate on very different time scales? If we say that the ability (temporal memory algorithm) of the bottom region to learn A->B->C->D is very much explained by the spike timing and it all just work beautifully, the upper region would have to perform the same thing but on a scale that is 4 times slower to learn the transition from one object (A->B->C->D) to some other object (that can be another sequence that will be pooled together)

If we take a hierarchy of 4 regions, then the top region should operate on even slower time scale. How does that work, if it works the way I described? Could there perhaps be a limit on how long a sequence is when the input cells are pooled over time?