I think these two are connected. It cannot be assumed that the system knows a sequence without going through forming necessary connections. What I mean is, the way a sequence is packed together holds the answer to how to unpack it. The agent I worked on never reliably encountered a sequence A->B->C->D without first learning to do C->D and then B->C->D (expanding back from the reward). Random actions would not cause predictable sequences for the HTM to capture. Naturally, the higher level did not start from a stable A->B->C->D. The system needs to learn a single transaction, do that reliably and expand it from there. This goes both for top down and lateral learning.
In this context, the first element in the sequence is whenever the higher level apical depolarization can influence the lower level. This can even be mid sequence from the perspective of the lower layer.
I believe it is better to brainstorm starting with an empty layer rather than known sequences as the folding shapes how the unfolding happens.
With the approach above in mind, I imagine the higher level does not capture discrete subsequences. First, it captures some transitions here and there. Then expands these to cover the larger portions of behavior. The actual transition of higher level would be driven by the patterns that the lower level encounters (assuming it is the highest level). I picture the higher level in my mind as a very elderly person whose memories of experiences are fuzzy. They come and go. The reason a memory comes in the first place is something she encounters that she could associate with. Then things make sense… At least for some time. Then it is a blur again.
That came out more vague than I hoped.
Edit:
This sounds a lot like Go, No-Go circuitry of ganglia.