From my perspective, sequences are the same thing as objects (the only difference is where the distal signal is coming from).
I do not think that TP is part of TM (in my current understanding, these two processes must run in different populations of cells, because they require a temporal differential – this separation also matches TBT currently as well)
In any case, there is a lot of evidence that the brain does this sort of chunking, and HTM is ultimately intended to faithfully model biology. And really, just from observing myself how I replay music in my head, I know that I construct the “object” of a song in components (especially sections that repeat themselves – I don’t think of them as different, but as semantically identical other than their position).
This is very different than the way the TM algorithm alone currently functions, where for a given sequence, each iteration through its sub-sequences involves a completely different (semantically dissimilar) set of representations with virtually no overlap.
BTW, I posted on this thread a while back how I see this sort of thing working in a hierarchy, where abstractions work their way down the hierarchy the more frequently their components are encountered. I still see this as one of the requirements for a “good” TP algorithm.