A couple of observations which might help with understanding this view of hierarchy ( or it might help me learn where there are still gaps in my own understanding ).
The first observation is that a single HTM region is by itself a 2-level hierarchy. The representations in the output layer are more abstract than the representations in the input layer. This is the first piece of architecture which allows the lowest level to form abstractions.
Imagine a hierarchy of three regions separated by Spatial Poolers. For the sake of simplicity, I’ll depict each region as two layers, input and output (obviously HTM theory currently has more layers involved in SMI, but these are the important two conceptually for communicating my point)
You can see that the transition from the input layer to the output layer within the same region is actually the logical boundary between hierarchical levels (not the transition from the output layer of one region to the input layer of another region). The SP algorithm is designed to fix sparsity while preserving semantics, not to increase abstraction.
The second observation is that the representations in each output layer (depending on the temporal pooling algorithm chosen) should be able to integrate more details into fewer consolidated representations the more frequently a particular pattern is encountered. This means, when a new complex concept is encountered, it might initially require hooking into more levels of hierarchy to reach a single stable representation for the overall concept. As the object is more frequently encountered, the abstraction should be able to push further down the hierarchy.
If you apply this to sequence memory and pooling (the same concept would apply to objects, but I’m lazy and it is easier to depict sequences), initially a particular sequence might require three hierarchical levels to form a single abstract representation. As it is encountered more frequently, some of the lower abstractions would start to merge, and may only require two hierarchical levels. Even more frequent encounters might push the abstraction down to the lowest level.
Combining these two points, one can theorize that the lowest levels should be capable of recognizing complex abstractions for familiar things which are encountered frequently.