I think there are spatial pattern detection layers and temporal pattern or sequence detection layers(that collect the temporal sequences of said spatial pattern layers). There is a spatiotemporal pattern conversion converting sequences of patterns into static spatial signals and the conversion of static spatial signals(from the temporal or sequence detection layers) into a sequences, and these sequences in turn trigger further sequences.
I think the system can handle and work with such simplicity in part due to the phenomena of postdiction, we feel we predicted certain things, but the actual conscious sensation of the present is constructive and done in a postdictive manner.
Regards hebbian learning, metaplasticity, stdp, I think the existence of a portion of long range connections allows the smaller simpler lower dimensional patterns from different sensory organs to act in a self reinforcement loop positively selecting those patterns that are part of a larger sparse spatiotemporal pattern throughout the multilevel structure, a higher dimensional model of an external object or causal actor. This internal evolutionary competitive force causes an extremely rapid convergence towards association of patterns to their true fundamental predictive causes.