We are able to visually recognize the same object in different poses the same way we recognize objects by touch. At the end of each saccade is a glance. Each glance is like a touch, but with thousands of sensors simultaneously (rods and cones on the retina). There is spatial information encoded in the distribution of light on the retina and certain features (low-level patterns) are recognized from this information. But it is the temporal sequence of recognized spatial patterns due to successive glances (as well as the sensorimotor information gleaned from the saccadic motion itself) that forms the invariant representation of the object in our minds.
To use the example of an eagle given in the previous post: We notice an object in the sky above us and glance in that direction. Within a couple of glances, we have determined it’s basic size and coloration, and then we notice that what we recognize as the head feature is bright white. Within a couple more glances we have confirmed that the head feature is consistent with the many images and sightings of bald eagles that we have been exposed to in our lifetime. Thus, the classification of bald eagles is made. This same process is at play wether the eagle is in flight, in it’s nest, or is being rendered stylistically on any number of government seals.
I believe that our brains have a natural tendency to rapidly drive saccades towards the most distinguishing features of an object because it wants to reduce uncertainty as quickly as possible. That categorization must happen almost immediately in order for us to recognize and respond to imminent danger in our immediate environment or to potentially advantageous opportunities.