If you start from the naive idea that the world is perceived as an “image” to be analyzed then change blindness may be puzzling.
When you consider the vision system as it is and what it has to do then change blindness is almost predictable from first principles.
I have posted elsewhere on the nature of saccades and the way that this layers one small snapshot of the world onto the visual processing stream as a collection of 2 degree wide foveal features. [1]
Our relation to other objects (including people, predators, and prey) can present a constantly changing array of features as the objects turn and move. As this 2 degree view of objects transform due to motion the scale of features is an ever changing mix of features and visual angle separation - this moment it is a cube and the next, a corner of a cube.
Our “primitive view” processing (our sub-cortical structures) can track the center of mass and rough outlines but this primal sketch must be populated with a stream of features for recognition. Our internal representation is just this collection of features and relative positions.
The example video offered above presents the “victim” with as much change as if the person turned in relation to ourselves and for most people this is not a significant thing. If they are not looking directly at the person then even the details of gender may not register although I believe that to be one of the primitives recognized by sub-cortical structures.
[1] Foveal vision primer: