Wondering about visual hierarchy across mammals

Hi all. I’ve been wondering about V1, and how it can be so large in humans (size of a passport is the quoted factoid). Obviously a cat (for example) has a much smaller V1, but with similar or better visual processing (depth perception, visual acuity, target recognition, low-light vision all seem better in most carnivores than in humans). The standard interpretation is that V1 does low-level processing, and then hands off to V2 and V4 which are higher-level and increasingly stable, and finally to IT where invariant representations are stored. But if that’s true I don’t see why humans need such a massive V1 when a cat’s tiny V1 seems to do the same job so well. What is V1 is doing in humans that needs all that extra processing and storage capacity ? The only explanation I can come up with is that our large V1 is storing a huge bank of images that a cat lacks - but that clashes with the idea that the invariant representations exist in IT. Either way I’m a bit stymied. Does anyone have any thoughts ?