Here-s an intriguing perspective which supposes all “learning systems” (either brains or ANNs) converge towards similar internal representations approximating the external “world”, across a (very) wide range of NN architectures, datasets and modalities.
What matters more (than architecture, modality, dataset) in making two different networks having more similar (and better) representations are network size/capacity, compute and dataset size.
IMO, philosophical implications are quite heavy, after all “how is it like to be a bat/ANN/etc…” might be less alien than we would normally assume, intelligence being a natural emergence of any complex learning system.