The Platonic Representation Hypothesis

Here-s an intriguing perspective which supposes all “learning systems” (either brains or ANNs) converge towards similar internal representations approximating the external “world”, across a (very) wide range of NN architectures, datasets and modalities.
What matters more (than architecture, modality, dataset) in making two different networks having more similar (and better) representations are network size/capacity, compute and dataset size.

IMO, philosophical implications are quite heavy, after all “how is it like to be a bat/ANN/etc…” might be less alien than we would normally assume, intelligence being a natural emergence of any complex learning system.

4 Likes

Trying to take away some useful principles from this:

  • Different architectures and topologies and learning algorithms, while relevant, do not fundamentally outmatch each other.
  • What matters more is ability to scale by exploiting efficiently the underlying physical “machine” strengths and figuring workarounds its limitations.
  • Some design principles are useful either way, e.g. sparsity has value in both brains and silicon because it is simply useful in dealing with energy limitations in computation. From “information quality” it matters less whether data or activation states are represented in dense scalar vectors or sparse binary ones. Vector size matters, with dense ones being a bit more expressive e.g. some representation that requires a given SDR (= binary&sparse) size can be “translated” in a equivalent slightly shorter scalar vector.
  • Activation sparsity in binary form should also lead to faster learning because it has to figure out means to learn by avoiding the need for repetitive, gradual learning used in BP/DL.

So an useful path would be to imagine/invent learning methods optimized for the hardware we have in hand.
Instead of attempting to replicate biological brains with different means, to understand what makes them work.

…just trying to (re)animate this forum

1 Like