Attention Approximates Sparse Distributed Memory

Nice talk from MITCBMM

7 Likes

This is fantastic thanks for sharing!
I love it when somebody discovers that an important equation in one field is “the same as” an important equation in another field.

The “Attention/Transformer is like cerebellum” idea is news to me

1 Like