A Deeper Look at Transformers: Famous Quote: "Attention is all you need"

robf · March 5, 2023, 4:46pm

That’s a lot of assertions. Any evidence to back up:

cause and effect is the prior we need to create a generalist agent, and can be quantitatively embedded in algorithms

proving transformers don’t learn that

How many assertions was that by me?

“2” would be proving a negative. I’ll concede no proof for that.

Aside from proving a negative, that leaves “1”, which is one assertion. Is one assertion a lot?

You can’t prove a negative. On the other hand, proving transformers do learn cause and effect as a generative principle would only require one existence proof. Should be easy for you. Just point me to the paper demonstrating it.

Evidence for “1”? Well, it works for language. That goes back a long way. It was the basis for American Structuralism in the '30s. You can look up Harris’s Principle, distributional analysis, Latent Semantic Analysis, Grammatical Induction…, for instance:

You shall know an object by the company it keeps: An investigation of semantic representations derived from object co-occurrence in visual scenes
Zahra Sadeghi, James L McClelland, Paul Hoffman
https://pubmed.ncbi.nlm.nih.gov/25196838/

Bootstrapping Structure into Language: Alignment-Based Learning, Menno van Zaanen
https://www.researchgate.net/publication/1955893_Bootstrapping_Structure_into_Language_Alignment-Based_Learning

The only problem is that it generates contradictions. Which prevent “learning” as such. That’s what destroyed American Structuralist linguistics, and ushered in Chomsky’s Generativism (Chomsky enormously dismissive of transformers.) Here’s a bunch of papers which characterize the inability for linguistic structure to be learned (starting at phonemes), as non-linearity:

Lamb, review of Chomsky … American Anthropologist
69.411-415 (1967).

Lamb, Prolegomena to a theory of phonology. Language
42.536-573 (1966) (includes analysis of the Russian obstruents
question, as well as a more reasonable critique of the criteria
of classical phonemics).

Lamb, Linguistics to the beat of a
different drummer. First Person Singular III. Benjamins, 1998
(reprinted in Language and Reality, Continuum, 2004).

Lamb and Vanderslice, On thrashing classical phonemics. LACUS

Forum 2.154-163 (1976).

Or for a more mainstream analysis of that learning problem here:

Generative Linguistics a historical perspective, Routledge 1996, Frederick J. Newmeyer:

“Part of the discussion of phonology in ’LBLT’ is directed towards showing that the conditions that were supposed to define a phonemic representation (including complementary distribution, locally determined biuniqueness, linearity, etc.) were inconsistent or incoherent in some cases and led to (or at least allowed) absurd analyses in others.”

That this then can be extended to something which generates meaningful structure for a generalist agent might be an assertion. More of a hypothesis. But first we should apply it fully to language.

What other assertions are you attributing to me?

By contrast, your thesis to the best of my ability to understand it, seems to be that transformers are fine. They are the full solution. We only need to make them even bigger, and give them even more data. Back-propagation is perfectly biologically plausible. HTM has been abandoned (quite possible) so the insights which motivated it are not worthy of consideration (less justifiable.)

And reiterated, that you are sure the way to move forward is to do more learning, over more multi-modal data.

Well, that has the advantage of closely aligning with what maybe 90% of people in the industry currently believe. Maybe you’re right. Maybe HTM was completely wrong. Maybe that Google chatbot really did just suddenly become conscious, and all we need to do to achieve final AGI is to get yet bigger, feed ANN back-propagation gradient descent learning algorithms yet more data, build a speech recognition engine with 100 years of training data instead of 77 (Whisper?) Learn it 2^42 parameters as Hinton jokes…

Maybe all Elon Musk needs to do to finally achieve fully human level object recognition for driving really is to perfect his auto-labelling system, in his automated driving example generating system, so that he really can finally label every possible corner case of driving that could ever occur in the world, and then train his networks to recognize everything that’s been labeled:

https://www.youtube.com/watch?v=2cNLh1gfQIk&t=1241s

Topic		Replies	Views
Attention Approximates Sparse Distributed Memory Lounge	1	525	October 22, 2021
A couple papers on attention Numenta Theory	1	482	September 9, 2018
Jeff on Lex Fridman podcast Talks and Events	12	1670	August 14, 2021
IBM's Context Aware Learning Tangential Theories	4	1420	March 1, 2018
Simple Cortex Tangential Theories	38	3903	October 8, 2017

A Deeper Look at Transformers: Famous Quote: "Attention is all you need"

Related topics