Mind blown!

Zipfs law.
It’s magic!

I have no Idea why discourse wants to start this in the middle. You should watch the whole thing.


I blame grammar rules.

… he said quizzaciously


Still - I have to ask - for these high repetition words what part of speech do they serve?
How much of human language is devoted to what part(s) of speech?
Is it the same part of speech in other languages?

I you they it    pronoun a place holder, points to a prior 

the a            articles, we know, one of us knows

and but:         conjunctions, set clusivity  

be is was:       verb of existence 

of to in for with on:    prepositions, forms relationship between words   
have:            verb of contingent existence       

as:              conjunction, conditional or comparison 

that:            relative pronoun, demonstrative pronoun, indicating a                 relative clause or it is a pointing word 

They are not nouns or verbs. They help us to externalize i-language.
1 Like

Superb, exactly the level needed to do the rest of this.

Do we have the same word list for other languages AND do we have the same level of word description in those languages?

Lists? Check!

Now on to some sort of automatic dictionary?
I feed google translate these words:

  1. que
  2. de
  3. no
  4. a
  5. la
  6. el
  7. es
  8. y
  9. en
  10. lo
  11. un
  12. por
  13. qué
  14. me
  15. una

and I get back this:

  1. that
  2. of
  3. No
  4. a
  5. the
  6. the
  7. is
  8. and
  9. in
  10. what
  11. a
  12. by
  13. what
  14. me
  15. one
    Repeat as needed for your favorite language.

Aribic - maybe harder to match up.
No., Lemma, Frequency, Part-of-speech

  1. مِن‎ (min), 3226, preposition
  2. ٱللَّه‎ (l-lah), 2699, proper noun
  3. فِي‎ (fī), 1701, preposition
  4. إِنّ‎ (ʾinn), 1682, accusative particle
  5. عَلَىٰ‎ (ʿalā), 1445, preposition
  6. ٱلَّذِي‎ (llaḏī), 1442, relative pronoun
  7. لَا‎ (lā), 1364, negative particle
  8. مَا‎ (mā), 1266, relative pronoun
  9. رَبّ‎ (rabb), 975, noun
  10. إِلَىٰ‎ (ʾilā), 742, preposition
  11. مَا‎ (mā), 704, negative particle
  12. مَن‎ (man), 606, relative pronoun
  13. إِن‎ (ʾin), 578, conditional particle
  14. أَن‎ (ʾan), 578, subordinating conjunction
  15. إِلَّا‎ (ʾillā), 558, restriction particle
    Feed google translate this:
  16. مِن‎
  17. ٱللَّه‎
  18. فِي‎
  19. إِنّ‎
  20. عَلَىٰ‎
  21. ٱلَّذِي‎
  22. لَا‎
  23. مَا‎
  24. رَبّ‎
  25. إِلَىٰ‎
  26. مَا‎
  27. مَن‎
  28. إِن‎
  29. أَن‎
  30. إِلَّا‎
    Get this back:
  31. From
  32. God
  33. In
  34. The
  35. On
  36. Who?
  37. No
  38. What
  39. Lord
  40. To
  41. What
  42. From
  43. The
  44. That
  45. Only

Keep in mind that this is drawn from the text of the Quran so there is likely to be a god bias in the list.

Of course - like everything else in neuroscience: When you look you find that someone has already been doing this. Ziphs law gets a tiny foot-note here.

Nice paper on the topic:

Keep in mind Russian has no word "the".
There is a lot of conservation across languages. 

|и|         and|
|в (во)|    (+pr) in; (+a) into, to|
|не|        not|
|на|        (+pr) on, at; (+a) onto, to|
|я|         I|
|он|        he|
|что|       what, that|
|с (со)|    (+inst) with; (+g) from, off|
|это|       this, that, it|
|быть (i) (pres: есть)|    to be; there is, there are|
|а|         and, but (slight contrast)|
|весь (f вся, n всё, pl все)|    all|
|они|       they|
|она|       she|
|как|       how, as, like|
|мы|        we|
|к (ко) (+d)|    towards, to|
|у (+g)|    by; at (used in ‘have’ construction)|

Arabic http://www.qamus.org/wordlist.htm


This is different than English.

More mind blowing is it can apply to almost everything. Like my cat, every day or probably throughout his life time, 20% awake, 80% sleeping.

I would think 80 20 or 20 80 would be a good ratio of signal to noise filter as well…the perfect joke is probably 80 percent predictable with a 20 percent twist.

Thanks Mark. My head will be spinning for days.


I have read this discussion with interest, especially the interest in “words” and word frequencies in different languages. Since this discussion is fundamentally about the relation between the structure of verbal behavior and its relations to the brains systems and processing of and creation of information from and about the internal and external environments of the brain, I would like to point out some fundamentals not touched in the debate, that nevertheless are fundamentals and important for the questions raised and the answers provided. First and all the brain asks questions and answers questions and create behaviors that show what its questions and answers were/are…humans dont drink water unless the brain has asked “Am I thirsty” or “should I drink now to survive in the desert” or questions of that sort. Proof of this: Drinking water is not a random behavior depending on a randomized decision process…only very few behaviors are the cause of randomized decisions. Instead a solid sequence of questions and answers lay behind most behaviors.
So “doubt” (= entropy) is fundamental, and is most probably hardwired into the structure of the brain (in the neocortex) and the subcortex is the machine delivering the answers…
Now we arrive at human languages…it is interesting that all human languages (at least the about 50% most important languages I have researched) all use the same six questions words to articulate doubt: where, what, which, why, when, how…and in indoeuropean languages and Japanese these six questions words are also the same, though these two language lines separated more than 8000 years ago…so these six question words are all we need to raised qualified doubt…then basically we use nouns, verbs and adjectives to produce one dimensional strings about multidimensional questions and answers and behaviors…the interesting thing is that the question words are only six across all languages…they are enough to qualify doubt in action (=the verbal behavior)…
Now we can then ask why it is possible at all to translate one language into another language? This is of course because any language is based on asking and answering these six questions and transform the answers into action…this is the only fact that can account for the fact that any language can be translated meaningfully to another language…this is because that language is modelling the same physical realities restricting action all around the world for all peoples. When we see linguistic novelties in the form of new words, it is because we have discovered new distinctions in nature that we can name…
So my advice to you brainscientists that wants to use texts as data (to discover something about the text, the writer, the brain???)…you should start understanding how one could automate and analyse how many questions and how many answers a given text poses…and how many commands for action…because this is always how texts reflects what goes on in the brain…unfortunately most texts are incomplete and base themselves on unwritten assumptions and expectations…like this text: I am thirsty, go get me the usual…!!! First sentences “I am thirsty” is an answers…then “go get me the usual”…is difficult…is it a question or a command for action based on an answer to a question…as it is written it is a command…and the question was “what will get me the usual”…
It will be analysis of this kind that will make the correct splits of texts and sentences, especially if analysis can end up asking the relevant questions that will make the machine able to construct a text without any hidden assumptions and expectations…this will be the first step for AI-language machines…find and ask the questions…then receive the answers…and then figure out the actions…all this will be based on a logical theory of language of which I have shown is about q/a and solving doubt…and then on empirical data from language users…and first then empirical data from decision makers and actors about how to act…
Finn Gilling


I bet this is because these words are often stored as predictive strings…differently in different languages. Just a thought …could we do a functional MRI looking directly at language areas of the cortex of a bilingual or multilingual subject. I expect that as language is learned these predictive strings form in similar ways across languages. This might explain phonics…sounds and letter combinations form into repeated words and words into repeated strings (mannerisms of speach) and so on…as I think I mentioned earlier predicting a baby’s first word could be done by observing the most common word and object exposures as well as interactions with these items and projecting onto an sdr. I think conventions in writing are probably just the brain’s way of doing data compression on strings of commonly used words and these develop into mannerisms of speech.
I am thinking particularly about how my father constructs speech in German in very similar ways to that of his English speech patterns.

1 Like

As far as fMRIs to match up word to cortex location, this paper and its references should be a good start if you are interested in continuing this line:


Has anyone ever tried to rank behaviours of animals in the same way as words…I bet it follows the same curve because it’s a language.

Emotions are the language before there are words and behaviour is the externalized manifestation of this language. The neocortex has to take this emotional input because it’s baked in. That i think makes sense of a lot of human nature.
Almost like having to be backwardly compliant with DOS…ha ha…little computer joke…