Words to SDR?

I travel extensively for my job and have been doing so since the early 1990s. This included frequent visits to China and it became useful to learn Mandarin Chinese. I used the Pimsleur course and eventually was able to function on my own in day-to-day interactions. Immersion in the language and culture really drives this home. I can read and write a bit but my tones are terrible. I understand what you are saying about Chinese stroke order and character formation. I am struck that in Chinese many of the pictograms are actually reasonably good renditions of the thing that they are communicating.

Some examples:
火 - fire; I love this one = stick-figure person running around FIRE!
水 - water; hard to see the original picture - see chart below.
山 - mountain
口 - mouth/opening
品 - goods/commodities; a pile of boxes
門 - door; western movie bar doors anyone?

In some cases, the original picture has evolved to be difficult to recognize - for example - water (shuǐ), fourth line in this chart.


(evolution/versions of Sun, moon, mountain, water, rain, wood)

This is an ongoing process. For example, the door has evolved from the swinging bar doors 門 of the traditional Chinese to the more generic 门 in simplified Chinese.

The combination of these symbols is also somewhat based on more than just strokes or radicals; they often tell a short story.

日 - Sun (rì) - also used to say “day” things, with 月 - moon (yuè) to say “month” things.
間 - Time; you look to the door to see the sun position, learning the time.
Putting things in a door is used for other constructions
心 - heart
悶 - Stifling - you are inside but your heart is out the door.
But marking an actual door?
出口 - exit; mouth/opening to the mountains/outdoors.
Chinese is filled with these short stories; I love this language. There can be layers of meanings in a line.

What is not found in stroke order, (or simple letter sequences or syllables for a western script) is any sort of useful semantic or grammar information. This information is loaded at the word/pictograph and symbol grouping level. Any sort of generation algorithm will have to consider this level if the output is to look like it is making any grammatical sense. Languages that use conjugation have to consider word grouping to influence construction at the word level. There is no local information at the word level that tells me if the word (tener in Spanish - “to have” or “to be”) I am building should be tener or tengo or tienes or tiene or tenemos or tenéis or tienen or (there are a bunch more). This strongly suggests that the higher levels will feedback to the lower levels in word construction.

But this will still be gibberish. Without any semantic guidance, the generation from a system that follows reasonable grammar rules will make something that looks like the human affliction Wernicke’s aphasia.

Note the huge range of signs and symptoms listed. This gives some indication of the range of factors that go into speech production. The clusters of defects suggest to me that the production of words is the product of several maps working in harmony, each contributing to some aspect of the ongoing production stream.

This tells us that if the map or connections to it are failing you suffer the related defect. From a modeling perspective this suggests your functional building blocks.

A little more on structuring these building blocks; a key difference between biological hardware and computers is that computers have variables, brains have connections.

  • An important programming task is to learn the producer(s) and consumer(s) of a chunk of information and WHEN and WHERE it is produced and consumed. If a chunk of information in a computer is needed in several places it is tucked into a storage space and accessed wherever it is needed.
  • In the brain information exists in SOME FORM in SOME PLACE in the brain. If there is a producer and a consumer there has to be a connection. If there is some order or stages to this process there has to be a physical pipeline. Parts of these pipelines may be selectively enabled to gate the flow of information but the connections are always there. If you think about it, there is no other way that neural hardware can work.

Whenever I see someone that is describing some AI proposal I keep this distinction firmly in mind to test if it biologically plausible. Once I started thinking this way it shaped how I view papers describing neural research and proposed architectures.

2 Likes