Have you considered that in the search for “underlying rules” part of the problem is that much of the human network training arises from embodiment? Since an AI program does not have this grounding it is flailing around trying to recreate the portion that is part of the humans learned experience without this grounding.
Look to this paper to see that much of semantics are grounded in the parts of the brain associated with how we sense and control the lived experience.
The functions are distributed over many processing centers with significant portions split between the grammar store (motor portion) and object store (sensory portion). The filling in process of speech generation seems to me to be somewhat related to the Hopfield model mentioned above, with various processing areas providing cues and constraints as the utterance is being assembled. I see the cues as a combination of external perception, internal stored precepts, and the current contents of consciousness. The constraints being stored precepts.
If you consider the frontal lobe as motor programs generator based on subcortical commands, speech is just another goal oriented motor command sequence, guided by the connectionist feedback from the sensory regions.
As is “thinking.”
Don’t forget that since this is a motor program, the cerebellum will be providing coordination and sequencing between the various components of the motor program.
If you look at this the right way, you may notice some similarity to the major components of the transformer networks and how the brain is organized. The large number of processing areas (about 100 in the human brain) corresponds to multiple attention heads. The “loop of consciousness” roughly corresponds to the transformer deep in/out buffer.