Commonality of language - self emergent learning

Yaqiong Xiao, Angela D. Friederici, Daniel S. Margulies, Jens Brauer,
Development of a selective left-hemispheric fronto-temporal network for processing syntactic complexity in language comprehension,
Volume 83,
Pages 274-282,
ISSN 0028-3932,
(Development of a selective left-hemispheric fronto-temporal network for processing syntactic complexity in language comprehension - ScienceDirect)

1 Like

Lot of reading and going down various rabbit holes of referenced documents, but still can’t find anything that matches.

How easy is it with previous methods to self segment with no input other than the text (and work on any language) ? My process is a relatively simple recursive decay/inhibit method, which I think is closer to how part of the Hippocampus works whereby it does not need to know a cortical columns contents or attributes, nor even needs to know some columns exist. The groups then self grow as well into sub groups (some of which are map areas). This is just the Hippocampus component.


This is one of the initial segmentations / groups that I get out of my process reliably now with near on any quality of text input. This group (before further segmentation) coincidentally maps to articles, demonstratives, determiners, etc.

For a given block of text this is the underlying word frequency.

Word / Count
you 18,181
the 106,332
he 18,522
in 25,124
to 51,298

Why I think this is my dancing around the camp fire moment is that I believe this is part of the “attention” mechanism for the HTM process, which with the other groups also allows the HTM structure to be created, ground up. It forms the basis of the learning signal to the cortex !

The hippocampus knows what and the cortex knows where.

Some of the words in the list have no “meaning”, other than to biologically direct attention beyond the Hippocampus. This is where I think our interpretation (and groupings) of language fails because we assume all language has a coherent meaning “in” the cortex only and have not factored in the real biological mechanism in aggregate (inconsistent between cortex, hippocampus and others).

One further group that evolves then directs attention towards working memory rather than the broader cortex as part of the way recall reinforcement works (e.g. “his”) for constructs to work in the “evaluation” process of cortex thought later on in the process.

This type of segmentation I believe is critical to the way HTM works because it is how the differing maps can integrate a coherent structure.

To me language does not appear as an interpretations as to what’s a noun, adjective, etc. it appears as a particular sequential pattern of constructs that breaks the existing “rules” and assumptions.

The groupings seem to work for me, at least for the first million or so words so far.

Maybe just my conjecture / evolving working theory at the moment, but hopefully a kick for some to think outside the box and not to follow the herd. Back to coding again…

I made a prototype of a model that does this (but not any of the other things which you wanted to do). I made a presentation about it here: Video Lecture of Kropff & Treves, 2008 . At the end of the presentation I demonstrate segmenting a string of letters into distinct words.

Interesting presentation, thanks. I realise that I’m looking the process at it a bit differently and another difficulty in that I’m not going the SDR route which makes communicating some of the aspects a bit more difficult. Plus I just try to implement them in code to see what works so all I see is code, my math writing ability at this level is near zero but I can still code what I can’t formalise.

Interesting results.

I like the work your doing modelling the brain at the neuron level and it’s awesome, providing some very profound insights.

I had not thought of the decay wave in the sense as a conventional filter in relation to frequency domain (and a temporal window effect), that I did not see in that way, although now clearly see it. I know filters from radio. Allows for a different perspective when looking at the code though. Thanks.

Your chart at 15:55 looked sort of similar but very different underneath, my initial testing looked a bit like your chart, with over 5bn entries on the horizontal and a temporal feed forward prediction through all 5bn entries. I realised that it would not work later on though because the temporal structure was incorrect, it lacked sufficient structure to actually be useful in a PFC type process (thought). Good for basic recall though. At that scale some other effects and complexities start to occur as well.

The bigger difference is that I’m not looking at changing weights in an existing network, rather creating the network as the learning process itself (which is where the hippocampus is key). I see the brain as a just in case biological probability network, which ends up with massive over provisioning to allow for “just in case” it’s needed (fire together, wire together proximal probability). With computers we can skip this overhead pre-provisioning probability and look at it in reverse. Wire only what is needed and the resulting aggregate structure ends up comparable, just without the noise and absolute minimal connections. The update method/process can then change as well to reduce compute significantly.

The process that I’m experimenting with starts off with dominant excitatory signals and then as the model learns what is learnt then creates a larger source of inhibitory signals (new connections), which upon reading some papers seems to more closely match what biologically happens with the early hippocampus. This also seems to fit with why we can learn languages more easily as a child, because this learnt wiring ends up then having to offload unknown patterns to the cortex. The dominant inhibitory framework precludes some new language patterns from being recognised properly in the hippocampus so the cortex has to do it when we try to learn a language at the adult stage. Sleep replay then tries to help the Hippocampus change because it creates the only pattern of weakening to allow changes to occur.

The profound conjecture at this stage would be to say that the inhibitory layer in the thalamus is created in a similar manner in relation to the cortex structure learning triggered by the Hippocampus. This I have yet to implement because it is the higher level abstractions that trigger the inhibitory connection requirements. Still much to think about on this because the emotional chemical basket effects complicate things.

This Hippocampus process I think is what Minsky is partly seeing as the missing universal grammar, only the reality is that the words (concepts) in the cortex don’t really have a coherent grammar consistency as such (green ideas sleeping, had had had had) but the hippocampus does have the abstraction of a universal type grammar of sorts and it is the resulting structure that is key. This is the only coherent universal type grammar/structure and it’s in the Hippocampus.

This does end up with a “universal” grammar of sorts, its just impossible to define with any consistency to our tagged words because they are completely incompatible. Which is where I’m at complete odds with the existing order of things… Maybe just my conjecture and still a (million-1) to 1 of working as a whole, but I really do hope this triggers some other lightbulb moments for some and good debate… so back to more coding.

1 Like

Are you familiar with bouba and kiki? My conjecture is that is where we will find universal grammar.

I had not, so it was an interesting read, however I do believe that what is being observed is commonality in associated learning patterns in the cortex (recursive evaluation) rather than what I think is the universal grammar of sorts via the Hippocampus. Aggressive, sharp, shock sounds also tend to be inversely associated with clam (ability to sleep through a sound a key divisor) so become emotionally associated (trigger a chemical release), which adds to additional recursion or feedback within an evaluation. Any sound pattern that triggers adrenalin for example may well not be considered “calm” or “soft” upon evaluation, regardless if the audible pattern has any real associated meaning.

The emotional attachment to learning of concepts I do believe is fundamentally critical to the provision of an evaluation mechanism. Words would otherwise all “mean” the same and just end up a pattern that the likes of GPT-3 can throw out without any emotional coherence as to meaning in context.

Ok, so here goes… given a star trek book to start with and no input tagging of any sort… caveman attempts to light fire (maybe in the rain).

= = = = ATTENTION = = = = = == = CONSTRUCT = = = = = = CONCEPT = = = = =

Attention - abstract reference (thalamic inhibitory relaxation) or working memory recursion. These words are intended as a directional signalling mechanism. Other than that they mean nothing, they only mean something to the biology of the Hippocampus in relation to universal grammar processing. “they” triggers a recursive sensitivity amplification in working memory to reinforce the associated active memory fragments. “they” is learnt by the Hippocampus. Words like “Constantinople” mean nothing to the Hippocampus, only to the cortex HTM structures.

Construct - The type by which the cortex pathways are created (HTM structure) in a single pass feedforward manner which is biologically plausible via an ART mechanism for filtering the targets. The constructs I believe end up like recursive cross linked graphs that wire up probabilistic pathways like Markov chains but with a feed forward element to alter the probability dynamically and a recursive element. Thought is then effectively multiple activations of consistent pathways through the cortex, which jump chains via recursive association (cortical column - cortical column sequencing).

Concept - The associated items/locations to be connected together (constrained by probabilistic connection proximity in biology - why learning does not always happen in one pass). These are recursive triggering associations based on the existing HTM structures. Within a computer there is no constraint of connection proximity so learning is then single pass.

They way I am looking at the language side it to completely forget existing POS groupings and structures. Make it an abstract split between Hippocampus biological mechanisms and cortex recursive evaluation of HTM structure. The structures end up different as well because they are recursively linked/associated. How language is the structured is however and whatever the words sequences are for any given type of language. Language is just a set of constructs of concepts, it has no inherent pattern beyond the Hippocampal mechanism.

Does this sound anything remotely sensible or have I lost the plot because it seems to fit with my thoughts and system at the moment (maybe confirmation bias, lol.) ?

My knee-jerk reaction answer is ‘No’. However, under a different context; i.e., NCC, it might be sensible and even encouraged.

Pan troglodytes, which may become Homo troglodytes someday, have a near identical brain structure to ours, a little bit smaller, no torque, but very similar nonetheless–at least when it comes to HTM. What they lack is language, real, human syntactical language. We do not have it either until after we learn it. They can’t even learn it and that is where a lot of ‘don’t bother going here’ research gold can be found.

1 Like

Another set of articles to read and ponder… had not heard of NCC.

Two aspects of interest… one based on publication quotes (google search the quotes) and the other conjecture from looking at active scans and the dog…

  1. Arcuate Fasciculus
    “arcuate fasciculus that is much smaller or absent in nonhuman primates.” and “Modern neuroradiological studies suggest that the AF connects posterior receptive areas with premotor/motor areas, and not with Broca’s area.”
  2. I believe that the buffer in the Hippocampus also needs to be of a certain temporal dimension before linguistic understanding can occur at the level “we” understand because of the nature and structure of our syntax. We talk in a longer temporal dimension but it’s “packaged” into smaller batches via recursive triggers from the cortex. Brief simple linguistic instructions seem to work for most larger animals. This temporal buffer also has a significant impact on the characteristics of working memory because it constrains what the cortex can encode and therefore pass and be used in thought in PFC.

I can tell the dog to go get my slippers and he runs off and returns with a slipper all excited as a game. That may well be a “pattern” or a learned response to a stimulus for food, but to me it’s just a vocabulary constraint of temporal dimension that limits asking the dog to then put my slippers back where he got them from. That would require recursion involving first encoding the concept of “returning” something. “ing”'s are all longer temporal sequences.

Elephants (“Suda”) can paint, so is that not a more complex form of communication than speech as it involves a much longer temporal sequence of events ? Understanding the input does not require an equivalent response, which may be biologically impossible to create for some animals (see 1).

I may have a working proof in a few weeks/months/years or never, but it will be quite interesting finding out the unknown unknowns along the way. I already know I don’t know a lot and I don’t known that I know it will not work, lol.

1 Like

Sorry, “Neural Correlates of Cognition/Consciousness.” My point is that the so-called mainstream cognitive science is pretty much locked into this; i.e., finding out where in the brain things happen. HTM is pretty much in line with that, but very much on the fringe since it is computational as opposed to studies that elucidate structural components via FMRI or probes, etc.—in vivo.

One must be extremely careful (read ‘skeptical’) of results like the elephant painting thing. Basically a learned behavior and, at least in my opinion, not far from teaching great apes to talk–all of those basically lifelong experiments have failed.

The Rilling paper is far more interesting, I found the statement “This does not preclude the possibility that the modified pathways mediate functions in addition to language, such as tool use…” provocative.

Read the article “A Cognitive Neural Architecture Able to Learn and Communicate through Natural Language”. Maybe it will be useful for you.

1 Like

Available here:

or here:


Great article, thanks. I will point out that it was published in 2015. Apparently, they did not achieve machine sentience or sustained self-emergent learning.

1 Like

Their idea, as well as Jeff’s, encouraged me to implement their project with HTM. I hope to succeed.


Keep us informed, that is a good plan.

1 Like

You might find this one interesting as well

Dracula resurection as it’s turning to night…

RNN’s are viewed as a formula, which can be unwrapped or temporally laid out as in the googled and first hijacked image (sorry - it lacks the recursion and feed forward dimensions are not quite labelled properly…) :

In this instance, change the persepctive as to what the input represents… and for each input rippple through the input “forward in time”…ahead of the calsulation… and where it meets an associative “correlate” ripple this effect “back in time”…association… ok, maybe lost 99% of you… It;s a feed forward with a recursive loop that is derived and created during the learning process (the learnt hierarcical structure is key). The whole process has a temporal correlate as to the update, the what is now.

View this in an SDR perspective and the current logic of HTM as to how distal and proximal connections are emulated in arrays at a unitary temporal resolution. The current SRD/HTM process relies on temporal coherence. i.e. a particualar value of time… coherence. This effect always feeds forward +1 (and temporal criticallity of 20mS as a result) as that’s the bilogical implementation of the SDR/HTM structure as implemented (highly probablistically modeled). Where this creates issues in the basic understanding as to what a comma represents when reading a book.

The comma represents nothing but time.

Those proximal coherences are just that… signals that arrive within a given time (e.g. in a burst context). The problem is slipping into the easy abstraction of time or the daemon of the complexity of time. SDR’s and HTM are the easy perspective (whilst still very much a mind f…k when interpolating them through time in your head).

I know nothing as I don’t have (and will never have - I’m too old to see any reason for this) any published papers, but what I do know is what works that is biologically plausible, whilst very argumentative.

What relevance does it have to language ? Language is attention defined by time… the comma…

1 Like

Slightly better diagram for thought…

The hand drawn signal levels are not quite what they should be…

1 Like

Hi, I am researching all my life on the topic of language and learning. Being myself multi-lingual in three languages, and having adquired language skills in seven further languages, I’ve been teaching actively for 20 years now, especially English.

Based on discourse ethics, I embarked into a quest for the phenomenon of “personhood”, and issues of learning, memory, action, and language were central to it.

As for your statement that “HTM to “evolve” rather than top down learning” is supported by my findings. Very important to highlight that “memory” is not only a process that emerges in time, but it is also an interpersonal process and it makes “time emerge” in a interdependent way. I approached learning and memory with political constutitive phenomena of mind and society, at both individual and societal level. And I find your research very interesting, because there is a “biological structure” that changes indeed. In other words, learning is also physiological process. And it depends on a ethical decision-making process. Learning and language are at the transition between the material and the transcendental of a person.

In the following, I can give you a glimps to my inside, and if it triggers some interest, where you can look further.

In short, learning and language acquisition, and the biological changes that accompany them, are always self-emmergent and always communal, i.e. interpersonal processes.

First, let me say that, to my findings from the practical experience, I’d not come to the conclusion that English is closest to “physiological structures”. But to fully answer this question, I would need what brings you to this conclusion.
My guess is that Sign Language and Chinese are much closer. I have the intuition that it depens of the level of structural complexity, as being more analytical languages closer to biological structures (and closer to structures of the mind), in case your are interested, we can discuss this separately.

Second, learning is not a physiological phenomenon but a socio-political phenomenon that is triggered by an ethical decision in a interpersonal discourse. The key to this is that a person is constituted at all leves, physiologically, psychologically and noetically within and through the interaction with another person. There is sufficient research that supports the impossibility to keep babies alive without any interpesonal interaction, only caring for their physiologial needs. For a better understanding see my research Hirzel, 2015, as well as Dewey and Mead.

Currently, I work on questions of information security in the digital environment (the manipulation of mind through programmed messages. NLP is an importante aspect of these phenomena. It would be great to follow your research.

Dewey, J. (2012). The Public and Its Problems: An Essay in Political Inquiry.

Hirzel, T. (2015). Principles of Liberty: A Design-based Research on Liberty as A Priori Constitutive Principle of the Social in the Swiss Nation Story.

Mead, G. H. (2009). Mind, self, and society: From the standpoint of a social behaviorist.


Words (and any other external form of conceptual representation) are a low quality coarse serial sensory stream that is inherently a lossy communication method. It never represents the entire conceptual context in our minds. Some forms are better at getting more of the concepts across than others in a given time span. The time span is also critical because it defines the bounds of complexity due to attention decay. The form of external representation (English, sign language, Mandarin), is just a hand or sensory stream to the mind with different approximations. Underneath it’s still the same.

The way I look at language is to try and see it as the base structure (fragments of memory sequences), irrelevant of the particular words used because over time we have segmented words into notional groupings that bear no commonality with any underlying biological plausibility. The groupings are then akin to 2nd order derivatives and artifacts of human interpreted structural complexity. Trying to apply these notional groups ends up with the colourless sheep sleeping furiously (which could happen within a computer game). A human comfort blanket of percieved control of our environment rather than reality as to what goes on inside the brain and a bounding mechanism for commonality.

The word “the” (and equivalent in other languages) is most common word used (and likely first to be learnt) because we ground language in attention. We learn attention first. That attention is either current (forecasts are based on the current) or past and serves to help direct the brain into where activation should be attended to. The word “the” and “a” are just temporally different forms of attention, they mean nothing on thier own, they are temporal instruments of attention.

Words are then cast in a temporal (memory sequence structure) or non-temporal (associative) frame and this is how the brain sees them and creates structures with them. Distal dendrites are the associative (non-temporal) and priming (learning) mechanism while proximal may be more temporal. This is where I fail to see how the human groupings of words makes any sense, other than a psychological derivative or partial reflection as to what is going on. We can learn any representation we want, society just defines a bound on that form.

I am looking at learning from probably the equivalent of 0-2 year old, where it’s more about the initial building blocks of conceptual structures. Later in life we just use larger and larger building blocks, but the process is the same, it just continues. It’s how we get the initial bricks to build the house that is key, it’s that process that I’m looking at because a 1yr old can grow into an 80yr old.

Some species energy consumption dictates the maintenance of a pairing for survival, progress of evolution, for humans it’s the energy cost of the brain. Tribalism and interpersonal interaction just changed the mortality rate and perception of sustainability for humans. Sea turtles lay eggs on a beach and they fend for themselves from day 1, where mortality is very high and evolution has a lower energy balance - typically a smaller brain.

Ethics are just a complex byproduct of interaction (in an over populated environment) from emotionally attached memory derivatives. Take away food from todays society and see how ethics changes or mostly disssapears as the population shrinks.

The more I look into the whole process the more I realise how susceptable the human brain is to long term complex forms of manipulation that is near on impossible to defend against. Attention is an artifact of surprise that causes us to learn, weather we “want” to learn the surprise or not. What we learn becomes our past and influences our future weather we want it to or not. Through this lense the world looks quite different.