Commonality of language - self emergent learning

Available here:

or here:


Great article, thanks. I will point out that it was published in 2015. Apparently, they did not achieve machine sentience or sustained self-emergent learning.

1 Like

Their idea, as well as Jeff’s, encouraged me to implement their project with HTM. I hope to succeed.


Keep us informed, that is a good plan.

1 Like

You might find this one interesting as well

Dracula resurection as it’s turning to night…

RNN’s are viewed as a formula, which can be unwrapped or temporally laid out as in the googled and first hijacked image (sorry - it lacks the recursion and feed forward dimensions are not quite labelled properly…) :

In this instance, change the persepctive as to what the input represents… and for each input rippple through the input “forward in time”…ahead of the calsulation… and where it meets an associative “correlate” ripple this effect “back in time”…association… ok, maybe lost 99% of you… It;s a feed forward with a recursive loop that is derived and created during the learning process (the learnt hierarcical structure is key). The whole process has a temporal correlate as to the update, the what is now.

View this in an SDR perspective and the current logic of HTM as to how distal and proximal connections are emulated in arrays at a unitary temporal resolution. The current SRD/HTM process relies on temporal coherence. i.e. a particualar value of time… coherence. This effect always feeds forward +1 (and temporal criticallity of 20mS as a result) as that’s the bilogical implementation of the SDR/HTM structure as implemented (highly probablistically modeled). Where this creates issues in the basic understanding as to what a comma represents when reading a book.

The comma represents nothing but time.

Those proximal coherences are just that… signals that arrive within a given time (e.g. in a burst context). The problem is slipping into the easy abstraction of time or the daemon of the complexity of time. SDR’s and HTM are the easy perspective (whilst still very much a mind f…k when interpolating them through time in your head).

I know nothing as I don’t have (and will never have - I’m too old to see any reason for this) any published papers, but what I do know is what works that is biologically plausible, whilst very argumentative.

What relevance does it have to language ? Language is attention defined by time… the comma…

1 Like

Slightly better diagram for thought…

The hand drawn signal levels are not quite what they should be…

1 Like

Hi, I am researching all my life on the topic of language and learning. Being myself multi-lingual in three languages, and having adquired language skills in seven further languages, I’ve been teaching actively for 20 years now, especially English.

Based on discourse ethics, I embarked into a quest for the phenomenon of “personhood”, and issues of learning, memory, action, and language were central to it.

As for your statement that “HTM to “evolve” rather than top down learning” is supported by my findings. Very important to highlight that “memory” is not only a process that emerges in time, but it is also an interpersonal process and it makes “time emerge” in a interdependent way. I approached learning and memory with political constutitive phenomena of mind and society, at both individual and societal level. And I find your research very interesting, because there is a “biological structure” that changes indeed. In other words, learning is also physiological process. And it depends on a ethical decision-making process. Learning and language are at the transition between the material and the transcendental of a person.

In the following, I can give you a glimps to my inside, and if it triggers some interest, where you can look further.

In short, learning and language acquisition, and the biological changes that accompany them, are always self-emmergent and always communal, i.e. interpersonal processes.

First, let me say that, to my findings from the practical experience, I’d not come to the conclusion that English is closest to “physiological structures”. But to fully answer this question, I would need what brings you to this conclusion.
My guess is that Sign Language and Chinese are much closer. I have the intuition that it depens of the level of structural complexity, as being more analytical languages closer to biological structures (and closer to structures of the mind), in case your are interested, we can discuss this separately.

Second, learning is not a physiological phenomenon but a socio-political phenomenon that is triggered by an ethical decision in a interpersonal discourse. The key to this is that a person is constituted at all leves, physiologically, psychologically and noetically within and through the interaction with another person. There is sufficient research that supports the impossibility to keep babies alive without any interpesonal interaction, only caring for their physiologial needs. For a better understanding see my research Hirzel, 2015, as well as Dewey and Mead.

Currently, I work on questions of information security in the digital environment (the manipulation of mind through programmed messages. NLP is an importante aspect of these phenomena. It would be great to follow your research.

Dewey, J. (2012). The Public and Its Problems: An Essay in Political Inquiry.

Hirzel, T. (2015). Principles of Liberty: A Design-based Research on Liberty as A Priori Constitutive Principle of the Social in the Swiss Nation Story.

Mead, G. H. (2009). Mind, self, and society: From the standpoint of a social behaviorist.


Words (and any other external form of conceptual representation) are a low quality coarse serial sensory stream that is inherently a lossy communication method. It never represents the entire conceptual context in our minds. Some forms are better at getting more of the concepts across than others in a given time span. The time span is also critical because it defines the bounds of complexity due to attention decay. The form of external representation (English, sign language, Mandarin), is just a hand or sensory stream to the mind with different approximations. Underneath it’s still the same.

The way I look at language is to try and see it as the base structure (fragments of memory sequences), irrelevant of the particular words used because over time we have segmented words into notional groupings that bear no commonality with any underlying biological plausibility. The groupings are then akin to 2nd order derivatives and artifacts of human interpreted structural complexity. Trying to apply these notional groups ends up with the colourless sheep sleeping furiously (which could happen within a computer game). A human comfort blanket of percieved control of our environment rather than reality as to what goes on inside the brain and a bounding mechanism for commonality.

The word “the” (and equivalent in other languages) is most common word used (and likely first to be learnt) because we ground language in attention. We learn attention first. That attention is either current (forecasts are based on the current) or past and serves to help direct the brain into where activation should be attended to. The word “the” and “a” are just temporally different forms of attention, they mean nothing on thier own, they are temporal instruments of attention.

Words are then cast in a temporal (memory sequence structure) or non-temporal (associative) frame and this is how the brain sees them and creates structures with them. Distal dendrites are the associative (non-temporal) and priming (learning) mechanism while proximal may be more temporal. This is where I fail to see how the human groupings of words makes any sense, other than a psychological derivative or partial reflection as to what is going on. We can learn any representation we want, society just defines a bound on that form.

I am looking at learning from probably the equivalent of 0-2 year old, where it’s more about the initial building blocks of conceptual structures. Later in life we just use larger and larger building blocks, but the process is the same, it just continues. It’s how we get the initial bricks to build the house that is key, it’s that process that I’m looking at because a 1yr old can grow into an 80yr old.

Some species energy consumption dictates the maintenance of a pairing for survival, progress of evolution, for humans it’s the energy cost of the brain. Tribalism and interpersonal interaction just changed the mortality rate and perception of sustainability for humans. Sea turtles lay eggs on a beach and they fend for themselves from day 1, where mortality is very high and evolution has a lower energy balance - typically a smaller brain.

Ethics are just a complex byproduct of interaction (in an over populated environment) from emotionally attached memory derivatives. Take away food from todays society and see how ethics changes or mostly disssapears as the population shrinks.

The more I look into the whole process the more I realise how susceptable the human brain is to long term complex forms of manipulation that is near on impossible to defend against. Attention is an artifact of surprise that causes us to learn, weather we “want” to learn the surprise or not. What we learn becomes our past and influences our future weather we want it to or not. Through this lense the world looks quite different.


Only part of this is working at the moment… and still a lot to resolve (how to code)… but this is how I see HTM working with a sparse structure and close to how it would look programatically.

Blue circles are sensory inputs (words, touch, etc.)

I have no idea what this structure would be called as i’t not a typical graph as such because of the way time is represented and the way the hierarchy is applied and used… yeah, having time on the 2 axis is just a relative representation not scaled, but hopefully you see or get the idea, alternatively just confusing everyone… lol.


In the EC/HC (episodic memory) there are various spatial cells (head direction, place, proximity, vector, …) and time cells.

I suspect that there are other types of cells but the testing regimes don’t necessarily sample the right inputs to activate the cells.

In the cortex portion, the hierarchy offers the possibility to do spatial pooling as well as temporal pooling. (The H of HTM) This could be how the output of lower levels of temporal pooling ends up in episodic memory as time cells.

This could go a long way to doing your grouping of different levels of representation as you indicate. If you specifically add time as some of your nodes it may help with forming your graph.

1 Like

I need a while to think about this properly because abstract pooling effect is quite difficult dimension to think through. I don’t think there is really any spatial pooling when looking at the process this way, but it still does not feel right.

Within the image the horizontal links between points are the temporal contex and are variable - vertical links are non temporal. The calculation takes into account the relative time of the whole space and the update process ripples through with time. I think the EC/HC ? (see below) is partly primarily involved in the normalisation of the temporal dimension (I see it as evident in how language is structured, buffered and … no idea on the word - a type of recursive conceptual compression / folding) so that the cortex stores a type of normalised temporal fragment of memory. This normalisation process then allows for a way around the binding problem for other senses and dimensions. It’s an abstraction of time.

The problem I’s stuck with is there should be a lot more vertical lines (non-temporal / equivalent of spatial pooling) and trying to figure out the blend of biology / code and reality, lol.

I have avoided mentioning how I encode the input streams (after a lot of trial an error in other approaches) and have zero proof as to the validity at the moment. I’m not able explain some of the parts becuase I’m going by intuition and only figuring it out later, it’s worked quite well for me in the past as a way about doing what I do… clueless concious behaviour. lol.

EC/HC - suspect it’s somewhere but no real idea on the true bits/process of biology, even after looking at heaps of papers, scans, disections, etc. The process would need to be in the same area(s) / sub-areas as place cells though in order to temporally normalise the sensory stream to memory. The biology for me is a more of a bandwidth pattern/mapping abstract at the moment.

Quite likely and suspect they end up far too abstract for us to associate with (we are the stick men trying to look at a pencil). Time-Place blend of abstract velocity would seem to be an entirely plausible possibility for a semi-relatable example, which I think has already been mentioned but guess it would be near on impossible to measure in biology as it’s a 2nd order sensory derivative ? Just a guess at this point…

Can you explain how your looking at it ?

Damn, more reading…
“We identify time cell populations in the medial temporal lobe of humans during memory encoding and retrieval”

Even more reading…

Theta wave phase timing…

Might just have found a plausible source for why I think of the language structure the way I do… ramping cells… it sounds like the effect that I am using to separate language in the extracts in earlier posts… Figure 5, although the R^2 looks like throwing darts at the dart board after 5 pints, lol.

“Human traveling waves showed a distinctive pattern of spatial propagation such that there is a consistent phase spread across the hippocampus regardless of the oscillations’ frequency.”
This to me looks like part of the conceptual recall / compression process… temporal memory reflection to access higher level concepts from the input stream. They are accessing the vertical lines in the diagram in hierarchical sequence correlated to the arrival in the input stream…

1 Like

What we learn after the first 25,000 words of star trek, lol…

Probably this is redundant - there should be the same “processing engine” which works either with language as with any other activity humans can do.

How/what should such an engine be capable of?

  • One would be it translates sensory input into a flow of identifiable symbols
  • Second it would have to ignore most of those symbols and select a much smaller (sparser?) subset of significant ones that should indeed allow encoding of “experience” as a stream of most significant (== meaningful, relevant) symbols.
  • And sure there is branching as in your chart. What branching is triggered by - could be an action/decision of the agent itself or a surprising external event?

What is important to notice is words/language is only the part that is visible of a much larger … structure. Most attempts to infer intelligence vi language models only is like trying to infer Archimede’s law by studying tips of floating icebergs instead of full icebergs.
The key point here is that while the tip (language) is made of the same substance as the entire iceberg, we’ll have a very hard time understanding the nature of floating by focusing only on whatever is not submerged.


As I have pointed out in several posts in the forum, language is a learned skill. Likewise, the concept of naming objects is a sub-set of this learned language skill. While the bi-directional loop from frontal-motor-planning back to the experiential stream in the temporal lobe is common to most (all?) mammals the higher degree of connectivity to support speech seems to be uniquely human.

I think all mammals have this substrate for “simple” consciousness.

So I support cezar_t’s assertion that the substrate for language exists without necessarily exhibiting this learned behavior. It does not have to have GOFAI symbols to do what it does.

In particular, the many maps at the top ends of the WHAT & WHERE (dorsal and ventral) streams end up in the posterior portion of the temporal lobe. I see the combination of the contents of the maps in that area to be the spacial and temporal contents of the fragments of speech, organized into a syntactically correct stream by the corresponding motor planning parts of the frontal lobe. Each map is the smallest fragment of speech - somewhat like letters in a word. These contents are experienced by the portions of the medial temporal lobe to be registered as the “here and now” portions of episodes in the EC/HC complex. (See the Arcuate_fasciculus tract below for corresponding cortical locations)

Language is not a given in humans and we are greatly diminished without it:
See Ildefonso discovering naming below:

Note the various speech areas connected to this fiber bundle:


Except that the tip here is …1% percent of the iceberg. But I think this sort of understanding is almost impossible to communicate. You either get it in your childhood, or never, introspection can’t be taught.


I’m not the best at explaining stuff… and still have a lot to do…

The words are more like the visible surface of the iceberg. The sensory input.

The interior of the visible part is what I would think of as say “green roof” - simple derived conceptual references or directly correlated sensory reception. The equivalent of sub 20mS type post-synaptic inputs.

Under the water is where you end up with parts of concept abstraction structures akin to say ‘go to bed’ or ‘going shopping for food’. That’s were our understanding of them also dissapears underwater, like when you consider a sensory touch stream that identifies a keyboard.

For the model if I pass in a stream of words and pass in a couple of emotional senses as well (blended with the input stream), they are just another input, but now the “language model” has emotions involved… How different senses are processed is a separate work in progress theory.

The initial focus is on the memory and the structure of the memory because if you don’t have that then the rest of the process will not work properly.

What looks like branching in the diagram is part of the conceptual hierarchy (the process of going under water), you could think of them as dendrites and the nodes as neurons/synapses - it’s one dimension of the hierarchical form (non-temporal) whilst the sequences are more the temporal streams that reflect the temporal relativity of the sensed input, i.e. the relative timing of the sensory inputs and the relative timing of those inputs. Time is an abstract normalisation this stage.

HTM with SDR’s are the same type of pattern identifiers, so if there are words/emotions/senses/etc. that are all combined in the input process/encoder(s), does the SDR break or is it agnostic… it’s agnostic to the pattern, but are emotions processed in the cortex or elsewhere ? Do emotions have the same temporal relativity to the other senses ?

Yes, if you try and use only words as particular symbolic assumptions I would very much agree a language model will get you nowhere, or rather seeing colourless green sheep sleeping furiously. Been there and wasted about 6 months…lol.


Is language really just a more efficient emotional mirroring mechanism and we have figured out how to abuse/use it to save clubbing someone to death for the food or a better mate ?

Is saying “hello” really that different to waving when it comes to what is really driving the behavioural action ? Does the cortex really mandate a friendly emotional mirroring attempt with words ? Or does the dumb boss just say “wow there’s Fred my mate…” (the image of Forest Gump madly waving at this point came to mind…) and the cortex never really get’s involved in changing the action…

I’ll still have a go with crappy diagrams, lol.

1 Like

I think that misses the main point. We may initiate a motor action such as reaching or walking to achieve a goal such as getting food or water.

Humans have learned to use a different motor action (speech) to achieve our goals. You can ask for food from mom and get something to eat. This speech thing really ups our game for meeting our needs socially.

1 Like

I’m in broad agreement with this as a plausible model. Two points:

  1. The translator (sensory flow to symbols) is a separate thing: sound, smell, skin, vision all have their own, but emitting similar symbols. (We have about 24 senses, not just 5).
  2. There are 3 ‘channels’, not just one: language, vision and ‘sensation’ (the rest). There are many things you can hear/smell/feel/see whether or not they have words, but those three modalities (see, hear and feel) are pervasive throughout language.

But the core model (sensory => symbol => processing => symbol => motor) is one I find convincing. And SDR is the best candidate for symbol right now.