Commonality of language - self emergent learning

I had seen cortical (had not read the patent) and it seems to work for what it does. 16kbit is a lot of data and a lot of compute. The implementation seems closer to a single over sized cortical column ?

My representations currently use 32Bytes / 256bits, it is not a bitwise SDR style representation rather a different higher level abstraction. Sparse connected and sparse updated, which reduces the compute requirement significantly but adds to complexity in a different dimension because the structure becomes the conceptual representations and not the nodes. This I think is sort of like the pattern recognition in a column becoming the concept rather than the constituents of the column.

This is also where I think a part of hippocampus is actually playing a rather basic but clever role (which fits with HM and memory creation) and was why I was more curious to see if any of NLP had tried to explain or incorporate this role as to any explanation or correlation as to language structure. I don’t have open access to research papers so limited to what the search engines return. Currently trying to divert all the time to coding as my philosophy is if it does not work 100%, it does not work and at the moment it does not work, close but no cigar.

Please don’t let that stop you. You have two or three options:

  1. Search and check if a PDF is available.
  2. When it isn’t, two common options are available:
  • Contact one of the authors and ask for the PDF. The vast majority of authors will be glad that somebody is reading their paper, and will share it with you.
  • Learn about sci-hub (find the DOI for the paper and then search there).

Looking forward to a more detailed presentation of your ideas and models on language.


I have searched around quite a bit for something specific to what I am thinking but there is nothing that matches as it may be near on impossible to prove. The abstract perspective does not help either, lol. The grid/place cell research is interesting but needs a little bit of a twist in thought to apply to phonetics. Grid cells are no different to any other input, we are just thinking they are special.

Looking at Fig 2 the type of scale I think fits with my thoughts because the area would otherwise need far more connections and neurons.

“CA3 system operates as a single attractor or auto association network to enable rapid, one-trial, associations”

The one time nature is key for what I think is the initial step on language processing, although I think it get’s quite abstract because it is dealing with phonetic type language compression / abstraction rather than cortex feedback of recognised pattern “objects” / “concepts”. The blending of a few other sensory inputs also complicates it a bit, but I think they all share the same logical pattern.

The hippocampus in areas like CA3 just lack the neurons to act as any sort of long term memory and does not have the connectivity to hold significant working concepts. Especially if synchronicity has to be factored in to the processing because that would require a lot more recursive iterative feedback from the cortex.

“Indeed, a hallmark of a “pure” case of anterograde amnesia is the inability to encode (or learn) new information, despite relatively intact abilities to retrieve premorbid memories, to remember a small amount of information, such as a phone number, for tens of seconds or even longer, and to demonstrate the improvements that accompany repeated performance of routine behaviors or repeated exposure to stimuli. Also spared in anterograde amnesia is every other major domain of cognition–sensory perception, language comprehension and production, motor control, intelligence, and so on.”

This also fits with my thought that the hippocampus is the sort of effective site foreman for the cortex construction site… He does not do any work himself, just directs what is right (and can recognise what is right) that the relevant workers can all see. The gatekeeper. The additional difference is the way in which the green light is given because the cortex knows what but not how to build. This creates an interesting position for the reference frame concept because it in part turns it on it’s head. The hippocampus knows how but not what and that is where it starts getting very abstract, which is what I’m trying to implement at scale.

The associative memory setup I have at the moment (and being re-written again) is sort of a gated Hopfield type arrangement, but the dimensions change for each input, which makes it a bit abstract due to the lack of consistency in a way. The next variation adds to this (and gets even more strange) because I think the memory and higher concepts are in a way one and the same thing, just depends on the recursive mapping and construction which provides and creates the hierarchical constructs and the predictive functionality by default.

May well be a million miles off (my view is I have a million-1 chance of it working fully), end of the day just my conjecture that through human bias flaws of the brain fits with my personal belief bias, lol… Back to coding…


Yaqiong Xiao, Angela D. Friederici, Daniel S. Margulies, Jens Brauer,
Development of a selective left-hemispheric fronto-temporal network for processing syntactic complexity in language comprehension,
Volume 83,
Pages 274-282,
ISSN 0028-3932,
(Development of a selective left-hemispheric fronto-temporal network for processing syntactic complexity in language comprehension - ScienceDirect)

1 Like

Lot of reading and going down various rabbit holes of referenced documents, but still can’t find anything that matches.

How easy is it with previous methods to self segment with no input other than the text (and work on any language) ? My process is a relatively simple recursive decay/inhibit method, which I think is closer to how part of the Hippocampus works whereby it does not need to know a cortical columns contents or attributes, nor even needs to know some columns exist. The groups then self grow as well into sub groups (some of which are map areas). This is just the Hippocampus component.


This is one of the initial segmentations / groups that I get out of my process reliably now with near on any quality of text input. This group (before further segmentation) coincidentally maps to articles, demonstratives, determiners, etc.

For a given block of text this is the underlying word frequency.

Word / Count
you 18,181
the 106,332
he 18,522
in 25,124
to 51,298

Why I think this is my dancing around the camp fire moment is that I believe this is part of the “attention” mechanism for the HTM process, which with the other groups also allows the HTM structure to be created, ground up. It forms the basis of the learning signal to the cortex !

The hippocampus knows what and the cortex knows where.

Some of the words in the list have no “meaning”, other than to biologically direct attention beyond the Hippocampus. This is where I think our interpretation (and groupings) of language fails because we assume all language has a coherent meaning “in” the cortex only and have not factored in the real biological mechanism in aggregate (inconsistent between cortex, hippocampus and others).

One further group that evolves then directs attention towards working memory rather than the broader cortex as part of the way recall reinforcement works (e.g. “his”) for constructs to work in the “evaluation” process of cortex thought later on in the process.

This type of segmentation I believe is critical to the way HTM works because it is how the differing maps can integrate a coherent structure.

To me language does not appear as an interpretations as to what’s a noun, adjective, etc. it appears as a particular sequential pattern of constructs that breaks the existing “rules” and assumptions.

The groupings seem to work for me, at least for the first million or so words so far.

Maybe just my conjecture / evolving working theory at the moment, but hopefully a kick for some to think outside the box and not to follow the herd. Back to coding again…

I made a prototype of a model that does this (but not any of the other things which you wanted to do). I made a presentation about it here: Video Lecture of Kropff & Treves, 2008 . At the end of the presentation I demonstrate segmenting a string of letters into distinct words.

Interesting presentation, thanks. I realise that I’m looking the process at it a bit differently and another difficulty in that I’m not going the SDR route which makes communicating some of the aspects a bit more difficult. Plus I just try to implement them in code to see what works so all I see is code, my math writing ability at this level is near zero but I can still code what I can’t formalise.

Interesting results.

I like the work your doing modelling the brain at the neuron level and it’s awesome, providing some very profound insights.

I had not thought of the decay wave in the sense as a conventional filter in relation to frequency domain (and a temporal window effect), that I did not see in that way, although now clearly see it. I know filters from radio. Allows for a different perspective when looking at the code though. Thanks.

Your chart at 15:55 looked sort of similar but very different underneath, my initial testing looked a bit like your chart, with over 5bn entries on the horizontal and a temporal feed forward prediction through all 5bn entries. I realised that it would not work later on though because the temporal structure was incorrect, it lacked sufficient structure to actually be useful in a PFC type process (thought). Good for basic recall though. At that scale some other effects and complexities start to occur as well.

The bigger difference is that I’m not looking at changing weights in an existing network, rather creating the network as the learning process itself (which is where the hippocampus is key). I see the brain as a just in case biological probability network, which ends up with massive over provisioning to allow for “just in case” it’s needed (fire together, wire together proximal probability). With computers we can skip this overhead pre-provisioning probability and look at it in reverse. Wire only what is needed and the resulting aggregate structure ends up comparable, just without the noise and absolute minimal connections. The update method/process can then change as well to reduce compute significantly.

The process that I’m experimenting with starts off with dominant excitatory signals and then as the model learns what is learnt then creates a larger source of inhibitory signals (new connections), which upon reading some papers seems to more closely match what biologically happens with the early hippocampus. This also seems to fit with why we can learn languages more easily as a child, because this learnt wiring ends up then having to offload unknown patterns to the cortex. The dominant inhibitory framework precludes some new language patterns from being recognised properly in the hippocampus so the cortex has to do it when we try to learn a language at the adult stage. Sleep replay then tries to help the Hippocampus change because it creates the only pattern of weakening to allow changes to occur.

The profound conjecture at this stage would be to say that the inhibitory layer in the thalamus is created in a similar manner in relation to the cortex structure learning triggered by the Hippocampus. This I have yet to implement because it is the higher level abstractions that trigger the inhibitory connection requirements. Still much to think about on this because the emotional chemical basket effects complicate things.

This Hippocampus process I think is what Minsky is partly seeing as the missing universal grammar, only the reality is that the words (concepts) in the cortex don’t really have a coherent grammar consistency as such (green ideas sleeping, had had had had) but the hippocampus does have the abstraction of a universal type grammar of sorts and it is the resulting structure that is key. This is the only coherent universal type grammar/structure and it’s in the Hippocampus.

This does end up with a “universal” grammar of sorts, its just impossible to define with any consistency to our tagged words because they are completely incompatible. Which is where I’m at complete odds with the existing order of things… Maybe just my conjecture and still a (million-1) to 1 of working as a whole, but I really do hope this triggers some other lightbulb moments for some and good debate… so back to more coding.

1 Like

Are you familiar with bouba and kiki? My conjecture is that is where we will find universal grammar.

I had not, so it was an interesting read, however I do believe that what is being observed is commonality in associated learning patterns in the cortex (recursive evaluation) rather than what I think is the universal grammar of sorts via the Hippocampus. Aggressive, sharp, shock sounds also tend to be inversely associated with clam (ability to sleep through a sound a key divisor) so become emotionally associated (trigger a chemical release), which adds to additional recursion or feedback within an evaluation. Any sound pattern that triggers adrenalin for example may well not be considered “calm” or “soft” upon evaluation, regardless if the audible pattern has any real associated meaning.

The emotional attachment to learning of concepts I do believe is fundamentally critical to the provision of an evaluation mechanism. Words would otherwise all “mean” the same and just end up a pattern that the likes of GPT-3 can throw out without any emotional coherence as to meaning in context.

Ok, so here goes… given a star trek book to start with and no input tagging of any sort… caveman attempts to light fire (maybe in the rain).

= = = = ATTENTION = = = = = == = CONSTRUCT = = = = = = CONCEPT = = = = =

Attention - abstract reference (thalamic inhibitory relaxation) or working memory recursion. These words are intended as a directional signalling mechanism. Other than that they mean nothing, they only mean something to the biology of the Hippocampus in relation to universal grammar processing. “they” triggers a recursive sensitivity amplification in working memory to reinforce the associated active memory fragments. “they” is learnt by the Hippocampus. Words like “Constantinople” mean nothing to the Hippocampus, only to the cortex HTM structures.

Construct - The type by which the cortex pathways are created (HTM structure) in a single pass feedforward manner which is biologically plausible via an ART mechanism for filtering the targets. The constructs I believe end up like recursive cross linked graphs that wire up probabilistic pathways like Markov chains but with a feed forward element to alter the probability dynamically and a recursive element. Thought is then effectively multiple activations of consistent pathways through the cortex, which jump chains via recursive association (cortical column - cortical column sequencing).

Concept - The associated items/locations to be connected together (constrained by probabilistic connection proximity in biology - why learning does not always happen in one pass). These are recursive triggering associations based on the existing HTM structures. Within a computer there is no constraint of connection proximity so learning is then single pass.

They way I am looking at the language side it to completely forget existing POS groupings and structures. Make it an abstract split between Hippocampus biological mechanisms and cortex recursive evaluation of HTM structure. The structures end up different as well because they are recursively linked/associated. How language is the structured is however and whatever the words sequences are for any given type of language. Language is just a set of constructs of concepts, it has no inherent pattern beyond the Hippocampal mechanism.

Does this sound anything remotely sensible or have I lost the plot because it seems to fit with my thoughts and system at the moment (maybe confirmation bias, lol.) ?

My knee-jerk reaction answer is ‘No’. However, under a different context; i.e., NCC, it might be sensible and even encouraged.

Pan troglodytes, which may become Homo troglodytes someday, have a near identical brain structure to ours, a little bit smaller, no torque, but very similar nonetheless–at least when it comes to HTM. What they lack is language, real, human syntactical language. We do not have it either until after we learn it. They can’t even learn it and that is where a lot of ‘don’t bother going here’ research gold can be found.

1 Like

Another set of articles to read and ponder… had not heard of NCC.

Two aspects of interest… one based on publication quotes (google search the quotes) and the other conjecture from looking at active scans and the dog…

  1. Arcuate Fasciculus
    “arcuate fasciculus that is much smaller or absent in nonhuman primates.” and “Modern neuroradiological studies suggest that the AF connects posterior receptive areas with premotor/motor areas, and not with Broca’s area.”
  2. I believe that the buffer in the Hippocampus also needs to be of a certain temporal dimension before linguistic understanding can occur at the level “we” understand because of the nature and structure of our syntax. We talk in a longer temporal dimension but it’s “packaged” into smaller batches via recursive triggers from the cortex. Brief simple linguistic instructions seem to work for most larger animals. This temporal buffer also has a significant impact on the characteristics of working memory because it constrains what the cortex can encode and therefore pass and be used in thought in PFC.

I can tell the dog to go get my slippers and he runs off and returns with a slipper all excited as a game. That may well be a “pattern” or a learned response to a stimulus for food, but to me it’s just a vocabulary constraint of temporal dimension that limits asking the dog to then put my slippers back where he got them from. That would require recursion involving first encoding the concept of “returning” something. “ing”'s are all longer temporal sequences.

Elephants (“Suda”) can paint, so is that not a more complex form of communication than speech as it involves a much longer temporal sequence of events ? Understanding the input does not require an equivalent response, which may be biologically impossible to create for some animals (see 1).

I may have a working proof in a few weeks/months/years or never, but it will be quite interesting finding out the unknown unknowns along the way. I already know I don’t know a lot and I don’t known that I know it will not work, lol.

1 Like

Sorry, “Neural Correlates of Cognition/Consciousness.” My point is that the so-called mainstream cognitive science is pretty much locked into this; i.e., finding out where in the brain things happen. HTM is pretty much in line with that, but very much on the fringe since it is computational as opposed to studies that elucidate structural components via FMRI or probes, etc.—in vivo.

One must be extremely careful (read ‘skeptical’) of results like the elephant painting thing. Basically a learned behavior and, at least in my opinion, not far from teaching great apes to talk–all of those basically lifelong experiments have failed.

The Rilling paper is far more interesting, I found the statement “This does not preclude the possibility that the modified pathways mediate functions in addition to language, such as tool use…” provocative.

Read the article “A Cognitive Neural Architecture Able to Learn and Communicate through Natural Language”. Maybe it will be useful for you.

1 Like

Available here:

or here:


Great article, thanks. I will point out that it was published in 2015. Apparently, they did not achieve machine sentience or sustained self-emergent learning.

1 Like

Their idea, as well as Jeff’s, encouraged me to implement their project with HTM. I hope to succeed.


Keep us informed, that is a good plan.

1 Like

You might find this one interesting as well

Dracula resurection as it’s turning to night…

RNN’s are viewed as a formula, which can be unwrapped or temporally laid out as in the googled and first hijacked image (sorry - it lacks the recursion and feed forward dimensions are not quite labelled properly…) :

In this instance, change the persepctive as to what the input represents… and for each input rippple through the input “forward in time”…ahead of the calsulation… and where it meets an associative “correlate” ripple this effect “back in time”…association… ok, maybe lost 99% of you… It;s a feed forward with a recursive loop that is derived and created during the learning process (the learnt hierarcical structure is key). The whole process has a temporal correlate as to the update, the what is now.

View this in an SDR perspective and the current logic of HTM as to how distal and proximal connections are emulated in arrays at a unitary temporal resolution. The current SRD/HTM process relies on temporal coherence. i.e. a particualar value of time… coherence. This effect always feeds forward +1 (and temporal criticallity of 20mS as a result) as that’s the bilogical implementation of the SDR/HTM structure as implemented (highly probablistically modeled). Where this creates issues in the basic understanding as to what a comma represents when reading a book.

The comma represents nothing but time.

Those proximal coherences are just that… signals that arrive within a given time (e.g. in a burst context). The problem is slipping into the easy abstraction of time or the daemon of the complexity of time. SDR’s and HTM are the easy perspective (whilst still very much a mind f…k when interpolating them through time in your head).

I know nothing as I don’t have (and will never have - I’m too old to see any reason for this) any published papers, but what I do know is what works that is biologically plausible, whilst very argumentative.

What relevance does it have to language ? Language is attention defined by time… the comma…

1 Like