I was reading recently Steven Wolfram’s remarks on his website about ChatGPT, where he says it is discovering semantic rules and that language may follow rules of meaning that we didn’t know about. As far as I understand, ChatGPT works by predicting the next word - just one word at a time, based on the words it has already generated. So like HTM, it is a sequence predictor. Somehow, learning to predict (ChatGPT uses backpropagation) creates an internal body of knowledge. One problem that researchers are running into is deciphering the internal rules (or models) that ChatGPT has discovered. So this leads to a question - does HTM build an internal model? If so, is that model easier to decipher than the connections of a transformer, or a deep learning net? Could HTM learn language?
I’d say the answer is neither yes nor no. Neither of the systems have a capability to learn an explicit world model, if you’re referring to SP+TM as HTM.
A world model is like a simulation. It allows the system to infer about some facts about the world (and consequently, about the external systems) without needing to directly observing them. And in that sense, both a transformer and an HTM have that sort of ability to some degrees. But that “world model” is a severely weaker version and is very much implicit.
The transformer has a very rich capability of learning various complex functions, and through backpropagation(gradient descent), it learns a very efficient factorized/decomposed representation of the data distribution, which helps it to generalize well, because the constraint of making the factorizations efficient makes them coincide with or at least closely resemble semantics sometimes. But its world model(if you can call it that) is implicit and limited. Just ask ChatGPT to play chess with you, it’ll make a bunch of illegal moves very confidently, despite it claiming to know the rules and its ability to describe every rule very accurately. It’s very surreal.
The HTM is non-Markovian, meaning it doesn’t directly model the relationship of the current input and the immediate past N inputs (usually, N=1 for Markovian models). And for a such system to predict the next input, it has to have internal states. Specifically, HTMs learn and recognize the context of the input. But I doubt it has a good capability to understand the underlying structure of the inputs, compared to deep-learning systems such as the transformer having a such (albeit weak) ability to an extent, through complex credit assignment achieved by gradient descent.
So, could both systems model a relationship that coincides with the “rules of a simple external system” (a world model)? Yes.
Do they actively try to discover and deliberately exploit the “world model”? Ehh, not so much.
I think the Thousand Brains Theory has components of learning the world model, though.
P.S. Chain of Thought(think step by step) in LLMs(transformers) has some components of exploring the world model.
Mark,
The question you are asking applies to a higher level assembly of the current HTM model.
I have collected what I think the individual columns are doing in HTM models here:
Numenta has started to branch out from this basic model in two directions: one is in the interactions with the thalamus, and the other is what happens when you add lateral connections. Both of these lead to higher-level behaviors.
The TBT (Thousand Brain Theory) is an exploration of the lateral connection concept. I don’t recall seeing a paper based on the interactions with the thalamus.
I have not seen additional published explorations of either concept from Numenta, but I am not privy to what goes on there, and there have not been a lot of newly published research meetings lately.
That said, I am coming to the conclusion that at a higher level (the H of HTM), each map/region/area of the brain works on the principle of pattern completion using these local lateral connections. Each fiber tract carries the ‘output’ from one map to another. The computation aspect is when two or more tracts impinge on each target, so the computation is the pattern that best satisfies the partial patterns arriving at that target.
In one direction, you have the senses that flow from the parietal lobe to the temporal lobe, where these senses are registered as ‘awareness.’ As time progresses, these are assembled at the highest levels as episodes and registered in the EC/hippocampus.
Going the other way, starting with projections of needs from the subcortex into the forebrain, you have the connection streams flowing the opposite way. These projections into the sense stream serve to drive attention and recall. These interactions also flow up the same pathways as the senses and are also registered in the temporal lobe/EC/hippocampus, converting ‘awareness’ into ‘consciousness.’
There are about 100 maps/regions/areas, with the arrangement of the many connecting fiber tracts being explored in the connectome project. The thalamus plays an important role in awareness (gating surprise) and spreading activation. The conjunction of maps into a single activation roughly corresponds to the global workspace model. I think of this as symbols being assembled into a word or sentence. A given global pattern roughly corresponds to the instantaneous ‘contents of consciousness.’
I see this bidirectional interaction (using local pattern completion) as the basis for speech production, and I see this as having much the same dynamics and properties as large language models.
I have skipped over the role of the subcortex, episodic training in sleep, how the palladium does the routing of activation, the sequential nature of processing in the system I am describing, and numerous other details, but this post is already at the TLDR point, and adding more will lose more readers.
In summary, HTM may well be part of a model that could learn language like a LLM, but as a sub-component and not at the level most of the people on the forum are currently working at.
globus pallidus? Globus pallidus - Wikipedia
Nice summary. IMO model building is the critical factor in AGI and the one big thing missing from any current AI, no matter how clever it seems on the surface.
Animal brains build models of the real world, smarter brains build better models. We can even build ‘models of models’, to figure out why other brains make the choices they do.
Models are inherently physical and can be projected forward and backward in time, so they depend on location and time senses.
So are you aware of any work being done in this area?
We may be lucky and find that this is new ground - finally - something new to discover!
But yes, to see how that is performed, you have to keep in mind the behaviors of a mass of HTM-like units. With lateral connections of a limited size but a reasonable length distribution of contacts, many relationships can be established with incoming fiber tracts.
Consider the outputs of the early stage forebrain, following learned mapping, motor plans are selected and sequenced. (Details elsewhere) some of the motor fibers project to the parts of the cortex that select parts of learned patterns that are labeled memories of a sequence. This makes more sense when you think of how states are represented in the cortex.
More plainly, the forbrain can impose/command some key into the stored memories and “pin” that part of a pattern. The surrounding maps try to compute the lowest energy match with as much of the pattern as it can see. If there is a continuous pattern that resonates with surrounding patterns then they act as accelerators for each other. This ripples along the same paths as sensory patterns, and is sensed at the temporal lobe, digested and passed to the subcortex.
Some handwavy magic in the subcortex as digesting part of the pattern - as it is decoded and recognized as part of the solution the subcortex is seeking - so a pattern is selected - the pattern is deployed - the focus (pinned cortex) shifts and a new pattern formed around it. The search is a sequential process until a goal state is recognized/accepted at the subcortex.
As goals evolve the process continues. A sort of Bays navigation.
As far as the forebrain is concerned, walking and talking and thinking are all motor programs to bring about some goal for the subcortical overlord.
Similar to this:
Substituting HTM behavior for the “oscillation”, and adding in the thalamus to coordinate the various activated areas, you can see the similarities to what I have been promoting above.
The various areas acting together are the H of HTM.