In HTM the brain is made up of an hierarchy of “learner modules” (aka columns) which are connected hierarchically and each of them has both narrow receptive window which it gets its inputs from and a narrow purpose: to detect & learn whatever spatio-temporal patterns its inputs are exposed to.
What is intriguing how ever is that each column is sensory-motor assembly in itself - it is fitted not only with inputs (receptive fields) but also with action outputs, and the whole HTM theory hypothesizes the actual motions of the animal machine is the result of a voting process.
On the other hand we have reinforcement learning with a different perspective: there-s an agent that, by whatever algorithm it’s possible (called policy), it interacts with its own environment through observations, rewards and actions.
Observations represent a narrow perceptive window within environment, actions represent what agent does, and rewards… well this is more tricky in the sense in RL is treated, as a simple scalar measure of gains/losses the agent receives from the environment.
I think real world isn’t wired that way, rewards are internally generated signals within the agent that:
- serve the purpose to direct an agent’s action towards internally valued targets.
- they aren’t one big reward with fixed rules but many,
- they also fluctuate by context and/or agents internal needs.
- in intelligent creatures agents have a certain ability to reshape their reward structures.
But the reward is a digression now, I’ll return to the hypothesis.
What is interesting to notice first is the symmetric feedback loop: RL’s perspective about an agent and its environment: agent responds with actions towards the environment as result of observations provided by environment, and so on a never ending cycle of actions->observations->action->observation and so on.
A game that plays between agent and environment.
What if we regard the “thousand brains” we have under the skull as a lot of individual RL agents each with its own individual policy(algorithm) each interacting with its own internal environment?
The obvious questions are why and how. A slightly subtler question is, ok we have a pretty good idea on how to make RL agents to play in various environments, and hope that somehow we can make them swarm & coordinate, but where do we get/represent the whole complexity of our actual environment to each individual agent and what their “collaboration” should happen?
Regarding why - we all know (or assume) our intelligence is capable to build an internal, interactive model of the world.
As mammals we can simulate internally what-if scenarios of actions and interactions with the “real world” before actually interacting with it.
One big question of intelligence is how does this model happens, how it works, how is it built?
And here-s the hypothesis:
Under the lid, we-re not a single agent as we perceive ourselves, but a me + thousands of other agents. Aka otherlings or actors if you like.
Why? because, since we already have one RL agent code available, we (== biology) can make many copies of it, each copy’s purpose being to simulate the behavior of every piece (aka thing) within the environment we become aware about.
Long story short each tiny agent’s purpose is to model looks & behavior of an “real-thing-or-who” (hence the otherling term) and this way we get rid of “environments” per se by having each otherling’s observations being simply pooled from the actions of its neighbor otherlings in the current arena or scene.
The cartesian theater might not be such a bad idea as most cognitive experts believe. Maybe not the the model of the theater itself was wrong but rather the assumptions about it.