A tousand agents hypothesis

In HTM the brain is made up of an hierarchy of “learner modules” (aka columns) which are connected hierarchically and each of them has both narrow receptive window which it gets its inputs from and a narrow purpose: to detect & learn whatever spatio-temporal patterns its inputs are exposed to.
What is intriguing how ever is that each column is sensory-motor assembly in itself - it is fitted not only with inputs (receptive fields) but also with action outputs, and the whole HTM theory hypothesizes the actual motions of the animal machine is the result of a voting process.


On the other hand we have reinforcement learning with a different perspective: there-s an agent that, by whatever algorithm it’s possible (called policy), it interacts with its own environment through observations, rewards and actions.
Observations represent a narrow perceptive window within environment, actions represent what agent does, and rewards… well this is more tricky in the sense in RL is treated, as a simple scalar measure of gains/losses the agent receives from the environment.
I think real world isn’t wired that way, rewards are internally generated signals within the agent that:

  • serve the purpose to direct an agent’s action towards internally valued targets.
  • they aren’t one big reward with fixed rules but many,
  • they also fluctuate by context and/or agents internal needs.
  • in intelligent creatures agents have a certain ability to reshape their reward structures.

But the reward is a digression now, I’ll return to the hypothesis.

What is interesting to notice first is the symmetric feedback loop: RL’s perspective about an agent and its environment: agent responds with actions towards the environment as result of observations provided by environment, and so on a never ending cycle of actions->observations->action->observation and so on.
A game that plays between agent and environment.

What if we regard the “thousand brains” we have under the skull as a lot of individual RL agents each with its own individual policy(algorithm) each interacting with its own internal environment?

The obvious questions are why and how. A slightly subtler question is, ok we have a pretty good idea on how to make RL agents to play in various environments, and hope that somehow we can make them swarm & coordinate, but where do we get/represent the whole complexity of our actual environment to each individual agent and what their “collaboration” should happen?

Regarding why - we all know (or assume) our intelligence is capable to build an internal, interactive model of the world.
As mammals we can simulate internally what-if scenarios of actions and interactions with the “real world” before actually interacting with it.

One big question of intelligence is how does this model happens, how it works, how is it built?

And here-s the hypothesis:
Under the lid, we-re not a single agent as we perceive ourselves, but a me + thousands of other agents. Aka otherlings or actors if you like.

Why? because, since we already have one RL agent code available, we (== biology) can make many copies of it, each copy’s purpose being to simulate the behavior of every piece (aka thing) within the environment we become aware about.

Long story short each tiny agent’s purpose is to model looks & behavior of an “real-thing-or-who” (hence the otherling term) and this way we get rid of “environments” per se by having each otherling’s observations being simply pooled from the actions of its neighbor otherlings in the current arena or scene.

The cartesian theater might not be such a bad idea as most cognitive experts believe. Maybe not the the model of the theater itself was wrong but rather the assumptions about it.

2 Likes

(I haven’t read the post carefully (yet), but) I immediately thought of Daniel Dennett’s Multiple drafts model, which includes:

A wide variety of quite different specific models of brain activity could qualify as multiple drafts models of consciousness if they honored its key propositions:

  1. The work done by the imaginary homunculus in the Cartesian Theater must be broken up and distributed in time and space to specialized lesser agencies in the brain.
    [+ 3 more key propositions]
2 Likes

I always had the intuition that it has to work like this. Higher areas’s environment is the lower areas and they can influence their input via feedback, you can interpret that as a context signal or as an “action”.

one thing that bogs me and I’d like the answer to is: how big is this individual agent? is it a minicolumn or a hypercolumn? an entire area like V1?

if it’s a hyperccolumn, how big is it? would it be closer to 700, 1k, 5k or 100k neurons? I’ve seen people claiming it to be every possible value between 700 and 100k

2 Likes
1 Like

My thought of this is a little different as I think our “perception” of a model is just the side effect of the many “otherlings” (using your phrase) creating thier own forecasts, to which through attention/inhibition focus we only percieve the elements (correlated/winner clusters of otherlings) that we can externally relate to. What goes on under the hood so to say we just can’t relate to from an external perspective, so when attention focuses/inhibits a winner “model” fits with the external world we relate to. The model perception being a creation within a separate area that only knows external reality coherence for patterns, unless they are chemically induced with drugs to go on a trip.

The other part to this is each otherling operates within in it’s own time domain, which may show up as a correlation of sorts with the neural cluster size.

1 Like

And there is this idea of general intelligence as a more general property than we currently conceive

1 Like

Nothing really new here, just repackaged. A truly excellent explanation of biological self-organization. Side note, Levin is at Tufts, think Dennett.

Nice little piece, although it doesn’t to all that far. But I agree.

The core theme of intelligence (it seems to me) is pretty obvious: finding solutions to the problem of surviving and doing it way faster. If evolution by mutation takes thousands of generations and adaptation (in the sense of reusing existing genes) takes tens, intelligence and learning can get there inside one. And instead of passing on knowledge in DNA, the intelligent animal passes it on by teaching the young.

The human animal by intelligence alone can out-compete every species on the planet in just about every ecological niche. That’s the goal of evolution: survival, and intelligence is the fastest (only?) way to keep surviving when the rules keep changing.

1 Like