A thousand agents hypothesis

In HTM the brain is made up of an hierarchy of “learner modules” (aka columns) which are connected hierarchically and each of them has both narrow receptive window which it gets its inputs from and a narrow purpose: to detect & learn whatever spatio-temporal patterns its inputs are exposed to.
What is intriguing how ever is that each column is sensory-motor assembly in itself - it is fitted not only with inputs (receptive fields) but also with action outputs, and the whole HTM theory hypothesizes the actual motions of the animal machine is the result of a voting process.

On the other hand we have reinforcement learning with a different perspective: there-s an agent that, by whatever algorithm it’s possible (called policy), it interacts with its own environment through observations, rewards and actions.
Observations represent a narrow perceptive window within environment, actions represent what agent does, and rewards… well this is more tricky in the sense in RL is treated, as a simple scalar measure of gains/losses the agent receives from the environment.
I think real world isn’t wired that way, rewards are internally generated signals within the agent that:

  • serve the purpose to direct an agent’s action towards internally valued targets.
  • they aren’t one big reward with fixed rules but many,
  • they also fluctuate by context and/or agents internal needs.
  • in intelligent creatures agents have a certain ability to reshape their reward structures.

But the reward is a digression now, I’ll return to the hypothesis.

What is interesting to notice first is the symmetric feedback loop: RL’s perspective about an agent and its environment: agent responds with actions towards the environment as result of observations provided by environment, and so on a never ending cycle of actions->observations->action->observation and so on.
A game that plays between agent and environment.

What if we regard the “thousand brains” we have under the skull as a lot of individual RL agents each with its own individual policy(algorithm) each interacting with its own internal environment?

The obvious questions are why and how. A slightly subtler question is, ok we have a pretty good idea on how to make RL agents to play in various environments, and hope that somehow we can make them swarm & coordinate, but where do we get/represent the whole complexity of our actual environment to each individual agent and what their “collaboration” should happen?

Regarding why - we all know (or assume) our intelligence is capable to build an internal, interactive model of the world.
As mammals we can simulate internally what-if scenarios of actions and interactions with the “real world” before actually interacting with it.

One big question of intelligence is how does this model happens, how it works, how is it built?

And here-s the hypothesis:
Under the lid, we-re not a single agent as we perceive ourselves, but a me + thousands of other agents. Aka otherlings or actors if you like.

Why? because, since we already have one RL agent code available, we (== biology) can make many copies of it, each copy’s purpose being to simulate the behavior of every piece (aka thing) within the environment we become aware about.

Long story short each tiny agent’s purpose is to model looks & behavior of an “real-thing-or-who” (hence the otherling term) and this way we get rid of “environments” per se by having each otherling’s observations being simply pooled from the actions of its neighbor otherlings in the current arena or scene.

The cartesian theater might not be such a bad idea as most cognitive experts believe. Maybe not the the model of the theater itself was wrong but rather the assumptions about it.


(I haven’t read the post carefully (yet), but) I immediately thought of Daniel Dennett’s Multiple drafts model, which includes:

A wide variety of quite different specific models of brain activity could qualify as multiple drafts models of consciousness if they honored its key propositions:

  1. The work done by the imaginary homunculus in the Cartesian Theater must be broken up and distributed in time and space to specialized lesser agencies in the brain.
    [+ 3 more key propositions]

I always had the intuition that it has to work like this. Higher areas’s environment is the lower areas and they can influence their input via feedback, you can interpret that as a context signal or as an “action”.

one thing that bogs me and I’d like the answer to is: how big is this individual agent? is it a minicolumn or a hypercolumn? an entire area like V1?

if it’s a hyperccolumn, how big is it? would it be closer to 700, 1k, 5k or 100k neurons? I’ve seen people claiming it to be every possible value between 700 and 100k


My thought of this is a little different as I think our “perception” of a model is just the side effect of the many “otherlings” (using your phrase) creating thier own forecasts, to which through attention/inhibition focus we only percieve the elements (correlated/winner clusters of otherlings) that we can externally relate to. What goes on under the hood so to say we just can’t relate to from an external perspective, so when attention focuses/inhibits a winner “model” fits with the external world we relate to. The model perception being a creation within a separate area that only knows external reality coherence for patterns, unless they are chemically induced with drugs to go on a trip.

The other part to this is each otherling operates within in it’s own time domain, which may show up as a correlation of sorts with the neural cluster size.


And there is this idea of general intelligence as a more general property than we currently conceive


Nothing really new here, just repackaged. A truly excellent explanation of biological self-organization. Side note, Levin is at Tufts, think Dennett.

1 Like

Nice little piece, although it doesn’t to all that far. But I agree.

The core theme of intelligence (it seems to me) is pretty obvious: finding solutions to the problem of surviving and doing it way faster. If evolution by mutation takes thousands of generations and adaptation (in the sense of reusing existing genes) takes tens, intelligence and learning can get there inside one. And instead of passing on knowledge in DNA, the intelligent animal passes it on by teaching the young.

The human animal by intelligence alone can out-compete every species on the planet in just about every ecological niche. That’s the goal of evolution: survival, and intelligence is the fastest (only?) way to keep surviving when the rules keep changing.


Regarding the individual agent size, ideally won’t matter. if a general teamwork algorithm would be discovered.
That would need a generic task-division by which a single agent forks (or clones) itself in multiple instances such as

  • each forked instance starts anew with initial agent knowledge, so it isn’t “dumb”
  • yet it is specializing in its own allotted sub-space of the original input space.
  • individual agents sub-spaces overlap in a way that all relevant areas are covered yet redundancy does not becomes excessive.
  • The new “cluster” is externally perceived as a single agent.

In that perspective an actual agent “size” simply influences how big its allotted subspace is, sure there would be an optimum yet it should not be critical.

Important factor is how to decide whether an individual agent needs to fork&cluster or not.

1 Like

There are several researches on this theme, e.g. https://arxiv.org/pdf/2007.02382.pdf

(the authors have several papers & presentations)

Biological processes, corporations, and ecosystems – physically decentralized, yet in some sense functionally unified. A corporation, for example, optimizes for maximizing profits as it were a single rational agent. But this agent abstraction is an illusion: the corporation is simply a collection of human agents, each solving their own optimization problems, most not even knowing the existence of many of their colleagues. But the human as the decision-making agent is also simply an abstraction of the trillions of cells making their own simpler decisions. The society of agents is itself an


Well put.

1 Like

Yes, I’ve seen this line of argument before, but I’m not convinced. It seems to me that the algorithms executed at the intracellular, extracellular, neuronal, columnar, cortical, human, tribe, company, state, planet level probably have little in common. Scale matters, if nothing else.


Well, yeah its a moonshot. A good one IMO. Analogy with societies might be a stretch, yet “living” algorithms, being the result of evolution, might share a lot in common with each other all being incremental variations of the simplest one(s) like the one slime mold uses
A memory without a brain: How a single cell slime mold makes smart decisions without a central nervous system -- ScienceDaily

1 Like

Minsky got a lot of mileage on that concept.

1 Like

Well it might be some fundamental intelligence algorithms were discovered even before brains, neurons only inherited and adapted them.

Food for thought:

How that level of problem-solving ability is even possible?

Sorry, too long. Please summarise.

It’s pretty dense material, hard to summarize.

Touching matters like

  • collective intelligence at different levels of living organism tissue / cells,
  • defining intelligence as ability to solve the same problem in many ways and how does it apply to the living stuff at many levels
  • similarities of communication/organization between “regular” cells and neurons, showing that neurons are not fundamentally different they do the same stuff faster
  • xenobots
  • how if you convince a tadpole to grow an eye on its ass instead of head, some how that new topology does not make a functional difference - the retina cells and brain cells figure out a way to connect to each other and the eye is functional
  • how a planaria worm can learn some stuff, you cut its head and the headless part grows a new brain that remembers the learned stuff.
  • or how it can be made to grow the head with the shape of a different species of planaria by hacking the intercell communication.
  • or a frog which can normally cannot regrow a cut leg can be convinced to regrow it by “imitating” cells signals a salamander use to regrow its broken leg

There’s your spoiler.

Its a scientific freak show, the best kind.

1 Like

Thank you. But we already knew that a wide variety of organisms show at least some ability to choose between strategies, and learn from results, and that this contributes to survival and is thus preserved and amplified by evolution over countless millions of years.

The bit we still really don’t know is (a) the means by which relatively uniform brain material does so many different things (b) how cortex got so powerful and evolved in humans so fast. I don’t think this presentation helps with that.


That was a spoiler not a summary.
It is nothing about what “we already knew”. We did not

If you think an hour of your time is too precious, that’s fine, I won’t comment on a matter you decline to look at.


I kind of hypothesized this in the past. But instead of merely individual and unrelated agents, an agent spawns a new string/tree of neurons/agent after it has been primed for a specific set of inputs. This also means that an object can be represented by many agents and these agents are used to decide which of them is used to predict/classify an object through consensus.

On another related topic, if the brain is operating at quantum level then it might be that these multiple agents are simply superposition of itself.