The definition of self

Talking about RL systems using an SMI model, how would you define your RL problem regarding the boundary between the agent and the environment?

RL people classically defines the “body” as part of the environment. But where does the body begin and stop? Should the SMI model be considered as part of the environment ? Is it contradictory with the embodied cognition framework that involves the mind & body as a coherent whole ? Does those questions really matter to progress in combining RL & HTM ?

Extract from the “bible” of RL by Sutton & Barto:

The boundary between agent and environment is typically not the same as the physical boundary of a robot’s or animal’s body. Usually, the boundary is drawn closer to the agent than that. For example, the motors and mechanical linkages of a robot and its sensing hardware should usually be considered parts of the environment rather than parts of the agent. Similarly, if we apply the MDP framework to a person or animal, the muscles, skeleton, and sensory organs should be considered part of the environment. Rewards, too, presumably are computed inside the physical bodies of natural and artificial learning systems, but are considered external to the agent.


The boundary is pretty fuzzy.
There is considerable evidence that your interactions with “personal space” is modeled in the parietal cortex; the map streams location is about midway between vision and the somatosensory area. This is the space that you can direct your limbs to touch. There does seem to be a visually guided component to this learned space.

I agree with what Mark is saying above. I think we can also think of this from the Thousand Brains point of view. If your whole isocortex is performing the same type of “object modeling” (and I am using object in the absolute vaguest way here), you must include the pre-frontal cortex, where it seems there are egocentric reference frames, meaning a representation of the space where the agent’s sensors exist with respect to each other.

There is a sense of self in this space, but like Mark said, it’s very fuzzy. It’s not just where your skin ends and air begins. It’s a sphere of influence in reality, where actions have immediate effect (remember we stored this with respect to our own movements). This same sense of space can be said to extend to non-physical realms as well, like language and social interactions. Our sense of self transcends the physical reality, breaching across brains and across society, it includes some definition of all our social abilities, unseen, yet just as important as our physical abilities.

All that being said, the same framework where these abstractions exist must at its core represent a much simpler survival mechanism: a model of one’s physical self in physical reality. We must deal with the “self” defined as an object in egocentric space, but with all the additional complexities of the abstract abilities and behaviors of “self” as an object with behaviors.

In a way, the most detailed object you model is yourself. You have the most first hand experience with this object.


I think that it begins in the vestibular system, and radiates out from there.

In part of this system, we have a gyro stabilized reference platform (the vestibular system) that is directly mapped to the eye tracking system to keep the eyes from being distracted by self-motion. (At that point, in that little spot on the brain stem, there is the closest thing that if you had the correct instrumentation, you could measure as the neural correlates to a sense of self.”

To help those joining the conversation, this thread walks though this basic chain of connections with the second post covering the details of expression in the posture system.

We learn to extend our control out from this frame of reference with the ability to project our agency “into” attached systems. Watch a skilled gamer or a backhoe operator to see this extension of agency in action. For that matter, typing or playing an instrument can reach the point where it is an extension of your body.

1 Like

I’m sorry I know little about RL and SMI, so maybe I should just keep my mouth shut about this… but I’m not going to.

I think the general principle is that everything is part of the environment if you look at it at the right scale.

The body has an environment (which is the environment), the brain has an environment (which includes the body), indeed, it mostly is the body. The brain sends messages to the body, it doesn’t send messages anywhere else. The body sends messages to the brain. The cortex sends data to the rest of the brain, so the rest of the brain is its environment, etc, etc all the way down to the neuron.

So it’s all about the scale that you’re looking at. I’m sure this post is really annoying because it is pure principle and real-life implementations of technologies lag behind pure principles and therefore the pure principle realities seem to mean very little to any real-life practitioner. But I think the more we keep pure principle in mind the better we can engineer new systems to match them. But it doesn’t do much for those working in existing systems, sorry.

This is why I’ve been obsessed with learning the structure and design of the ‘smallest unit’ of intelligence such that an amalgamation of these smallest units creates macro images of themselves and all their capabilities at larger scales.

Think of, say, a motorcyclist. He cannot afford making any big mistakes while learning how to ride a motorcycle. How much to lean when turning has to be learned, but he really doesn’t want to fail. So as we know from HTM, there’s a hierarchy where the macro models predict a large range of how far to lean during the duration of the turn, then the submarco models converge on a range mostly within that range, and the micro models converge and finetune moment to moment on minute alterations of what has already been agreed upon.

The hierarchy itself mirrors this scale hierarchy of boundaries between self and other, entity vs environment.

And learning is distributed, that is when leaning into a turn next time, the macro models predict a narrower range of leaning, not only because they’ve done it before successfully, but because they were able to see how far incorrect thy were by witnessing how their predictions were altered by the consensus of lower-level micro models. I think we have to capitalize on this ‘hierarchical’ feedback loop in order for machines to start learning with as few observations as humans need. that essentially comes down to treating different scales of the hierarchy as external environments in order to learn from them in a reinforcement pattern.