I’ve thought about this question all yesterday and I think I’ve got an answer.
What exactly is it trying to do? Learn the environment. Now, that’s an interesting goal because it’s always within a context - Learn the environment in order to… gather resources, fight off invaders, find a mate, etc. etc. But to do any of these things, and anything at all in the environment you must get a view of it, you must develop a distributed mental model of anything in the environment that can help you achieve your goal.
In ML I know we start with the goal (the context for why we should learn the environment) and then work backward, and I think that’s what you’re asking for. The issue I take with this is that as the wording goes the agent then learns a policy not the entire environment itself in general.
Seems to me that we don’t want our AGI to learn a policy. Seems like the wrong approach, because, when a baby is born it doesn’t start learning a policy - what has it got to do? It has no goal except the implicit goal encoded in its brain’s baseline structure given it by DNA of figure out how to make sense of this weird and wild data you’re getting from everywhere.
But of course, we’re not making babies here. I just think we should keep that in mind rather than focus entirely on the traditional policy-oriented frame of reference. We’re actually not trying to get the agent to learn a set of triggers and behaviors (a policy), we’re trying to get it to learn how to make an abstracted memory structure upon which it can embed millions of learned policies.
Ok, But what does that actually mean for what goal we should give the agent? Because at some point - even if it’s programmed to walk around randomly for a while - it has to have a goal. Either it has to develop or infer a goal (I think that’s the ideal), or we have to give it one explicitly, but either way, it must have a goal: what should it be?
I think it should be something akin to survival. That seems to be the baseline goal for all things living (at least until reproduction might overtake it) so let’s start there. I think another way to frame “survival” is “seeking that path of most options.”
And that I think gets us really close to our solution to this answer. If it’s ideal that the brain itself come up with the goal of survival, and if we posit that the SUI will form a hierarchical brain, and if the brain unfolds predicted futures into the lower layers as actual behaviors to perform to generate the higher-level predictions, then all we need to do is make the SUI tend towards paying attention to contexts with higher variety.
Let’s talk through an example to decode the previous statement: Agent is in the environment, information about the agent’s location in the environment (the environment state at current timestep) flows into the lowest layers of the hierarchy. The information flows up. The highest layers understand, more or less, where the agent is (it’s location). By this, they know what other states it can get to (they have a union of predictions for where the agent could go, what it could see next). That union of predictions about the future gets sent down the hierarchy as the agent’s current context by which it should interpret all new incoming data. If those predictions, the context, tends towards variety, the agent will be moved to see novel things; things it has not seen in a while, or ever. If the context is instead biased to predict things that are known the agent may get caught in a loop where it prefers to see the same thing over and over and over.
Now, this is in no way completely understood by me, or anyone else, but I do understand the fact that goals for the agent can and (ideally should) be internally generated as predictions made by the hierarchy itself. Which is fundamentally created by SUI since the repeating smallest unit of intelligence is the entirety of the network itself.
Now if none of that seems valid or makes sense, then I think there’s one other thing I could say that would be of service to the conversation: Any goal we could possibly give it is technically none other than pathfinding. The agent lives in an environment. Manipulating a Rubik’s cube is no different whatsoever than walking through a higher-dimensional maze. It’s all pathfinding to a particular state of the environment. Everything is. If the agent is in a sensorimotor feedback loop with the environment then it is always ever only pathfinding through the state-space of that environment.
So if we have to give it a goal like “gather all these things and put them in this room” that’s just a high-level abstraction of many pathfinding tasks to form one pathfinding task: find the state of the environment where all the things are in this room.
That’s all the structure is doing when it comes to its interaction with the environment: pathfinding. So I’m not sure if that helps answer your question, but I think it’s important to understand we can unify all seemingly different types of goals as truly one goal this way.
Lastly, just to get down to the nuts and bolts of what to actually do:
Early on I suggested that we merely put the agents in the environment for some time, and allow it to interact with the environment according to its own nature, without a goal or policy. Only in that way, can it explore the state-space of the environment according to its own policy, embedded in the structure of the hierarchy itself, determined by the makeup of its particular seed SUI. Then, after it’s explored a while we can give it a goal to a particular state and see how it goes about learning to achieve that goal (as opposed to seeing how it goes about only achieving that goal).
That’s what we really care about if we want AGI - we want to build a mind that finds the easiest path to learn the widest variety of new things. We are not trying to find a mind that finds the easiest path to a goal.
I think we need to let the agent learn what it should learn to pay attention to as much as we should have it learn what to pay attention to.
Maybe there is a way to give it explicit goals all the time and see how well it learns to learn. How do you think that might be implemented?