Crafting a Reward Function with a Heir-achy of Needs for Complex Goal Formation
author tofara moyo
abstract
We show a simple method for an agent to learn levels of abstractions ordered by priority that ultimately increase the global expected reward. Each level is associated with a separate scalar output of the neural network at each time step t which is fed back to the agent as part of the state at time t+1. The agent then correlates them with features of the state initially randomly. It however learns the correct assignment by doing it in such a way that it increases the global reward. We describe an equation meant to order these scalar values and the global reward in order of priority and hence induce a heir-achy of needs for the agent. This then forms the basis of goal formation for it.