I’ve been playing with this idea of an ultra simplified sensory-motor engine. And I thought, surely someone has tried to make this already!
HTM, fully actualized will be a sensory-motor inference engine; an agent that explores an environment and learns how the environment can be manipulated.
My question is, what projects already exist that try to achieve that goal? They would have to be less complex because HTM theory isn’t fully conceived yet. So I’m assuming they’d have to use more naive technology like basic pattern matching and route finding instead of the more intuitive approach to generalization that HTM promises. I’m thinking it might employ a method such as ‘unsupervised reinforcement learning without reward’ or something.
But anyway, does anyone know of any projects or products that might adequately be classify as an attempt to be a generalized sensory motor agent?
I don’t know of any projects doing that. But, it’s an intriguing concept. I’d also be interested in any projects doing this. It could be fun to plug a sensory-motor agent to a small robot on a Raspberry Pi or the like.
I am working exactly on that for some time; operating an agent with limited sensors inside a 3D game environment simulated in real time. I guess you want to know about a possible approach:
You can use full blown CLA units (layers) and build hierarchies out of them with some sort of reinforcement learning (Q learning or temporal difference learning) biasing the current activation towards a chosen set among the predictions. You need to implement some sensors to feed the online data obviously. You need both the spatial pooling and temporal memory on your CLA layers. There is no simplifying it in my experience. I tried and failed a lot to simplify things.
The obvious challenge is that the agent first needs to learn how its motor behavior is related to its sensory change before producing any voluntary behavior. It is doable up to some complexity through associative learning. You just form synapses from the current active cells towards the active motor cells that produced the behavior, on their own at first. Then you need to be able to activate some of your predictions (depolarized cells) on your deciding layer through your reward circuitry (reinforcement learning). If the relation between motor activity and sensory change is learned successfully, this forced activation would produce the behavior to actually see some activation in the sensors similar to the wanted one.
This pipeline gets you some primitive behavior depending on your sensors and motor actions. Though what you really want is an agent that is able to form higher level behavior in time out of these lower level behavior through temporal pooling and become more abstract. This is the main goal of mine, and is a strong reason to pick HTM for this problem.
Nice to see your post. I believe Numenta is currently actively researching S-M related problems, I haven’t heard an update from them lately… =) I agree with you that it would be interesting to hear about others working in this area.
I’d like to point out that this is a hard problem to research. What environment does one use? It has to be complex but feasible. When I was at Numenta they has used a grid world to experiment for SM inference and considered (but ultimately shelved I believe) using the Unity game engine. You’re stated goal strikes me as ambitious; and surely it has to be broken down into smaller problems: E.g, 1) how to learn the effects of one’s actions vs. 2) action selection vs. 3) action execution. Additionally which reinforcement learning algorithm and how does it work with the HTM SP and TP?
Another question is has anyone played around with the OpenAI Gym? (maybe @fergalbyrne and @ericlaukien have based on this comment HTM in OpenAI Gym) I feel that there’s likely an environment in OpenAI Gym suitable for S-M research.
Hi @mccall.ryan, yes we have been using the OpenAI Gym since it came out to test our RL “exoskeleton” for the Feynman Machine (that’s why @ericlaukien developed and submitted his PGE 3D engine as a replacement for the non-Open Source one that’s in there).
We’ll be releasing some Sensorimotor/RL-centric software and paper in coming weeks, using a new encoder design which is better at both prediction and SM/RL than in the current paper.
Thinking about this I’ve come up with what I hope are some answers to those questions specifically.
I thought, if you want to solve this problem you want to isolate it so for research purposes it might be easiest to create an environment which is:
fully observable. That way you your agent can know everything that changes given a specific action.
and
deterministic. That way the agent wont get confused by other actors in its environment mucking things up. you may even need it to be static so that if it doesn’t send in an action to the environment, nothing happens. This is not the case in most any video game where you have physics simulations. Even in those openAI play ground thing there’s simulated gravity and other bots doing things. So I wouldn’t start there. I’d start with a static environment.
Instead of using the unity game engine, I’d start with something simple and uniform like a number line, and eventually move to 2D environment like a grid and mazes, then to puzzles like 8-square and the rubik’s cube, after that maybe you can move onto environment which aren’t entirely static like mario or whatever.
You could hook this up to a simulated numberline where the only thing that changes in the environment is its location on the number line. it can go up or down. You can let it learn all the transitions from 0 - 999 and then let it predict what might be above 999 if it comes up with the correct structure of repeating the base 10 rule set of symbols then you’re good to go, you can move on to a grid, etc…
Anyway, that’s just my thoughts on how I’d go about researching this problem or developing solutions.
1- 1st try was based completely on some abstract sparse distributed representation classifier with cascaded representations that would somehow capture functionality of a CLA layer. Maybe it was possible but I could not do it. It was too “handcrafted”.
2- The initial CLA architecture was composed of units without temporal memory. The idea was; on every frame the agent would “imagine” the outcomes of all the possible actions and pick the best out of them. Same layer would be fed every possible action along with the current state and the output of the spatial pooling would be compared to pick the one with the best reward.
The problem is if you want to really interact with the world itself there is no causality in this system. Yes the agent picked the best action according to the patterns learnt by spatial pooler but it did not have any knowledge about the actual state the action would lead it to (sensory change). It only knew how rewarding was taking action a on state s. You cannot really obtain behavior sequences and interact with the world.
2- I tried temporal memory with single cell per column. Multiple actions for the same state could not be represented with “enough” similarity. If you merge the motor activity and current state as the input to spatial pooling you cannot have state information independent from the action. You would get different representations on the same state for different actions. This leads to exhaustive learning for every action-state tupple and lacks generalization. If you have multiple cells per columns then you can represent multiple actions on the same state without drastically changing the whole representation.
3- Then there were experiments without boosting or bumping mechanisms:
Without bumping (strengthening synapse permanences of the proximal dendrites of the “lesser” used columns) you would simply not utilize every column because the patterns appearing on the sensors are actually a fraction of what is possible with the input space. So you have to somehow adapt your column connections to those.
Without boosting (artificially forced activation for lesser used columns) the early representation stability is better but the columnar representation for similar states keep changing in time so the increase in stability is slow. With boosting, the representational change is massive for early phases but it leads to better stability and column activation loads, though it takes time and hinders early learning. Keep in mind that the effect is massively correlated with the richness of the patterns on your sensor. Boosting is still turned off for the layers of the architecture where the stability is critical.
4- There are also synapse decay mechanisms to prevent false positives on Nupic implementation.
I was forced to implement these as my experiments became more complex. The representations for similar states change in time as the layers learn but the predictions of those outdated representations stay. Basically you have to get rid of these one way or another. Or else the agent keeps trying to reach these outdated states.
These are the important trials that I can remember.
I’ve also done some experimentation in this area as well. I’ll be posting the details on the forum in the next few days or so, but to give you a quick idea of the application, I have initially started with what @jordan.kay recommended - a deterministic system (no environmentally-caused state changes). The application is a maze that a robot can navigate using four directional motors and receiving inputs from four sensors when it encounters walls, as well as an input about its current position. It learns the maze by “watching” a human navigate the maze, and uses this input to build its temporal memory and initial RL policy. Switching out of “watching” mode, it can then use what it has learned to move through the maze and improve on the initial policy over time. The RL system I came up with is a bit unique and would probably be starting to get off-topic to go into here, so I’ll talk about that in more detail with some nice drawings when I am ready to post my own thread.