A testcase of HTM with reinforcement learning

The problem that you’re describing is a sensorimotor and goal-oriented behavior problem.

Sensorimotor means that the snake must model changes that it observes that are caused by its own actions, as opposed to external changes that it passively observes.

Goal-oriented behavior means the snake must additionally drive motor commands in pursuit of some goals. In your case, there are positive and negative signals and the snake must learn to drive sequences of motor commands to get to the positive rewards.

NuPIC does not currently have models for sensorimotor or goal-oriented behavior. Numenta and others are actively working on advancing HTM theory and extending our understanding to include these components of cortical function.

If you don’t want to wait do theory advanced that could take years, there are some ways to set up models to solve the snake problem. You can try using the experimental sensorimotor model in nupic.research:
https://github.com/numenta/nupic.research/blob/master/htmresearch/algorithms/TM_SM.py

We can’t provide support for that code and we will be changing it substantially in the future but you can try it out. You can search that repository for example code that uses it. The idea would be to learn a sensorimotor representation of your grid world and then when you want to drive behavior you can simulate different motor sequences and look at the predictions.

4 Likes