A testcase of HTM with reinforcement learning

NuPIC implemented a model that mimic the core behaviour brain respond to universal inputs.

presume that all things well done, then we got a system which could respond to sensor input, yet passive, for the system lack it’s own motive.

pains and desires are the compelling force which droves livings to act.

so i build a testcase called snake game which introduced pain & desire in the system, hope it could help the snake adjust its act to get desire fulfilled and least pain.

in the game, the snake move around in an square splited into grids , within which food emerges randomly.if the snake catch the food, it’s desire score goes high, and if hit the wall, it feel pain. the vision, pain, desire, and direction control could be easily encoded into SDRs.

under this prediction, the system should evolve the snake to a food hunter, which expound a complete machine intelligence who would adapt to environment and avoid danger and pursue it’s own desire.

i’ve been work around with NuPIC for several days, still couldn’t figure out how to endow the snake to act based on the vision pain and desire it sensed, any suggestions?

Bellows are python scripts to illustrate the idea.

https://pypi.python.org/pypi/snake_nupic

3 Likes

The problem that you’re describing is a sensorimotor and goal-oriented behavior problem.

Sensorimotor means that the snake must model changes that it observes that are caused by its own actions, as opposed to external changes that it passively observes.

Goal-oriented behavior means the snake must additionally drive motor commands in pursuit of some goals. In your case, there are positive and negative signals and the snake must learn to drive sequences of motor commands to get to the positive rewards.

NuPIC does not currently have models for sensorimotor or goal-oriented behavior. Numenta and others are actively working on advancing HTM theory and extending our understanding to include these components of cortical function.

If you don’t want to wait do theory advanced that could take years, there are some ways to set up models to solve the snake problem. You can try using the experimental sensorimotor model in nupic.research:
https://github.com/numenta/nupic.research/blob/master/htmresearch/algorithms/TM_SM.py

We can’t provide support for that code and we will be changing it substantially in the future but you can try it out. You can search that repository for example code that uses it. The idea would be to learn a sensorimotor representation of your grid world and then when you want to drive behavior you can simulate different motor sequences and look at the predictions.

4 Likes

Sounds like a worthy experiment.

Here are some other test problems for reinforcement learning: https://gym.openai.com/

btw, I changed the topic title from “A testcase of HTM theory”.

5 Likes

A lot of the results in deep learning neural networks are the result of empiricism. When you read the papers by top researchers the mathematics often has little or nothing to do with the systems they have built up by trial and error (and guessing). There are 2 strands there that actually don’t meet. A more focused and sensible form of empiricism is called engineering. That is the middle way. Also since engineering components are well understood anything you build with them you can explain and rationally defend.