I’m still working on this. But I’m too excited to hold back myself and want to share the results right now.
Its my graduation project and I’m building RL agents using HTM algorithms. To my surprise, HTM works rather well (comparing to DQN and A2C) in environments with dense enough rewards. HTM can learn how to act in just a few episodes. But the learning seems to collapse after, say 200 training loops. After that HTM just doen’t know how to act.
(Figure: A HTM agent in the CartPole-v1 environment and getting high rewards very early.)
I have. It has a variety of reinforcement learning games of varying difficulties. All of the games have the same API, so applying an AI to many different games is easy. It’s nice to work with and experiment with, though I haven’t solved any of them yet.
Never mind. I found why the agent goes nuts after a while. It turns out that the environment I use (CartPole-v1) requires the agent to alternate between sending commands going right and left to go slow. But HTM is bad at that.