I’m still working on this. But I’m too excited to hold back myself and want to share the results right now.
Its my graduation project and I’m building RL agents using HTM algorithms. To my surprise, HTM works rather well (comparing to DQN and A2C) in environments with dense enough rewards. HTM can learn how to act in just a few episodes. But the learning seems to collapse after, say 200 training loops. After that HTM just doen’t know how to act.
(Figure: A HTM agent in the CartPole-v1 environment and getting high rewards very early.)
I’ll release the source code once it’s ready.