An open-source community research project on comparing HTM-RL to conventional RL

The most naive idea which I outlined in Trying to make an HTM augmented/based RL algorithm is to let the model predict both the next state’s value and next action, and choose the action by sampling N random actions in addition to the predicted action and pick the one with the best predicted value. Supposedly the model will converge on the best policy.

I have not yet fully read @sunguralikaan’s work on making a HTM-TD(lambda) hybrid, which seems like a more sensible approach.

3 Likes