An open-source community research project on comparing HTM-RL to conventional RL

matan_tsuberi · June 12, 2018, 10:28pm

The most naive idea which I outlined in Trying to make an HTM augmented/based RL algorithm is to let the model predict both the next state’s value and next action, and choose the action by sampling N random actions in addition to the predicted action and pick the one with the best predicted value. Supposedly the model will converge on the best policy.

I have not yet fully read @sunguralikaan’s work on making a HTM-TD(lambda) hybrid, which seems like a more sensible approach.

Topic		Replies	Views
Reinforcement Learning and HTM Algorithm Machine Learning sequence-memory , encoders , question , community , nupic	26	3570	June 18, 2019
Multistep Predictions vs Predictions from Predictions Numenta Theory	14	1946	December 9, 2016
Reward hacking in simple HTM agents (using OpenAI Gym) Engineering	19	1550	June 1, 2020
Exploring Reinforcement Learning in HTM Tangential Theories	19	4420	July 4, 2018
HTM do not perform well when learning a simple function like y=x! NuPIC	23	1569	February 6, 2018

An open-source community research project on comparing HTM-RL to conventional RL

Related topics