Reinforcement Learning and HTM Algorithm

Paul_Lamb · May 6, 2019, 8:34pm

Be sure to check out @sunguralikaan work here: HTM Based Autonomous Agent

I have also spent a lot of time studying how to apply RL to HTM. I essentially started with this high-level view:

Given a state, predict all the future expected reward for each possible action that can be performed from that state, and choose the action which is predicted to give the maximum reward. Then use the actual reward to improve future predictions when that state is encountered again in the future.

From this, you can see one obvious place that HTM could be applied. The temporal memory algorithm captures the semantics of temporal patterns, and recalls them when they are encountered again. This could most easily be connected with a backward view TD(λ) algorithm (https://youtu.be/PnHCvfgC_ZA), where each bit in the output of the TM layer would have some weight on what is the current state. Learning in a setup like this could also happen online – as the states (sequences) are learned and re-encountered, simultaneously the model could be predicting and running TD(λ) to improve its predictions over time.

I personally believe one could take this even further than just marrying the TM algorithm with an RL algorithm. The TM algorithm out of the box actually does a lot of the legwork required for backward view RL, by connecting a series of states into an “eligibility trace”. What is missing is a way to pool sequences along with their reinforcement “value”.

What I am thinking about currently is combining multiple sources of information into common stable pooled representations. The activity in a TM layer tracking sensory input would be one source, a TM tracking motor actions would be another, and emotions would be another. Combined, these would produce representations that include not only sequences of states and actions, but also their emotional context. Emotional context could then be weighed against the current needs of the system and used for action selection. The pooled representations, once chosen, could then be used to temporally unfold the motor sequence, comparing predicted sensory input and predicted emotional context with reality and updating the model online.

Topic		Replies	Views
An open-source community research project on comparing HTM-RL to conventional RL Related Papers	63	3343	June 19, 2018
Trying to make an HTM augmented/based RL algorithm Engineering	7	1353	May 6, 2019
Reinforcment learning using NuPIC(.core) Engineering	3	1060	April 17, 2018
Deep Reinforcement Learning, HTM Numenta Theory	5	1268	May 14, 2016
Help with an HTM implementation NuPIC help-wanted	5	1182	November 9, 2017

Reinforcement Learning and HTM Algorithm

Related topics