I’m trying out some experiments on HTM for reinforcement learning. And Markov decision process (MDP) is a fundamental type of tasks in reinforcement learning.
My answer to the question is yes. Because spatial poolers are able to represent the mappings from current state to action or from current state to the next future state. I have managed to solve some simplest MDP tasks with method similar to swarming – to find a winner out of a swam. But this method doesn’t change the connections in spatial poolers much, which makes it lack a sense of “learning”.
But how does a spatial pooler learn Markov process efficiently? I know the learning rule of spatial pooler is based on Hebbian learning which is unsupervised. Is it possible to add some kind of guidance for learning? My current idea is to see if some evolutionary algorithms will work.
@hwangtamu I think you need to use temporal memory, not spatial pooler to learn Markov process. Spatial pooler converts binary input patterns to SDRs. Temporal memory takes SDRs as input and makes prediction of future inputs. Unlike Markov process, temporal memory can maintain long term sequence context and learn very high-order Markov sequences.
I suggest you to take a look at our papers on spatial pooler and temporal memory to learn more about it.
I’ve read some of the papers. The temporal memory may help to evaluate the current state, but Markovian tasks do not require temporal memory. I’ll see what I can get from temporal pooler.
Hmm, I don’t understand your comment “Markovian tasks do not require temporal memory”. Maybe I am missing something here. How could you use spatial pooler to represent a mapping between current state and future state?
You could put the SDRClassifier on top of the SP to get first order predictions, which are sufficient for the problem. The classifier gives multiple predictions with different likelihoods as well.
In this case, though, it is the classifier that is solving the prediction problem, not the SP.
Scott exactly clarified my confusion.
The architecture of the system can be something like:
Raw Input -> Binary Encoder -> Spatial Pooler -> Layer of SDR Classifier -> Decoder to output
My previous assumption is incorrect. The classifier plays the key role.