I’m trying out some experiments on HTM for reinforcement learning. And Markov decision process (MDP) is a fundamental type of tasks in reinforcement learning.

My answer to the question is yes. Because spatial poolers are able to represent the mappings from current state to action or from current state to the next future state. I have managed to solve some simplest MDP tasks with method similar to swarming – to find a winner out of a swam. But this method doesn’t change the connections in spatial poolers much, which makes it lack a sense of “learning”.

But how does a spatial pooler learn Markov process efficiently? I know the learning rule of spatial pooler is based on Hebbian learning which is unsupervised. Is it possible to add some kind of guidance for learning? My current idea is to see if some evolutionary algorithms will work.