Probabilistic Predictions and Reinforcement Learning Concept in HTM Theory

Dear All,

I have studied the literature related to HTM and might be I have not understand it deeply that is why I have two questions.

1- HTM network can be used for predicting up coming events or instances in a sequence and that’s how anomalies can be detected and it is possible that at a time instance there are multiple predictions but is it possible that HTM shows predictions with specific value of probability?
In short HTM return probabilistic predictions or non probabilistic predictions?

2- HTM learn sequences with its dendrites segments by making synapse strong or week and changing the permanence value of synapses by increment or decrement. After understanding this mechanism of learning it seems to me that it resembles reinforcement in which reward and punishment occurred.
So please confirm me that the concept of learning in HTM is inspired from reinforcement learning or am I wrong and mapping wrong concept with HTM learning?

Looking for your kind response.

Yes is gives probabilistic predictions.

No, HTM is not inspired by RL. They are very different things.

Remember almost all these concepts involve neural networks and Hebbian learning, but that does not make them the same.


Thanks for your explanation @rhyolight

Is it possible to get all predictions of HTM with specific probabilities?

Can you please refer me any literature material so that I can clear my self that how HTM learning approach is much different then neural network, RL and Hebbian learning?

Actually I admit that HTM is built on neuro-science (neo-cortex CLA) working principles. But I am going to present HTM in next week in my university and I want to be prepare for answering the questions of professors that belongs to the different domains of computer science.

1 Like

Yes, instructions are in the quick start. Extracted predictions contain both classified prediction values and probabilities.

Let me know if you have any other questions!

The TM’s predictions are composed of set of cells which become predictive. There can be any number of cells in the predictive state, so the system can make none, one or many predictions at each time step.

  • Each predictive cell is made predictive because it has active dendrite segment(s).

  • Each active dendrite segment is made active because it has enough active synapses (formed connections to active cells from the prior time step).

  • Each active synapse is made active because the ‘presynaptic’ cell it connects to is currently active, and the synapse’s permanence value (scalar from 0-1.0) is above a threshold (TM parameter connectedPermanence).

So each cell’s predictive state is ultimately binary, in that it is either predictive or not. It is not probabilistic, though the permanence values making them predictive can be checked to see how close each sequence is to being forgotten. For instance if the permanence values from the currently predictive cells to the prior active cells are ~1.0 then those memories would take longer to forget than if the permanence values were ~0.3.

Yep, each cell’s segment’s synapse’s permanence is learned locally. This gives each TM model very high potential for variance, and thus very high capacity for uniqueness.


…and in no small part there is the plasticity itself! It creates capacity to distinguish values by adding dendritic connections to parts of the knowledge space (and culling those which no longer express meaning along a certain vector).

There is research that indicates that the brain prunes heavily every single night. If I can find it tonight I will edit into this post.

does it mean that it has the same function as Q function in RL?

1 Like

I wonder if @lucasosouza might say something about this? HTM does not keep a queue at all, it performs high-order memory without this non-biological trick.


I wouldn’t say so. A Q function is a function that maps state and action to a scalar value. The scalar value represents the expected reward at the next timestep plus the expected reward at all future timesteps discounted by a discount factor.

Like Matt said, HTM is not at all inspired in RL. But if we want to make this comparison, the TM prediction would be more related to a transition function, which outputs the next state given the current state and current action. The transition function in RL is one of the two functions that define the environment (the other is the reward function), and are not usually learned by the agent. The most common strategy in RL is to search directly in the space of policies or search in the space of value functions (or Q functions) and use the value function to derive an optimal policy.

There is a group of RL algorithms commonly referred to as model-based, that also seeks to model the transition function to accelerate the learning process (if you model the environment, you can generate possible transitions and learn offline as in planning). Model based algorithms go all the way back to Dyna-Q, in 1991, to recent sota algorithms like google’s SimPLe.