Difference between SMI and RL?

I’m writing my graduation project report and got confused. What is the difference between SMI (Sensor-Motor Inference) and RL (Reinforcement Learning)? Can I assume SMI to be HTM’s version of RL?



SMI does not include reward / punishment. SMI can simply be a model that accepts sensory input and incorporates self action into its model of an environment over time.

RL requires a function that calculates some reward or punishment (usually) based upon a model, then makes a decision upon what action to take next.

A RL system could use an SMI model.


In my own taxonomy of machine learning techniques, the main distinction that I draw between the two relates to how much a’priori knowledge, or meta-understanding of the system, is available to guide the learning of the agent.

Reinforcement Learning is usually considered a supervised (or semi-supervised) learning technique. This typically implies that that the reward function (or cost) function is known in advance or there is some form of a supervisor (e.g. human or GAN) available that knows the difference between right and wrong.

Sensor-motor Inference, on the other hand, is generally considered to be completely unsupervised. In SMI, the agent is attempting to make predictions about what to expect from its sensory input based on external environmental factors as well as its interactions with the environment (including its own movement and placement of its sensors). The agent adjusts its internal representation based on the deviation of the prediction from the observed sensory input. Sometimes this is due to an insufficient model of the environment and/or self-interactions with the environment. Sometimes it’s due to the surprise of novel phenomena or interactions. Ultimately, the agent learns to adapt to whatever behavior the environment exhibits. If there is sufficient regularity to this behavior, then the predictions of the agent should improve over time as its internal representation is refined.


How do you think a captured training signal fits in your scheme?
In the dad’s song project we posit that the structure of language is learned by listening first, then imitation.

In a nutshell:
Passively learning dad’s song coupled with emotional flavoring.
Passively learning with other sounds with no or other emotional flavoring.
(Could be alert calls or other signalling such as food)
Active learning to produce sounds by exploratory vocalizing motor skills.
Time passes.
Hormonal / reinforcement driven learning to create prior learned Dads song.

This turns out to be a very interesting question. There isn’t even a wiki about Sensor-Motor Inference.

Could we try to define a precise definition for this? Where does it start and end? Maybe define a flow chart using HTM terminology?

How could we compare it to an affordance (Cisek’s term)? Is an SMI a set of affordances? Or are there different SMIs?


I’m not familiar with this concept.

I’ve also not considered the behavioral aspects of imitation to be a direct consequence of sensory-motor inference. Such behavior may not be adequately explained by the over-simplified model of SMI that I outlined above. But then again, I was only trying to draw a distinction betwen RL and SMI as requested by the OP.


Please see this paper (page 444) for a description of latent learning that is expressed after a very long interval:
The Evolution of Multiple Memory Systems - David F. Sherry, Daniel L. Schacter



This latent learning seems to be related to imprinting.

This learned template forms a strong training signal that is as immediate during the SMI as tactile feedback. It is of a different nature but I see the basic mechanism as being very similar; the error signal is closely coupled to the motor performance.

1 Like

This is interesting, but it brings up a few questions.

  • Since birds didn’t evolve the same structure as our neocortex, does that mean that SMI omits the neocortex?

  • Bird song SMI only involves the vocal chords (or whatever a bird uses to sing) and auditory feedback. Isn’t the SMI supposed to move the subject in the world, and thereby change its world model?

To me, the inference part means that every output to a muscle, coming from the neocortex, receives a direct feedback from the muscle to the emitting zone, combined with one or more secondary inputs that help rectifie the output.

This is why I bring this up - the connection is somewhat more distant than the direct connections across the sylvian fissure. (Motor to somatosensory cortex)

In this case, we have some internal model that is compared to the perceived sound - the difference being the error signal.

With SMI you have to consider that the connection between perception and some movement still requires some feedback from some other part of the brain saying that the motion is doing something wanted in guided motion. These are two slightly different activities.

I don’t think it takes much imagination to think of humans trying to imitate a perceived and remembered sound. You can see babies babbling to learn to run the voice mechanism, and then a distinct difference when they try to make some remembered sound. Again, two slightly different activities.

Babies are a wonderful laboratory to see learning in action if you pay attention.


I see.

There is another difference though, I think.

My brother is a physiotherapist, and he told me that people who sprayed an ankle run a higher risk of spraying it again, because they need to relearn to walk after their cast is taken off. It sounds like during wearing the cast, their SMI was modified to walk differently, and they forgot how to use the muscles properly.

So this means the learning is continuous and constantly adaptive to new situations.

I should get me one of those. Are they expensive? ;-).

Agreed, had a newborn in the house at the time I was first getting into HTM. Really useful watching them develop while you’re mulling over the neuroscience.

I experienced exactly this with a soccer injury, an ankle sprain. Re-injured it every subsequent season until I found a good physio, who explained why proper rehab was important for this very reason.

The physical injury will always heal itself, but if you don’t give the brain enough practice re-calibrating with the muscles though various daily balancing exercises, then you won’t be able to keep balance during the awkward moments and that’s when you get injured again.

They involve a significant investment of your time. Much easier to borrow one from a friend or family member for short periods.


I love these comments on the forums. They are soooo to the heart and made my day that much better. :smiling_face_with_three_hearts:

Anyone want to make one using HTM? :stuck_out_tongue:


SMI does not necessarily require the agent to move in the world. Only that the agent is able to perform an action that has some affect on the environment. SMI then learns to recognize these environmental changes due to self-action as distinct from other changes due to external causes.

It may also be possible for these same (or similar) systems to build up a model of the interactions of other agents with the environment. If the other agents (or their behaviors) are sufficiently similar to the observing agent’s, then some of its self-interaction modeling could be applied towards this task. This may then be a part of the cognitive capabilities that lead to behaviors like imitation.


In the same vein, the fiber tracts from the frontal lobe back to the sensory areas mean that you can perceive some of the actions wholly internally without passing out to the body and back through the senses.

While this might be hard to envision, with these connections, it should work that you can direct early “motor” activity to the higher-level object store to trigger recall that is perceived. These actions will be learned by the cerebellum much the same way as learning to walk, resulting in rapid and fluid mental sequences.

This is the same basic mechanism that I propose underlies the mechanism of consciousness.

1 Like

2 posts were split to a new topic: The definition of self