Embodied HTM with Raspberry Pi Sensory-Motor Loop


Hello everyone,

I’m interested in experimenting with HTM and NuPIC in a simple embodied system with a sensory-motor loop (sensory: Raspberry Pi 3 with microphone and camera; motor: Raspberry Pi 3 controlling pan-tilt motors for the camera; learning/processing: NuPIC processing likely done on remote computer) to see if basic learning similar to an infant can take place.

I was wondering if anyone here had tried similar experiments, and if so, hopefully you’d be willing to share the approach you took in terms of NuPIC architecture, and also perhaps the stumbling blocks you ran into.

I already anticipate that the introduction of reinforcement learning might be a little more than NuPIC is equipped to handle, and I’d love to hear from more experienced folks what other problems might creep up.

I’m still deciding on the “metric” to use in order to evaluate the learning abilities of the system, but I’m thinking if I can get it to the point where it can replicate a simple visual habituation task that has been conducted with infants, that would be a good starting point.

Any thoughts, considerations, or warnings are much appreciated!



At least in a thought experiment, one thing that’s missing is some sort of output from your system. Without that, you’re essentially creating a locked-in network… it takes in information, but does nothing as an output, and isn’t able to necessarily draw any independent conclusions.

On the other hand, if included with all the sensory input information (I’d add a thermometer and some variation of gas sensor, for feeling and smell), along with the ability to output anything, like a speaker, led+potentiometer, or actuating motor that can interact with the world, then you would have an experiment to see if:

  1. it does anything at all.
  2. simply does random things
  3. starts to self-associate certain inputs with certain outputs.

For example, would it start out randomly moving some motor, creating a noise, or flashing its LED, then eventually drop off to only produce some response when it detects some stimuli (i.e. “clap” and it would twitch/blink)?

The ultimate starting goal would be to have it starting out doing anything at all, then see it calm down, then see it respond to stimuli.

There needs to be both input and output.
If you haven’t seen it yet, take a look at this video from a month ago about the functioning of the different layers.

I think in a very simplified way, that’s what our brain does, right?

If I have time, I may lend my support to this, but currently have a few other things I’m working on.


Thanks for the input @MaxLee! You are absolutely correct about the output, and I should have been more clear about that in my post. So there actually are two motors (pan and tilt for the camera) and my plan was to have them start out moving randomly (like a baby’s eyes) so that there is initial sensory input.

There will be motor signal outputs from the model to control those two motors, and ideally the “test” to see if anything is being learned will be to see if the system can direct the camera motors to point to novel input patterns (i.e. show curiosity) and habituate to old patterns (will have to build in some reinforcement learning to promote that behavior).

In the long run I do hope to add more sensors and actuators so that the system can do more exploring and build a up a more sophisticated model of the world, but two vision motors seemed like a good place to start. It’s a long shot, but with the processing happening remotely (the Raspberry Pi basically just streams sensory data and takes motor commands) it’d be cool to allow others to test their models using this system remotely.

Would love any help/input whenever you get more free time, so by all means let me know.


This is a good experiment and I was planning to do this myself.
I think what you can do is after converting the sensory and motor outputs to SDR and setting up the HTM model, first control the bot yourself using motor commands. You give those motor commands as a part of the SDR and train the bot for some time.
Then you pool the motor predictions(if you use a single SDR representation) and give them as motor commands(after relevant conversion to the suitable format, for which you can use a mapping method or anything else).
And then let the bot be. If it is about to, say crash into something, you can give it your own manual command to stop that instead of using reinforcement learning.
I would like to see how this turns out. The above explanation is so because I was planning to use motors to move the bot and the camera would have been stationary since the bot itself would move the way it wants.


Why not do this virtually first? you can see exactly what it saw, the input doesn’t have to be so complex, and you can itterate fast.