A Sensorimotor Machine - proof of concept with existing tech

I’m trying to make a (proof of concept) Sensorimotor Machine: something I’ve called a sensorimotor inference engine in the past…

The difference is now I’m thinking it might be possible to make it out of existing technology.

Jeff is always talking about defining the principles inherent in intelligence as the most important fundamental first step towards intelligent machines. I 100% agree.

And I wonder, can we find a mashup of existing technologies that allow us to approximate these principles?

For instance, we have a recent example of a combination of simple technologies that gave us exponential improvement because they embodied special principles: Generative Adversarial Networks. The principle they embodied: a feedback loop that produced an arms race.

I’m hoping we can do the same towards the goal of making a sensorimotor machine: a machine that naturally (without external rewards, direction, or influence) learns how to control its sensory input through motor output.

I believe we can more or less define what kind of principles are essential to the task (these are straight from the HTM playbook):

  • Modularity - the machine’s mind should be a network of components.
  • Homogeneity - each component of the machine’s mind should be similar to all (or many) other components at the most granular scale (like how cortical columns are similar).
  • Compression - of representation (ideally into SDR-type representations) is necessary in order for components to communicate with each other.
  • Spatio-Temporal Representations - The compressed information must represent an understanding of Spatio-temporal patterns in data, the transition from one moment to another at least.
  • Protocol Variation - ‘compressed’ representations should be similar to nearby nodes, but not exact. Computer systems typically speak a precise common protocol, we want a system where nearby components speak nearly the same language, which changes slowly the further you move from any component and these changes are a reflection of the changes in patterns experienced by the nodes.
  • Composability - each granular component should easily interact with other components.
  • Hierarchy - composability should allow the network to tend towards forming hierarchical interactions.
  • Invariance - further away nodes should represent larger-scale structures that should intern produce more highly invariant representations (spatially and temporally).
  • Others? - there are probably other principles we’ll find valuable that are as yet unknown.

Can we combine existing technology in such a way as to embody these principles so we can create a simple, generalized Sensorimotor Inference Engine proof of concept? I think so, and here’s where I am starting:

Autoencoders produce a compressed representation. They can be made to represent the transitions from one timestep to the next rather than mere spatial patterns. They could be modular and homogeneous, they can send their compressed representation as input to one another. They can therefore form hierarchies and with a little extra information (such as how many inputs does this other node and I have in common) they can send more compressed (and I think, therefore, more invariant) representations to ‘further away’ nodes.

In other words, what I’m suggesting is that by wiring a network of autoencoders up in a special way, (to each other and to an environment they’re interacting with) we may be able to embody these principles of intelligence and therefore produce a prototypical design of the most simple AGI conceivable: a Sensorimotor Inference Engine.

If my intuition is correct, we could create something called a Sensorimotor Autoencoder.

Here’s a diagram of an extremely simple version of what I’m describing:

Notice how the compressed representations for T0 and T1 are fed as inputs to all other neighboring autoencoders (this is a fully connected network). Also, notice how each autoencoder is receiving different input about the environment, it isn’t shown here but they would receive overlapping information about the environment, this would also be true of their motor output.

Of course, this is extremely simplified. the ideal version of what I have in mind would probably have several autoencoders per ‘node’ in the network all receiving basically the same input, but compressing it by varying rates in order to send the correct invariance level to other nodes. Furthermore, each node would have to have the ability to modify its connections, not only to the environment but to each other, as well.

Getting from this simple version to the more complex version may take a lot of work! But I’m hoping each step along the way would serve as better and better proofs of concept for the idea and general principles involved therein.

I’m looking for feedback, ideas, concerns, programming help, anything. Thanks for your valuable attention!


A great ambition I’d say, and very well described @jordan.kay.

The first order of business IMO is to design validation scenarios for such a system.
What would it look like if this were actually working?
How do we know its not overfitting to the task or just getting lucky?

My sense is that it’d be easier to use touch-based sensors rather than visual (to start at least) – since visual data is more non-trivial to even encode into SDRs (as I understand).

I find myself gravitating toward control task(s) of some kind, where the system learns to operate something on its own.

The task(s) should be complex enough to demonstrate non-trivial competency, which to me at least means:

  • multiple control movements at the system’s disposal

  • multiple and interdependent moving parts in the controlled environment (“plant”)

The learning process here reminds of a new human trainee, learning to become an air traffic controller for instance. He/she has to learn the dynamics of planes and crews, what they need to do and how much time and space they need to do it safely for all involved.

I think this objective (to control a somewhat complex system) can potentially show real robustness to the system, since there’s so much to learn and so many ways to screw up.

Early in the learning process the system would be clueless and dangerous, like a 16 year-old new driver who breaks abruptly, misses stop signs, cuts people off and bumps other cars when parking.
But when you get in the car w/them 2 years later all those incompetencies have receded, and you feel they’re a relatively “safe driver” on the whole. You have confidence that they could operate safely in novel scenarios, with unknown streets and traffic patters etc.

This kind of general confidence in system competency should be our holy grail IMO.
And I think this test environment design is the best place to start, since it make us put the rubber to the road.

1 Like

I would recommend an even more trivial task to start with. Can you design a thermostat that learns to recognize the effects of its own actions as distinct from changes in the environment due to external causes? Can you then engineer a feedback mechanism that allows the system to discover an optimal operating point with minimal guidance? I think there is more than enough challenge in just that task to be worthy of study.


I agree with you 100%. I’ve spent sometime trying to develop my understanding of environments that could be used to test a true Sensorimotor Inference Engine. I think a great ultimate goal would be if it could learn the dynamics of a system that is challenging to most humans. I think a great puzzle to get it to learn on it’s own is the Rubik’s Cube: very many interdependent parts. It’s entirely deterministic, which is great for a proof of concept where dealing with fuzzy logic is difficult, and it can easily be modeled in code. The state space of a Rubik’s Cube is too big to be done naively (can’t memorize it), so you have to have multiple agents working together to understand it. It seems like a good goal for this system.

But in the meantime, for interim development we may as well use any/every environment in the openai gym. I’ve already made 3 environments to test against besides the ones that come with it: an extremely simple one (a number line), a pretty similar one (a Rubik’s cube with only 2 colors instead of 6) and a regular Rubiks Cube. they are in the “envs” folder in the repo.

It’s easy to test the naive version of the sensorimotor inference engine (naive being basically a database with a pathfinding algorithm), but with the sensorimotor autoencoder, that becomes a bit more difficult to test.

I think what you basically have to do is watch the structure that the autoencoders make and then implant your own representations at the top of the hierarchy. That may sounds a bit creepy but imagine your had neurallink implanted all over your brain, connected to a super computer that learned which cortical columns were your highest level of the hierarchy for every context. It could essentially implant ideas at that highest level and the idea would flow down producing behaviors that bring the idea about.

I think you’d have to do something similar in order to tell the sensorimotor autoencoder what state you want it to bring about in the environment. Obviously, this is something we have to think more about. I’m glad you brought it up!

1 Like

Actually, to reduce the amount of complexity required for this system to work I assumed it would be best to unleash it in strictly deterministic environments (environments where it is the only actor upon the system).

This seems like a simpler approach, and sufficient to build a Proof of concept, even though it may only be useful in a very limited context. I think we (meaning the industry of AGI) could expand it to include non-deterministic environments later. Probably through some kind of evolutionary process (putting multiple of these environments in a cage and seeing who wins).

As I see it there are a few ways to categorize environments: how large they are, how complex they are, and if they’re deterministic or not.

Small, simple, deterministic environments are one one side of the environment-complexity-spectrum and could be handled by a naive (memorizing, with no AI components) version of the sensorimotor engine. Large environments, that have a huge statespace, that are complex, (that have huge transition space), that are deterministic lie near the other end of the spectrum. A Rubik’s cube is an example of such an environment - a complicated puzzle. I think that should be the goal of this sensorimotor proof of concept idea.

I see introducing another player in the env as the last step in making it more complex, and I’d like others to figure that piece out.

1 Like

Wow, this is a great deal of work!

I should find myself more time to experiment with it! How could I run the code?


I’ll put in some documentation on that tonight. I have created the environments and the naive sensorimotor version, but haven’t really started on the sensorimotor autoencoder because I’m no expert when it comes to building neural networks. That’s where I could really use some help!


The Naïve Agent demo is done, follow the “Getting Started” instructions in the readme.md :slight_smile:


You forgot “attention.” Motor behavior, high order sequential inference, and attention must all seamlessly flow out of the modular and homogeneous network you’re describing. At the highest level of abstraction, what’s the mathematical operations that characterize your data set? The foundation of SDR is set-theoretic; personally, I think HTM resonates with the three attributes I mentioned even at the most fundamental level. Just briefly speaking, discrete operations just seems more correct than do theories that prioritize statistical or more continuous math operations.

1 Like

Agreed, those must be considered I think. How would you integrate the peonies principles? What would you change so they are embodied? I think getting down to set-theoretic data operations is ideal. The data will always reflect that no matter the environment