I’m trying to make a (proof of concept) Sensorimotor Machine: something I’ve called a sensorimotor inference engine in the past…
The difference is now I’m thinking it might be possible to make it out of existing technology.
Jeff is always talking about defining the principles inherent in intelligence as the most important fundamental first step towards intelligent machines. I 100% agree.
And I wonder, can we find a mashup of existing technologies that allow us to approximate these principles?
For instance, we have a recent example of a combination of simple technologies that gave us exponential improvement because they embodied special principles: Generative Adversarial Networks. The principle they embodied: a feedback loop that produced an arms race.
I’m hoping we can do the same towards the goal of making a sensorimotor machine: a machine that naturally (without external rewards, direction, or influence) learns how to control its sensory input through motor output.
I believe we can more or less define what kind of principles are essential to the task (these are straight from the HTM playbook):
- Modularity - the machine’s mind should be a network of components.
- Homogeneity - each component of the machine’s mind should be similar to all (or many) other components at the most granular scale (like how cortical columns are similar).
- Compression - of representation (ideally into SDR-type representations) is necessary in order for components to communicate with each other.
- Spatio-Temporal Representations - The compressed information must represent an understanding of Spatio-temporal patterns in data, the transition from one moment to another at least.
- Protocol Variation - ‘compressed’ representations should be similar to nearby nodes, but not exact. Computer systems typically speak a precise common protocol, we want a system where nearby components speak nearly the same language, which changes slowly the further you move from any component and these changes are a reflection of the changes in patterns experienced by the nodes.
- Composability - each granular component should easily interact with other components.
- Hierarchy - composability should allow the network to tend towards forming hierarchical interactions.
- Invariance - further away nodes should represent larger-scale structures that should intern produce more highly invariant representations (spatially and temporally).
- Others? - there are probably other principles we’ll find valuable that are as yet unknown.
Can we combine existing technology in such a way as to embody these principles so we can create a simple, generalized Sensorimotor Inference Engine proof of concept? I think so, and here’s where I am starting:
Autoencoders produce a compressed representation. They can be made to represent the transitions from one timestep to the next rather than mere spatial patterns. They could be modular and homogeneous, they can send their compressed representation as input to one another. They can therefore form hierarchies and with a little extra information (such as how many inputs does this other node and I have in common) they can send more compressed (and I think, therefore, more invariant) representations to ‘further away’ nodes.
In other words, what I’m suggesting is that by wiring a network of autoencoders up in a special way, (to each other and to an environment they’re interacting with) we may be able to embody these principles of intelligence and therefore produce a prototypical design of the most simple AGI conceivable: a Sensorimotor Inference Engine.
If my intuition is correct, we could create something called a Sensorimotor Autoencoder.
Here’s a diagram of an extremely simple version of what I’m describing:
Notice how the compressed representations for T0 and T1 are fed as inputs to all other neighboring autoencoders (this is a fully connected network). Also, notice how each autoencoder is receiving different input about the environment, it isn’t shown here but they would receive overlapping information about the environment, this would also be true of their motor output.
Of course, this is extremely simplified. the ideal version of what I have in mind would probably have several autoencoders per ‘node’ in the network all receiving basically the same input, but compressing it by varying rates in order to send the correct invariance level to other nodes. Furthermore, each node would have to have the ability to modify its connections, not only to the environment but to each other, as well.
Getting from this simple version to the more complex version may take a lot of work! But I’m hoping each step along the way would serve as better and better proofs of concept for the idea and general principles involved therein.
I’m looking for feedback, ideas, concerns, programming help, anything. Thanks for your valuable attention!