Is this a good idea?

Please tell me what you think of this idea.

Through Jeff’s books and my research into what Numenta knows I have noticed several operating principles of intelligence, or even just attributes of it’s make up and have, for a long time, wanted to embody those principles in something I call a Sensorimotor Inference Engine.

Though it’s been on the back burner of my mind for years I think I’ve finally come up with an initial design that begins to take some intelligent principle into account.

I’m not attempting to build Nupic or instantiate the cortical algorithm. I’m just trying to use what I’ve learned combined with simple neural net technology in order to approximate a machine that maps a system and can control it, even if it’s in a highly unadvanced manner.

This is the Sensorimotor Inference Engine that I mentioned. Now, I already made a sensory motor inference engine - but a naive one. I’ll explain how it works and then you’ll be able to see what I’m trying to do to advance it.

The Naive implementation of the Sensorimotor Inference Engine works like this: the naive agent is in a continual feedback loop with the environment. The environment gives it sensory input and it gives the environment motor output. Every time the naive agent sees sensory data it saves that data in the database along with the action it chooses to take.

After it it has explored the environment by making mostly random actions, you can tell it to put the environment in any state that you want. It will look up in its database to see if it’s ever seen that state before and if it finds it it will search for a path from where it is now (what state the environment is in now) to that particular state. This path is a list of behaviors or a list of motor commands that it must output to the environment.

It’s a simple idea. I can now control an environment via the Sensorimotor Inference Agent without actually knowing how the environment works or even how to work it.

The problem is, of course, the agent that I build only works for really simple environments. Because it’s naive. It has no intelligence. It doesn’t do anything probabilistic and it doesn’t have any generalization. If the environment is too big to put into a database it cannot learn that environment.

Most environments are too big to put into a database. Even a simple 3x3 Rubik’s Cube is too big. I was able to train the naive agent on a 2x2 Rubik’s Cube because there’s only something like 3.6 million transformations. But not the 3x3.

So I wanted to make an intelligent version, a very simple, but intelligent version of the sensorimotor inference engine.

This is the design I’ve come up with so far. I figure you’re going to need an Encoder to simplify the environments so that they can be passed up a hierarchy. You’re going to need that encoder to exist on every layer of the hierarchy. You’re going to want every layer of the hierarchy to look pretty much the same. You’re going to need a predictor to know what next state you need to go to and you’re going to need higher layers in the hierarchy to tell you which next state they want you to go to. And you’re going to need to translate a state-to-state transition to a particular motor output. So I’m calling this the EPA circuit: Encoder, Predictor, Actor. It’s a way to wire up a few different models so that together they manage an environment the way you want them to.

So I’ve already made the “Naive” implementation of a Sensorimotor Inference Engine. If this implementation works it’ll be the “Simple Agent”, because I’m sure there is much more efficient and effective implementations that can be made. (I mean the neocortex is basically the most advanced implementation you could imagine).

If you’re interested in this, let me know what you think. I know almost nothing about neural nets so I need all the feedback I can get.


In the drawing are two … pipelines (encoder, predictor, actor). The repeating in the drawing is hierarchical, parallel (neighbors) or sequential (consecutive timesteps as in ilustrating RNNs in two timesteps of the same network) ?

What is the memory module?

How many neighbors? (fixed number or dynamically expanding)?

The “user” or “trainer” could be substituted by something akin to exploring/curiosity? I mean if memory module records each timestep, then it can report a “density” of predicted future observations, then a loop may “seek” an action towards least explored “territory”

Otherwise the system seems quite complex architecturally. My intuition is all these could be stitched onto a (relatively) simple sequence predictor aka transformer model. For a simple “insect” a too large (and sluggish) model might not be needed. See tinystories models.
The trick in using it is to figure out a means to build simple “phrases” from sequences of “observation 1”, “obs 2”, “obs N”, “goal”, “action”, “next step token”, “obs 1”, “obs 2”… .

Which needs a common vector representation for observations, goals (goal is simply a desired future observation just watermarked as “goal”), and motor actions. Each with its specific encoding, or encoder if that can be trained.

All that memory thing is already somewhat worked upon with vector databases.


The drawing shows the bottom layer of reality - the environment - above that is the first layer of intelligence - the circuit - it repeats again above itself to indicate that this circuit is (or might be) a simplified version of the “smallest unit of intelligence.” the SUI is a circuit that can be repeated horizontally and vertically.

If it is repeated horizontally, it’s acts as multiple units watching different parts of the environment, with some collaboration to coordinate their efforts (motor outputs). Horizontal repetition is not shown in the diagram, but implied as an option by the arrows coming in from the side of the diagram.

What I think is more important is that the circuit can scale vertically (in theory) and this is shown explicitly in the diagram. Indeed, this rendition of the circuit is a 2-leveled circuit. Vertical scaling means, higher level circuits, though they are almost identical to the lowest level, see a broader perspective of the environment. “Broader” in terms of space and time, but generally, most importantly time. In this way, higher levels indicate a long path in the state space of the environment, guiding lower level’s intricate manipulation of the environment to conform with higher levels predictions or, in their context, goals.

So to answer your question specifically it is not consecutive timestamps of the same circuit.

1 Like

The memory module, described in the text of the diagram (near the bottom) is essentially a database. We can think of it as short term, explicit memory used to train the models correctly. It is not intelligent.

1 Like

As many as you like. It could be made to be dynamic, but I since I’m focused on producing the simplest possible version of this I have to say, no not dynamic, I have to reduce complexity at all costs right now.

Horizontally you could have many neighbor circuits, but you would do so in order to break up the sensorimotor space of the environment. In other words each neighbor would see a portion of the environment. (this might be ideal if you have lots of overlapping circuits, such as 12 circuits looking at the 6 sides of a rubik’s cube, where no circuit watches one side exclusively).

But I’m not focused on horizontal scaling at this time. More important than breaking up the input space is, I think, scaling vertically, so that higher levels of circuit(s) can see broader areas of the state space. For instance, the lowest level can only translate this current state into some next state (one time step). The second level looks at 2-state pairs transitioning into some of 2-state pair, and the 3rd level can see 2 2-state pairs transitioning into some other 2-2-state pairs. So it’s binary exponential. 1, 2, 4, 8, 16, 32.

Consider the Rubik’s cube example, from any state you can (if you know exactly how) change it into any other state in 22 moves. Therefore, in theory, a well trained 6-level Sensorimotor Circuit could fully manipulate the environment of the Rubik’s cube.

1 Like

The “User” is the human that uses the trained agent. He does so by issuing high-level commands to the top level of the circuit, which get passed down in the form of predictions and generally guide the agent’s behavior (this circuit is mainly meant to solve the problem of using the trained intelligence). The user issues commands, in other words, in order to achieve his immediate goal - he wants the environment to change in a certain way.

The “Trainer” is probably an AI or program that issues commands to the environment in order to achieve the general goal of making the agent learn the environment efficiently - the trainer bot doesn’t care about the state of the environment, the trainer bot wants to help the agent learn how it works quickly.

So the Trainer would have the curiosity metrics, and would have full view of the mind of the bot. What it’s trying to do is reduce confusion in the models as fast as possible by seeking surprise, and minimizing free energy. (I think, perhaps the best trainer’s policy would be Karl Friston’s active inference).

But at it’s most simplest (which will be the first iteration) the “Trainer” will tell the agent to 1. never do something it’s done before (since we assume a deterministic environment) and 2. behave randomly.

1 Like

I don’t really know much about anything you’re saying here, but thanks for the advice. Yes, this circuit relies most heavily on the concept that a goal is merely conformity to a prediction from a higher level.

I’m not sure if you’re suggesting this or not but I don’t want humans to have to train this engine on every single environment so I’ve designed it this way in order for training to be automatic as well as the querying of the learned data (i.e. using the agent to manipulate the environment).


Since you want to make it automatic… how do you know which information is necessary or what is the essential parts in any information? Because the reason im asking is our brain filters information by eliminating less-energy-spiked-information and accelerating high-energy-spiked-information.


I think the predictor should be allowed to choose only on few options(initially). I see the predictor is the one who creates “order from chaos” which is essential for any model to work. Eg, human babies have limited option to learn that is they cannot see the world with greater quality or move the muscles with greater flexibility. This reduction of freedom is important so that the brain can form a narrative on only few things that it can later form many complex narratives. In your model first reduce the options like the teach the model on only how to walk (give two legs) then make it walk on ups and downs (now give hands& flexibility). I don’t know if im expressed it correctly. If you don’t understand think about this question “how can you know to jump if don’t know how to walk?”. There is a reason for earlier limitations in the human body. Let the predictor know what to do with the environment by gradually increasing the options or degree of freedom.


To have the agent train itself I think the ideal might be something like implementing Karl Friston’s minimization of free energy approach he called active inference. But I don’t really know. It’s definitely seeking surprise but it needs to understand the order and the patterns and to verify that patterns exist, so there’s a balance there, and I don’t know enough to know what the ideal is for sure. But whatever the policy is for training the agent, that policy should be carried out by the trainer bot, it can be automated.

The very first trainer bot I think is not an approximation of the ideal. I think it’s just a random walk. That will teach the agent, just not as efficiently as possible.

So I guess the answer to your question is, I’m not actually too focused on finding a optimal solution for that yet.

But I think it’s a good question and I’d like to compare it to what’s being done out there in the world of agents and environments today.

If you want to train an agent today you give the agent a reward for good behavior. Good behavior is goal-oriented behavior. In other words you train the agent to solve the Rubik’s Cube, you don’t train the agent to learn how to put the Rubik’s Cube into any position.

Now since selecting the goal and communicating that to the network is done at like run-time, not during training, in this case, that’s really the problem I’m trying to solve with the EPA circuit.

You know every environment is just a map of state spaces, and the actions that lead from one state to another. I’m just trying to make a neural network that learns the state space and a circuitry that allows us to request the path of actions from one state to another.


This resonates with me a great deal. In fact this is what defines my approach. I’m embodying this principle as I’m trying to develop the system. I think in order to train ourselves on learning how to make the best sensorimotor inference engine, we should set for ourselves the simplest tasks possible.

For instance we should first learn how to make one according to the appropriate principles that works in a:

  1. small environment, then in a
  2. large environment but a simple one, then in a
  3. small but complex environment, then in a
  4. large and complex environment, then in a
  5. small, simple, but non-deterministic environment, then in a
  6. small, complex, non-deterministic environment, then in a
  7. large, simple, non-deterministic environment, then in a
  8. large, complex, non-deterministic environment, then in
  9. the real world.

I already wrote a Naive Agent (no AI) implementation of the sensorimotor inference engine which can handle #1 and #3 (small, complex, deterministic environments). I’m hoping this AI implementation will at least do the same, but in a way that involves generalization instead of memorization, and which will allow us to learn how to better scale it.

Now as far as training the model on a set of behaviors rather than all possible behaviors, perhaps you’re right perhaps that’s important but I think that’s something that should be the concern of the trainer bot. The trainer bot, should not be a human because we wanted to automatically train itself.

Ultimately we wanted to map the state space, but if it can take some simplifying assumptions even if it means the path that it must Traverse is not as ideal as possible but if we can have it build a mental model of the state space of the environment that is much more simplified and still useful. I think that’s great.

A really really good example of this and this is why I always use the Rubik’s Cube example, is that by memorizing a certain set of moves and knowing when to employ them you can solve the Rubik’s Cube from any messed up combination it’s in. But this algorithm though it’s much more simplified than the ideal algorithm, is not too inefficient. The most efficient algorithm might solve the cube in 22 moves whereas this one might be 60 to 100. But because this algorithm is so much simpler it’s the one 99% of people learn.

I think finding those heuristics of an environment that are not perfectly accurate to it’s state space, but yet are sufficient will be a massive optimization over the kind of blind-learning approach that this initial design takes.


A different way of looking at this is to see how nature does it. While opinions vary on the role of phylogenetic older structure (nobody likes to talk triune brain anymore) the parts that correspond to what equips crocodiles are very powerful. It can explore, run a body to move, eat, fight, mate, and all those things a crocodile does.
Add in a cortex and it can learn the things a cortex learns while the hard-programmed parts do what evolution has programed in.
@Socradeez has referred to these as kick starters.


Can i get more details on your theory.

1 Like

This algorithm is very similar to one I had come up with for a toy self driving car. The amount of steps is much larger though, this car maps the sensory representation directly to an output but it uses this desired vs undesired future to learn the mapping.

I deleted the code long ago because I got frustrated with the fact that it coudnt do well on any other task. But I’ll try to make it again now that I know it was so dumb because I simply had refused to use gradient descent.


Interesting! Do you use TD-learning?


In terms of path finding, building from experience a vector database which stores, in overlapped form a triple vector:
(source point, destination point, middle point) would be useful?
Even if agent itself learns to search & navigate only a small nearby space (within 1-2 steps), long distance targets can be reached by recursive planning:
“I’m in A want to reach B, database recommends intermediate step C”
“I’m in A want to reach C, database recommends intermediate step D”
and so on, till we get to a point Z that is closest to A but towards B

Regarding rubik cube as an example… meh it is too algorithmic/digital, I prefer more… analogue spaces with which all animals have to deal with.

e.g. what kind of task?


I think from an animal perspective, the large, complex and non-deterministic are treated the same way: stuff I cannot fully record or remember.

For simplification I would pick one robot virtual environment (there are quite a few) such it is sophisticated enough (large, complex, non-deterministic) to have good transfer to a real world corespondent and “grow” an agent within it. Learning to make baby steps from “home” position and then return can count as simpler/smaller/more predictable environment.


Not in the way you would normally, the credit assignment extends only one timestep back into the past.


I haven’t envisioned it with that but I maybe you could.

1 Like

That’s actually the data structure in using for my naive agent, yes. And my conjectured hybrid agent. The one that I’ve outlined above, however also has that same structure, but embodied in neutral nets.

1 Like