Here’s my take on prediction in brains/machines. I think it’s partially orthogonal to most of what’s been said here.
Prediction is useful for anticipating the outcome of your actions and the actions of others (and the passive dynamics of the environment). But that kind of forward modelling is nowhere near the most useful thing about prediction.
Prediction is useful because it forces you to form better representations.
Ultimately, an agent is interested in choosing actions that optimize some sort of utility. Behaviour is the name of the game. Learning behaviour can happen directly from raw sensory input (see DeepMind’s success with model-free RL) but that takes an exhaustive amount of experience. For an agent to learn quickly, it needs to have representations available that expose the true latent causes of events in the world that are useful for mapping directly to actions. That’s the most important reason to try to predict the future: it provides a self-supervised learning signal that allows you to form good representations of the true underlying causes of your sensory input, and those representations are good for learning to act.
It’s worth noting that currently, HTM does not form good representations. It forms quite poor representations because the temporal memory splits representations by context. But temporal pooling, and sensorimotor pooling, have the potential to change that once people figure out how to do it right.
For an agent to learn quickly, it needs to have representations available that expose the true latent causes of events in the world that are useful for mapping directly to actions.
That’s a helpful summary of the problem, thanks Jake. Would you mind helping me explore an example?
Let’s say you’re standing next to someone who is holding an expensive vase. Without warning, you notice that they are fumbling, losing their grip on it and it’s going to fall, and that it is probably within your reach. Your instinct is to reach out and grab it, but forget the specific action for now (a professional footballer may instinctively put their foot out instead!)
Do you think that this reaction necessarily involves thinking about gravity, the fragility of the object and projecting forward in time to imagine the breakage? Or could there be a more direct shortcut from the fumbling action to your reaction, created from past experiences of droppages?
As long as you are conscious, you have a representation in your brain of the objects immediately surrounding you. In this case, a person and a vase. Subconsciously, you know that vases are expensive and fragile. You also know that humans are unpredictable, which leads the subconscious to prepare for a possible future where the vase is falling. I think representations of objects, actions, behaviors, etc. are all stored in the same fashion in the brain, and can be linked together by associations. All objects have behaviors, or at least behave in typical ways when they are interacted with. For example, vases don’t do much of anything except hold flowers and break.
No, because gravity is a constant in everything you’ve ever done. You’ve learned every object and all object behaviors with gravity applied. There is no reason to ever think about gravity (unless you’re a pilot or astronaut).
A fumble is a tangible object. We both know what it is. The mechanics of the fumble are unimportant and contextual. When people fumble something, it falls toward the ground. Yes I do believe this idea is built over time by witnessing hundreds possibly thousands of fumbles (and of course fumbling things yourself). And I also think there are many responsive behaviors that can be played in response to any number of fumbles, but that behavior is always custom crafted for the specific situation.
The direct shortcut is that, through having learned to predict the details of fumbles throughout your life, you have a very reliable representation of the general category of “fumble events”. This allows you to generalize from the abstract representation of a fumble event to an abstract general motor response, without having to think of the specific details of the fumble in particular. And of course the low level motor controllers can figure out the exact motor command to send to accomplish the specific implementation of the general “catch the object” response.
These abstract “fumble event” and “catch the object” representations are formed originally, I contend, because they are helpful in predicting what will happen across large variations in fumbles and catches.
Somewhat relevant to this conversation, especially about representation:
We talked a lot about how things are represented in the brain, especially about how objects have behaviors and associated motor commands tied very closely to those object representations. This is my favorite interview so far. I think this stuff is really interesting, and I encourage anyone trying to understand representation in the brain to have a watch.
It seems you are approaching motor planning from a human macro planning perspective. A movement is a practiced action that can be emulated. You have watched a fumble and learned to make the fumble move.
Nature has no such bias. In your brain, in the multiple interconnected maps, the token of information is multiple levels of automatic parsing. Each sensory stream gets this treatment. Each planed movement is formed with layers of interacting maps - from the desired action to motor planning and interaction with your sense of vision and vestibular system. In each of these areas, the experience of life builds models of the world and its physics.
An interpretation of the world passes through the sensory stream and a “what & where” stream of interpretation tears it apart. Your nonverbal half of your brain is a wizard at geometry and parses action as well as you parse a stream of words. The pairing of elements, actors and objects, value, and the relationship between objects is a level we seldom see mapped in a human-written program.
I will leave it to you to describe how the tokens of people, objects, and motions moved you to swing a cello case you are carrying to put under the falling iPhone; where is that represented in the brain and what drives actions from that internal model of the perceived world.
I think it is unlikely that you saw anyone catch something with a cello case before but at that moment it was just an object under the influence of your agency. So - if you see something of value falling you effect to interpose something to brake the fall. It may be your hand or some part of an object you are carrying. Note that this usually happens so rapidly that there cannot be many stages of activity - it’s hardwired to run in parallel.
If I had to describe the basic action plan is would be (starting with the situation already in play) - high-level action - catch/rescue (falling/danger+child/object). with falling “colored” bad, Object “colored” good. World parsing has already determined the locations of the actors and relative velocity. A brief constraint-based access of the physics of the parts shows that this extension-of-body-object can be manipulated under the child/object by manipulating the body system. The motor planning was already entered into the motion parsing system as soon as it is accessed/considered and the basic SAVE instinct lowers the action potential gate in the basal ganglia - you start to move almost before you even realize it is happening.
The creation of an effective AI will involve contemplating system design on these interactions and the effect on the final product. The predictive parts of HTM goes a long way to seeing how a chunk of neural tissue decides what to move of its bits and pieces to stay alive. HTM needs some tuning but it matches a laundry list of features that MUST be present to match up with the known properties of the brain. It still needs to work out the interface between the oscillating systems of the subcortical structures and the wave-based processing of the cortex.
Thanks everyone for the replies to my scenario, they’ve been very useful and interesting.
Also, the mirror neurons topic from the Jonathan Michael’s interview was quite relevant, helps to show how observing fumble reactions and performing fumble reactions could contribute to the same action representations.
One thing though:
Aren’t these two quite different explanations? What would be the benefit of the subconscious preparing for that specific fumble if the abstract fumble -> response sufficed?
Is @rhyolight’s an optional optimisation because there was enough time to process the scene?