It seems you are approaching motor planning from a human macro planning perspective. A movement is a practiced action that can be emulated. You have watched a fumble and learned to make the fumble move.
Nature has no such bias. In your brain, in the multiple interconnected maps, the token of information is multiple levels of automatic parsing. Each sensory stream gets this treatment. Each planed movement is formed with layers of interacting maps - from the desired action to motor planning and interaction with your sense of vision and vestibular system. In each of these areas, the experience of life builds models of the world and its physics.
An interpretation of the world passes through the sensory stream and a “what & where” stream of interpretation tears it apart. Your nonverbal half of your brain is a wizard at geometry and parses action as well as you parse a stream of words. The pairing of elements, actors and objects, value, and the relationship between objects is a level we seldom see mapped in a human-written program.
I will leave it to you to describe how the tokens of people, objects, and motions moved you to swing a cello case you are carrying to put under the falling iPhone; where is that represented in the brain and what drives actions from that internal model of the perceived world.
I think it is unlikely that you saw anyone catch something with a cello case before but at that moment it was just an object under the influence of your agency. So - if you see something of value falling you effect to interpose something to brake the fall. It may be your hand or some part of an object you are carrying. Note that this usually happens so rapidly that there cannot be many stages of activity - it’s hardwired to run in parallel.
If I had to describe the basic action plan is would be (starting with the situation already in play) - high-level action - catch/rescue (falling/danger+child/object). with falling “colored” bad, Object “colored” good. World parsing has already determined the locations of the actors and relative velocity. A brief constraint-based access of the physics of the parts shows that this extension-of-body-object can be manipulated under the child/object by manipulating the body system. The motor planning was already entered into the motion parsing system as soon as it is accessed/considered and the basic SAVE instinct lowers the action potential gate in the basal ganglia - you start to move almost before you even realize it is happening.
The creation of an effective AI will involve contemplating system design on these interactions and the effect on the final product. The predictive parts of HTM goes a long way to seeing how a chunk of neural tissue decides what to move of its bits and pieces to stay alive. HTM needs some tuning but it matches a laundry list of features that MUST be present to match up with the known properties of the brain. It still needs to work out the interface between the oscillating systems of the subcortical structures and the wave-based processing of the cortex.