Learning to Act by Predicting the Future

We present an approach to sensorimotor control in immersive environments. Our approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream. The cotemporal structure of these streams provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment. The model is trained using supervised learning techniques, but without extraneous supervision. It learns to act based on raw sensory input from a complex three-dimensional environment. The presented formulation enables learning without a fixed goal at training time, and pursuing dynamically changing goals at test time. We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.



If you have a system that does dimensional reduction from a sensor stream to a measurement (controller input) stream and the system has short term memory it probably will learn to move around to get better total information about its environment into its short term memory.
Okay, I’ll go read the paper later when I have time.

Okay, I see. The measurements are not derived from the pixel data. However they are intermeshed with the pixel data after it has undergone smart dimensional reduction with a deep network. I’ll think about it some more. I guess there are degrees of dimensional reduction you can do on pixel data from passive Fourier transforms or random projections, to single layer networks all the way to deep convolutional networks.


If you say the brain is mostly composed of predictive memory then the paper definitely fits with what Numenta is doing. What do you need? Predictive memory, some goal functions that tell the animal whether it’s doing well or not and some way of searching through the action space to find the best thing to do. As an alternative to deep networks or Numenta predictive memory I think you might get away with using high density single layer networks for prediction. I presume the OP is with Intel in which case they should get on with creating the 100 Peta ops/sec chips I suggested to them:

I was experimenting today with both a binarized activation function and binarized weights for single layer nets and it worked out fine:

There is this related paper but it is about deep networks:

Try ternary signals (-1, 0, 1) - they could still be considered binary if you use differential representation, and don’t require any multiplications. The issue is the range of quantization operation. It most likely needs to be scaled on per layer basis. Also, in my experience, activations are more sensitive to quantization than weights. Finally, backprop through hard threshold is tricky - try different estimators. If you’re interested in this research, there have been a few hundred papers [1] published on this topic since that breakthrough Courbariaux paper you linked to.

[1] http://www.arxiv-sanity.com/search?q=quantization

I have a quibble with this. Pyramid cells are excitatory only. Zero and one. How nature deals with this is to offer one that signal zeroes in the encoder stage.

This may seem like a minor difference but we are trying to stay with biological plausible if at all possible with HTM systems.

This is just about converting real values to (-1, 0, 1) instead of to (-1, 1). So you simply add (“don’t change connection strength”) state.

Could you use your your approach without training on a goal? In other words, could the goal that you train on actually be ‘create the best predictions of the future’? It seems you should be able to train the agent agnostic to any state of the environment, then once it understand the environment provide a state-goal and it would be able to put the environment into that state. From how I understand your paper (and I don’t understand it well), the prediction at the end is only used to select an action always in relation to some other goal but I wonder if that goal could be self-referential, that is; to increase the variety and confidence of correct predictions?

This is not my paper (I just post interesting stuff when I think it’s relevant to what Numenta is doing).

“Predict the future” goal is useful to build a world model, and a neural network should probably be trained for it before pursuing higher level goals (e.g. planning). Predicting the future falls under “self-supervised” learning category. Here’s Lecun explaining some of the challenges. If you’re interested in this field, check out these resources.