Learning to Act by Predicting the Future


We present an approach to sensorimotor control in immersive environments. Our approach utilizes a high-dimensional sensory stream and a lower-dimensional measurement stream. The cotemporal structure of these streams provides a rich supervisory signal, which enables training a sensorimotor control model by interacting with the environment. The model is trained using supervised learning techniques, but without extraneous supervision. It learns to act based on raw sensory input from a complex three-dimensional environment. The presented formulation enables learning without a fixed goal at training time, and pursuing dynamically changing goals at test time. We conduct extensive experiments in three-dimensional simulations based on the classical first-person game Doom. The results demonstrate that the presented approach outperforms sophisticated prior formulations, particularly on challenging tasks. The results also show that trained models successfully generalize across environments and goals. A model trained using the presented approach won the Full Deathmatch track of the Visual Doom AI Competition, which was held in previously unseen environments.



If you have a system that does dimensional reduction from a sensor stream to a measurement (controller input) stream and the system has short term memory it probably will learn to move around to get better total information about its environment into its short term memory.
Okay, I’ll go read the paper later when I have time.


Okay, I see. The measurements are not derived from the pixel data. However they are intermeshed with the pixel data after it has undergone smart dimensional reduction with a deep network. I’ll think about it some more. I guess there are degrees of dimensional reduction you can do on pixel data from passive Fourier transforms or random projections, to single layer networks all the way to deep convolutional networks.



If you say the brain is mostly composed of predictive memory then the paper definitely fits with what Numenta is doing. What do you need? Predictive memory, some goal functions that tell the animal whether it’s doing well or not and some way of searching through the action space to find the best thing to do. As an alternative to deep networks or Numenta predictive memory I think you might get away with using high density single layer networks for prediction. I presume the OP is with Intel in which case they should get on with creating the 100 Peta ops/sec chips I suggested to them:


I was experimenting today with both a binarized activation function and binarized weights for single layer nets and it worked out fine:

There is this related paper but it is about deep networks:


Try ternary signals (-1, 0, 1) - they could still be considered binary if you use differential representation, and don’t require any multiplications. The issue is the range of quantization operation. It most likely needs to be scaled on per layer basis. Also, in my experience, activations are more sensitive to quantization than weights. Finally, backprop through hard threshold is tricky - try different estimators. If you’re interested in this research, there have been a few hundred papers [1] published on this topic since that breakthrough Courbariaux paper you linked to.

[1] http://www.arxiv-sanity.com/search?q=quantization


I have a quibble with this. Pyramid cells are excitatory only. Zero and one. How nature deals with this is to offer one that signal zeroes in the encoder stage.

This may seem like a minor difference but we are trying to stay with biological plausible if at all possible with HTM systems.


This is just about converting real values to (-1, 0, 1) instead of to (-1, 1). So you simply add (“don’t change connection strength”) state.