Viviane Clay on Sparse and Meaningful Representations Through Embodiment - October 20, 2020

We invited guest speaker Viviane Clay from the University of Osnabrück to talk about her research on learning sparse and meaningful representations through embodiment. In the first part, she explores how these types of representations of the world are learned in an embodied setting by training a deep reinforcement learning agent on a 3D navigation task with RGB images as main sensory inputs. She then discusses how the model learns sparse encoding of high dimensional visual inputs without explicitly enforcing sparsity, and what the possible hypothesis for this phenomena are.

In the second part, she covers her undergoing work on extracting concepts by identifying a minimal set of co-occurring activations that represents an object in a curiosity-driven learning setting. These concepts can be used to improve sample efficiency and performance in downstream tasks, such as object classification or the full reinforcement learning task.


One thing I learned from this session is the revelation that the “third person” view is just as relevant as the “through the eyes” view. When you think about it, your eyes are “way up here” in relation to your feet and hands, and yet you learn to do the activities you do with this frame of reference. All long as the view is stable, has some fixed relationship to the agent, and provides feedback on your manipulation activities it should be useful for navigation and learning.

We humans have this tight relationship between our vestibular sensors and our vision sensors. This parks our sense of self somewhere behind our eyes.

I wonder if this is also true for the visually impaired?

The head is at the top end of the reverse kinematic chain where the other end is “out there” to our posture control system. I would expect the head to be the seat of the sense of self for all of us.