Cortical Column Networks: Learning object identity and pose representations from pixel observations

Drawing inspirations from the Thousand Brains Theory on Intelligence, guest speakers Tim Verbelen and Toon Van de Maele from Ghent University share their recent work on learning object identity and pose representations from pixel observations.

➤ Paper: [2108.11762] Disentangling What and Where for 3D Object-Centric Representations Through Active Inference
➤ Blog Post: Cortical Column Networks - The Smart Robot
➤ For more information on The Smart Robot: https://thesmartrobot.github.io/

Abstract
Although modern object detection and classification models achieve high accuracy, these are typically constrained in advance on a fixed train set and are therefore not flexible enough to deal with novel, unseen object categories. Moreover, these models most often operate on a single frame, which may yield incorrect classifications in case of ambiguous viewpoints. In this paper, we propose an active inference agent that actively gathers evidence for object classifications, and can learn novel object categories over time. Drawing inspiration from the Thousand Brains Theory of Intelligence, we build object-centric generative models composed of two information streams, a what- and a where-stream. The what-stream predicts whether the observed object belongs to a specific category, while the where-stream is responsible for representing the object in its internal 3D reference frame. In this talk, we will present our models and some initial results both in simulation and on a real-world robot.

1 Like