https://www.biorxiv.org/content/10.1101/2022.01.20.477125v1.full.pdf
"Hinton and colleagues [19, 10, 6] have explored a class of networks called Capsule networks which use a group of neurons (“capsule”) to explicitly represent not only the presence of an object but also parameters such as position and orientation. More recently, Hinton [5] has proposed an “imaginary system” called GLOM to overcome some of the limitations of capsule networks. Independently, Hawkins and colleagues [15] have taken inspiration from neuroscience, specifically cortical columns and grid cells, to propose that the brain uses object-centered reference frames to represent objects, spatial environments and even abstract concepts.
What has been missing is a scalable framework that solves the following problem: how can neural networks learn intrinsic references frames for objects and parse visual scenes into part-whole hierarchies by dynamically allocating nodes in a parse tree?
Here we introduce Active Predictive Coding Networks (APCNs), a class of structured neural networks inspired by the neocortex that address this problem using hypernetworks [4] to learn and dynamically generate parse trees from images.
At each level of a hierarchy, the APCN model uses two embedding vectors, one to represent the current “state” denoting an object/part, and the other to represent the current action denoting the position (or more generally, the transformation) of the object/part. Nonlinear functions (implemented as hypernetworks [4]) are used to map these vectors to lower level state transition and action functions, which act as “programs” to parse various parts/sub-parts via sequences of sampled locations/transformations. This process can be repeated for an arbitrary number of levels."