Hi y’all! Wanted to share a recent proposal for building object-centric representations in ANNs from Geoff Hinton that has caused a fair amount of discussion in certain ML circles. I bring it up because, at least at the surface level, it seems to converge on pretty similar ideas to the Thousand Brains theory.
.
“How to represent part-whole hierarchies in a neural network”
Hinton also has a recent ACM IKDD talk with a nice walk through of the ideas.
Some key similarities of note:
Organized into many many structurally similar columns, organized topographically
Each column transforms inputs according to its own learned coordinate frames, enabled by positional codes and “neural fields”
Projections between adjacent layers within a column, making top down and bottom up predictions
Local voting (in the form of “attention”) to build agreement between columns
Builds multiple maps of the same object, with a composition of part-whole maps even within the levels of a single column
Would love to hear what folks here make of this! Note that the author is not aiming for biological plausibility per se, but nevertheless converges on what I think are familiar concepts.
I think it’s an exciting direction and logical next step from capsule networks. One major point of difference from Numenta models is that Hinton is explicitly trying to model a parse tree.
Digression:
Personally I love the notion of a parse tree for visual perception, because it allows us to apply the Chomsky hierarchy to biological systems. As Dileep George has pointed out, both CNNs and frogs are basic pattern matching systems that fall prey to “adversarial examples”. Why? I hypothesize it’s because they are finite state automata attempting to recognize data that belongs to a higher rung of the Chomsky hierarchy. Symmetric, recursive, compositional structures cannot be expressed by regular languages or recognized by finite state automata.
Anyway, GLOM has a dynamical systems flavor, as is clear from Hinton’s reference to 2D Ising models. This takes him a step closer to Friston, et al. There is also overlap with recent work that emphasizes the power and generality of local interactions.