I’m right there with you Marcus. Over the past couple of weeks, I’ve started to think in almost exactly the same way with what you presented in the meeting. I’m just surprised that Jeff couldn’t see it as well. Or maybe he was starting to get there towards the end of the meeting when he mentioned voting between the columns.
The concept of a cup is a very high level construction. At the lowest level there are simply persistent features that have observed relationships to other features, and potentially predictable behaviors with respect to one another and with respect to the actions/movements of the observer. It is these low level features and their orientations with respect to a specific sensor that each column needs to represent. Each column then builds up a representation of the expected behavior of that feature regardless of what object it is attached to. We have a lifetime of experience with observing these features (in all of our sensory modalities). Objects with persistent properties and predictable behaviors can then be inferred and eventually recognized by the combination of features that are observed over time, or are observed simultaneously from multiple sensors.
In the same way that we can classify groups of objects by their similarity (in appearance and/or behavior), columns can also learn to classify groups of features by their similarity. That means it is no longer necessary to observe every object and all of its features from all possible orientations and/or distances. A column should be able to generalize the appearance and behavior of the properties of an unknown object from the previously learned appearance and behavior of other objects with similar features.