What is the clever trick Marcus Lewis came up with to represent the position of the logo on the coffee cup?
If the attended object is the coffee cup then L4 represents the features of the coffee cup (including the logo) and L2/3 represents the stable object representation of the coffee cup. L6a has to store the position of those features but in what space? If it’s an allocentric representation then L6a must store the positions of all features of all stable objects in order to relate to each other, right?
Purely based on my own intuitions, so take with a huge grain of salt (no doubt very wrong!) Numenta is light years ahead of me in this theory, but I’ll share my own thoughts on this circuit. I am definitely aware that these ramblings deviate from what Numenta believes is going on, so this is just me trying to justify things that I believe must exist in the circuit which do not yet seem to be in the current theory (as I understand it).
L4 is taking sensory input, and there are undoubtedly a large number of ways that the same feature could be sensed, depending on the heading of the sensor when it sweeps over the feature. For example, if what is sensed at a given point on an object is registered as a horizontal edge when the sensor is heading north, it would be registered as a vertical edge if the sensor were heading west.
So my thoughts are that L2/3 is pooling the different ways that input for a given point on an object can be sensed. Thus what it is representing are stable representations of the pooled senses from different headings (most obvious source for the heading signal being L6a, based on the diagram). Note that I believe the heading signal is equivalent to head direction, and is from the perspective of the sensor (not the head or body), and is specific to different classes of semantically similar objects.
The representations from L2/3 would then need to be pooled with locations, presumably in layer 5 (the most obvious source for the location signal being L6b, based on the diagram). Note that I believe the location signal (like the heading signal) is from the perspective of the sensor and is specific to different classes of semantically similar objects. L5 would be where the stable concept of the “coffee cup” (or “logo” or whatever attention has locked onto) would reside.
BTW to clarify what I mean by heading or location being specific to different classes of objects, Geoffrey Hinton’s demonstration of perception related to a sliced tetrahedron is a good demonstration of this concept in action.
Realized I never circled back around to answer this specific question. Positions are represented by grid cells. Grid cells are logically organized in “modules” which are repeated over an area like tiles. Multiple such modules all represent the same area in slightly different scales and orientations. For a given activation, the point where these different representations overlap is what defines a specific position.
The coordinate spaces depicted by the grid cells are biased by the input representations. This is why in the diagram, the connection between the input and locations layers is depicted as a bi-directional arrow. I believe the differences should be directly related to the semantics of the object. Thus two objects that share a lot of semantics should have equally similar coordinate spaces, while two objects that do not share semantics would have dissimilar ones. This would allow after sensing enough features of a object that has been previously learned, the coordinate space to be recalled and suddenly know not only what object is being sensed, but where specifically on the object as well.
I believe the same applies to the head direction cells and its associated input layer (though that is not part of the official theory currently)