Has anyone tried reproducing the location layer experiments (such as these) from htmresearch on 3D objects? Are there any conceptual reasons it should not be possible (e.g. by simply stacking the location modules on top of each other in lattice-like fashion and adding another coordinate to the locations), and if so, what problems need to be solved? I’ve had time to only quickly skim the code, but have no deep understanding of it yet, so an expert guidance would be very appreciated. Thanks!
There is no dimensional limitation in the way that it is implemented and you could have arbitrary dimensions to the grid encoding and add extra coordinates to the motor commands. One interesting aspect of grid cells in the brain is that they generally appear to be 2D representations. There is evidence of 3D encoding, but it is unclear if there is every a truly 3D representation rather than a projection of the three dimensions onto a 2D encoding space. Given that most animals learn environments that require a much higher capacity in two dimensions and much less resolution in the third, this strategy makes sense since having a truly orthogonal third dimension in the grid cell encoding could likely be inefficient.
There is a lot of literature on the dimensionality and representational capacity of grid cells but a lot is still unknown, including how exactly 3D representations are encoded, if and how these representations are learned while simultaneously incorporating sensory anchoring (in fully online algorithm), and how the suspected analogous representations in neocortex operate. We will be sharing more of our theory on this subject in the coming months.
For me, 3D is 2D with perspective. Detection of a sub feature such as a corner in the
middle of the image and detection of another corner in lower edge would activated
different detectors. The algorithm to make them equal would be derived from old learned
temporal sequence to move the lower sub feature “corner” into the middle
of the image. Such as, for example, the shift of the sub feature during the movement of the eye or head.