This is based on the idea that grid cells represent location ambiguously alone, but the combination unambiguously indicates the animal's location. It is also based on "The hippocampus as an associator of discontiguous events" (Wallenstein et al.)
In the current model of object recognition, unless I'm mistaken, features are discrete. For example, the virtual fingertip always touches each feature in exactly the same way, producing the same sensory input each time. However, you don't usually do so. When you touch a feature, you can recognize it no matter which part you contact. Features probably tend to produce a similar pattern of pressure when you touch different points a few millimeters away, but any more than that probably produces a drastically different pattern. So the model needs to produce a similar representation of the same feature regardless of the exact point it touches.
I'm not sure how to solve this problem in a scenario where a fingertip pokes various points on the object, but I have a possible solution for a different scenario. Let's say the fingertip is dragged across the surface, producing a sequence of pressures. To keep things simple for now, let's say the fingertip is always dragged in the same direction, always producing the same sequence for the same object. I will address the issue of different directions later.
In this scenario, the same feature always produces the same sequence of pressures. So as it drags the fingertip across the surface, the portion of the sequence for a given feature is always the same. By recognizing portions of the entire sequence, neurons can represent features almost as if they were discrete. If neurons recognize random portions, they won't necessarily respond to a neatly selected portion of a surface which you would call a feature. However, by learning to recognize common sequence portions, they can reliably respond to common features, portions of features, or formations of features. A quick sketch of portions of a surface neurons might respond to:
If the neurons respond to partially overlapping portions, they can produce an allocentric representation. At any given moment, neurons responsive to adjacent (or nearby) features will be on. As a result, the sequence of active neurons indicates the relative positions of each feature. (Sorry, I'm really foggy on some details so the following is probably really confusing.) One way to use this sequence to produce an allocentric representation is to treat the object like a hierarchy of adjacencies and build adjacency contexts into contexts of contexts and so on, eventually putting each feature in context of the relative positioning of all other features. For a given neuron or feature, its direct adjacencies are simply the neurons which have overlapping portions and are therefore active at the same time, at least for part of the duration of the given neuron's activity. Its slightly more distant neighbors can be found by looking for the features which are adjacent to the directly adjacent features. The process repeats to find the relative positioning of more distant features.
I'm not entirely sure how to produce a representation which is invariant to the direction in which the fingertip is dragged. If relative positioning is only based on overlapping sequences (adjacencies, essentially), then there are only two issues. The first is that it must respond to each feature in the same way regardless of the direction in which the fingertip moves, since the direction changes the sequence. The second issue is that the fingertip might not always contact the same points on each feature, even if it moves across it, because features aren't one dimensional. To solve these issues, neurons probably need what are essentially place fields on objects. So whenever the fingertip moves across a certain area on the surface, the neuron responds, regardless of how it moves across. However, these might not actually be issues, since the L2/3a model works by reducing ambiguity of possible features on the object. Even a direction-sensitive representation might be able to reduce ambiguity just as much as a place field representation.