Discussion on Grid Cell Distortions and Reference Frame Transformations - May 17, 2021

In this research meeting, Marcus Lewis discusses the importance of explaining grid cell distortions, and to generate discussion he proposes a possible explanation, showing some results from an experiment he conducted. He hypothesized that an animal localizes by detecting distance from various points of boundaries and those points “vote” on the location. The weight of the votes is determined by nearness, and distortions occur when the animal’s idealized map differs from the actual environment. The team then discusses the hypothesis and raises further questions.

Jeff then explores the possible processes and mechanisms that underlie reference frame transformations in the neocortex. He describes a few problems with a previous hypothesis he proposed about rf transformations in the thalamus and further explores the role of the thalamus. He then suggests the relationship between rf transformations and temporal memory.

Papers from Marcus’ presentation:


Distortions are a natural result of fitting idealized parts onto actual environments.

I’ve been thinking of this a lot over the past few weeks. The folks in my study/research group will have heard me talking about filters and residuals.

The filters are the learned representations of objects/features. Over time, the object/feature representations modeled by these filters tend to take on some kind of weighted average of a specific feature or set of features that commonly appear together. (This might be related to principal components of a feature correlation matrix.) The residuals are the parts of the input pattern that don’t match the learned filters. My thinking of late has been focused on two things:

  • Does the brain use this residual information to discriminate between similar objects and possibly form unique representations for them?
  • Does the brain use some of this residual information as a means to store and/or access unique episodic memories?

This also naturally leads to some related lines of thought:

  • How does the brain know when/how to chunk features together?
  • Under what circumstances is it more important to focus on the residuals than on the filters? (i.e. How do we decide that we need to be more concerned with differences rather than similarities between an input pattern and our learned internal models of similar patterns?)
  • Are there separate pathways that processes the similarities and differences?

This then lead me to think about disparity maps. In computer vision applications, disparity maps are generated by overlapping two images (usually taken from stereo camera pairs) and subtracting one from the other. This allows a kind of depth processing by allowing one to quickly find areas of the images that are approximately the same depth (due to the locations having similar displacements due to parallax effects).

Now, to bring this back around to Marcus’s presentation. Could it be that the distortions in the rat’s grid cell receptive field have something to do with these disparity maps? Or perhaps there is some other mechanism by which the brain is contrasting it’s current input with some stored representation (or maybe even it’s prior inputs) and using those differences to key-in on the parts of the input field that are different (or changing) in ways that are recognizable and interpretable (e.g. associated with a specific kind of motion or behavior). Could these be what the grid cells are anchoring to in addition to recognizable features (or instead of in the case where there may be no recognizable features)?


Maybe different grid cells distort the environment differently, to match different environments. If all cells distort the environment the same way, that seems like it’d prevent trying different things to learn things (like how the cells in the spatial pooler each start with different random synapses).

If each grid cell has different distortions, maybe they’re trying/learning a bunch of different distortions to make the environment a triangle. Their firing fields are arranged triangularly and a triangle has the fewest sides so easiest to distort to make I guess.

(or make the environment multiple triangles because distorting a square into a triangle would require a lot of compression somewhere, but I’m not sure there are discrete environments so make overlapping patches of the world each a triangle)

1 Like

If this happens, would the brain have a need to store (at least temporarily) this residual information as long a the deviant similar object is kept in mind? Would that not create a new filter or set of filters?

I mean, if you detect a new but similar object (let’s say an apple with a hole in it), then your brain recognises it because it knows all the features old and new. Every imaginable feature would have a set of filters in your brain. Even if you would see a blue apple or a musical apple or a bouncing apple, none of this would be essentially new.

@Falco and I talked about this online on Sunday, but I’ll reiterate it here for those who couldn’t make it due to other commitments.

I think I have a solid grasp on how to create a system that can create learned representations for recognized spatial patterns (features, objects, etc.). I also have a pretty decent understanding of the existing HTM model for learning temporal sequences of patterns. However, I cannot shake the feeling that I’m missing something important.

Up until recently, I’ve been focusing on using the aforementioned techniques to recognize patterns and sequences of patterns. I’ve started to realize that the more times a sensed pattern is recognized, the more it’s representation begins to settle on the average of the inputs that match to it. While this permits a very efficient way of capturing the principal components of sensed and learned patterns, it does leave open a question of what, if anything, to do with the residual information. That is the part of the sensed input signal that is distinct from (orthogonal to) the stored representations of recognized patterns.

To be clear, I’m not talking about what to do with a novel input that is not recognized by the network. An agent could easily expand its dictionary of recognized patterns when encountering a new feature or object that does not sufficiently overlap or align with any previously learned patterns. In this case, the magnitude of the residual exceeds some threshold which triggers the learning algorithm to learn and store a new pattern, of which the current residual is the first exemplar.

What’s been nagging at the back of my mind is what to do with the information contained in residuals that do not meet the threshold for learning a novel input. To my mind, this information is the part of the sensed input that makes this moment unique. It is everything about the current environment that didn’t fit neatly into existing bins. The imperfections, if you will, with respect to the idealized forms that have already been sensed, recognized and passed on for further processing in other parts of the brain.

Up to now I’ve been presuming that this residual information is simply discarded as background noise. That may very well be true; however I’ve recently started to wonder if it might be possible to use the residual to fulfill some other purpose. Potential uses could be to provide a unique signal to the lower brain structures that allows them to disambiguate two very similar moments in time. Or perhaps as a random bit of color to the HC/EC that allows it to assign a unique identity to the memory before it gets correlated and stored with other long term memories.

Alternatively, there might be some spatial information encoded in the residual. In this case, it may be that the residual is the representation that encodes a spatial heat map of the negative space (i.e. the space not occupied by recognized objects or features). This would act sort of like a disparity map, but instead of subtracting common features from a stereo pair of images, you are removing all features that have been recognized and/or are being attended to.

This last bit might help to explain the figure-ground perception problem. The features that you are attending to get filtered out for further processing, while everything else becomes background. When your change your attention, you are focusing on the parts of the input that are currently salient, while everything else goes to background. The question remains: Is there information in the background/residual that the brain is using for other purposes even if it is not the current focus of our attention?


Sounds like you are getting yourself into philosophical territory. Maybe contemporary philosophy will be more useful than Plato. A first step might be to undo the spell of representationalism, Mark Bickhard could help there.

I think that the “residual information” is amplified and transmitted alongside the primary data. The way it would work is:

  • Through training, inhibitory cells learn the primary pattern. They then learn to inhibit the primary pattern.
  • When a stimulus is presented: the excitatory cells which respond to the primary pattern activate and are then immediately inhibited from further activity.
  • The residual information is not inhibited, so cells can still activate and transmit those details which are not part of the primary pattern.
    The result is that both the stereotypical primary pattern as well as any residual info gets transmitted at the same time.
  • To take it a step further: the primary and residual inputs could have very different magnitudes (as in firing rates), and the inhibition should do gain control so that both aspects are represented in the output with roughly equal magnitude.

The key points here are that inhibition can be triggered by specific inputs, and can precisely target specific cells.

1 Like

Indeed, I had been considering a similar mechanism for allowing multiple features to be detected in sequence rather than simultaneously.

Adjacent mini-columns are observing the same set (or overlapping subsets) of sensor inputs. Each minicolumn implements a different learned filter through its proximal dendrite connections to the input layer and temporal sequence memory through its distal dendrite connections. To this, we add a local inhibition mechanism to select a winner minicolumn and temporarily silence nearby minicolumns.

When presented with a spatially distributed signal on the input layer, one or more features are detected by the proximal dendrite filters. The minicolumn with the strongest activation fires first, selecting the winner neuron (or bursting) in accordance with the temporal memory algorithm. All other minicolumns are temporarily silenced.

Once the winning neuron in the winning minicolumn has fired and its activation has passed onto its axon, the neuron goes into a refractory period where it is unable to activate for some interval. The rest of the minicolumn may or may not also need to be suppressed during this time.

At this point, the rest of the adjacent minicolumns can now compete to be the next to activate following the same procedure as before. This process repeats until the refractory period of the first neuron/minicolumn has elapsed, at which time it reenters the competition.

The way I see it, this mechanism could allow for multiple features to be extracted from the same region of the input layer and encoded as a temporal sequence of SDR’s. For a static input, the response would be a cyclic repetition of the same sequence (a unique spatial-temporal representation for that specific combination of features). If the input is gradually shifting in response to some movement in a continuous space, then the sequences should modulate in a predictable (learnable) manner. Even discrete transitions could potentially be learned and represented in this way.

I’ve also been considering this as a potential remedy to @Paul_Lamb repeating input problem. At some level of granularity, it may be that the network naturally allows cyclical repetition of an input pattern while simultaneously waiting for possible transitions to a new stable representation. Sort of like driving around the traffic circle until you get the signal to take one of the available off-roads. If this were an autoencoder, I might be able to get away with describing it as the network state orbiting around a local minimum in the configuration space (attractor) until a gap opens up allowing it to transition to another local minimum.

1 Like

Markus mentioned this article:

The entorhinal cognitive map is attracted to goals

Charlotte N. Boccara, Michele Nardin, Federico Stella,
Joseph O’Neill, Jozsef Csicsvari

Grid cells with their rigid hexagonal firing fields are thought to provide an invariant metric to the hippocampal cognitive map, yet environmental geometrical features have recently been shown to distort the grid structure. Given that the hippocampal role goes beyond space, we tested the influence of nonspatial information on the grid organization. We trained rats to daily learn three new reward locations on a cheeseboard maze while recording from the medial entorhinal cortex and the hippocampal CA1 region. Many grid fields moved toward goal location, leading to long-lasting deformations of the entorhinal map. Therefore, distortions in the grid structure contribute to goal representation during both learning and recall, which demonstrates that grid cells participate in mnemonic coding and do not merely provide a simple metric of space.

DOI: 10.1126/science.aav4837 (free pdf download available)