Follow-up question to Podcast 1 with Jeff (location, orientation, and attention)


I’m not clear on this either, but in some sense I believe this has to be learned (since it applies to abstract dimensionless concepts as well as the more obviously spatial ones, and can be applied to weird physics like Portal or Paper Mario). Perhaps a clue can be found in another phenomenon that occurs if you reach into a black box to feel an object (or in your example when you take off your blind fold after wandering a bit around the gym). You started with one spatial sense (perhaps random in the case of the black box, or perhaps drifting from reality in the case of the gym). As you get more sensory clues from your movements (touching with your fingers, saccades from your eyes, etc), you then recognize (based on the sequences of actions/inputs, or voting between sensors) a location on the object (or in the room) that you remember, and the spatial sense suddenly “snaps” to the one that is remembered.

This same “snapping” strategy could potentially be used to learn different paths to the same location on a new novel object, room, or concept as well. The system might continue to build out a new random set of spatial information as it performs actions (likely relying on semantic similarities with other previously learned objects, rooms, or concepts, so probably not usually completely random). When it recognizes a location it has been previously, it can snap back to those representations, and associate it with the motor action performed at the previous location.

1 Like


This is an area of active research. My favourite theory is (Kropff and Treves 2008). They propose a spatial pooler (its not called that in their paper though) with two additional mechanisms. One of the mechanisms causes grid cells to respond to large contiguous areas of their input, the other mechanism (named fatigue in their paper) shapes those large receptive fields into spheres. Then the spatial pooler’s competition packs these spheres in as tightly as possible, which on a two dimensional plane yields a hexagonal grid.

Kropff and Treves, 2008:



My guess is a lot of the input to entorhinal cortex about walking is pre-processed so it doesn’t need to handle a huge sequence of exact muscle positions and movements. For example, it might get a subcortically produced movement direction or movement direction change signal.

The purpose of path integration is sort of to get rid of the sequence aspect. I don’t think it’s feasible to learn every single sequence between locations on an object. Those sequences don’t apply to every object because the solid object is in the way of many of them, so if it does path integration only by pooling sequences, it would take forever to learn.

Some theories use oscillations (with or without phase offsets that produce a travelling peak along the cortical sheet) to do path integration and form grid cells. Oscillations are periodic like grid cell response fields and can form hexagonal grids with multiple oscillations interfering.

I’m extremely biased but I think the hypothesis I described in a post is close to the truth.

I think there needs be some form of pure path integration. Otherwise, it would have to experience every path and pool them, like the old temporal pooler or union pooler was meant to pool sequences. It’s physically possible to do path integration without that, so I don’t think it does path integration by pooling sequences.

Or at least, that’s probably not the only way. There are probably multiple complementary methods of path integration at play. Some sort of automatic, sensory-insensitive version of path integration, and then some sort of flexible, sequence learning path integration (perhaps related to object behavior).

The automatic path integration brings you from point A to point B on a fairly routine journey. But then from point B to point C, you take an elevator, which the automatic system can’t handle. So you learn sequences from point B to point C. The automatic system gets you to point B and point C other ways, so you already know what those places look like, and you just need to learn that one sequence. As a more grounded example, moving your fingertip through the air leads to consistent transitions between locations, but with object behavior and not being able to phase through objects, you also need flexible, learned path integration.

There are like ten functional layers, twenty if you count what and where pathways separately, and thalamus, basal ganglia, and so on, so there’s plenty of room for multiple forms of path integration.



There is sure to be a great deal of pre-wiring involved, given that eons of generations have been born onto a planet with some stable physical parameters (I like to use the example of wildebeest infants, which are able to run from predators within hours of birth). I’m sure you are right that there are a lot of potential mechanisms that the brain can leverage (many of which have been around a lot longer than the neocortex).

1 Like


There is a mechanism for this. It is widely believed that the thalamus is integral to attentions. The most important input to every region goes through relay cells in the thalamus. These cells have two modes of operation (burst and tonic) plus there is an inhibitory network in the thalamus. The relay cells can be switched between relay and burst modes by either a top-down feedback signal from the receiving region or a very strong signal from the lower sending region. The idea is that an unexpected input causes the relay cells to attend to the unexpected input, and also the top higher region can direct attention as well. For example, I can tell you to attend to some area of your visual field (top down). Or, if something unexpected happens your attention will automatically go there (bottom up), you can’t prevent it.

This is an interesting question that we are still trying to understand. Switch to vision. Do the grid cells a column in V1 represent the location of the eye in the space of the viewed object or do they represent the location of actual feature on the object? With touch it is easy to imagine that the location represented by a column in S1 is both the location of the skin and the sensed feature, but as you point out you can touch something with a tool such as a toothpick. Do the grid cells represent the location of the finger or the location of the tip of the toothpick? I believe the cortex represents the location of the sensed feature and not the sense organ. This is cleaner and more powerful, however, it then begs the question, how does a column know where the sensed feature is? How does it know the location of the tip of the toothpick? We don’t know. We have some ideas but no answers yet.



Thanks, I just want to make sure I understand the mechanism you are describing here for the bottom-up route.

It sounds like you are describing the nodes in a hierarchy routing their most important input through the thalamus between levels of the hierarchy. Something like this:


When something sufficiently anomalous occurs (I’m assuming some competition here), the thalamus gates input from other regions:


And global attention shifts due to feedback from the top of the hierarchy cascading down (essentially the anomalous node gets an overwhelming vote, due to other input being blocked from traveling up the hierarchy):




The mechanism is simpler than you are describing. Take two regions, R1 projects to R2. The feed forward connections from R1 to R2 are routed through the thalamus. It appears that the thalamus plays a role in what part of the output of R1 is attended to by R2. What exactly attention is and what exactly the thalamus does when it passes on the signal is not known. Much of the anatomy and cellular mechanisms are known, but the function is not clear. If attention is related to the “burst vs tonic” modes of the relay cells, as some believe, then both top down and bottom up input to the thalamus can direct attention. I was only letting you that a bottoms up control of attention is both possible and some of the mechanisms are known. I would recommend reading Murray Sherman’s book about the thalamus if you want more detail.



Ok, so the thalamus is essentially responsible for establishing and/or enforcing the context (the thing that should be attended to). How specifically it does so is not entirely known (but in theory could involve something like the global workspace, or something else entirely).

I’ll play around with some of these ideas then and see what works (probably will deviate from the biology for now). I can always go back to the drawing board later when more is understood about the mechanism in the future.

Thanks again for taking the time to reply to my queries!

1 Like


So I thought I understood what you were describing, until I found mention of “Displacement Modules”, which I don’t understand. But I thought I’d share what I thought anyways since I think it’s interesting:

Layer 6 contains grid cells, which are organized into mini-columns. The mini-columns accept distal input from other layer 6 grid mini-columns, so that layer 6 forms a temporal memory, representing the current location in the context of the previous locations. Layer 6 would represent the current location in a trajectory of motion, if that makes sense? Then layer 5 would be doing temporal pooling over these layer 6 cells, which would cause layer 5 to represent the overall trajectory of motion. Layer 5 then projects to the muscles which drive that motion.

When the animal wants to go somewhere, the thalamus simply activates the location in layer 6 where it wants to go. The layer 6 mini-columns burst which represents every trajectory passing through the destination, which in turn activates the layer 5 actions which pass through both the current location and the destination.

A separate set of layer 6 grid mini-columns is doing something different, by accepting distal input from layers 2/3 which represent the current object being sensed. These grid cells represent the location on the object, without any memory of how it got to this location. These cells are specific to both the object and the location on the object. These grid cells project to layer 4 where they’re used to predict sensory features given the sensors current location and the object being sensed.

I am interested in reading that paper which you mentioned about animals finding their way home in the woods, If you would be willing to share the citation.