Follow-up question to Podcast 1 with Jeff (location, orientation, and attention)

I’ve given some thought to how the concept of something lower “rising to the top” of a hierarchy might be implemented. This is what I’ve come up with so far (please feel free to tear this apart if I am way off the mark).

The first thing that becomes clear, is that there isn’t any obvious mechanism for literally pushing something lower in a hierarchy up through each level to the top (at least I couldn’t imagine one that seemed plausible).
Instead, the top of the hierarchy must have direct connections to each of the lower levels. This is of course
a deviation from the normal view of hierarchy, so open to criticism here.

Borrowing some ideas from the Global Workspace paper that @Bitking referenced in his Grids to Maps thread, you start with feed forward input traversing a hierarchy level by level in the traditional sense:

image

Next, you add direct connections from each level to the top of the hierarchy. The top node will be receiving anomalies from each level, and sending stimulation:

image

The signals will compete at the top node, and the most interesting/anomalous signal will be selected. The originating node will be stimulated. This stimulation will combine with the feed forward signal, and excite the node:

image

Each node will have lateral connections to other hierarchical branches across other modalities. When a node is excited, it will send a stronger signal from its lateral connections. In this example, let’s imagine this node in the hierarchy is related to sensory input from your hand, and the anomaly was an unexpected bump on your favorite coffee cup:

image

The lateral signal will recruit nodes from other hierarchies and modalities. In this case, lets assume there is a connection with a hierarchy related to sensory input from your eyes. Whatever the eyes were attending to before subconsciously will be overruled, and they will now be recruited to help resolve the anomaly with the coffee cup:

image

At this point, the global (conscious) attention has shifted to the coffee cup, and now coordinate spaces across the various sensors involved are all in relation to the cup.

2 Likes

Quick question, when you say “foot and hand hierarchies”, are you talking about a direct touch sensation only, or does it include anything motor related? For example does it include holding a toothpick in your hand and using it to examine the shape of the cup?

If I may, I’d like to offer a modification to the thought experiment of @Paul_Lamb that may get at one (or more) of the points that @jhawkins has made. Imagine that you are in a gym and are holding a basketball, and that somewhere in the gym is a hoop. Initially you know where you are with respect to the hoop, and could probably easily shoot the ball towards the hoop and come pretty close. Now, imagine that you are blind folded and told to wander around the gym for a while (10, 20, 30 seconds), and then take a shot (or throw the ball towards the hoop). Chances are, the longer you wander, the worse your final aim will be, but odds are that you would still have some sense of the general direction of the hoop. It would be interesting to see if there is a significant difference whether you were attending to the gym or the ball during the wandering, but I suspect that in both cases your aim would be much better than if someone were to wheel you around the gym in an office chair before taking the shot.

The point of the exercise is that there are many ways in which your brain can maintain spatial awareness, and it’s remarkably clever at picking up on subtle cues for maintaining relative orientations and positions. I often find that when I close my eyes and wander in an environment, that I attend to audible cues as reference points. Sometimes, I imagine a visual representation of the audible sources embedded in my surroundings shifting as I move, essentially binding the sounds to locations. I do much the same thing when I’m wandering around my house in the dark at night. I find that I’m usually able to put out my hand to find the door frame within a few inches of where I expect it to be.

2 Likes

Now, I’d like to get to the crux of the problem that I have been trying to grapple with ever since the topic of grid cells was introduced: What is the mechanism that drives these cells to fire in the grid pattern? With only raw sensory input to go on (i.e. no explicit position information) how do the cells know they have arrived at a given location/orientation. I could understand it if a particular body pose (prorpioceptive inputs) generated a certain SDR that happened to select for a given set of grid cells. However, if my understanding is correct, these same grid cells would still be active regardless of the body pose when the body returns to the previous location. I could also understand it if a given combination of environmental cues gave rise to an SDR that would also select for a location/orientation representation. But what causes the same cells to fire in such a regular pattern w.r.t. location (hexagons!) even if the environmental cues do not have a corresponding regularity?

I suppose it all comes down to path integration. It’s not the position, it’s the motion. Is there a network of cells that recognize temporal sequences corresponding to spatial translations and rotations? Are these transitions somehow translated into recurring patterns of grid cell activation?

2 Likes

I’m not clear on this either, but in some sense I believe this has to be learned (since it applies to abstract dimensionless concepts as well as the more obviously spatial ones, and can be applied to weird physics like Portal or Paper Mario). Perhaps a clue can be found in another phenomenon that occurs if you reach into a black box to feel an object (or in your example when you take off your blind fold after wandering a bit around the gym). You started with one spatial sense (perhaps random in the case of the black box, or perhaps drifting from reality in the case of the gym). As you get more sensory clues from your movements (touching with your fingers, saccades from your eyes, etc), you then recognize (based on the sequences of actions/inputs, or voting between sensors) a location on the object (or in the room) that you remember, and the spatial sense suddenly “snaps” to the one that is remembered.

This same “snapping” strategy could potentially be used to learn different paths to the same location on a new novel object, room, or concept as well. The system might continue to build out a new random set of spatial information as it performs actions (likely relying on semantic similarities with other previously learned objects, rooms, or concepts, so probably not usually completely random). When it recognizes a location it has been previously, it can snap back to those representations, and associate it with the motor action performed at the previous location.

1 Like

This is an area of active research. My favourite theory is (Kropff and Treves 2008). They propose a spatial pooler (its not called that in their paper though) with two additional mechanisms. One of the mechanisms causes grid cells to respond to large contiguous areas of their input, the other mechanism (named fatigue in their paper) shapes those large receptive fields into spheres. Then the spatial pooler’s competition packs these spheres in as tightly as possible, which on a two dimensional plane yields a hexagonal grid.

Kropff and Treves, 2008: https://onlinelibrary.wiley.com/doi/abs/10.1002/hipo.20520

My guess is a lot of the input to entorhinal cortex about walking is pre-processed so it doesn’t need to handle a huge sequence of exact muscle positions and movements. For example, it might get a subcortically produced movement direction or movement direction change signal.

The purpose of path integration is sort of to get rid of the sequence aspect. I don’t think it’s feasible to learn every single sequence between locations on an object. Those sequences don’t apply to every object because the solid object is in the way of many of them, so if it does path integration only by pooling sequences, it would take forever to learn.

Some theories use oscillations (with or without phase offsets that produce a travelling peak along the cortical sheet) to do path integration and form grid cells. Oscillations are periodic like grid cell response fields and can form hexagonal grids with multiple oscillations interfering.

I’m extremely biased but I think the hypothesis I described in a post is close to the truth.

I think there needs be some form of pure path integration. Otherwise, it would have to experience every path and pool them, like the old temporal pooler or union pooler was meant to pool sequences. It’s physically possible to do path integration without that, so I don’t think it does path integration by pooling sequences.

Or at least, that’s probably not the only way. There are probably multiple complementary methods of path integration at play. Some sort of automatic, sensory-insensitive version of path integration, and then some sort of flexible, sequence learning path integration (perhaps related to object behavior).

The automatic path integration brings you from point A to point B on a fairly routine journey. But then from point B to point C, you take an elevator, which the automatic system can’t handle. So you learn sequences from point B to point C. The automatic system gets you to point B and point C other ways, so you already know what those places look like, and you just need to learn that one sequence. As a more grounded example, moving your fingertip through the air leads to consistent transitions between locations, but with object behavior and not being able to phase through objects, you also need flexible, learned path integration.

There are like ten functional layers, twenty if you count what and where pathways separately, and thalamus, basal ganglia, and so on, so there’s plenty of room for multiple forms of path integration.

2 Likes

There is sure to be a great deal of pre-wiring involved, given that eons of generations have been born onto a planet with some stable physical parameters (I like to use the example of wildebeest infants, which are able to run from predators within hours of birth). I’m sure you are right that there are a lot of potential mechanisms that the brain can leverage (many of which have been around a lot longer than the neocortex).

1 Like

There is a mechanism for this. It is widely believed that the thalamus is integral to attentions. The most important input to every region goes through relay cells in the thalamus. These cells have two modes of operation (burst and tonic) plus there is an inhibitory network in the thalamus. The relay cells can be switched between relay and burst modes by either a top-down feedback signal from the receiving region or a very strong signal from the lower sending region. The idea is that an unexpected input causes the relay cells to attend to the unexpected input, and also the top higher region can direct attention as well. For example, I can tell you to attend to some area of your visual field (top down). Or, if something unexpected happens your attention will automatically go there (bottom up), you can’t prevent it.

This is an interesting question that we are still trying to understand. Switch to vision. Do the grid cells a column in V1 represent the location of the eye in the space of the viewed object or do they represent the location of actual feature on the object? With touch it is easy to imagine that the location represented by a column in S1 is both the location of the skin and the sensed feature, but as you point out you can touch something with a tool such as a toothpick. Do the grid cells represent the location of the finger or the location of the tip of the toothpick? I believe the cortex represents the location of the sensed feature and not the sense organ. This is cleaner and more powerful, however, it then begs the question, how does a column know where the sensed feature is? How does it know the location of the tip of the toothpick? We don’t know. We have some ideas but no answers yet.

4 Likes

Thanks, I just want to make sure I understand the mechanism you are describing here for the bottom-up route.

It sounds like you are describing the nodes in a hierarchy routing their most important input through the thalamus between levels of the hierarchy. Something like this:

image

When something sufficiently anomalous occurs (I’m assuming some competition here), the thalamus gates input from other regions:

image

And global attention shifts due to feedback from the top of the hierarchy cascading down (essentially the anomalous node gets an overwhelming vote, due to other input being blocked from traveling up the hierarchy):

image

Paul,
The mechanism is simpler than you are describing. Take two regions, R1 projects to R2. The feed forward connections from R1 to R2 are routed through the thalamus. It appears that the thalamus plays a role in what part of the output of R1 is attended to by R2. What exactly attention is and what exactly the thalamus does when it passes on the signal is not known. Much of the anatomy and cellular mechanisms are known, but the function is not clear. If attention is related to the “burst vs tonic” modes of the relay cells, as some believe, then both top down and bottom up input to the thalamus can direct attention. I was only letting you that a bottoms up control of attention is both possible and some of the mechanisms are known. I would recommend reading Murray Sherman’s book about the thalamus if you want more detail.
Jeff

4 Likes

Ok, so the thalamus is essentially responsible for establishing and/or enforcing the context (the thing that should be attended to). How specifically it does so is not entirely known (but in theory could involve something like the global workspace, or something else entirely).

I’ll play around with some of these ideas then and see what works (probably will deviate from the biology for now). I can always go back to the drawing board later when more is understood about the mechanism in the future.

Thanks again for taking the time to reply to my queries!

1 Like

So I thought I understood what you were describing, until I found mention of “Displacement Modules”, which I don’t understand. But I thought I’d share what I thought anyways since I think it’s interesting:

Layer 6 contains grid cells, which are organized into mini-columns. The mini-columns accept distal input from other layer 6 grid mini-columns, so that layer 6 forms a temporal memory, representing the current location in the context of the previous locations. Layer 6 would represent the current location in a trajectory of motion, if that makes sense? Then layer 5 would be doing temporal pooling over these layer 6 cells, which would cause layer 5 to represent the overall trajectory of motion. Layer 5 then projects to the muscles which drive that motion.

When the animal wants to go somewhere, the thalamus simply activates the location in layer 6 where it wants to go. The layer 6 mini-columns burst which represents every trajectory passing through the destination, which in turn activates the layer 5 actions which pass through both the current location and the destination.

A separate set of layer 6 grid mini-columns is doing something different, by accepting distal input from layers 2/3 which represent the current object being sensed. These grid cells represent the location on the object, without any memory of how it got to this location. These cells are specific to both the object and the location on the object. These grid cells project to layer 4 where they’re used to predict sensory features given the sensors current location and the object being sensed.

I am interested in reading that paper which you mentioned about animals finding their way home in the woods, If you would be willing to share the citation.

Paul Lamb’s point about recognizing with one hand an object that that hand never felt before is not just a problem for later. I think we can draw some conclusions immediately, or at least raise some questions.
Lets suppose that your finger traced out an object with a peculiar shape. Lets also suppose that there are no lateral connections between your the finger on your hand and your big toe. Later you are blindfolded, and trace an object with your big toe, and recognize it as having the same shape as the earlier object you traced with your finger.
The implication, if we believe Numenta’s theory, is that there is a higher level representation somewhere, maybe more abstract, that has bidirectional connectivity with both the finger and the toe. It has to be bidirectional, because the toe has to send up some pattern that is on some way similar to the pattern that was sent up by the finger, and the abstract pattern has to tell the toe what it is feeling.
But that raises another question. What is similar about the two patterns - the one sent up by the toe, and the one sent up by the finger?
Is it the SDRs for the features being sensed? I would think not, no two SDRs are alike.
Is it the location vectors for the toe and the finger? Again, Numenta theory says no, in fact even adjacent fingers that connect to adjacent columns produce different location vectors for the same location on the same object. A big toe would also have different location vectors.

So this raises a problem.

The big toe pattern has to be similar in some way to the finger pattern. The big toe pattern is a paired location vector with a sensory vector. But if the features are not similar, or the location grid cell patterns are not similar, or both together are not similar, then how can that abstract pattern at a higher level know that the finger and the toe felt the same object?

Here is one possibility Perhaps the set of displacement vectors ARE the same for both the finger and the toe. But according to Numenta theory, they are not - displacement vectors are dependent on the location vectors that they are computed from, and so are unique to each sensory patch!

I don’t really see this as a problem. I don’t talk much about the “what” and “where” pathways and how they converge, because I don’t know much about that yet, but if you consider the egocentric locations of your sensors, they should represent space in a comparable way.

1 Like

This has been an area of considerable attention. This post scratches at the surface of the answer.
While this is was asked about letters in a sentence some reflection should show that this is the visual form of the sensory perception question you are asking.

There is a different and related answer to your question - your personal space. This is special as the area that you can reach and manipulate is learned as if it was part of your body. Coordinated movement in this space is learned as part of your sense of agency.

2 Likes

Matt, you mentioned a WHAT and a WHERE pathway, and you said the solution to my question had to do with them. To confess here: I’m trying to write a blog post on Numenta’s ideas of grid cells and compositionality and so forth, and I have a feeling I’m missing something. Basically that discomfiting ‘feeling’ started with the idea that similar objects should have similar representations (which didn’t seem to be true if location space is different for every object), and then via a Bitking reply I came across Paul Lamb’s observation about feet versus hands. If Feet and Hand sensory areas in the brain don’t have direct lateral communication they probably only communicate via some level higher up (I assume). So that is the motivation for all my questions here. (apart from interest of course)

Do you have any source on using WHAT and WHERE pathways to find a commonality? I had thought that WHERE in Numenta’s theory is solved by layer 6 (at least WHERE in object space) and I had also thought that WHERE is solved in the hippocampus - in egocentric space. But now you are talking about the WHERE and WHAT pathways (which I have come across but vaguely understand) which seem to supplement the other WHERE’s.

To summarize, if the sensory/location paired layers that work with your fingers don’t communicate directly the sensory/location paired layers of your toe, and if the representations from both are different, then you seem to be saying that there is an additional answer to this, which lies in the WHAT and WHERE pathway, which finds a commonality between the two. I do not grok that yet.
Thanks.

To make this clear, this is not about the layers in a given section of cortex, this is about how the areas of the cortex are connected together with projecting fiber tracts.

Check this reference on the WHAT and WHERE streams starts with the seventh reference link:

1 Like

I have another perspective on this:

Consider the neocortex / thalamus. In the very beginning, it discovers it is connected to some new hardware some people call baby. It has all kinds of appendages the neocorex needs to learn. Some are fixed (nose, ears), others are flexible (arms, legs) and some become disconnected some time (pacifier, diaper, floor, mother).

To the brain, my body is an object. Or a set of objects that interact with other objects. It must be. From the point of view of a central processing unit with connectivity to external input, there is no other way to process all this information but to treat it all the same way.

The grid cells and displacement cells that help me orient my fingertip must be the same kind as for the tool I am currently using (toothpick). The violin player, who studies 4+ hours a day, must have a brain that knows her violin almost litterally as the back of her hand. To the soccer player, a football is part of his body, even when it is detached. (At least as long as it is in his vicinity).

So, this makes me wonder whether there is a difference between egocentric and allocentric space at all? You are not your body. You are a product of your brain, that considers the world as a collection of objects, including your body.

1 Like

Personal space is that region that you can interact with. Peripersonal space is special because you have learned how all of your senses interact with it.
As far as ego vs allo - for the somatic-sensory sense this is bound to your kinematic body chain to your supporting structures and the “other end” of your vestibular sensory system. “Out there” is the frame of reference you orient to.

I do some animation and the kinematic figure animation is a hard bit to get right. One of the todo items is to try running this with a ANN and see if I can get this to look more natural. This potential project has driven my attention to this aspect of neural networks for years. The project does not get done but it has driven the reading of many books and papers.