Follow-up question to Podcast 1 with Jeff (location, orientation, and attention)

Thought I would start a thread to talk about some of the information from the first podcast (part 1 and part 2). There is one concept that I am still trying to wrap my head around, so hoping some further discussion will help to solidify things in my mind. It cuts to the core of “attention”, so it may not be something which has been sufficiently explored yet.

I’ll start with a thought experiment, which I think exemplifies the area where I am having trouble. (It is also something you can do IRL with the help of a friend, if you are having trouble visualizing the experience I am trying to convey)

Imagine yourself in a large room sitting in a typical swivel chair with coasters, and you are holding your favorite coffee cup. At this point you are not focused on the cup, and are aware of the room around you. Your friend slowly spins your chair, and pushes it around the room randomly. You are perfectly aware of where you are in the room and which direction you are facing at all times.

Now imagine while your friend continues spinning the chair and pushing it around, you start to explore the coffee cup with your index finger. You move your finger around the cup, changing directions. You are perfectly aware of where your finger is on the cup and which direction it is facing at all times. (Also assume that you do not have AADD and are able to explore the cup without being distracted by your friend and constantly shifting your attention)

Then you look up. Initially, you will have had no idea where you are in the room or which direction you were facing. A quick glance or two will re-establish your location and orientation in the room. At the same time, you will have lost the location and orientation you had in the context of the finger and cup.

Where my confusion comes into play, is that it seems you can be globally (across all sensors) attending to only one thing at a time (yes it is possible to “multitask” by shifting focus quickly back and forth between multiple things, but you are still only attending to one thing at a time). The position/orientations that you experience at any given time are globally in the space of that one thing. However from the discussions on the podcast, it seems as though positions and orientations are theorized to be distributed all around the whole neocortex. If that were the case, then it would seem that my finger exploring the cup should have no bearing on other sensors on my body which are experiencing the movement of the chair. Why do I lose my context in the room when I am exploring the cup with one sensor, and then lose my context on the cup when I attend to the room with other completely different sensors?

Another way to think about this problem, is if you explore a new novel object with your right index finger, you can transfer the object to your left hand and immediately recognize it and make position/orientation inferences about it with a completely different finger that has never encountered it before.

The above thought experiments would seem to indicate that there is a global mechanism for things like position and direction, that can refer to only one thing at a time, globally across all sensors. That being the case, then couldn’t a much smaller network of cells (perhaps even the same ones in the entorhinal cortex that are used for navigating a room) be also used for exploring any object with any sensor? I wonder if there have been any experiments to watch EC activity when exploring an object with a finger, to see if there is any similarity to exploring a room WRT location and head direction cells.


also - when you talk on the phone while driving - your mental attention “goes into the phone” and you mostly become unaware of what is going on around you.
I see this as much the same thing as the thought experiment that you propose.


Or listen to a podcast while driving… :laughing:


I think that attention is a bottle neck in the brain. The fundamental issue is that in order to make any decision, you need to think about the evidence. This entails making the cortex to think about the relevant things and not think about other irrelevant things.

As long as you don’t have to actually make any decisions, then you can multi-task quite well.

As evidence for multitasking, I present Gandalf Playing Bagpipes and Riding a Unicycle:

In this video Gandalf doesn’t make any decisions. He’s certainly practised all of the skills he displays and he’s probably planned out his song choices and his route in advance of when he needs to take action on them those decisions. If he got lost and needed to figure out where he was he would likely lose his place in the song; and if he was paying too much attention to his song then he might miss his turn and become lost; but as is he knows how to do all of the things without making decisions.
1 Like

This is a virtuoso demonstration of muscle memory. This really is not much in the way of attention going on when exercising a well-learned task. At this point the cerebellum takes over.
It is also well known that if you try to think about the performance while doing it you screw it up.

A slightly simpler version is typing from text or sight-reading music. At some level of mastery your mind can wander as you work and your training takes over. At least my mind does.


Thanks, @dmac, that actually illustrates quite well what is confusing me about the theory presented in the podcast. This is a nice example of temporal unfolding in action. Most of what the unicycler is doing is going as predicted, allowing him comfortably shift his attention between the two tasks to deal with any minor surprises. As those shifts happen, the global position/direction context is also shifting on the fly.

The interesting point is that even in this example, the unicycler’s sense of position and direction is still globally tuned to what he is attending to at a given time (and like you mentioned, if he has to focus on one of them for a long time, he can find himself lost when he eventually switches his attention back to the other one).

1 Like

Perhaps to state a more succinct question which directly addresses the source of my confusion:

If location/ direction mechanisms are replicated around the entire neocortex, how is it possible that I can learn a new novel object with my right hand, and then place my left hand into a black box and recognize the object even though those sensors have never encountered the object before?


Well - one possible explanation could be that the low level recognition is “simple features and location given by the entire posture/body chain” and those features are processed by the usual hierarchy and assembled in the association cortex as is usually explained by the most common neuroscience theories.

If the Numenta theories are going to replace this dogma it will be necessary to offer an alternative mechanism that explains your question as well as the entrenched theories do.


Thanks @Bitking that is a good point about hierarchy possibly being involved. But I still have some concerns with respect to room navigation versus object exploration.

It might help if I explain where in the podcast I started to have some concerns. It was during the “wormhole” discussion (in Part 2 I believe). Jeff and Matt were talking about the phenomenon that occurs when shifting attention between a cup and the logo on the cup. The object space (sense of location specifically) changed even though there was no physical movement involved. You can even do this exercise eyes closed, and still have exactly the same sense of changing location space as attention shifts between cup and logo.

It occurred to me at this point that if I were doing the exercise with my right index finger while my left hand was doing whatever, I could hold my right index finger at one point on the cup, and pass the cup to my left hand, placing my left index finger next to my right finger, then release the cup from my right hand. I could then continue the exercise with my left hand without any disruption in my sense of object spaces, even though I switched to a completely different sensor. This is where I started to have some concerns about the theory of location / orientation being generated in a layer within the same region that is processing the sensory input.

It then occurred to me that if I were performing this exercise while sitting in a chair that someone was moving around the room, I would lose my sense of position in the room while I was attending to the cup. And when I shifted my attention to the room and regained my sense of position and direction there, I would simultaneously lose my sense of location and orientation with respect to the finger and cup.

At this point, it occurred to me that my sense of location and direction in a room feels exactly the same as my sense of location and orientation in an object space. I’ve discussed this idea before a couple weeks ago on the forum when I talked about the sense of “projecting oneself into a video game”. However I didn’t notice the conceptual conflict this idea has with the current HTM theory, until that “wormhole” discussion in the podcast.

The sense of space doesn’t seem to be related at all to which sensors are involved, only to what is being attended at a given point in time. Is this simply a mental illusion, or is it an indication that location and direction mechanisms for both navigating a room and exploring an object might actually be performed “somewhere else” by a common network of cells?

It is clear that attention is a major player here, so definitely can’t rule this out as some global effect that the mechanisms behind attention produce (unrelated to the local object space generated by layers in the same region as sensory input). Hopefully things will start to become more clear as the theory evolves.


This might get a bit sketchy without pictures and supporting papers so please bear with me.
Maybe this idea will help and maybe it will make things totally confused: A lightning ball!
Imagine each arc as the activation of a single area/map with the thalamus as the controlling element that projects the basic alpha drive to unify each maps activity.

Now: At the same time add in a pattern of local loops that are joining map-to-map in the hierarchy to the hub of each of the cortical lobes. Each arc in the picture is a bi-direction data connection between maps. They have been training each other in interpreting patterns your entire life. For recall/readout - you put in a pattern and a matching/transformed pattern is influenced on the other end. This pair of patterns is what was paired when you experienced them before; these patterns are learned online as you go about your life. This association of patterns helps both ends match as they restrict the recall to the pair that is most closely match what you are experiencing at the moment. This greatly reduces ambiguity.
The individual sensory stream that enters each primary sensory area start this hierarchy at the “bottom” and the “top” of each hierarchy is joined at the hub of each lobe. In-between is these banks of transformative maps that have experienced every single thing you have ever perceived; a lot of it learned in a single shot. The activations and connections form the higher-dimensional representation that is the interpretation of the perceived and the remembered world.

The lizard brain needs are fed in the “bottom” of the forebrain hierarchy and this also joins at the “top” of the sensory hierarchies at the hub level. This point is where the “global workspace” theory has the most significance. If you are not familiar with it I highly recommend looking at it.

One of the areas closely associated with the hubs is the entorhinal cortex & hippocampus. This is the spatial sub-processor that relates “things to each other” in a 2D sheet. I am still thinking about how that turns into episodic memory but that is not important to this explanation.

As you may have read in my prior posts, I see the WHAT and WHERE streams terminating in the temporal lobe which is also directly connected to the highest levels of the various hubs. This is where your episodic memory is formed and is your “finished” perception of what is happening. This is what is fed to the inner-lizard as input to its simple deliberations.

These high-level connections all speak the lingua-franca of hex-grid coding (as they have all their lives) so the various representations are compatible with each other and the spatial sub-processor.

The spatial sub-processor has a rather limited capacity to represent things (current thinking is somewhere between 4 and 11 items) so this has to be switched as your focus of attention changes. Bidirectional connections feeds back to the activation chains to pull the related maps into line as a cooperative relaxation process.

An interesting feature of this system is that the items in the spatial sub-processor can have just part of the set changed which looks to the rest of the cortex as if you have switched to a related topic or level of an item in your mental focus.

Your inner lizard handles pulling on these hierarchy chains to switch attention but this is invisible to the cortex as there must be a cortical activation for it to be perceived in the temporal lobe - what the lizard does is all behind (well - below really) the scenes as far as the cortex is concerned. The lizard pulls on this through unfolding plans projected in nascent form to the forebrain.

I was going to do a post like the HTM - Hex-grids post on this material but this concept is so far from where everyone is here that I did not know if it was worth the effort; those posts take a huge effort. I am instead working on a program that will let me build up this system as a working model.

Too much?


Thanks, that analogy is actually quite good. It fits with how I perceive the experience as well.

1 Like

I very much enjoy reading the posts, but agree that building it in software is a better use of your time. Once it’s modelled in code, each components behaviour can be more easily communicated and it can even be used to drive visualisations.


On the contrary. There are separate areas of the brain dealing with riding a bike and with playing music, and according to the podcast each of those areas maintains its own location relative to the task which that area is doing. The ears have a location relative to the start of the song, and the legs have a location relative to road. It seems that each area of the brain is just doing its own thing.

I think all of your thought experiments require attention and possibly memory. My understanding of grid cells is that they are an entirely unsupervised system and thus should be capable of functioning independently of other brain areas. I’m sure that attention can manipulate and interfere with cortical grid cells, but I also think that the grid cells can function just fine without any attention or memory.


Yes, agree that is what the podcast is describing and part of current HTM theory. My confusion comes from the fact that this view seems to be contradicted by the “wormhole” thought experiment in the podcast (which, if thought about more deeply, reveals that object space is global across all sensors in relation to what you are attending to)

Yes, I think this is a key point. The confusion will likely only be cleared by better understanding how attention works. I’m comfortable with saying that the HTM theory for location/orientation encoding across the neocortex could still be accurate, and that the mechanism of (conscious) attention happens at a higher level of abstraction than that.

1 Like

Good conversation. I see two related, but different, problems being discussed. One is about attention, whether there is a global position and orientation, and the other is about how learning occurs across sensory areas, such as learning a cup with my left hand and recognizing it with my right hand. These are both topics we have discussed at length at Numenta so I can share some of our thinking, perhaps that will be useful.

First attention. I like to distinguish between what “you” are aware and what you are not aware of. I believe that most of what is happening in the neocortex is not available to introspection. There is a simple non-dualistic explanation for this. If we assume that only the representations at the top of the hierarchy are available for episodic memory and for verbal expression then most of what happens in the cortex will be invisible to these mechanisms. When I drive a car, part of cortex is attending to varied items on the road, making decisions, and taking actions. If this can be handled lower down in the cortex then I will not be aware of this activity. At any moment I can direct my attention to this activity and this cause it to rise to the top and be available for introspection and verbalization. Also, if the lower regions experiences something that they can’t handle that will force my attention (bottoms up) to these items. I will stop talking while I attend to anomaly.

The thought experiment of being pushed in a chair is perhaps not the best to illustrate these principles. Instead imagine that you are walking around the room and touching the cup. You can attend to the cup and then attend to the room and never get confused. We do this kind of thing all day long. By having someone push the chair, the normal mechanisms for keeping track of location (path integration based on motor) are lost. The only way to keep track of your location in the room is to constantly attend to what you are seeing. When you walk yourself, you don’t need to attend and you won’t get confused. So, I believe attention is occurring everywhere in the cortex. However, only one attended thing can make it to the top. That is what you are aware of, the rest is not available to instropection.

The second topic is how can one finger learn an object but then another finger recognize the object? Or how can part of an object be learned with one finger and other parts of the object be learned by another? Or how can I learn an object by looking at it and then recognize the object by touch alone? We call this the “disjoint pooling” problem (not a great name). I believe I wrote about this in another forum post. Briefly, we don’t know how it occurs, but we have several ideas. 1) Information spreads laterally from a column that is getting input to adjacent columns that are not. This is documented and the spread occurs in L3 and L5. I believe this is part of the solution. When column A learns something it trains its neighbors if they are idle. 2) The other involves the hierarchy and attention. Imagine I have two hierarchies, one for my hand and one for my foot. The hierarchies are separate (disjoint) except the top most region is shared by both the foot and hand hierarchies. I then learn a cup with my hand. This means all the regions in the hand hierarchy learn the cup, including the top most hierarchy. The regions in the foot hierarchy have learned nothing. What happens if I try to infer the object by touching it with my foot. Unlike the hand, this cannot be done low down in the foot hierarchy. What we do is attend to one part of the foot, say the big toe, and with attention move the toe, sense a feature, move the toe and sense a feature. These features will be something the toe can recognize like an edge or a rounded surface. These basic features are passed all the way up the hierarchy to the shared region where recognition occurs. You can’t do this without top down attention to the toe. As I said, these are just ideas. The problem has a solution.


I particularly like this concept of “rising to the top” (it would explain a lot of things). I’ll have to do some thinking how to actually implement a mechanism like that (it would require something different than a traditional view of hierarchy I think). Seems like more of a “tangle” than a “blockchain” (reference to IOTA)

image :wink:

1 Like

I’ve given some thought to how the concept of something lower “rising to the top” of a hierarchy might be implemented. This is what I’ve come up with so far (please feel free to tear this apart if I am way off the mark).

The first thing that becomes clear, is that there isn’t any obvious mechanism for literally pushing something lower in a hierarchy up through each level to the top (at least I couldn’t imagine one that seemed plausible).
Instead, the top of the hierarchy must have direct connections to each of the lower levels. This is of course
a deviation from the normal view of hierarchy, so open to criticism here.

Borrowing some ideas from the Global Workspace paper that @Bitking referenced in his Grids to Maps thread, you start with feed forward input traversing a hierarchy level by level in the traditional sense:


Next, you add direct connections from each level to the top of the hierarchy. The top node will be receiving anomalies from each level, and sending stimulation:


The signals will compete at the top node, and the most interesting/anomalous signal will be selected. The originating node will be stimulated. This stimulation will combine with the feed forward signal, and excite the node:


Each node will have lateral connections to other hierarchical branches across other modalities. When a node is excited, it will send a stronger signal from its lateral connections. In this example, let’s imagine this node in the hierarchy is related to sensory input from your hand, and the anomaly was an unexpected bump on your favorite coffee cup:


The lateral signal will recruit nodes from other hierarchies and modalities. In this case, lets assume there is a connection with a hierarchy related to sensory input from your eyes. Whatever the eyes were attending to before subconsciously will be overruled, and they will now be recruited to help resolve the anomaly with the coffee cup:


At this point, the global (conscious) attention has shifted to the coffee cup, and now coordinate spaces across the various sensors involved are all in relation to the cup.


Quick question, when you say “foot and hand hierarchies”, are you talking about a direct touch sensation only, or does it include anything motor related? For example does it include holding a toothpick in your hand and using it to examine the shape of the cup?

If I may, I’d like to offer a modification to the thought experiment of @Paul_Lamb that may get at one (or more) of the points that @jhawkins has made. Imagine that you are in a gym and are holding a basketball, and that somewhere in the gym is a hoop. Initially you know where you are with respect to the hoop, and could probably easily shoot the ball towards the hoop and come pretty close. Now, imagine that you are blind folded and told to wander around the gym for a while (10, 20, 30 seconds), and then take a shot (or throw the ball towards the hoop). Chances are, the longer you wander, the worse your final aim will be, but odds are that you would still have some sense of the general direction of the hoop. It would be interesting to see if there is a significant difference whether you were attending to the gym or the ball during the wandering, but I suspect that in both cases your aim would be much better than if someone were to wheel you around the gym in an office chair before taking the shot.

The point of the exercise is that there are many ways in which your brain can maintain spatial awareness, and it’s remarkably clever at picking up on subtle cues for maintaining relative orientations and positions. I often find that when I close my eyes and wander in an environment, that I attend to audible cues as reference points. Sometimes, I imagine a visual representation of the audible sources embedded in my surroundings shifting as I move, essentially binding the sounds to locations. I do much the same thing when I’m wandering around my house in the dark at night. I find that I’m usually able to put out my hand to find the door frame within a few inches of where I expect it to be.


Now, I’d like to get to the crux of the problem that I have been trying to grapple with ever since the topic of grid cells was introduced: What is the mechanism that drives these cells to fire in the grid pattern? With only raw sensory input to go on (i.e. no explicit position information) how do the cells know they have arrived at a given location/orientation. I could understand it if a particular body pose (prorpioceptive inputs) generated a certain SDR that happened to select for a given set of grid cells. However, if my understanding is correct, these same grid cells would still be active regardless of the body pose when the body returns to the previous location. I could also understand it if a given combination of environmental cues gave rise to an SDR that would also select for a location/orientation representation. But what causes the same cells to fire in such a regular pattern w.r.t. location (hexagons!) even if the environmental cues do not have a corresponding regularity?

I suppose it all comes down to path integration. It’s not the position, it’s the motion. Is there a network of cells that recognize temporal sequences corresponding to spatial translations and rotations? Are these transitions somehow translated into recurring patterns of grid cell activation?