An Apical Depolarization for Numenta: How to Generate the Allocentric Location Signal

Hi all,

My name is Daniel Rehman, and I am a junior at Washington State University in Vancouver, studying Computer Science, with a minor in Neuroscience. I have been following Numenta’s developments of the theory for sensory-motor inference, and I have especially been intrigued by a missing part of Numenta’s currently disclosed theory, the generation of the allocentric location signal.

Over the last 6 months, I have submerged myself deep into the field of neuroscience, and after a long and hard process of reading neuroscience papers, thinking about neuroscience, and drawing things on whiteboards, I have created a theory for how the allocentric location signal might be generated in the neocortex.

A PDF document which explains the theory in detail is linked at the end of this post, and I hope it is accessible to as many people as possible. I am also eager to accept any feedback you might have when reading the document.

I hope this theory is helpful in the path to understanding the neocortex, and creating true machine intelligence!

Daniel Rehman.


This is great, I love seeing stuff like this. Just to be clear that this is a community expansion of Numenta’s HTM theory, I’m moving this from #htm-theory into #htm-theory:tangential-theories.

Ah, yes that move makes sense, thanks for the heads up!

Hi Daniel

Thanks for this top professional walk through. I would like to write you an email, so would you give me your email?
Finn Gilling

Great work Daniel!

I wish I could have time and desire to arrange my ideas in such a proper scientific form.
All I can do is barely arguing. :slight_smile:

"An alleged scientific discovery has no merit unless it can be explained to a barmaid."
Ernest Rutherford.

What’s good in Science form of information is that you have to represent the facts first, and arguing and explanation can follow. That’s why it is useful to read scientific materials even those authors whose ideas you do not agree completely.

Can you please make an “explanation for a barmaid” of your work? How the allocentric location signal might be generated in the neocortex?


yeah I’ll give my email. Although it should be noted that messages on the forum are my prefered way of communication. but my email is:

Thank you for your kind words!

yes, i might be able to write up a simplified version of the document. it might be a little while, as i am currently working on a couple other documents on my newest theories about the neocortex and hippocampus, however. i will try to get around to making that “explanation for a barmaid” document at some point. :slight_smile:

1 Like

I read your paper, and watched your video, and have a question.

Suppose a person is sitting in a chair, and feeling the edges of a Rubix Cube that is resting on a table.
What is being predicted, and in what order?
There is FA, LA, FE, LE, m(FE), m(LE), m(FA), m(LA), which (oversimplified) are:

  1. object-feature (FA)
  2. object-orientation (LA)
  3. the orientation and position of the person’s hand (FE)
  4. the motor command from the person’s motor cortex (LE)
  5. the goal to be achieved (or the motor command to achieve the next movement, or the sum of all motor commands so far) m(FE)
  6. the full surroundings (the locations of all objects) m(LE)
  7. the object itself, independent of orientation m(FA)
  8. the orientation of the object m(LA)

You can see that this gets confusing. What does orientation mean anyway? How can you know the orientation of the object without incorporating the features of the object?
In the case of the person’s body, does orientation include space - that is - the distance of the object from the hand, as well as that his hand is at an angle?

What is the order of events? Do you first have a goal m(FE) and if so, how does it unfold? When you use the term FA, for instance, I’m not sure when ‘FA’ refers to the current edge or corner of the object being felt, or the predicted future edge or corner of the object that will be felt.
Plus, in the Numenta temporal pooler paper, there is no sequence prediction. There is no way of knowing what the user plans to touch next. All you know is that he touched several points at several locations, and from that, you can infer what the object was. Why would your model predict the new location - what basis does it have except that the hand was at a nearby location?

There are many symmetries in your theory, but I don’t think people can build a software model based on it, given what you’ve said so far.

You need to give a sequence:
For instance, does the person first reach for and touch a corner of the object, which means he knows FE and LE, and by touching finds out FA? If that’s the case, where does prediction coming into the picture? What does he predict next - the location of his hand? Or the type of feature (edge, corner, texture) he is going to feel next?

We could also ask: are the predictions in mutual directions, for instance FA predicts LA and LA predicts FA? Or do you have FA at time t predicting LA at time (t+1)?
If you have any time to explain its appreciated.

“What I cannot create, I do not understand.”—Richard Feynman

1 Like

Thank you for your interest in my theory! Ill gladly answer your good questions.

The list you gave of the semantic meanings of each aspect of reality (e.g., FA, LE, m(FE), etc) is very close, however i see a couple of errors:

number. semantic meaning (another way to think of it) = SIGNIFIER,
–> {manifestation in the Rubik’s cube example}

------- predicted aspects --------

  1. an object feature, (part of an object’s abstract model) = FA,
    –> {feeling an edge on the cube}

  2. an object location, (part of an object’s orientation) = LA
    –> {knowing that the edge is on the red face, on the bottom left corner area on the cube}

  3. part of the body’s orientation and state and position, (a new or previous motor command) = FE
    –> {the previous motor movement of my left hand, which caused it to become closer to the cube sitting on the table}

  4. an object’s location in space, (not a motor command, but a location) = LE
    –> {knowing that the red side of the cube is 20 cm and 10 degrees away from the tip of my hand, for example}

------- modelled aspects --------

  1. the current or desired state of the body, ie, the body’s entire orientation, and position, etc, (note: when this state is the new desired state, this is the goal.) = m(FE)
    –> {the current model of my left arm, which maybe posits that my forearm is at a 30 degree angle from my chest, and my hand is oriented 20 degrees counter clockwise to my forearm}

  2. the full surroundings (the location of all objects being modelled) = m(LE)
    –> {the model of the location of every object in the room, like the chair, the rubiks cube, the table, the walls, all smushed together into a single model of the location of my surroundings.}

  3. the full model of the object, independent of orientation or location in space = m(FA)
    –> {the rubiks cube, thought of in its entirety: every possible somatosensory unique feature-location pair i could feel on every side of the cube.}

  4. the full model of the orientation of the object = m(LA)
    –> {the cubes current orientations in space: the fact the cube is sitting with the blue side downwards, and the red side facing me, and the yellow side facing towards the left.}

with this said, most of your understanding of these aspects of reality and their names was actually correct, however this list corrects a couple errors you had.

An important note about how i organized this list, is separating out the predicted aspects from the modelled aspects. this is important to answer your initial question of which ones are predicted, and when, or in what order.

Now for your questions:

question: what does orientation mean anyway?
answer: other than the intuitive definition of orientation, it’s hard to answer this cleanly. its sorta like if you take a Rubik’s cube, and you simply rotate it in space, relative to you. (if you aren’t changing its location in space,) you are changing its orientation in space. it is this orientation that is “m(LA)”.

question: how can you know the orientation of an object without incorporating the features of the object?
answer: you are exactly right. you can’t. knowing the previously predicted object feature is imperative to predicting a piece of the new orientation of the object. (note: “a piece of the new orientation of the object”, is identical to an allocentric location on an object.).

question: does orientation include (egocentric) space?
answer: no, it does not. space, or in other words, location in space, is done exclusively by LE-predicting or m(LE)-modelling layers-- orientation is merely its “rotation in space” (more or less) as described earlier. you noted the angle and distance an object is away from you. this is actually a good rough idea of what LE is predicting, or what m(LE) is modelling. (note: this is not exactly at all how brains deal with egocentric locations, however)

question: what is the order of events? do you first have a goal m(FE)? how does it unfold?
answer: this is a really good question. i didn’t talk about this in the paper at all, but i’d say there is actually very little complicated “order” to it, (such as FE, then LE, then LA, etc). I have a hypothesis that the predictions and models happening in the CT modules in both E and A regions are simultaneously the first things to be modelled or predicted, in the process of developing a model of the world. in fact, it may even be silly to think of the modeling and predicting happening in a E.CTmod, or A.CTmod as separate. they are actually intimately dependant on each others activity, and one can only succeed at the moment the other succeeds.

so in other words, in the case of prediction, FE and LA must successfully predict their inputs at the same time, and only then, can FA or LE predictions start to be made. (take this with a grain of salt, however, because this is simply based on my understanding of the theory, not any neuroscience evidence or software simulation.)

this situation is further complicated by the activity of the pooling layers, or in other words, the modelling of the aspects of reality, like m(FE), m(LA), etc. I postulate that the modelling situation only becomes successful, after a sufficient amount of successful predictions are made about the inputs (after LA FE, FA or LE aspects are successfully predicted), OR help (apical depolarizations) from the parent region allows for the coalescence of a stable model of a particular aspect of reality.
both of these will allow a stable model of the inputs to be made.

question: does FA refer the current edge on an object being felt, or the predicted future edge that will be felt?
answer: in the case of the output of layer A.4 (or A.3b-alpha) always the former. it represents the current edge being sensed. in order to develop this FA at timestep {t}, (ie, current feature on an object), you need to do the following. this is given specifically for layer 4, however layer 3b-alpha is not much different.

  1. at the timestep {t-1}, you need to develop a prediction of what allocentric location the FA at {t} will appear at on the object. this information is simply the output of A.6a, (in the case of A.4), (which as we know is responsible for producing the allocentric location of some feature being sensed). for the sake of example, we will assume that the A.6a has successfully predicted the allocentric location of the sensory feature that occurred at {t-2}. we can now imagine that this new activity of layer A.6a causes a distal depolarization in layer A.4 . this “allocentric location of the previous feature” is, for some non-intuitive reason, the prediction of the allocentric location of the feature that will occur at next timestep, {t}. …i am assuming that there must be some learnable translation between “the allocentric location of the previous feature” and the “predicted allocentric location of the newly arriving feature”, which might be learned through the learning of the exact dendritic inputs that A.6a gives to the distal dendrites of A.4. (this part is where i’m not really sure, to be perfectly honest.)

  2. at the timestep {t}, now that this prediction of the allocentric location of the newly arriving feature is made, (through A.6a distally depolarizing cells in A.4) the moment the newly arriving sensory feature comes into layer 4 proximally, it is made more specific by the current distal depolarizations made by 6a, through “Competitive Ion Update Inhibition”, as i call it. the resultant set of cells which got proximal inputs about the sensory feature, and also the LA distal depolarization, is the current allocentric feature being perceived, or in other words the currently predicted allocentric feature.

question: are the predictions in mutual directions? for instance, does FA predict LA, and LA predict FA?
answer: not quite. actually, each layer that works with FA or LA, (lets use A.4 and A.6a, respectively) predicts its own future semantic activity, (semantic activity meaning, an SDR that represents a FA, or an SDR that represents an LA). each layer makes its own predictions about its future activity using the previous activity of neighboring layers, for instance, in the case of layer 4, it would use the previous activity of 6a, which deals with LA.

question: do you have an FA at time {t} predicting an LA at time {t+1}?
answer: yes, technically. if 4 produces an FA at {t}, then at {t} that FA will come in distally to 6a. this distal input can be thought as PART of the prediction of the LA that will be produced at time {t+1}.

moving on to your observation about numenta’s temporal pooler, you are completely correct. the temporal memory algorithm has no sequence memory/prediction functionality to it, it is unable to determine what the user plans to touch next, it can only build a cohesive model of what is currently being touched.

i think the confusion here stems from a misunderstanding of the place of the temporal pooler functionality in my theory: the temporal pooler functionality is not used in predicting the new (allocentric) location of an object, rather, this is done by an inference layer, specifically A.6a, or A.5b-alpha. temporal pooling layers, on the other hand, are simply in charge of smushing all the allocentric locations that have ever been felt on an object, into a single model, the orientation of an object.

I hope that helps in the understanding of this part of the theory, id be happy to answer more additional questions about this if it still doesn’t make sense.

…its funny you say that people probably can’t build a software model based on this theory, actually. i am currently in the process of finishing the architecture and code for a piece of software which will simulate every aspect of my theory in detail. i will be posting this to the forums quite soon.

Very nice Feynman quote at the end. :slight_smile:


Hey Daniel. All of this is very intriguing. I’m very interested in attempting to understand it. However, your notation is pretty dense, and the explanation is quite long.

Do you have a brief layman’s explanation you can give? The sort of thing that might fit in a paper abstract, or a press release?


Thanks for the answers. One other question before I re-read the article. You say pooling is used just to create the big picture, whether it be a model of where everything is in around you (mLE) or the current state of your body (m(FE) or the position/orientation of your body relative – I suppose – to your surroundings) or the full model of the object – m(FA).

So then apart from that pooling, everything else is ‘inference’. By that, I suppose you are thinking of the kind of learning in Numenta’s temporal memory, where the states of cells at t-1 predict the state of a cell they feed into at time ‘t’.

To learn a sequential pattern like that, in the case of a motor exploration, I would think you would be assuming a plan by the person exploring. In other words, the person decides he is going to reach for a Rubik’s cube, and turn it clockwise 4 times. If he just feels it at random, then I don’t see how he could predict what he is going to feel next. I do see how, if the cortex already has a general guess of what object the person’s hand is feeling, and has already decided where to move the hand next, that it could predict the next feature.

I read once that in the case of exploration by saccades, there is a non-random exploration - people generally look at certain parts of an image in a particular order. In that case, sequential prediction might work.

Jake Bruce asks for a very brief explanation of what you accomplished, would it be correct to summarize it by saying that you are putting the exploration and identification of an object into a context that includes the person doing the exploration, and the local environment of the person and the object?

Here is an interesting paragraph from an abstract I just found on the internet:
Prediction is well known to occur in both saccades and pursuit movements and is likely to depend on some kind of internal visual model as the basis for this prediction. However, most evidence comes from controlled laboratory studies using simple paradigms. In this study, we examine eye movements made in the context of demanding natural behavior, while playing squash. We show that prediction is a pervasive component of gaze behavior in this context. We show in addition that these predictive movements are extraordinarily precise and operate continuously in time across multiple trajectories and multiple movements. This suggests that prediction is based on complex dynamic visual models of the way that balls move, accumulated over extensive experience. Since eye, head, arm, and body movements all co-occur, it seems likely that a common internal model of predicted visual state is shared by different effectors to allow flexible coordination patterns. It is generally agreed that internal models are responsible for predicting future sensory state for control of body movements. The present work suggests that model-based prediction is likely to be a pervasive component in natural gaze control as well.

1 Like

This paper is fantastic!


Just to elaborate a bit on what would be useful (not just for me, but for solidifying the theory and communicating it to others in the future). If this summary is correct, it’s actually not that useful. It’s kind of a restatement of the definition of an allocentric location signal.

What I would prefer is a statement in engineering terms, along the following lines (I’m not sure if this is at all relevant to the theory, but it’s the kind of thing that would satisfy people looking for a brief explanation):

This theory proposes that the allocentric location signal is computed by fusing two sources of information using an SDR-intersection based multiplicative filter: 1) the current sensory input as encoded by the spatial pooler, and 2) a movement signal capable of transforming the current allocentric location estimate into the predicted location estimate at the next timestep, using a procedure described in this theory. In this way, the sensory input at time t corrects for drift caused by the repeated application of noisy motion signals.

Basically I’m heavily biased by the robotics literature on SLAM (RatSLAM uses grid cell techniques to map subdivision), and I expect that any solution to the allocentric localization problem is going to have a lot in common with SLAM techniques, since they are solving precisely the same problem. Would it be reasonable to put your proposal into the language of SLAM, Daniel?

1 Like

Hi Jake,

Thank you for your interest in my theory! i would say (although, keep in mind this is a rough approximation of what i will post later on) if i had to summarize the theory in one long sentence, it would be the following:

This theory hypothesizes a way in which objects, their location in space, their orientation in space, and the body’s state are all modeled and inferred, which results in a method for modeling objects abstractly, as well as the prediction and production of motor movements based on location and orientation of objects in space, and finally introducing a mechanism for goal oriented behavior, and the production of these goals, based on the modeling of objects and their orientations, the surroundings, and the body’s current state.

moving on to your comments about ratSLAM, i have not heard of SLAM prior to this, however from what i understand about the grid cell techniques that where used, i hypothesize that it it logically identical to the pooling layer located in an egocentric layer 2/3a, as well as possibly 3b-beta. this is because these layers within an egocentric region are hypothesized to be achieving the exact same functionality as the functionality achieved by grid cell functionality. given what i know about my theory, i have concluded that the process of modeling egocentric locations i space into a cohesive map of its environment, (or its surroundings, if you will) is actually no where near a solution to solving the allocentric location signal generation problem.

…I realize that this may be a controversial view to take on this forum, but i have come to that conclusion, mainly because i hypothesize that the act of modeling the locations of objects in space is completely useless to the act of inferring features on an object at that modeled location.

In other words, egocentric location modeling does not have any direct affect on inferring allocentric features on objects. only the inferring of allocentric locations does. it is hypothesized that this inferring and subsequently production of allocentric locations is done in layer 6a in an allocentric region, (not an egocentric region), and does not involve grid cell functionality at all, (which is equivalent to an egocentric CC module pooling layer functionality)— it actually involves inference layer functionality, (specifically allocentric CT module inference layer functionality.)

(it should be noted that in most other egocentric regions of the cortex, the phenomenon of grid cells is not found— it is only really found in the medial entorhinal cortex. i hypothesize this is because this type (see {1}) of egocentric location modeling was a necessity for the functionality of the hippocampus, however not for all other cortical tissue which used a different type (see {2}) of egocentric location modeling, (which can pretty much be thought as the typical or normal type.)

to explain the difference between the two hypothesized types of LE modeling methods in the brain, (which are functionally equivalent) i like to draw an analogy from the difference between a basic scalar encoder, and a random distributed scalar encoder.

if you are familiar with these two types of representing scalar values, they are functionally equivalent (for most usage), however one encoder ( the basic one) takes a very simple, but intuitive method to representing scalar values, and simple array of consecutive on bits, which have a very obvious way to make semantically similar values to a given scalar value: just shift the position of the on bits to be at a nearby position of the given scalar value. and then you have a semantically similar scalar value, because a decent amount of bits will still overlap between the given one and the new scalar value.

There is another, less obvious way to construct a scalar encoder: simply preserve the fact semantics similar scalar values should have encodings with a high overlap score, and then represent any scalar value however you feel is adequate, as long as it adheres to the rule above.

Given this analogy for how two different scalars may be encoded, we can apply the same idea to the modeling of locations of objects.

{1}: this is the “basic scalar encoder model” where locations in space are represented as physically near each other, in the representation of cells. this is what grid cells do.
{2}: this is the RDSE technique. representing it in which ever way is most convenient, as long as it satisfies the property of similar locations having representations with high overlap scores. this is achieved by the general pooling layer functionality in the brain, in the case of modeling locations, this is done by a pooling layer being located in the CC module in an egocentric region. it should be noted that this particular pooling layer is identical in functionality to all other isocortex pooling layers in the brain, whether it be in A or E region.

i hope this clears up some confusion about the theory, and how it relates to grid cells!


Thank you for the very kind words! i hope everything made sense in the paper! if you have any questions, i’ll gladly answer them. :slight_smile:

1 Like

Hmm. I’m afraid that leaves me more confused than I was when I started; I think I’ll have to make an effort to digest the notation in the paper.

Yes, @gidmeister, you are exactly right about the functionality of pooling layers in this theory.

  • when pooling layers are given LE information to be pooled, this can be thought as the current surroundings, or environment the organism is in (medial entorhinal cortex grid cell functionality), or in short, m(FE).
  • when pooling layers are given FA information to be pooled, they produce a full model of an object.
  • when pooling layers are given LA information to be pooled, (ie, the location of a certain feature on an object) they produce the full orientation of an object in space.
  • and finally, when pooling layers are given FE information to be pooled,(ie, the previous motor commands that were executed, or something similar to that) they produce a current model of the body. (or possibly a future model of the body, however, lets just stick with saying the “current model” for now.)

you are also right. according to my theory, everything else in the neocortex must be an inference layer. this inference layer achieves its inference by two means:

  • inferring what pattern will occur next by using memory of what patterns preceded this pattern in the past. this is the temporal memory algorithm.
  • inferring what pattern will occur next by using the allocentric location of the previous feature as the allocentric location of the new feature. this is the “layer 4” sensory-motor inference mechanism proposed by numenta not too long ago.

it is important to note that according to my theory, every single inference layer in the cortex is inferring inputs using both of these methods simultaneously, in the same set of cells in a layer.

your information on the non-sequential nature of saccades is very good- this is the very problem that necessitates the use of my theory, as opposed to simply inference based on sequences. however, the example you gave about the nature of saccades being partially sequential in nature, i think, is simply a red herring to the actual problem of sensory-motor integration, which is that: most of the time, movement and motor-exploration are not sequential.

In order to resolve this problem and infer things correctly in the face of non-repeatable sequences of movements, my theory hypothesizes that you need to model four aspects of reality: objects, (which is sort of the whole point of this) their orientation in space, their location in space, and the body’s current state.

the exact hypothesized cortical connections involved in utilizing these four aspects of reality (three, not including objects themselves) are portrayed in the paper. (as well as in the recent youtube video). those cortical connections/projections are, i think, the key to understanding how the cortex uses the four aspects of reality to model objects in the abstract, when there are no repeatable sequences to use instead.


Ah crappo. :frowning: well, in your reading of the paper, if you have any questions on the clarification of what my notation means, feel free to post a question here, and ill try to answer it to the best of my ability, without getting into too much deep theory. for me personally, if you haven’t already watched my recently posted youtube video, i recommend watching that before reading the paper. it essentially describes what the paper describes, but in a more understandable, relatable, and less detailed format.

1 Like

Thanks again for the answers. I think the basic confusion I have is this: Suppose we could build a blind robot that knows its surroundings, knows the positions of every joint in its robot arm that is holding on to an object, and knows how far that object is. Suppose it has already felt enough of the object to be almost sure of what it is.
So now what does the robot’s artificial cortex-layer predict? Does it predict its next motion? Does it predict what its going to feel after its next motion?
I apologize for not grasping this.

1 Like

To help me comprehend your theory better, I’m going to attempt to write up an implementation (will be starting on it this weekend). The process of going from theory to implementation always leads to interesting questions and further evolution of the concepts, so it should be a good exercise.

I’m curious how much of your theory you (or other readers here) might have implemented so far? Besides the notes in Appendix B, any concerns that might have come up along the way? I have to admit there are still some elements that are fuzzy to me, but I’ll dive into the paper in more detail before I start asking a ton of questions :slight_smile: