Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World



I don’t think it’s that simple. Don’t forget that after you have switched to a new object and begin exploring each of its features (or explore an unexplored area of a know object) that each new input is also completely unexpected.


Not completely unexpected.
Once I have a finger on an unknown object, I have some set of pairs [feeling, location] I can reasonably expect. Let’s say I make you lay your finger, blindfolded, on something which seems a plastic surface. There is a range of feelings you wouldn’t be surprised to feel while moving your finger one inch to the right (still the plastic surface, an edge, and so on); those would make you think you’re still touching the same object. If instead you were to feel a completely different feeling (for example, a soft fur) you would imagine you probably landed on a second object (a dog?)


Only based on other objects that you have previously learned, though (which the term “reasonably expect” implies). How does a new system learn those reasonable expectations, though? Also, what about objects that don’t conform to what is reasonably expected (maybe I have encountered a strange, plastic, furry mug…)

I don’t necessarily disagree with you that the concept of “reset” may be necessary, but this does warrant some further exploration to see if it can be addressed another way.


In your paper you propose some set of neurons send location information the result of a complex calculation. Wouldn’t be enough if the (in the case of a finger) the movement command were sent? If we are exploring a table top surface there are four simple command move left right, forward or backwards. Which the local column can deal with without external computation.


The problem is that many sequences of movements can bring the sensor from one location to another, and it would have to learn the same things for each of those movements.


Probably not so simple. Most likely the commands are relative, not absolute (if you are right handed, try writing left-to-right with your left hand. Difficult. Now, try writing with your left hand right-to-left: easy. This is an indication that movement are coded as “medial-lateral” instead of “left-right”).
Also: "moving the finger along the edge” (an example of a codified command) has different executions depending on the shape of the edge, the size of the object, and so on.


Apologies for resurrecting an old thread - the forum rules don’t explicitly discourage it and I only just got around to reading the paper. :grinning:

My question is related to these quotes on page 9:

Thus, allocentric locations must be computed at a
fine granular level, and therefore, locations must be computed
in a part of the brain where somatic topology is similarly
granular. For touch, this suggests location computations are
occurring throughout primary regions such as S1 and S2. The
same arguments hold for primary visual regions as each patch
of the retina observes different parts of objects.


These analogs, plus the fact that grid cells are phylogenetically
older than the neocortex, lead us to hypothesize that the cellular
mechanisms used by grid cells were preserved and replicated
in the sub-granular layers of each cortical column.

Given that non-mammals appear to possess navigation abilities that include remembered landmarks, and assuming this requires an allocentric location system, is part of the hypothesis that these mechanisms have been similarly preserved in other phylogenetic branches? Alternatively, is it possible that the location computation abilities present in the old brain of the common ancestor are sophisticated enough that they need not require a neocortex?

Warning: I’m not a biologist so I may have misunderstood this part entirely.


The short answer to your question is, yes, we believe that the mechanisms used by grid cells existed prior to mammals. We have no idea how many phylogenetic branches use the same mechanism. But because the mechanism is so clever and somewhat complicated I would imagine it extends a long way back in time. (Didn’t dinosaurs have nests, and wouldn’t they need to know where they were and how to return?)

However, It is also likely that some animals have evolved different solutions to this problem. I don’t know, but I guess that some insects that know where they are use different mechanisms.


Thanks for the response Jeff, that makes sense.

I found an interesting article on this: Neuronal implementation of hippocampal-mediated spatial behavior: a comparative evolutionary perspective. (pdf)

In examining the similarities and differences between pigeon HF gridlike cells and rat entorhinal cells, they appear to have arrived at much the same view.


jimmyw, that was an excellent review paper, thank you for pointing it out. They speculate that the basic mechanisms that an animal uses to know where it is and to navigate have been preserved from a common ancestor to both birds and mammals, and may go back as far as 300M years. They looked at cell responses to explore this idea. As I said, I reached this hypothesis based on how every animal has to solve this problem and the complexity/cleverness of the grid cell solution. It is hard to imagine another solution could exist that solves the same problem. It could, but seems unlikely. thanks again


2 posts were split to a new topic: Gilling theory of cortical layers

Gilling theory of cortical layers

This paper is a pre-print posted in July. Is there a more up-to-date version coming?


The July date is the first date we posted to bioRxiv. The latest addition from September is what will be downloaded. Click the info/History tab to see the history. The final approved version from Frontiers is not posted but is very close to what is on bioRxiv.


If the objective is to represent 3d objects, then the CAD world has several methods to model these objects with XYZ cords or your lateral layers and implied columns. The thesis that 3d models make a good representation of the world is obvious given our ability to see things in 3d. Note also that if you were to think in terms of interpolation functions, the curvature may be represented by 3 nodes along a column.k


I find the cup example a little jarring when contrasted with what I know about localization of function in the brain.
This would imply that “cup” and “pen” are distributed across all sensory modalities; ALL instances of “cup” and “pen.”

It would seem that this is concentrating too much parsing to be pushed out to the primary sensing areas.

I do accept and welcome that there is a counter-flowing stream from “higher” areas that are cooperatively working to shape perception. (Examples: filling in visually & cocktail effect in hearing)

I expect the pressure/deformation primitives at the raw sensory area would be more something like “curved convex surface” that is passed up to the associative areas to be combined with joint positions, texture & temperature perception to form “large hot cylinder” and “small cool smooth oblong object;” with processing much further along doing naming and associative properties such as “white cup” and “blue pen.”

This distribution of function more closely matches what I have read about various deficits resulting from focal damage. I don’t have a reference ready at hand but am fairly certain that this could be confirmed with minimal effort.


Hi Mark,

I agree that this “seems” to “concentrate too much” on primary regions. This was a surprise result of our discovery and it goes against what most neuroscientists would say, including myelf in the past. That is one of the nice things about this theory, it is surprising and testable. In some sense the discovery of “border ownership cells” is a solid validation of the theory. Also the observed "shifting of receptive fields in V1 and V2 are predicted by the theory. One can argue that border ownership and shifting rfs are achieved via hierarchical feedback, although that seems unlikely. Consider also that primary and secondary sensory regions are the largest in the cortex. E.g. monkey’s have a visual system equal to ours. In the Macaque, V1 and V2 are about equal in size and represent about 25% of the entire cortex. One could naively say that 25% of everything a Macaque knows about the world is stored in V1 and V2.

As we point out in the paper, each column can’t learn everything due to its limited input space, but they can learn a lot more than people currently imagine.


What is needed here is something like Shannon’s information theory to add a measure of what and how much.

There is some sort of “representation” and “transformation” that happens as you pass from map to map but without some vocabulary and measure it is hard to talk about what is going on.

We are reduced to saying “a lot more than people currently imagine” without a means to quantify it.


Hi all,

I have found an unsupervised method of forming viewpoint invariant representations in the output layer’s mini-columns. The method is implemented and tested on random/fake data.

The full write up is at:



Please discuss @dmac’s “Proof of concept Basa Ganglia” on the associated thread.



According this, the “time scale” of each layer varies greatly across the vision cortex (from 20ms to 10sec). Do you think that it could be possible to reconcile it with this experimental results? I’ve been thinking about it, but I haven’t been able to.