Preliminary details about new theory work on sensory-motor inference

I am back from traveling. I will try to answer a few questions prior to our office hours tomorrow. I will work from Fergal’s list of questions.

We haven’t yet settled on the language to use for the new theoretical ideas and in the recording I didn’t define my terms carefully. A mini-column is about 100 neurons in a very skinny column that span all layers of a region. They are about 30um wide and 2.5mm tall. Mini-columns are a physical reality and we have proposed a function for them in the spatial pooler and temporal memory algorithms. The output of the SP is a set of active mini-columns. The new theory I describe in the video does not change anything about mini-columns. We assume they are still there and performing in the same fashion. What is new is we are modeling L4 and L3 cells in each mini-column, whereas the TM models just L3 cells. L3 cells get almost all their input from other L3 cells (which is what we need for sequence memory). L4 cells get 65% of their input from L6a, some from the equivalent “where” region, some from L2, a few from other L4 cells.

The TM requires a set of mini-columns to work. We typically use 2048 mini-columns, if you go much below that then some of the properties of SDRs start to fail. A couple of thousand mini-columns is the smallest amount of cortex that can actually perform TM. This is roughly equivalent to a patch of cortex 1.5mm x 1.5mm. We didn’t call this a “column” or anything else.

In the new theory we are sticking with the same basic dimensions, just adding more layers. I have been thinking of touch by imaging a small patch of sensors on the tips of each finger. Each patch would feed into a 2048 mini-column patch of cortex. I am pretending there are no other sensors on the hand. This is a simplification, but I believe it keeps the important attributes without throwing away anything essential.

So we now have multiple 2048 mini-column patches of cortex, one for each finger tip. We need a way to refer to them. In the recorded talk I just referred to them as “columns” but we may need a better term. These columns are roughly equivalent to barrel columns in rat.

The important attributes of the “column” are it receives a bundle of sensory input bits and that all these bits are run through a spatial pooler. There is no “topology” within the column, or put another way, all the mini-columns in the column are trying to inhibit all the other mini-columns. This makes it much easier to understand and model, yet allows us to build systems with multiple columns.

Yes. We are trying to understand a 1.5mm x 1.5mm patch of cortex with all layers. That is the goal. The hope is all the important functions of a cortical region are represented in this small patch. It is well known that cells in some layers send axons longer distances within their layer. This occurs in layers 3a, 2, 5, 6a, and 6b. The idea we are pursuing is that the representations in these layers can be a union of possible values and that the “inter-column” projections are a way for multiple columns to reach a consensus on the correct value.

I am sorry if I was confusing on this matter. The model assumes that L4 cells receive a “location” representation on their basal dendrites. Our current best guess is that this location SDR is coming from L6a (again L6a is 65% of the input to L4). L6a and L6b are massively interconnected to the equivalent layers in the equivalent region in the “where” pathway. The basic idea is cells in the “where” pathway represent a location in body or sensor space, this gets sent to L6a in the “what” region (the region we are concerned with), L6a converts this body-centric representation into an object-centric representation. Similarly, a location in object-centric representation is passed to L6b which is sent back to the where pathway. Somewhere along the way it gets converted to body-coordinates.

If you think about what it takes to move your fingers and to predict the next input you realize that the brain has to continually convert between body-centric to object-centric coordinates and vice-versa. This need has well known in the robotics community, all we are doing is bringing it to cortical theory and trying to understand the biological mechanisms.

There is a lot we don’t understand about the location coordinate transformation. In one possible implementation it requires a transformation of the sensory input as well as a transformation of the location representation. IF that is true I propose it is happening in L4 itself. Over the past few weeks i read some papers that suggest we learn objects in body-centric coordinates but then we mentally “rotate” our models to fit the sensory input. This is more than I can write here. We can leave it as transformations have to occur continually and rapidly yet there is a lot we don’t understand about it.

Yes, that is my understanding. This new theory that a column is actually a model of entire objects provides a simple explanation for what has been a mysterious phenomenon.

I didn’t understand this question.

I know. The specificity people report is somewhat contrary to very notion of common cortical function. When I think about this specificity I don’t think it is wrong, but I suspect it is misleading. For example, V4 is often associated with color processing. But input to V4 had to come through V1 and V2, the data was there all along and was also processed in V1 and V2. It is not possible that V4 processes color and V1 and V2 don’t. The biggest problem with most of these studies is it is hard to find cells that reliably fire in response to a stimulus. To get around this problem the animal is often anesthetized, and/or the stimulus is made very unnatural. The simplest example is that cells in V1 behave completely differently when and animal is awake and looking at natural stimuli than when not awake and exposed to gratings.

I am looking forward to more discussions on this topic.

9 Likes