How HTM is supposed to deal with spatial invariance?

paper on vision in primates

Large-scale, high-resolution comparison of the core visual object recognition behavior of humans, monkeys, and state-of-the-art deep artificial neural networks

1 Like

Spin,

The entorhinal cortex is cortex. One of the takeaway lessons I have gotten from books like “On Intelligence” is that cortex is cortex. The distributed grid representation is clearly spread over a very large chunk of the entorhinal cortex. Why should this computing pattern not be found elsewhere in the cortex?

The older allocortex in the hippocampus communicates with the neocortex and distills this representation into chunks like “places.” That strongly suggests that the place information is in the cortex distributed over this grid pattern to be sampled. Hence the claim of a distributed representation in the grid cells. SDRs are perfect to get a little taste of several different intermingled distributed representations from many sensory modalities and combine them into a meaning.

I have many reasons think that there are other codings like “place” in the hippocampus but that they are not as easy to stumble into by monitoring free-roaming critters. It may be a real stretch to suggest that other parts of the brain use a grid pattern but consider this: we were able to sample grid cells because there was a strong relationship to something happening in the cage to correlate with. (See reference [7] above) I don’t think I have seen anyone looking for this sort of property in other parts of the cortex.

I find distributed representation in grid patterns to be a delightful solution to the coding problem. Even if this turns out to be non-biological for most of the cortex it codes very well in neural networks. The ratslam people are doing very interesting things with this. In the H of HTM, we need to get from a point in the input pattern space and spread it out to interact with other parts of the pattern - a distributed representation. Humans have trouble visualizing this as we see in 2.5 D. (2D with stereo cues & color) Once we go from the highly topographic V1 cortex it gets hard to conceptualize how the data gets blended together. The V1 cells look like a topographic pattern because the retina feeds it that. Once we get into progressive visual maps it gets less clear what is happening to the representation. I can see how a sort of FFT transform could be spreading and combining different frequencies of sampled data across a distributed pattern in these maps. By the time we get to the association areas we have multiple streams of senses blended together. What is the lingua franca? Grids would work as well as anything here.

I agree that this is highly speculative and that it may be premature to assert that this is the ground truth but it does fit very well with many years of reading about this sort of thing - it just feels right. Win or lose I will be researching to attempt to learn if this is how the brain does it. I strongly suspect that the answer will be this or something very close.

1 Like

Bitking super interesting post. I expect it will take me years to understand. Thanks.

It took me years. On the other hand - nobody showed me what to look for.

Frankly, I can’t find the purpose of using the periodical grit everywhere in the cortex. I was charmed by the hexagon magic of grid cells too, but after spending reasonable time in thinking about it, I didn’t find any use for them outside the of creating a virtual space, what is super important for some modalities on the lowest levels of hierarchy, but how could it be useful in all other cases?
BTW, even the founder of the grid cell Edvard Moser, who continues his work in this field, quite often speaks about the importance of place cells for the memory mechanics, but not grid cells (he also sad the idea of FFT as the key mechanics for grid cells hadn’t been confirmed).
I agree, that we are still on the early stage of collecting facts in this field and it’s very complicated and time-consuming to conduct research here, so some important information can be omitted. However, how the brain could use such grid as a general base for everything else? Perhaps, you could provide a simple imagining example for such mechanics for some nonspatial patterns to get the flavour of your vision?

1 Like

Consider the entorhinal cortex and what is being represented: a mixture of the vestibular system, visual representation, self-motion, postural feedback. Perhaps more. When you look at the repeating patterns where do you see any of that in the distributed dispersal that we choose to call a grid system?

I see that as the output of a common digestion system that ends up in a “common format.” The grid cells under discussion are a byproduct of explaining the coding that surrounds the hippocampus. Even though there are some very interesting theories to be best of my knowledge nobody really is sure of how that grid pattern is formed.

Looking at the dizzying maze of fiber tracts that connect the maps in the brain I have often wondered about the palimpsest of patterns that lay one on top of another in these many tracts. Consider the areas that receive the foveal vision as the eye darts over a scene, bit by bit - color, texture, stereo disparity, edges - all overlapping on the same maps. Within a few layers of maps that stew is mixed with the digested versions of body sensors (vibration, temperature, joint position and muscle stretch) to guide end-effector motion in space in the association areas.

The same general thing is going on in the auditory tracts.

Fat fiber bundles connect these diverse association areas and communicate some sort of meaningful information.

It gives me hope that if we can tease out how the wildly dissimilar information is combined in the grid maps that perhaps the same general process is working in everywhere in the brain.

You call grids a specialized navigation system - I see it as a possible general coding scheme that has been demonstrated to work in part of the brain. I will say it again - if the cortex is a general fabric that is uniform throughout then working out how part of it functions could apply to all of it.

Ask yourself - if you were going to take all of these wildly diverse sensory organs and apply a common process to turn it into a maximally dispersed (and compatible) coding to gain the best resistance to decay over time - what format would you expect it to end up as? For bonus points use a fabric of HTM style columns that do local processing to do this job.

1 Like

I really support everything you’ve just sad, but I see generalized place cells in this role, not grid cells. Again, what could it be the reason to use the 2D grid for auditory modality and how can it be applied on abstract levels like the meaning of the speech?

Spin, because hearing is binocular. We need to place a sound on a 2D grid left/right and up/down. Exactly as we map vision onto a 2D grid.

To amplify this point: a next level of representation, speech sounds, can be arranged in a 2-d mapping arrangement.
http://www.animations.physics.unsw.edu.au/jw/voice.html

Something to think about in comparing place cells and grid cells: the place cells are in the older 3 layer tissue we call the hippocampus.

The Grid cells arrangement is found in the entorhinal cortex which is the same six-layer structure found throughout the neocortex. It does differ in the lack of cell bodies in layer 4 but it is more like the rest of the cortex than the hippocampus is.

I accept that the reciprocal connection with the hippocampus may make this a special case but I will be looking for similarities with the association areas for the reasons I mentioned in prior posts.

So, people, who can hear only with one ear need another organization of the corresponding part of the
cortex?
We don’t place sound to space, we create a model of the world, using all available information, including sound. We need only one grid to support this space, not several separate for all modalities.

I was talking about meanings, not sounds. Nevertheless, even for mapping sounds, I think, we don’t need so complicated system of grids, because it’s only the time dimension and everything else for each moment has one own dimension. There aren’t directions (for the sound itself), 3D (or 2.5D if you wish) space, the necessity to deal with the scale of objects, etc.

Well, @jhawkins claimes every minicolumn has some analogues of place cells, and it sounds very logical to me because we know, that after destroying hippocampus we lose the capability to create new memories, but not the existing memories. So, our brain supports space/time-related interrelationships without the hippocampus, what is possible only with some structural elements (cells) performing this role in the cortex.

I won’t argue that ALL areas signal in grid formations. I suspect that this is the format used in the association areas. That means that the maps closest to the senses are speaking in the language of whatever sense is hooked up to it.

Where it gets interesting is the one or two maps between the sense and the association area - what is going on there?

Due to the hippocampus’s special relationship with the temporal lobe and episodic memory formation, there may be a more direct connection with the grid structure in the entorhinal cortex.

As far a “place cells” in the columns - I don’t recall ever seeing that in any of the papers I have read. Are you aware of anyone that has actually demonstrated that function in the cortex?

Let’s do a bit of a deep dive on this - the representation of sounds as a time sequence. This leads to more general questions that seems relevant to this conversation.

In programming of computers, we can assign a space for variables, pull them apart and bring them back together as needed for various operations.

In neural tissues, we don’t have this luxury. The various sensory streams come in on the same stream with successive waves laying one atop the next. In vision, we have the overall scene being recognized in an early part of the vision system with directions being tapped off to the frontal eye fields moving the foveal vision to and fro picking up select features of the scene in turn. In the center of VI, we have successive presentation being layered one right on top of the next - perhaps dozens - to build up knowledge of the current environment.
In the hearing, we have combinations of frequencies and phase shifts that are combined into sounds, parts of speech, sentences, up to communications that span several sentences. We are able to parse out meanings that span the entire length of an utterance so simple word combinations are not enough to convey meaning.

I have been thinking about this exact point for many years - looking for possible mechanisms that are biologically plausible and capable of holding this information in a palimpsest.

One possibility is that the stream is parsed into a distributed form of some sort where orthogonal information is spatially separated into regions so that a buffer map can hold more than one piece of information at a time. I envision these islands of meaning as cow spots. William H. Calvin describes them as the Cerebral code.[1]

Where this runs into trouble is that SDRs can only reach so far. The SDR mechanism is strings of synapses along a dendrite. If we had patches of meaning scattered here and there around a map then it would be non-biological to expect dendrites to connect this information; the patches are simply too far apart.

How else can we distribute the information so it is available over some larger space, and still not interfere with other information present at the same time?

If each perception went through a “hashing function” that spread it out into a distributed form it could make many little islands of information that reliably signaled a certain ground truth about what was sensed, and this spread-out form could co-exist with other hashed representations simultaneously. A sparse distributed representation.

A SDR neuron could learn a local combination of these islands of meaning that represents some fact about the sensed object.

Coming back to the entorhinal cortex we have an actual experimentally measured pattern that satisfies the conditions I just set out. We know that the hippocampus does, in fact, pick out place information from whatever this distributed islands of information we call grids is signaling.

When I encountered papers describing grids I saw exactly what I was predicting to solve this theoretical problem in exactly the form I was expecting. This is much too strong an alignment of Marr’s level 1 and level 3 problem spaces to be ignored.

This is the very much the same problem I expect to solve in other areas of the brain and as I have stated several times now - since the cortex is able to use the columns to form this grid pattern here - and the same columns are present in other areas - it is highly suggestive that these other areas are doing the same thing.

If not actual grids than something very much like them.

COMPRESSING THE CEREBRAL CODE - William H. Calvin
http://williamcalvin.com/socns94.html
http://williamcalvin.com/bk9/bk9ch6.htm
The entire book:
http://williamcalvin.com/bk9/

1 Like

I didn’t read it personally, but @jhawkins refers to the related researchers here https://youtu.be/yVT7dO_Tf4E and he is always very careful with such stuff.

1 Like

Do you have any proves it works literally this way? And the same question about hearing. We definitely need a buffer to be able to inhibit odd patterns, but I doubt it works at the lowest level.

It sounds like some kind of hierarchy - do you see any fundamental difference here?

I think in general we are talking about the same thing - some neurones which are responsible for interconnections of different pieces of related patterns, I’m just confused by your references to the highly periodical nature of the space created by the grid cells.

Consider that we have measured and know with a high degree of certainty that the cortex in the entorhinal cortex region codes for a distributed representation that we are calling grids,

It complicates things to assume that different areas use some other method to mingle streams of processed sensory data together for SDRs to sample and learn. Maybe there are many other forms but I am applying Occam’s Razor and assuming that this is a gateway to understanding a fundamental coding method of areas that communicate with older brain structures and perhaps between different association maps.

Matt Taylor made this delightful example to show what this might look like:

1 Like

Just look at it from another perspective: all parts of the cortex are organized in the same way, but different regions are responsible for processing different kinds of patterns, and the entorhinal cortex region is responsible for space patterns, that’s why we can find “grid cells”, which are actually usual cortex cells but activated by the virtual grid space patterns.

The cortex is not activated by a grid pattern but forms the grid pattern.
Look here to see how the columns do this feat with all the elements of SDR neurons.
It’s pretty much a one-to-one mapping. Please look at slide 45: note each column’s dendrites sampling around the area and picking up some learned local part of a pattern and the winners inhibiting neighbors. This is all HTM canon law.

1 Like