Spatial Pooling, SDRs, and Grid Cell Dimensionality Phase Transitions

I’ve noticed lately in the research streams that Numenta seems to be having a hard time reasoning about the properties of the models that the HTM algorithm generates. I came up with a fairly straightforward model of it a few months ago, but figured it was simple enough that Numenta probably had come across it. After the recent research streams however, and how much @jhawkins has been stressing about not understanding the origin of grid cells, I’m starting to think this may not be the case.

I’d like to first point out that my reasoning here is based mostly on some basic properties of spatial pooling, and relatively few other assumptions. As for hard evidence, I don’t have any quite yet, though I plan to push some NuPIC models through UMAP, which I’m pretty sure should allow for most of the phenomena I describe here to be directly visualized. I’ve been a bit busy as of late however, and so it’ll be a little while before I have the time. If someone else wants to set up the visualization for me, I’d be happy to discuss what that would take. It should be fairly easy; I’m just short on time and am not great with Python.

I hope this solves some problems Numenta is facing.


The Inner Workings of Spatial Pooling

Imagine the space of possible input SDRs to a macrocolumn. The input to the column is some point within this space, which moves each timestep. This, over many timesteps, traces out a complex path through the input space. The space of observed inputs is, in practice, a subspace of the total input space.

The spatial pooling algorithm is fundamentally an algorithm for distributing Receptive Fields (abbreviated RFs) across this space. Each time step, it performs a k-nearest-neighbors search across all RFs in the macrocolumn (ignoring ambiguity, unions for now), and activates the corresponding neurons/minicolumns. Hebbian learning is applied to active neurons, moving their receptive fields toward the input point. Boosting can attract RFs that lie far outside the input subspace until they lie within it.

The subspace mapped by the RFs can be thought of as the subspace of the model.

This has several effects:

  • Eventually, all RFs overlap with the input subspace.
  • Eventually, the model subspace overlaps the explored input subspace completely.
  • The density of RFs in a given portion of the input subspace is correlated with how much time the input point spends near it (more frequently visited regions are modelled in more detail).
  • RFs can be thought of as “landmarks”, keeping track of common reference points in the input subspace.
  • The selected RFs in any given timestep form an SDR that serves as a coordinate, representing location in the model subspace based on providing a set of nearby landmarks.

A Brief Aside on Temporal Memory (not directly relevant, just related and interesting)

The role of Temporal Memory in this interpretation is not to model the coordinate space, but rather to model paths commonly taken between landmarks. Within a TM layer, each minicolumn represents a landmark within the input subspace, which each neuron in that minicolumn represents a set of possible paths that may have been taken to that point. A TM layer encodes temporal sequences as “path spaces”.

The extra information encoded in an SDR where active minicolumns fire normally as opposed to all active minicolumns bursting (encoding no temporal context) encodes the temporal context as a fuzzy intersection of path spaces. Each neuron has its own path space that it responds to. These spaces may be somewhat large and ambiguous, however the effect of adding more neurons/minicolumns to the representation is to generate a “true path space”, which can be thought of as the space of possible paths that lie at the intersection of all the path spaces provided by active neurons.

As more path spaces are contributed to the temporal context, they constrain the space of possible paths toward the current input even further. In the case of a minicolumn bursting, it contributes “all possible paths”. The intersection between a path space P and the “space of all possible paths” is of course P.

The introduction of non-temporal information into prediction (apical dendrites, etc.) simply shifts the model to including other information into these paths. In this case, the temporal context may not consist purely of path spaces, but also perhaps “action spaces”, or “column voting spaces”.

For example, a minicolumn with the majority of its distal connections coming from motor commands rather than local connections would base its predictions on motor spaces; the predicted neurons are those whose “motor paths” overlap with recent motor activity.

GCM Dimensionality and SDRs

Suppose we want a spatial pooler to map out receptive fields across an N-dimensional manifold. The total number of landmarks to exhaustively map the space scales with the volume of the manifold (adjusted for density). As volume scales exponentially with the number of dimensions, this very quickly demands far more RFs than we actually have!

Given an input space of say, 1000 bits, an input that traces out the full 1000-bit space with an even distribution, and an RF for every point in that space, we’d expect HTM (in theory) to eventually evenly distribute all these points, making a fully saturated map of the space, with 1 bit in the output SDR for each of the 2^1000 possible inputs. (This is of course just a theoretical statement about the general approach to distribution, and I am of course ignoring sparsity here.)

However, SDRs have some interesting properties that come into play here. Any given landmark has a corresponding coordinate in the space of input SDRs (the center of its RF). Nearby landmarks will have similar SDRs, or else they wouldn’t be nearby (nearby meaning they are similar enough that an input point near one will often activate both).

Suppose there is a collection of grid cells forming a 1D basis vector. We can expect the neurons along this basis vector to have significant overlap with the points in the manifold that project to them. In this case, we can expect that, lacking sufficient neurons to properly fill the manifold, a more achievable local minima for the spatial pooler would be to fill points along just the basis vectors. The grid cells along these projections would likely find themselves centered on an SDR at roughly the average of all the points that project to them.

Of course, exactly how many RFs are available to perform this mapping depends on many RFs are available to the macrocolumn, and how many are being used to map other portions of the input subspace. If there is just one manifold to map, the macrocolumn may have hundreds of RFs to throw at the map, and perhaps may even be able to form a 3 or 4 dimensional grid cell module if necessary (this would likely be somewhat rare). If the manifold has a larger number of dimensions, a more achievable local minima would be to create grid cell modules of slices or projections of the manifold. A 7-dimensional manifold, for example, may be more easily modelled with a combination of two 2D GCMs and a 3D GCM.

The input subspace may also contain several disconnected manifolds, which would need to be modelled separately. For example, suppose we had two manifolds, A and B, and the input spent 70% of its time in A, and 30% in B. We’d expect to see HTM model A with ~70% of its available RFs, and B with ~30%. We’d likely expect to see a larger number of 2D GCMs in A than in B, assuming that A and B have a similar number of dimensions. This may not be the case however, as A could perhaps be very high-dimensional while B is very low-dimensional (2 or 3), meaning that while A has more than twice as many landmarks available for mapping, it would also have an exponentially larger space to map.

It’s also worth noting that higher-dimensional GCMs may not be regular. Manifolds with more than 1 dimension may frequently contain holes that cannot be mapped (for example, while a collection of GCMs may be capable of representing a location inside a wall, you may find yourself physically unable to occupy that space). In this case, higher-dimensional GCMs will likely not allocate RFs to such locations. High-dimensional GCMs may, as a result, look less like grids, and more like a collection of landmarks.

Place Cells and Phase Transitions

While a high-dimensional manifold may not be able to be fully mapped without resorting to lower-dimensional projections, there may be regions of the manifold that are visited frequently enough that the spatial pooler allocates an entire RF to it. In this case, the resulting RF isn’t found along a basis vector or projection, but rather at a specific location. Considering this to be a grid cell may not make much sense, and it makes much more sense to refer to it as a place cell.

While mapping a particular space, the spatial pooler may find itself with a spare RF to reallocate. Perhaps it decides to boost an RF serving as a landmark in a region of the input subspace that hasn’t been visited lately. Slowly, such RFs would be reallocated to mark common locations inside other manifolds. A 2D manifold represented by a pair of 1D GCMs may slowly be filled with place cells if the input point begins to explore it more heavily.

Eventually, as the pair of 1D GCMs transitions into a 2D GCM, the original 1D GCMs would eventually become active less frequently, as the new 2D grid cells would be closer to the input point. Eventually, boosting would cause these now redundant 1D GCMs to be repurposed toward mapping other parts of the input space.

This process can be thought of as a phase transition from one dimensionality of GCM to another.

Phase transitions may also work in the opposite direction, reducing dimensionality of GCMs. If a manifold mapped by a 2D GCM is less explored, boosting would repurpose its RFs to map out other portions of the input space. Eventually, as the 2D GCM would begin thinning out, the remaining RFs would find themselves moving to a different local minima. They would be expected to move toward the 1D basis vectors, being the new local minima.

Sparsity, Unions, and Manifolds

When inputs become more ambiguous, SDRs become less sparse, and begin modelling a variety of different potential objects as a union. Properties of SDRs allow for these different objects to still be differentiated. However, each object in the union can also be seen as a union of yet simpler objects. Even in cases where an SDR might not be thought of as being in a union state, it can’t really be said for certain that it is a single object.

Arguably, in much the same way that the context component of an SDR can be thought of as a path space, the spatial component of an SDR (which RFs are active) can be thought of not as a specific object, but rather as a space of possible objects (the space of all possible input SDRs similar enough to each other to generate this specific output SDR).

However, in much the same way that different path spaces intersect, these object spaces intersect too. A given active RF can be thought of as the representing the space of all possible objects close enough to activate it. Multiple active RFs together represent the space of possible objects that would activate all of them at once. As more RFs are added, the intersection shrinks exponentially. If the sparsity of the input SDR is limited, the sparsity of the output SDR will be limited as well, as adding too many neurons would quickly shrink this space down to nothing.

More activity in the input creates some complex behavior. Because a k-nearest-neighbors selection breaks down as an accurate model of RF selection during unions, the interpretation of intersecting object spaces changes (this applies to path spaces too).

In this case, rather than the object space being the intersection of the object space of each contributing RF, it is instead the space that lies at the intersection of at least k contributing object spaces. If the input SDR is a union of 2 SDRs, we can expect 2k contributing object spaces, and roughly 2 disjoint spaces. Sometimes these spaces will overlap. This starts to become a significant factor at the same time that the SDRs in the union begin overlapping too much to be differentiated.

As more objects are added to the union, the object space begins to expand. While this allows one to represent more points in parallel, this also increases error rates. If too many objects are added to the union, the object space becomes saturated and nearly the entire space lies within it.

However, we can also go in the opposite direction as well. Individual objects can be thought of as a union of simpler object spaces. They are close enough to be considered more or less the same object (roughly k active RFs, perhaps somewhat close together), but aside from that, they behave similarly.

In this case, an object is a collection of its features, and its features can change independently of each other. A feature may lie at the intersection of several projected GCMs. As the input changes, the activity of the projected GCMs changes. Temporal memory would be able to predict each independently, based on context and overlapping context spaces.

However in other cases, features may interact. Applying temporal memory to these manifolds can be thought of as turning them from simple coordinate spaces to attractors, or as projections or holograms of a phase space. While in some cases, the previous state of just that one GCM may be sufficient for accurate predictions within it, it may also be just a projection of a larger manifold, and thus be closely connected to several other projections. In this case, you’d expect to see different projections/basis vectors of a single manifold strongly connected to each other in their temporal connections.

If you’re tracking a moving ball, its x coordinate at the next time step won’t just depend on its previous x coordinate; it’ll also depend on its previous y and z coordinates, as well as perhaps other factors such as velocity vectors and other objects in the environment it could collide with.

Navigation and Landmarks

Now for something completely different. A higher-level perspective.

Think about how you navigate a familiar space. The most important objectives in navigating a space is, of course, to figure out your location, and determine which direction to travel in.

The way this typically operates in humans is to base your location on the relative locations of familiar landmarks. If you sit at your desk, you may see your computer, your coffee mug, and the surrounding items around your desk. If you turn your head, you may see the window, and perhaps a very recognizable building down the street.

Being a location you find yourself in often, you can expect your brain to have place cells dedicated specifically to your office and common points of interest nearby - your computer, your desk, the window, the building down the street that can be seen through the window.

If you suddenly found yourself transported from home to your office, you’d be able to quickly determine from your surroundings where exactly you are. Presumably, there should be some pretty strong associations between the place cells for your office and cells representing these familiar experiences.

It’s likely that the way that humans perform path integration is different from how computers perform it. While they may find optimal paths, computers will often use pathfinding algorithms that do so through what is close to a brute-force search. It’s clear that this isn’t the approach that humans take; we don’t find ourselves thinking through every possible path (far from it), and often prioritize paths that are familiar or easy to think of, rather than paths that are absolutely optimal.

Path Integration

Think about how you make it to your office. You don’t plan out every possible step; you remember familiar landmarks, and remember which direction to take from each landmark to get to the next. The circuitry required for this doesn’t demand any expensive search algorithms; a sequence of landmarks and associated actions/directions to take from one to get to the next.

If you find yourself asking for directions to an unfamiliar location, you don’t ask for a precise step-wise path. You ask for a sequence of recognizable landmarks, and the direction to take to get from one to the next.

Place cells, marking landmarks, can be expected to be fairly robust. The place cells associated with your office don’t exist in isolation; they’re strongly associated with nearby items. If you find yourself in your office but find your desk missing, you’ll still know exactly where you are when you look out the window and see the same building down the street.

It can be expected that more abstract spaces can be mapped in similar ways. If you are trying to solve a problem, you have a set of familiar sub-problems you expect to face. In these cases, place cells may not be locations in space, but rather specific situations and problems. In much the same way that you map out your commute based on familiar landmarks and where you expect familiar actions to take you from each one (essentially breaking path integration into a collection of smaller, simpler path integration problems), you solve general problems by breaking large problems into smaller problems that you are more familiar with.

When you come across a common problem, you can quickly recognize a collection of familiar properties it has, and have a library of known actions and strategies for breaking it up into smaller pieces, and how you expect each problem to affect it. You can run through common sequences of events to imagine how each strategy will break it up, and look for familiar consequences that suggest ways to break it down further.


Wow, that’s a bit dense. I’ll try to add some visual aids if I can find the time. Otherwise, a wall of text is at least better than nothing.

I’m open for discussions on this. I hope it leads research into a useful direction.


Edit:

I figure a simple and less technical explanation (TL;DR?) might be worth putting here to more clearly explain the point of this thread.

Numenta has been trying to understand lately on how the cortex models abstract concepts; we can tell it clearly uses grid cells to model locations in physical spaces, but what about abstract spaces like language and math? When it does use these grid cell modules, does it use 1D GCMs? 2D GCMs? If it uses 1D GCMs, why does it appear that it uses 2D GCMs to model physical locations?

This thread is my attempt to explain how a little exploration into how the spatial pooler and the math of SDRs interact leads to an answer to this. It’s a complex answer, and suggests that the brain can build arbitrary N-D GCMs, but only if it has enough receptive fields (the number of RFs needed for a detailed, “optimal map” would scale with the volume of the space, and thus exponentially with the number of dimensions).

If it doesn’t, it opts for a combination of place cells and lower-dimensional projections. If the dimensionality of the projections gets low enough, they start to look like 1D and 2D GCMs. The mechanisms of the spatial pooler (really hebbian learning in general, since this kind of applies to the temporal pooler too) should even create phase transitions between dimensionality of GCMs as the network explores certain parts of the input space.

I also made a point that if you look at how humans navigate these spaces, and the role that grid cells and place cells seem to play in this, the strategies humans use for navigating physical spaces seem to match well with strategies used for abstract problem solving (navigating abstract spaces). In fact, it seems to me like the mechanisms to perform navigation of complex abstract spaces would actually be fairly simple based on these observations (mostly just a matter of replaying learned sequences, with a little reinforcement learning to find good sequences), though that’s a conversation for a different time.

7 Likes

I like that you’re thinking with manifolds. Manifolds are a good way to analyze these problems.

Something to consider about the Spatial Pooler is that the exact size of the Receptive Fields is controlled by the input manifold. This is good if you are trying to represent sensory features, which are locations on the input-space manifold.

However grid cells are not representing locations on the input-space manifold, they are representing locations in the physical world (which is a different manifold: Euclidean space).

Two locations could be far apart on the input-manifold, but very close together in the physical world. One challenge for grid cells is to associate adjacent locations with the same RF, even if the locations look totally different.

I’ve written at length on this forum about how to modify a Spatial Pooler such that it forms receptive fields which respond to physical locations (as opposed to sensory features).

1 Like

Observed landmarks in the environment would map onto place cells pretty well, and grid cells would form a manifold of how actions affect location relative to them.

The space of possible actions would be quite complex, and would lead to a very complex manifold.

My main point here was that if a spatial pooler lacks enough neurons to properly represent such a complex manifold, it’ll approximate the manifold with multiple lower-dimensional projected GCMs. Lower-dimensional projections are roughly the average of all the points that project to them, so it would make sense that a simple 2D grid cell representation would form as a rough approximation. It’s an oversimplification, but if all the other factors cancel out, it makes sense. Factor in that there’s a lot of neurons found in the same regions as grid cells that aren’t obviously grid cells, and it doesn’t seem too far-fetched that they could be modelling the more complex relationships between actions and location.

Also, Group Theory seems to suggest that there aren’t very many ways to accurately model euclidean space. If you have a system that models Euclidean space in any way, it’s either going to look like a coordinate system, or it’s going to be inaccurate.

Are you sure the brain reliably forms accurate models in such cases? There are a lot of cases where people misjudge distances, etc. People will frequently underestimate how long it takes to travel a frequently-taken route, or will misjudge the size of a room (“this room seems bigger on the inside”).

Yes, a spatial pool can make place cells, where each place is a small location in the input space.

But the Spatial Pooler won’t. The Spatial Pooler can not understand actions, because it has no concept of time. The spatial pooler has no memory or temporal state.

Absolutely. Grid cells can have large receptive fields, covering areas of the world which look very different. For example take two steps to the side, and almost all of your visual inputs change, but many grid cells will activate in both places.

1 Like

The temporal pooler is forms grid cell modules too. It just forms them in phase space, not input space. If distal dendrites get their inputs from motor commands, then you can think of individual minicolumns as being akin to independent grid cell modules (similar to what Jeff has been talking about in recent research meetings), that represent dimensions of action space.

The interaction between Spatial Pooler GCMs and Temporal Pooler GCMs is a subject I need to explore much more, though.

Kind of? Many grid cells will be different too. Depends on what you’re talking about. Low-dimensional features will change, but high dimensional features won’t. Take a few steps to the side, and your visual inputs change heavily, but you still recognize a lot of the same objects.

Even if you don’t (say you just stepped into a different room), your brain probably has mapped out that the locations in these two rooms are different. For example, you may have a place cell that represents one room and another that represents the other. If you take an action to move from one room to the other, the brain should model the transition from one place cell to another in action-phase-space, and change the predicted, high-level sensory observations accordingly.

In this case, the low-level sensory features would vary too much to be accurately modeled. The dimensionality of that manifold is way too high for a single place cell to represent. So in that case, any synapses on that place cell that get inputs from low-level sensory regions would average out to nothing. When dealing with SDRs, ignoring noisy, low-level sensory input on a place cell would just be another example of a lower-dimensional projection.

High-level sensory features, which change much more slowly would be averaged out to a much lesser extent, and would correlate to actual common features for a given location. Even if an object or two become appear, or become occluded, most objects are likely to stay the same, and will be within the error tolerance of the place cell. If everything changes too much, that’ll probably be modeled with a different place cell, representing a different place (a different room?).

I’m not sure we are still talking about the same things, so here is what I mean when I say manifold.

Lets start by defining what an SDR is. An SDR is a vector of length N containing the values 0 or 1. It represents which neurons are emitting Action Potentials (APs) in an instant of time.

  • Because an SDR is a vector of numbers, we can define it as an N-Dimensional space, and define the distance between two points as the Hamming distance.

  • The sparsity is the fraction bits which are ‘1’. The sparsity of an SDR is directly proportional to its distance from the origin (the SDR with all zeros).

Inhibitory interneurons control how many neurons activate. SDRs typically have a fixed sparsity of around a few percent. This constrains which SDR’s are possible, and which SDR’s can never exist in the brain.

  • Setting the sparsity to a constant number reduces the space of possible SDRs from an N-dimensional cube to the surface of an N-dimensional sphere. The center of the sphere is the origin, and all valid SDRs have a fixed distance from the origin.

The surface of this sphere is a manifold, with many dimensions. All SDRs exist on this manifold, or a sub-set of it. The manifold is the set of all possible SDRs.

  • The sensory organ turns a sensory feature into an SDR. The resulting SDR is arbitrary, determined by the input.

  • The spatial pooler maps from one SDR to another SDR. It maps every location on the input manifold to a location on the output manifold. This mapping is initially random.

We know what the manifold of a Grid Cell module should look like, from probing animal brains. The open question is how the brain maps locations from the input manifold onto the grid cell manifold.

2 Likes

This bit seems a bit off.

If you start from my basic hex-grid idea, or the related TBT concept, the “center of activation” in the column can move around. This changes the context and a SDR sequence that might be passive in one context is the center of activity in a slightly different context.

This local winner of the competition through lateral voting extends across some area of a given map, and though fiber tracts, from map to map. If I point to one area of a map that is receiving some input I have to also consider the neighboring cells that are also being driven - this explores the entire solution space and settles into the lowest energy that satisfies the largest number of local cells in all columns simultaneously. At each node there is some activation that is part of the larger energy (hyper?) surface. This is also the bi-directional signal to related maps, also seeking the lowest energy solution.

The hubs of attractor networks are dynamic based on context; the entire manifold “base” is as dynamic as a soap bubble blown on the breeze. Given a certain context a local assembly will form and grow, influencing and flipping the stable states throughout the connected maps. Changing inputs or fatigue will erode this assembly to give way to the formation of a new assemble - the evolution of thought.

This dynamic behavior requires addressing the geometry at the cell, hex-grid assembly, and intermaps connections simultaneously.

I have tried many times to come up with a workable mental image of this ridiculously complicated multi-actor interaction and so far - bupkis. If you can clarify this it would be appreciated.

To be clear: I am not talking about any physical space. Physical spaces have ~3 dimensions. The space of possible SDR’s has as many dimensions as there are neurons. The “manifolds” which I discussed are part of a mathematical analysis of SDRs.

You quoted me saying the “center of activity”, but in fact the “center” which I speak of is defined by zero activity. These are two different concepts.

I hope this clarifies?

1 Like

Do the lateral connections between these neurons have an influence, beyond the nominal proximal dendrite/axon and apical dendrite/axon SDRs?

Just a minor update.

I have a project I’m working on that I [hopefully] should get done in the next few days. If that works out, it should free up some time to work on some visualization code to prove my points here.

I noticed a situation where UMAP might fail to preserve important structure of an HTM network necessary for the visualization to work properly. There would be work-arounds, though they would be a bit complex. I think I have a much simpler way of visualizing the internal mapping of an HTM network in a way that would preserve it, but it won’t be able to do dimensionality reduction well, and will be limited to visualizing simple 2D/3D manifolds/GCMs, and associated place cells.

Some of these ideas have also led me to an interesting way in which the grid cells in L6/ERC could be tracking orientation, which I’ll discuss in more depth later. Brief summary is that, because GCMs (specifically those mapping physical space, not conceptual space) map out grids at seemingly random orientations and scales relative to each other, and because displacement cells appear as though they may be orientation-sensitive, it seems as though the GCMs may be aligned to these motor commands, and would essentially be modelling integrations of those motor commands. This would naturally lead to 1D GCMs though, rather than 2D. However, if you account for how different motor commands interact, it looks like hebbian learning rules would naturally push such a system toward using 2D GCMs in a way that would look like what you see in the ERC, and could also account for rotation. It would be approximate and a little prone to mistakes, but with how error-tolerant neurons are, I don’t think that would matter much.

1 Like