Space is a latent sequence: Structured sequence learning as a unified theory of representation in the hippocampus
… we provide a unifying principle that the mental representation of space is an emergent property of latent higher-order sequence learning. Treating space as a sequence resolves myriad phenomena, and suggests that the place-field mapping methodology where sequential neuron responses are interpreted in spatial and Euclidean terms might itself be a source of anomalies. Our model, called Clone-structured Causal Graph (CSCG), uses a specific higher-order graph scaffolding to learn latent representations by mapping sensory inputs to unique contexts. Learning to compress sequential and episodic experiences using CSCGs result in the emergence of cognitive maps - mental representations of spatial and conceptual relationships in an environment that are suited for planning, introspection, consolidation, and abstraction. We demonstrate that over a dozen different hippocampal phenomena, ranging from those reported in classic experiments to the most recent ones, are succinctly and mechanistically explained by our model.
As a graph model, this model seems fairly straight forward. His description does appear to invoke a temporal memory algorithm similar to what we are familiar with from HTM.
As a physicist, I can’t help but draw the connection to a position-momentum phase space plot from dynamical systems theory.
Here, each node in the graph encodes both position and velocity. Each trajectory should generate a unique path through this phase space (e.g. forward vs. backward). The only ambiguity that might result in a bifurcation of paths would be when the agent decides to change direction or keep going while passing through the zero momentum state (middle path above). However, even this might be resolved with a third axis to the space representing orientation.
It should be possible to discretize this space with a neural representation if position, orientation, and velocity are each provided as independent inputs (or can be inferred from the input sensory stream). Acceleration might also be measured by the proprioceptive system or inferred from motion cues. With these inputs, it should be possible to track each unique path through this phase space and allow for disambiguation of current states and prediction of future states by position (where am I), orientation (what direction am I facing), duration (how long do I expect to remain in this location).
There may be evidence for the generation unique place cell representations based on orientation and/or direction of travel. I’ve heard of an experiment that was measuring place cells in rats that were tasked with traveling back and forth down a corridor. Place cells associated with specific locations were observed. However, over time, some of the place cells began to only respond while traveling through a location in a specific direction. On the return trip, another place cell was observed to be active in the same location.
There are if-except-if trees. If some test is true then return a response value, except if a further test is true then return the associated response value, except if a further test is true then …
That way you can go from general to more specific responses to an input.
It is probably better to use n-way tree nodes than just the binary ones in the example.
I think their main point is there is no explicit separate structures/algorithms for place/grid cells (or reference frames). The agents uses only “strips of experience” to map using movement (action->observation->action…) their territory, being it either physical or conceptual or mixed, by generating a knowledge graph.
And that’s all what is happening in the agent’s brain.
What they also observed - and claim - is that whatever nodes become active in the above process parallels activations of place/grids cells so whatever observation lead to “discovery” of grid cells was only the “noise” made by the function of their proposed mechanism - Clone Structured Causal Graph.
PS Even the (mental representation aka model of) space itself emerges from learned strips of experience, place/grid cells being artifacts of how this model is encoded
One example of if except if trees is if the previous letter (at n-1) is “a” then the current letter (at n) should be “s” except if the previous previous letter (at n-2) is “c” then the current letter should be “t”.
Then you have memorized “as” and “cat”. And of course you can memorize a billion things that way, using a simple greedy learning algorithm.
That is imperfect because the further back you go in a sentence the less informative the letter tests are.
You could turn an image into a stream of symbols using a random projection and binarization and then all the symbols have about equal information. I don’t know if that would get you to neural network level performance but it is something to think about.