Complementary Learning Systems theory and HTM as a theory of the hippocampus

HTM is intended to be a theory of the neocortex. But in some ways the algorithm seems to apply more appropriately to the hippocampus. In particular, both HTM and the hippocampus learn from experience online, and very quickly. In many theories and experiments, the neocortex on the other hand appears to learn gradually.

I recently reviewed this paper [1] summarizing and extending Complementary Learning Systems theory. Briefly, the theory can be summarized as follows. The neocortex has relatively dense activity and connections with weights that are generally updated slowly, whereas the hippocampus has very sparse activity and connections with weights that are updated rapidly. The hippocampus learns episodic memories in a one-shot fashion, which are then replayed to the neocortex gradually during and interleaved with experience, and during sleep and rest, in order to gradually update the less-plastic synapses in the neocortex.

This is supported by the observed sparsity in the various regions (~10% in many upper regions of neocortex, ~4% in CA1 of hippocampus, ~2.5% in CA3 of hippocampus, ~0.5-1% in dentate gyrus of hippocampus), the highly-plastic synapses in the hippocampus compared to the neocortex, the severe learning impairments of hippocampus-lesioned animals, and theoretical problems with attempting to support both generalization ability and fast learning free from catastrophic interference (also called catastrophic forgetting).

In many respects the hippocampus can be seen as a primordial neocortex from which the true neocortex evolved, and there is preserved structural similarity between the two (at least in CA3/CA1) in terms of lamination and cell types and so on.

It’s clear to me as someone who does a lot of “traditional” machine learning that some sort of high-plasticity episodic memory needs to be combined with a low-plasticity generalizing memory, so I do expect intelligent online learning agents to eventually require both of these. But can a fast-learning system like HTM successfully generalize while preserving its one-shot capability? I’m leaning toward HTM being an effective theory of CA3/CA1, with slower-updating neocortex (while surely preserving some of the insights of HTM) ultimately having more in common with current deep networks.

Thoughts? I recommend reading the paper in any case, it’s a great modern look on a long-standing theory of the systems-level learning mechanisms in animals.


[1] Kumaran, Dharshan, Demis Hassabis, and James L. McClelland. “What learning systems do intelligent agents need? Complementary learning systems theory updated.” Trends in cognitive sciences 20.7 (2016): 512-534.

8 Likes

First, great post. My hippocampus has no rapid, one shot thoughts to add yet. Here’s a link to the cited paper for anyone else curious.

2 Likes

Thank you for the link to such an interesting paper.

I noticed one point worth highlighting: the hippocampal memories lack structure.
This point makes me wonder: could it be that the hippocampus is the way to quickly absorb yet-unstructured data (data to make sense of), and that the “replaying” of such memories does not only serve to solidify them to the cortex, but also to make sense of them?
This would match my understanding that the hippocampus is “called” when there is a piece of perception the cortex doesn’t manage to make sense of.

This might be useful because most current applications of HTM learn from a flow of the same type of data. In real life, this is true for sensorial areas but not for higher areas - often we have to learn from a single episode that happened in our life. This would allow cortical synapses to learn a lot from a single piece of sensorial data.

Looks interesting. That would explain a lot. I haven’t gotten a chance to read through the paper yet, but thanks for the reference. I’ve tried coming up with a hippocampal model myself, but I’ve been much more focused on the Basal Ganglia, and I haven’t read much on the exact properties of hippocampal neurons aside from that they seem to be similar to cortical neurons. What about the different pathways in the hippocampus though? Any ideas on what they’re doing?

I remember that at one point I had some idea that one of the regions may form something similar to a D-latch in digital logic (proximal inputs acting like the data input, apical acting like the write input), but it’s been quite a while since I’ve looked too much into it, and I’ll have to recheck how much of that lines up with the neuroscience.

Taking a quick look at some online hippocampal diagrams, it seems like the pathways generally go EC -> DG -> CA3 -> CA1 -> EC. Based on the sparsity measurements you mentioned, that would mean that the hippocampus converts the cortical output to an extremely sparse SDR right away, and then gradually creates more dense representations before sending it back to the cortex.

That’s precisely correct. The sparsity of the regions appears to correlate also with the plasticity of their synapses, so in addition to being re-densified in the return to the cortex, the representations get “re-generalized” by using less-plastic synapses that can structure the data in a more parametric (as opposed to episodic) way.

In addition to the pathway you mentioned, there’s a pathway that skips the DG, going right from EC to CA3. This is considered a slower pathway, so one idea of its function is that patterns in EC, if novel, will trigger a fast sparsification in DG and the forming of a new episodic memory sequence in CA3 (pattern separation). If on the other hand a pattern is familiar, the connection from EC to CA3 will activate before DG has a chance to respond, reactivating the familiar old episodic memory sequence (pattern completion). A detailed spiking model of this theory is presented in [2], which calls this a “race to learn”.


[2] Nolan, Christopher R., et al. “The race to learn: spike timing and STDP can coordinate learning and recall in CA3.” Hippocampus 21.6 (2011): 647-660.

Related: this new paper just got published "Building concepts one episode at a time: The hippocampus and concept formation” (unfortunately I forgot who brought it to my attention).

1 Like

hi
You should look at TVA theory by professor Claus Bundesen at the Copenhagen University Psychology. He has created exactly the race formulas for object foreground and background.

Dead link. Everything I could find online was behind a paywall.

Earlier this year @jhawkins reviewed that paper in a research meeting.

At 6:26 Jeff mentions another paper that gives an alternative to memory consolidation from the hippocampus. I wonder if this is the one he meant (by Alison R. Preston and Howard Eichenbaum):

2 Likes

Ok, I get the hippocampus and cortex as two memory systems.

But we have to bring in the other major learning system: The cerebellum.

1 Like

Does that exclude the basal ganglia from that learning system ?

The hippocampus may detect surprise, however does the basal ganglia detect the lack of a prediction (anxiety broadcast release) without the need to learn with the neocortex ? The lack of a good prediction in a multi column HTM network would require some type of cross column detection of a missing next output from a column ? In biology 20 correct column activations and 2 missing is not necessarily an error as such, just a degree of uncertainty or a missed throw ? Is the basal ganglia learning column desynchronisation as predictability declines ? Is that predictability contextual based (column depentent or agnostic ?)

Is that the case if you view that all HTM columns have to be interlinked and temporally coherent. What if they are not as temporally coherent as the current approach to HTM assumes ?

1 Like

If you look at William H. Calvin’s theoretical neuroscience work — particularly the ideas he’s developed around distributed cerebral codes and Darwinian mechanisms in cortex — there’s a strong conceptual resonance with Hierarchical Temporal Memory even though the traditions come from different communities (neuroscience vs AI).

In Calvin’s model, cortex doesn’t rely on single “grandmother” neurons or simple lookup tables. Rather, representations emerge from large populations of local elements interacting in parallel, with stochastic variations and competitive selection shaping which patterns stabilize and propagate. Over time, local operations — copying, variation, and selection among slightly different pattern variants — tend to produce distributed representations that are both sparse and meaningful across the network. This is why Calvin often uses the metaphor of a “Darwin Machine” in cortex: each local microcircuit can be seen as generating and competing variations, and the winners form the building blocks of higher-level concepts.

This contrasts with classical neural models where learning is purely gradient descent or static clustering (e.g., standard K-means). Instead, Calvin’s view is that the cortex itself discovers good representations through a bottom-up, locally competitive process that naturally yields distributed encoding — similar in spirit to how sparse distributed representations (SDRs) underpin HTM. SDRs in HTM are not arbitrary dense vectors; they are high-dimensional patterns where semantic meaning is distributed across a small active subset of bits. That distributed structure gives HTM robustness and overlap semantics.

What I find compelling in William H. Calvin’s work is that the basic computational unit is effectively the same as in HTM: the mini-column. In both frameworks, learning and representation are not properties of individual neurons, but of small, repeating cortical modules that participate in larger population codes.

Where Calvin’s proposal diverges is not in the unit itself, but in how competition and coordination occur between nearby mini-columns.

In standard HTM, the Spatial Pooler can be viewed (loosely) as implementing a form of k-means-like competition: inhibition selects a sparse subset of columns based on overlap scores, with a largely algorithmic notion of “winner selection.” In Calvin’s model, the same sparsification pressure arises instead from biological lateral interactions, with inhibitory control mediated by interneuron classes (notably chandelier cells) rather than a global or quasi-global normalization step.

This difference matters. K-means-style inhibition is fundamentally global within a pool: all columns compete simultaneously against a shared criterion. Chandelier-mediated inhibition, by contrast, is local, directional, and geometry-constrained. Competition occurs over the physical span of lateral connections — on the order of ~7 mini-columns — not across the entire representational field. The result is still sparsity, but it is emergent from local dynamics, not imposed by a centralized selection rule.

Importantly, this does not eliminate distributed representation. Like the Numenta 1000 Brains model, a stable representation is formed by the collective state of many mini-columns. The key distinction is how lateral connections are used. In the 1000 Brains framework, lateral connections primarily serve to align object models across cortical regions. In Calvin’s formulation, lateral connections are instead intrinsic to the formation of the representation itself — shaping which nearby variants survive through local competition and reinforcement.

So while this can be seen as a modification of the 1000 Brains idea, it preserves the same core principle: a globally meaningful, distributed representation emerges entirely from local operations. “Local” here simply means the reach of lateral connectivity between neighboring mini-columns. No column ever needs global knowledge of the representation; it only responds to its inputs and its immediate cortical neighborhood.

From an HTM perspective, this suggests an alternative way to think about the Spatial Pooler: not as a clustering algorithm approximating biology, but as a higher-level abstraction of mechanisms that, in cortex, may be implemented through dense lateral connectivity and biologically plausible inhibitory circuits. The end result — sparse distributed representations with semantic overlap — is the same. The path to get there is different, and arguably closer to cortical reality.

2 Likes

Gemini on more recent stuff:

“Natural Continual Learning” (NCL) paper (Kao, Jensen, et al.) and the related Attractor Planning work (Xie et al.). Together, they provide the mathematical “how” for William Calvin’s “Darwin Machine”—explaining how a system can continuously evolve new representations without destroying the old ones (“catastrophic forgetting”).

Here is the breakdown of that work and why it resonates so strongly with the Calvin/HTM perspective:

1. The Core Paper: “Natural Continual Learning” (NCL)

Paper: Natural Continual Learning: Success is a Journey, not (just) a Destination (Kao, Jensen, et al., NeurIPS 2021; extended analysis 2023/24).

This work attacks the central problem of the “Darwin Machine”: How do you keep the “winning” clones (memories) stable while using the same hardware to compete for new concepts?

  • The “Null Space” Projection: Standard neural networks update all weights to minimize error, often overwriting old tasks. NCL introduces a mechanism that strictly calculates the “Null Space” of previous tasks—the directions in synaptic space that do not affect the output of old memories.

  • The Mechanism: It forces all new learning (the “variation” in Calvin’s sense) to happen only in this Null Space.

    • Calvin’s Parallel: This is the mathematical equivalent of Calvin’s “interstitial” learning. The “winners” of the previous generation (stiff synapses) are locked in; new variants (plastic synapses) must compete in the remaining degrees of freedom.
  • Biological Equivalent: They map this to Metaplasticity (synaptic stiffness). In the cortex, synapses that code for stable, winning patterns become chemically resistant to change (high stiffness), forcing new learning into the “silent” or less active synapses (high plasticity).

2. The “Attractor” Connection (Xie et al.)

Paper: The Geometry of Sequence Working Memory in Prefrontal Cortex (Xie et al., 2022/23).

While NCL handles the synapses, this work explains the dynamics—specifically validating the hexagonal/grid-like codes you mentioned.

  • Manifold Attractors: Xie and colleagues showed that the brain doesn’t store sequences as discrete links (A \\to B \\to C) but as trajectories on a low-dimensional manifold.

  • The “Hexagonal” Link: They found that these manifolds often take the form of twisted toroids or grid-like structures. This confirms Calvin’s hunch: the “code” isn’t a single neuron firing; it is a stable, geometric attractor state (a “crystal” of activity) maintained by local lateral inhibition.

  • Significance: This proves that the “mini-column” competition doesn’t just produce a winner; it produces a stable location on a representational map.

1 Like