Hebbian Learning



I’ve been spending some time breaking down cortical theory into small pieces. I believe Hebbian learning is the basis of cortical function. So I’ve produced a simulation to illustrate this fundamental operation.

The idea is - when a pattern of activity (shown in the top row of cells) consistently activates the post-synaptic cell (the singular cell) the synaptic connections between them strengthen, while those that do not contribute to the post-synaptic activity weaken.

As I understand it - dendrites grow synapses to any cell within their target layer - without any bias or preference. So the potential pool of synaptic connections is arbitrary, so the cell can activate to any arbitrary pattern of cells. But when there is enough repetition of consistent patterns (as you will notice with the activation of the last 4 cells) the synaptic connections will tune to that specific pattern. By the end of the learning you’ll notice that the cell will only activate to that pattern, no other. You can see in the visualization that the opacity of the connections represent the permanence of the synaptic connections. Although it shows that all 4 cells activates the target cell, it would only need a subset due to sub-sampling. Sub-sampling is more efficient in larger pools.

I wanted to post this to provide some visual analogy for one of the fundamental principles in HTM. Also, I would appreciate any corrections/additions to my explanation.

(The dendrite diverges into 8 connections to represent the 8 synapses of the segment. This model dendrite has 1 segment with 8 synapses)


I think it varies somewhat though, based on the region of the neocortex you’re in. Also, obviously there is a physical limit to how far a neuron can reach to others, so that’s also important in the long term goal of making a true copy of the brain. (Jeff Hawkins mentioned this in his limitations of HTM theory video that you can watch on YouTube ).

I wonder why you don’t have many individual dendrites going into the bottom cell, but instead have one dendrite going from the bottom cell that then connects all the others.

By the way, I saw your comment on the “Magic Hebbian Learning” video. I don’t know why that video has so many dislikes.


Yup, this is a model of a distal dendrite, so it can only span out so far - which results in local processing.[quote=“Addonis, post:2, topic:1763”]
I wonder why you don’t have many individual dendrites going into the bottom cell, but instead have one dendrite going from the bottom cell that then connects all the others.

I should have made it clearer in the explanation that the single line coming from the single cell represents the dendrite, then the divergent lines to the 8 cells represent the synaptic connections of that segment. Essentially it models a distal dendrite. However, the process is very similar for a proximal dendrite that learns spatial patterns from input.

Proximal Dendrites:

The interesting thing is that although Hebbian learning is a general principle, it applies to both spatial and temporal memory - it all depends on which layer dendrites of another layer project to. Layer 4 projects to the inputs, forming spatial memory. Layer 4 also projects to other cells in layer 4, forming temporal memory. Layers 2/3 project to layer 4, forming temporal pooling memory. The same operation - different configuration.


Brain Dump:

In spatial pooling, all columns compete with surrounding local columns using inhibition. The columns that survive the aggregate inhibition have their cells activate. Local competition between columns and cells have many functions, beyond that in spatial pooling. They also play a big part in attractor networks for memory-completion and decision-making.

Competition is important for Hebbian learning. If a number of cells shared the same potential pool they would result in representing the same patterns.

Even if different consistent patterns were presented, the cells will still represent the same one.

Even if the each dendrite had different permanence values, with the possibility that each cell might have different initial conditions (therefore possibly representing different patterns), they will still tend to represent the same pattern.

If the cells had inhibitory connections to each of their neighbors then they will each compete to represent a different pattern. If a cell wins at representing a pattern, it will inhibit its neighboring cells, not allowing them to represent the same pattern. Self-organization occurs.

It seems that throughout nature, competition is the primary driver for self-organisation and adaption. Another term for competition (in the biological sense) is “natural selection” - which defines it as a selective system. A good example is clonal selection in immune system. When defending against an invading infection, the immune system puts millions of different varieties of antibodies to the test. Those that ‘fit’ the bacteria or virus is then selected for reproduction through the system. And in the cortex, those cells that happen to have the best connection to an input pattern (even if the connection is weak) will be selected to represent that pattern through reinforcement and local inhibition. (A pattern is never really represented by a single cell. A pattern is represented by many distributed cells, each outside of each other’s neighborhoods).


You should be careful about assuming a learning model. There are many.
You could have temporally sparse snapshot learning where the neuron is non-active in terms of learning almost all the time, then just suddenly decides it is going to learn the next pattern that comes along.
There are variation on Winnow/Multiplicative Weight update learning.
Minimum probability flow batch learning algorithms could be possible with short term replay.
There are many others.


Thanks Sean for pointing this out. I’m curious about other learning models. Whenever I do research on the cortex I’ve only found Hebbian/STDP, which is why I’ve used it as the basis of my models.

Do you happen to have a link on temporally sparse snapshot learning? I wasn’t able to find much on Google. The others you’ve mentioned seem to be non-biological/non-cortical learning models (or are they?).


Okay, this picture helped me understand it better.

So dendrites are the outer rims of the cell that want to receive synapses or want to have outgoing axon connected to it. Axons then connect a dendrite to another nucleus and the other end (on a new neuron) of an axon is called a synapse.

Sorry for dumb questions: Is this how NuPIC currently works, I mean with Hebbian Learning or does it use completely different kind of a learning model? Also, NuPIC only currently uses only one of those layers, right? I think it was Layer 6? And does NuPIC currently use that competition you described?



My knee-jerk processing of this is that this is just part of the “normal” 2% sparsity found in the neocortex to begin with - and the reason for forming another type-distinction is just researchers being out of sync with accepted nomenclature?
…not saying this is true - just saying this is my suspicion (i.e. that “temporally sparse snapshot learning” is not a “real” thing)? Any other neuroscientist types want to weigh in? (@Sean_O_Connor I apologize up front for equivocating on this :stuck_out_tongue:)


That is a generic neuron. In the cortex there are more specialized pyramidal cells that have spiny dendrites. The HTM Neuron model shows the basal and apical dendrites as a set of segments.

NuPIC uses Hebbian (or Hebbian-like) learning. Although NuPIC does not explicitly model Hebbian learning, it is the basis of HTM theory. You can read more here.

So far only the input and layer 4 is modeled for spatial pooling and temporal memory.

Competition is used between columns in spatial pooling, where local inhibition causes competition and sparsity.


Please keep in mind that all “Hebbian” means is that learning takes place by virtue of the increase/decrease in firing-likelihood based on the frequency of that path being “excited” or used. In classical ML “neurons”, “learning” is implemented by a change in weights - though all connections are static and unchanging; while in HTM Theory, learning is implemented via increases/decreases in “permanence”, and in addition to that, the connections themselves can change (“grow”/“be culled”) to make new connections or remove existing connections.


Yup, indeed. The opacity of the synaptic connections in these visualizations represent the permanence values. Consistent input patterns that contribute to the activation of the cell get reinforced (increased permanence), while those that don’t get weakened (decreased permanence). So therefore the dendrite segment represents a specific pattern (even when sub-sampled).

What is interesting about synapses is that they can relearn even without synaptic genesis/pruning. They can forget redundant patterns when new consistent patterns win over.


Cool, what do you think about the Blue Brain project? At their research lab they have a giant screen on which they look at the neurons that they simulate. https://www.youtube.com/watch?v=LS3wMC2BpxU


Yeah it’s really awesome. They’ve learned a lot from their simulations. This is my favorite video from Henry Markram. The way he describes dendrite grow inspired some other simulations I’ve done.


That is very weird for me comprehend. So the actual connection growth involves no chemicals or guidance? Just random growing and bumping into each other. They state that there are exceptions to this but still this sounds like something fundamental that impacts the whole design. Is this assumption still up to date?


Yeah others have also come to the same (or similar conclusions). The initial connections formed are arbitrary and are not functionally useful. But through Hebbian learning the connections are then modified or pruned (based on activity patterns) to be functionally useful.


Then might I ask the reasoning behind potential synapse fields of HTM, especially to the Numenta people?

If this is the case, shouldn’t the potential fields actually include any synaptic target rather than a subset of cells or columns? Are the potential fields there to emulate the effect of proximity? In other words, is it because the growth of closer cells/columns have a higher chance of bumping onto each other?


That’s the way I understand it. However, in simulations the potential pool could be arbitrary/sparse, therefore not conforming to proximity. That is possible simply because there is no physical constraints to how far dendrites/axons can project in silico. So it seems proximity is not even essential (except perhaps for vision).


Here is what we know about synapses in the brain and how we model them in SW.for HTM

Brain (facts, or at least things that many neuroscientists would agree with)

  • Pyramidal neurons are the primary cell type in the neocortex
  • In pyramidal cells, all excitatory synapses are on dendrites, none on the cell body
  • There are three different “integration zones” of dendrites, proximal:close to cell body, distal: farther away from body, apical: at the end of the “apical shaft”.
  • The three zones have differing affects
  • The dendrite is like a tree with lots of branches
  • Branches typically have a few hundred synapses
  • Each branch acts like a coincidence detector. Patterns in large populations of cells can be recognized by forming synapses to a subset of the active neurons. Typically 10 to 20 synapses are sufficient
  • Individual synapses are stochastic, and can’t be relied on to always do what you expect, but a set of synapses is pretty reliable
  • Some synapses are stable but others are constantly forming and disappearing
  • An axon and a dendrite branch need to be near each other to form a new synapse, they don’t have to be touching. Glial cells can act as mediators and tell an axon and dendrite they should grow towards each other over short distances
  • A “potential synapse” is when a dendrite segment and axon are close enough that they could form a new synapse via Hebbian type learning
  • The tips of dendrites and the tips of axons grow, explore, and retract, so the morphology of the dendrites and axons do change over time. There is some evidence that if useful synapses form near the end of a dendrite or axon, then the axon and dendrite continue to grow from there. If no good synapses are formed then the tip might retract and grow elsewhere.
    An individual dendrite segment cannot be expected to be near the axons of all the cells in a pattern it needs to learn, but if it can be near enough to 20 or so of the axons, that’s good enough. This proximity is somewhat random.

How we model this in HTM

  • In HTM we model the growth of a new synapse using a “permanence” value
  • In SW we don’t have to worry if an axon or dendrite are near enough to form a new synapse, typically we just pick a subset of the active cells to form synapses to.
  • However, sometimes we designate a subset of cells as being in a “potential synapse” pool. We pick this pool randomly and do this mostly to guarantee that all the cells that are learning don’t select the same subset of cells to form synapses to
  • We don’t model the proximity of two neurons, at least not explicitly, we may designate a set of potential synapses, we then pick a subset of those to train, the rest are considered to have a permanence of zero although in SW we only need to actually store the values of synapses with permanence above zero

I hope that is clear, if not, let me know and I will try to make it so


@jhawkins, The detailed points are very much appreciated!

So this probably is the reason that the Nupic spatial pooler variable potentialPct has a default value of 0.5. To help with the synapse diversity and prevent multiple columns from converging exactly to the same connectivity. Then setting this variable to 1.0 would nullify the point of having potential synapse pools in the first place.

For temporal memory, I would guess the similar functionality is obtained through the maxNewSynapseCount variable. Indirectly subsampling previous activation randomly just like having a potentialPct value. So this is one extra reason to set maxNewSynapseCount variable lower than active cells on a predicted scenario (single cell active on every column). Synapse diversity.

Does that sound correct?


Yes, that’s exactly right on both aspects.

Another reason we did the temporal memory that way is more practical - in the temporal memory there are so many cells (65K) that it would really slow things down if we had a potential pool like the spatial pooler. maxNewSynapseCount gives us essentially the same functionality with many fewer potential connections and allows implementations to go much faster.