HTM learning algorithm - temporal memory

I have a few questions on a particular topic, corresponding to the HTM algorithm:

When there is no predictive cell that is active, then all the cells of the column burst. How to choose the winner cell? The paper and video give some clarity, but I still have some doubts, so I would like to clarify.

Let us suppose (only for the sake of example and discussion of current topic) there are 10 columns with 10 cells each, so 90 total cells in the system.

  1. Each cell contains several segments. So each segment would be a binary vector of 100 dimensions, 1 denoting an active synapse, and 0 denoting an inactive one, correct?
  2. If no segment is active, I pick the segment with the highest number of active synapses, and the cell corresponding to that segment as the winner cell - is that correct? So suppose column 2 burst, in absence of any predictive neuron. Suppose, col.2 cell 5 contains 6 segments, and segment 1 has 13 active synapses, which is the highest, across all cells in col. 2. Then I pick segment 1, and the corresponding cell 5 in column 2 as the winner cell.
  3. Do I increase the permanence of cell 5 (the winner cell) or should I increase the permanence of all the bursting cells?


1 Like

Why Neurons Have Thousands of
Synapses, a Theory of Sequence
Memory in Neocortex

If a winning column was unpredicted, we need to select one
cell that will represent the context in the future if the current
sequence transition repeats. To do this we select the cell with the
segment that was closest to being active, i.e., the segment that had
the most input even though it was below threshold


Conceptually, yes. Practically, since the vector would be extremely sparse, it’s implemented as a list of entries such that each entry consists of a pointer(or the index) and the permanence.
It depends on how the system as a whole is implemented and what platform or language you’re implementing it on. But it would be severely impractical for a segment to track the entire cells of the system.

Yes, exactly. The cell that has the segment that has the most potential synapses active.
If such segment doesn’t exist, the winner cell would be the cell that has the least amount of distal segments so far, if there’s a tie, choose a random cell between them.

According to the standard specification, I think only the winner cell’s permanence would change.

Yes, I saw the description, but still do not understand some things:
Firstly - what does it mean by - such segment does not exist? Is it as follows:
For each active cell i
For each segment j
For each synapse k
k is inactive

Is this what is the meaning?

When the cell has least amount of segments, and the cell is picked, a new segment would be created. Which cells would that new segment connect to? Or is a segment created with random connections?

Putting it more technically, “if there’s no segment that has potential synapses that sufficiently overlaps with the active cells from the previous timestep.”

A random subset of the active cells winner cells from the previous timestep.

So then there is no guarantee that the new segment has potential synapses that overlap with active cells from previous timestep, right?

Sorry, I don’t seem to understand what you’re saying.
The new distal segment would absolutely have potential synapses that overlap completely with the active cells from the previous timestep as it would be a subset of them.

Sorry I misunderstood. Thanks! that’s right.

1 Like

Is this not the manner in which a cell becomes predictive - when any segment has sufficient synapses overlapping with the previous time steps. Sufficient → no segment crosses threshold. If no cell becomes predictive, then how is the next segment chosen? Obviously none of these segments have sufficient overlap with active cells (i.e. none are active yet).

I meant active potential synapses as in a “synapse” that has the permanence of blew the threshold and is “connected” to an active cell.

Your description is basically right.

The column burst because no cell was predicted, so the winner cell in essence is the cell which came closest to being predictive.

In the TM source code there’s a variable called I think BestMatchingSegment that’s used to decide the winner cell in bursted columns. Like your example:

This however deviates from the algorithm (tho maybe still effective):

Each segment is a list of cells which are essentially monitored by the segment – so they won’t all be the same dimension, since diff segments can have diff numbers of cells at any given time.

Each segment has a maximum number of cells it can hold, and a max new number that can be grown at once.

At each time step every segment is either: Inactive, Active or Matching – depending on how many of its monitored cells are currently active.

If enough cells are active the segment is Active and its cell is Predictive, but if fewer are active the segment could be Matching (not enough to be Active but still significant). There are 2 thresholds for active cell counts on a segment for Active & Matching status.

When a column bursts due to no predicted cells (thus no Active segments on any column cells), the algorithm looks for the cell that best matches the prior input. If no cells match at all (say its only timestep 3), a random cell is chosen among those cells with fewest number of existing segments.


Thanks a lot.

Your reply along with the following document: makes it very clear.

Actually I had taken a look at the much older Whitepaper document, which was not clear at all. But thanks a lot. I think this is very useful.

I’ll come back here, when I have further questions :slight_smile:

1 Like

More questions:

  1. How do we initialize a cell’s segments? → How many segments does a cell have, and which are the synapses that the segment is connected to - to begin with?
  2. When punishment occurs, and synapse permanence goes down to 0, is that synapse deleted from the segment? So there is the idea of growing new synapses and segment - is it balanced in some way by cutting down the synapses?

Cells have no distal segments initially, but for the spatial pooler part (i.e. proximal segments) on the other hand, a cell has one segment that are pointing to about 80% of the input cells initially and this segment is shared with the other cells in the same column. This is because the cells in the same column are modeled to response in the same way to a proximal input pattern.
So for the temporal memory, it has no initial segments thus no synapses and it has to rely entirely on learning.

Yes, it’s usually implemented that way. And as you put it, this reduction of synapses is balanced out with growing new synapses when there are fewer synapses (below the permanence threshold or not) that are pointing to active cells than the sub-sampling hyper-parameter. (usually 32, I think.)
The reason why you would remove synapses is that they might be pointing to the wrong cells that don’t contribute to recognizing the context, which probably came from a noise at the first place.

1 Like

Thank you! This really helped. I have one last question before jumping into implementation.

In the first pattern in a sequence, how is the winner cell chosen? Is it chosen randomly? For e.g. in sequence ABCX, proximal input activates column corresponding to A, and the column bursts, as there is no prior context. How to choose the winner cell?

The winner cell of a bursting column is:

And according to this logic, the cell with the least amount of segments becomes the winner.

1 Like

I have a second question:

for growNewSegment routine, initially the segment has 0 synapses. So: newSynapseCount = (SAMPLE_SIZE - numActivePotentialSynapses(t-1, learningSegment))

The value “numActivePotentialSynapses” is always 0, right? Since a new segment once grown, does not have any synapses yet.


For some implementations, when the new segments are created, the permanence values for the potential synapses are randomly initialized with a normal distribution centered on the permanence threshold. That way approximately 50% should already be connected.

But without an initial context signal to guide the synapse growth in the new segment, the dendrite should form a sort of random sampling of about half of the neurons in it’s local neighborhood. However, since the permanences are initialized to values very near the threshold, they should adapt very quickly towards actual input patterns.

As an alternative, I’ve often thought that there should also be a null pattern that could be used to indicate the beginning and/or ending of a sequence. For text, that could be whitespace. For audio, it could be silence of a sufficient duration.