Why Neurons Have Thousands Of Synapses, A Theory Of Sequence Memory In Neocortex


Thank you @subutai for your response.


Hi @subutai,

what is active presynaptic cells?

“The matrices “delta D” are added to the current matrices of permanence values at every time step” means delta D will be added with matrix D, that is correct?

In part 5 of paper, eq( 6) is doing reinforcing and punishing, and eq(7) is do something like punishing, that is correct?


@Niki There are quite a few posts on the forum mentioning presynaptic cells. Our search page is quite good.


Thanks @rhyolight , I will check them.


If cell a connects to cell b (i.e. cell a forms a synapse onto cell b’s dendrite), then cell a is a presynaptic cell for cell b. Cell b will be a postsynaptic cell for cell a.

Yes. Delta D^d_ij will be added to D^d_ij

Yes, that’s right. Eq 7 is a much weaker punishment in case a cell was predicted but did not become active later.


Thank you for your reply @subutai.


Hi @subutai,

Are the three states “Active, predictive, and nonactive” mutually exclusive?


No, an HTM Neuron can be active and predictive at the same time.


I’ve just started reading the paper, however I’m wondering what operation is denoted by the slash in the graphic.

I think it would become obvious later on, but it’s important to me while I’m still at pages 3-4 to already know it.


It just denotes a linear operation, as opposed to a threshold. The proximal synapses (green) tend to have a linear impact on the cell body. In this paper, we focused almost completely on the blue context synapses and didn’t discuss the proximal ones too much.


Hi, I am new to HTM community.

Several years ago, I read the HTM whitepaper published in 2010. Is there any update on the theory between that and this paper, which is published in 2016?


Yes, see our latest paper: Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World.


Thanks but I know there were some updates on the theory between the 2016 paper and the latest paper which is very interesting. I am asking what has changed between 2010 and 2016, because the following page says that the whitepaper is obsolete.


@subutai, @jhawkins,

Sorry but I am unhappy with the “matrices” D ^d_ij and A being considered “matrices”. D is supposed to be, more or less, a matrix as indicated by the indices “ij” and each element of this “matrix” is again supposed to be a matrix, ostensibly forming some sort of matrix of matrices or tensor of rank 4. We then see some operation of “element-wise multiplication with subsequent summation” being used in equations (4) to (7).

While, unlike with vectors, there is no exact mathematical definition of a “matrix” other than “some numbers written in a rectangular form” using the notation does imply at least one thing: That the numbers in each of the columns are somehow related to one another in some meaningful relationship and that the numbers in each of the rows are somehow related to one another in another relationship.In addition, a matrix usually represents a linear map and has some matrix multiplications happening for it.

As for the relationship of the columns in the the matrix D (and by extension, each of the matrices D_ij and the matrix A), that’s easy, they are the cells in the cortical mini-columns and their co-location in a mini/matrix column plays a role in some sort of inhibition process that activates only some of them on input in each iteration. As for the rows however, they have no meaning. So this “D_ij matrices” business is a red herring.

Wouldn’t it be better to consider the entire set of cells in the temporal memory, and hence also the synapses on the dendrites, each to be a vector of length N x M. With e.g. 32 cells per mini-column, the first 32 elements of such a vector would represent the cells of the first mini-column and so on. The weird “element-wise multiplication with subsequent summation” would then be a simple vector dot product. And D would be a vector of vectors or simple matrix. Incidentally it seems that that is more or less how it’s implemented in the NuPIC reference implementation, albeit probably only for some reason of coding convenience. See e.g. the “segToCol” function.

This notation would also highlight similarities to the matrix that is the spatial pooler (also called “D” in the spatial pooler paper) and its learning algorithms, and allow for further investigation of the algorithm’s relationship to linear algebra.

– Rik


Hi Rik,

I agree the math notation could be clearer. I actually started with the above as it’s closer to the code, but then moved away from it. I don’t remember the exact reason, but it could have been that the section “Computing cell states” became more complex as you need some way to index cells within minicolumns. Perhaps I could have used D^ijd instead of D^d_ij. What we really need is the equivalent of numpy.reshape in our math notation!

I’m definitely open to simpler notation - if you want to try writing out the whole thing using your suggestion, I’m open to looking at it, and perhaps switching to using it in the future.

BTW, it was even worse in the Columns paper as we also had to index the cortical column, describe two layers, and deal with external, intercolumn and intracolumn connections!



Thanks for replying. Now that “I have your permission” to think of
segments as vectors I’ll see what I come up with.


Hi @subutai,

Sorry another worry. I see 3 different definitions of “potential synapse”:

  1. In the section “Synaptic Learning Rule” the paper says: “For each dendritic segment we maintain a set of “potential” synapses between the dendritic segment and other cells in the network that could potentially form a synapse with the segment. […] A permanence value close to zero represents an axon and dendrite with the potential to form a synapse but that have not commenced growing one.” So far, so god.
  2. Now, later, in the section “Materials and Methods” the paper says: “the network is initialized such that each segment contains a set of potential synapses (i.e., with nonzero permanence value) to a randomly chosen subset of cells in the layer.” Should it say “i.e., with potentially nonzero permanence value”? Because according to definition (1) a “potential synapse” can have a permanence value of zero? Or is the difference between “close to zero” and “zero” significant in this context?
  3. In the reference implementation there is a variable “numActivePotentialSynapsesForSegment” that plays a role in the learning bits. This variable seems to reflect the set of permanence values greater than zero (or rather epsilon), more or less, where a permanence value can be recorded for any pair of segment and cell.

It seems definition (3) doesn’t match (1) or (2) and that that variable should be called something else. It also means that effectively the reference implementation doesn’t implement a concept of “potential synapse” or rather all pairs of segment and cell are potential synapses. This seems to be a departure from the biology described earlier in the paper. It might be an implementation shortcut. Could this be affecting the functioning of the temporal memory in any way.

Two more clarifications about the implementation please:

  • There is a mechanism for creating and destroying “segments”. The temporal memory starts with zero segments per cell and gradually adds them to cells as patterns are being learned, up to a point of some maximum segments per cell, upon which a regime of forgetting and renewal sets in involving the variable lastUsedIterationForSegment that maintains a constant segment count per cell. Why all this code for the “ramp up” to the maximum, would it be simpler to initialize every cell with the maximum upfront? Presumably you expect temporal memories to have widely varying counts of segments across cells so that there is no good initial upfront maximum value for all cells?
  • Also the paper says, in the section “Testable Predictions”: " There should be few, ideally only one, excitatory synapses formed between a given axon and a given dendritic segment". I understand this to mean “a given cell and a dendritic segment” (1 axon per cell). So a cell forming several synapses to a segment of another cell is a biological possibility, but a bug not a feature. For that reason the implementation aims to do better than nature and allows only one synapse per segment-cell-pair, and this here is the code that does this, correct?

One more minor nitpick, still section “Materials and Methods”: In the paper it says: “Initialization: […] The permanence values of these potential synapses are chosen randomly”. The implementation seems to initialize permanence values with a constant.


– Rik


Hi Rik,

For (2) it may have been clearer to say “the network is initialized such that each segment contains a set of potential synapses to a randomly chosen subset of cells in the layer.” (omit the “i.e.”)

In the TM implementation (3), that variable is counting the number of weak matches to find the segment that was closest to becoming active (eq 5 in the paper). It could be renamed to avoid the confusion. I do think we may have shortcuts in the code where a permanence value of zero is treated as a non-potential synapse (e.g. sometimes we remove synapses when permanences hit zero).

This is a performance optimization. Most cells never come close to the maximum number of segments in most datasets, and the code is faster with fewer segments.

Yes, our prediction is that multiple connections from a given cell onto a single segment would be very rare. You’re right that we enforce it in the code.

That’s true - good catch.


Thanks for replying. Sorry to be dwelling on this particular point here
but it seems important:

It also means that effectively the reference implementation doesn’t
implement a concept of “potential synapse” or rather all pairs of
segment and cell are potential synapses.

This deviation from the paper and the biology isn’t significant, is it?



I think there are two issues there. In the past we’ve described implementations of the SP and TM that either have topology or are global. The description in the paper applies to a global TM, where cells in all minicolumns are within reach of any given cell. A cell in a TM with topology could only connect to cells in “nearby” minicolumns.

The second issue is the percentage of cells that are actually potential synapses to a given cell. In the global SP, the potential synapses for each column are determined ahead of time by randomly sampling from the input vector. In the TM, in theory we could do the same thing by preinitializing potential synapses for every segment in every cell (as described in the Initialization paragraph). In practice though this strategy would use up a huge amount of memory and slow things down quite a bit. In our implementation, we instead create synapses on the fly by creating segments when we need them, and randomly sampling from the set of active cells. It is not the same as initializing potential synapses in advance, but it is still a random sampling for each segment. This is described in the “Implementation details” paragraph in the paper.

I don’t know if this deviation is significant or not, though I suspect we would get similar results in both cases.