Thank you @subutai for your response.
Hi @subutai,
what is active presynaptic cells?
âThe matrices âdelta Dâ are added to the current matrices of permanence values at every time stepâ means delta D will be added with matrix D, that is correct?
In part 5 of paper, eq( 6) is doing reinforcing and punishing, and eq(7) is do something like punishing, that is correct?
@Niki There are quite a few posts on the forum mentioning presynaptic cells. Our search page is quite good.
If cell a connects to cell b (i.e. cell a forms a synapse onto cell bâs dendrite), then cell a is a presynaptic cell for cell b. Cell b will be a postsynaptic cell for cell a.
Yes. Delta D^d_ij will be added to D^d_ij
Yes, thatâs right. Eq 7 is a much weaker punishment in case a cell was predicted but did not become active later.
No, an HTM Neuron can be active and predictive at the same time.
Iâve just started reading the paper, however Iâm wondering what operation is denoted by the slash in the graphic.
I think it would become obvious later on, but itâs important to me while Iâm still at pages 3-4 to already know it.
It just denotes a linear operation, as opposed to a threshold. The proximal synapses (green) tend to have a linear impact on the cell body. In this paper, we focused almost completely on the blue context synapses and didnât discuss the proximal ones too much.
Hi, I am new to HTM community.
Several years ago, I read the HTM whitepaper published in 2010. Is there any update on the theory between that and this paper, which is published in 2016?
Yes, see our latest paper: Why Does the Neocortex Have Layers and Columns, A Theory of Learning the 3D Structure of the World.
Thanks but I know there were some updates on the theory between the 2016 paper and the latest paper which is very interesting. I am asking what has changed between 2010 and 2016, because the following page says that the whitepaper is obsolete.
https://numenta.com/papers-videos-and-more/resources/hierarchical-temporal-memory-white-paper/
Sorry but I am unhappy with the âmatricesâ D ^d_ij and A being considered âmatricesâ. D is supposed to be, more or less, a matrix as indicated by the indices âijâ and each element of this âmatrixâ is again supposed to be a matrix, ostensibly forming some sort of matrix of matrices or tensor of rank 4. We then see some operation of âelement-wise multiplication with subsequent summationâ being used in equations (4) to (7).
While, unlike with vectors, there is no exact mathematical definition of a âmatrixâ other than âsome numbers written in a rectangular formâ using the notation does imply at least one thing: That the numbers in each of the columns are somehow related to one another in some meaningful relationship and that the numbers in each of the rows are somehow related to one another in another relationship.In addition, a matrix usually represents a linear map and has some matrix multiplications happening for it.
As for the relationship of the columns in the the matrix D (and by extension, each of the matrices D_ij and the matrix A), thatâs easy, they are the cells in the cortical mini-columns and their co-location in a mini/matrix column plays a role in some sort of inhibition process that activates only some of them on input in each iteration. As for the rows however, they have no meaning. So this âD_ij matricesâ business is a red herring.
Wouldnât it be better to consider the entire set of cells in the temporal memory, and hence also the synapses on the dendrites, each to be a vector of length N x M. With e.g. 32 cells per mini-column, the first 32 elements of such a vector would represent the cells of the first mini-column and so on. The weird âelement-wise multiplication with subsequent summationâ would then be a simple vector dot product. And D would be a vector of vectors or simple matrix. Incidentally it seems that that is more or less how itâs implemented in the NuPIC reference implementation, albeit probably only for some reason of coding convenience. See e.g. the âsegToColâ function.
This notation would also highlight similarities to the matrix that is the spatial pooler (also called âDâ in the spatial pooler paper) and its learning algorithms, and allow for further investigation of the algorithmâs relationship to linear algebra.
â Rik
Hi Rik,
I agree the math notation could be clearer. I actually started with the above as itâs closer to the code, but then moved away from it. I donât remember the exact reason, but it could have been that the section âComputing cell statesâ became more complex as you need some way to index cells within minicolumns. Perhaps I could have used D^ijd instead of D^d_ij. What we really need is the equivalent of numpy.reshape in our math notation!
Iâm definitely open to simpler notation - if you want to try writing out the whole thing using your suggestion, Iâm open to looking at it, and perhaps switching to using it in the future.
BTW, it was even worse in the Columns paper as we also had to index the cortical column, describe two layers, and deal with external, intercolumn and intracolumn connections!
Thanks!
Thanks for replying. Now that âI have your permissionâ to think of
segments as vectors Iâll see what I come up with.
Hi @subutai,
Sorry another worry. I see 3 different definitions of âpotential synapseâ:
- In the section âSynaptic Learning Ruleâ the paper says: âFor each dendritic segment we maintain a set of âpotentialâ synapses between the dendritic segment and other cells in the network that could potentially form a synapse with the segment. [âŚ] A permanence value close to zero represents an axon and dendrite with the potential to form a synapse but that have not commenced growing one.â So far, so god.
- Now, later, in the section âMaterials and Methodsâ the paper says: âthe network is initialized such that each segment contains a set of potential synapses (i.e., with nonzero permanence value) to a randomly chosen subset of cells in the layer.â Should it say âi.e., with potentially nonzero permanence valueâ? Because according to definition (1) a âpotential synapseâ can have a permanence value of zero? Or is the difference between âclose to zeroâ and âzeroâ significant in this context?
- In the reference implementation there is a variable ânumActivePotentialSynapsesForSegmentâ that plays a role in the learning bits. This variable seems to reflect the set of permanence values greater than zero (or rather epsilon), more or less, where a permanence value can be recorded for any pair of segment and cell.
It seems definition (3) doesnât match (1) or (2) and that that variable should be called something else. It also means that effectively the reference implementation doesnât implement a concept of âpotential synapseâ or rather all pairs of segment and cell are potential synapses. This seems to be a departure from the biology described earlier in the paper. It might be an implementation shortcut. Could this be affecting the functioning of the temporal memory in any way.
Two more clarifications about the implementation please:
- There is a mechanism for creating and destroying âsegmentsâ. The temporal memory starts with zero segments per cell and gradually adds them to cells as patterns are being learned, up to a point of some maximum segments per cell, upon which a regime of forgetting and renewal sets in involving the variable lastUsedIterationForSegment that maintains a constant segment count per cell. Why all this code for the âramp upâ to the maximum, would it be simpler to initialize every cell with the maximum upfront? Presumably you expect temporal memories to have widely varying counts of segments across cells so that there is no good initial upfront maximum value for all cells?
- Also the paper says, in the section âTestable Predictionsâ: " There should be few, ideally only one, excitatory synapses formed between a given axon and a given dendritic segment". I understand this to mean âa given cell and a dendritic segmentâ (1 axon per cell). So a cell forming several synapses to a segment of another cell is a biological possibility, but a bug not a feature. For that reason the implementation aims to do better than nature and allows only one synapse per segment-cell-pair, and this here is the code that does this, correct?
One more minor nitpick, still section âMaterials and Methodsâ: In the paper it says: âInitialization: [âŚ] The permanence values of these potential synapses are chosen randomlyâ. The implementation seems to initialize permanence values with a constant.
Thanks
â Rik
Hi Rik,
For (2) it may have been clearer to say âthe network is initialized such that each segment contains a set of potential synapses to a randomly chosen subset of cells in the layer.â (omit the âi.e.â)
In the TM implementation (3), that variable is counting the number of weak matches to find the segment that was closest to becoming active (eq 5 in the paper). It could be renamed to avoid the confusion. I do think we may have shortcuts in the code where a permanence value of zero is treated as a non-potential synapse (e.g. sometimes we remove synapses when permanences hit zero).
This is a performance optimization. Most cells never come close to the maximum number of segments in most datasets, and the code is faster with fewer segments.
Yes, our prediction is that multiple connections from a given cell onto a single segment would be very rare. Youâre right that we enforce it in the code.
Thatâs true - good catch.
Thanks for replying. Sorry to be dwelling on this particular point here
but it seems important:
It also means that effectively the reference implementation doesnât
implement a concept of âpotential synapseâ or rather all pairs of
segment and cell are potential synapses.
This deviation from the paper and the biology isnât significant, is it?
Thanks
I think there are two issues there. In the past weâve described implementations of the SP and TM that either have topology or are global. The description in the paper applies to a global TM, where cells in all minicolumns are within reach of any given cell. A cell in a TM with topology could only connect to cells in ânearbyâ minicolumns.
The second issue is the percentage of cells that are actually potential synapses to a given cell. In the global SP, the potential synapses for each column are determined ahead of time by randomly sampling from the input vector. In the TM, in theory we could do the same thing by preinitializing potential synapses for every segment in every cell (as described in the Initialization paragraph). In practice though this strategy would use up a huge amount of memory and slow things down quite a bit. In our implementation, we instead create synapses on the fly by creating segments when we need them, and randomly sampling from the set of active cells. It is not the same as initializing potential synapses in advance, but it is still a random sampling for each segment. This is described in the âImplementation detailsâ paragraph in the paper.
I donât know if this deviation is significant or not, though I suspect we would get similar results in both cases.