The 2016 Neuron Paper is very inspiring, but I still have a few questions after reading it and some posts in the forum, specifically:
for the learning/update rule for cells in the active column (equation (6) in the paper), it seems possible that more than one cell is selected to be learned (and ultimately have segment activities above threshold) when a winning column was unpredicted. However, based on other descriptions (e.g, Figure 3 in the paper), only one cell is learned. Does this depend on a specific set of hyperparameters (i.e, theoretically >1 cells could be learned)? Or there is some detail in the update rule that will ensure only one cell will be selected for learning?
In the paper it mentions that Even a small overlap (such as 20%) is highly significant and implies that the representations share significant semantic meaning. Is it based on the assumption that humans will learn SDR that have semantic mearning? e.g, if we assign random embedding vectors to each English word, it is hard to say words with higher overlap will be more similar. We have to learn the embedding through contrastive learning-like approach?
In the paper it mentions that The most critical parameters are the dendritic spike threshold and the number of synapses stored per pattern, does the number of synapses stored per pattern mean the number of potential connections on a dendritic segment?
Question 1) If an active column was not predicted then, by definition, there were no active segments in that column.
In this case the learning algorithm selects one cell in the column to learn the current context. It adds one new segment to that cell and populates the segment with synapses from the currently active inputs.
The HTM algorithm will also check for segments with potential (but not connected) synapses, and if there are enough active potential synapses such that the segment would have activiated if they were connected synapses, then it will select that segment to learn.
If an active column was predicted then there could be more than one active segment in the column, in which case all of the active segments will learn.
Thanks, David! So for Question 1), the general principle is that we will intentionally select one segment of one cell to learn, is that correct? If so,
If an active column was not predicted, we will select one cell (which might already or might not have one segment with the most potential synapses) to learn. i.e, we apply equation (6) in the paper only to the one segment of the one selected cell in the active column?
If an active column was predicted, are there two cases?
Case 1, it was unpredicted at the beginning, and become predicted through learning. And because we were selecting one segment of one cell to learn at the beginning, only the synapses of that segment is enhanced. Therefore, only that segment will be learned.
Case 2, it was predicted at the beginning (if we allow initial permanence to be above threshold), in this case, there could be multiple segments (and even from multiple cells??) to be active and learned
Multiple segments can learn on the same cell / column.
The algorithm will only intentionally create one segment at a time, but while it’s running it might encounter ambiguous inputs and then all active segments will learn.
Ah, I see. Got it! Assuming there is only one pattern, I was confused about whether our learning is:
Learn any synapses between previous active cells and all cells in current active column
Selectively only learn synapses between previous active cells and one cell in current active column
Sounds like we are doing Option 2. Do we know if there are any phsiological evidence that the brain is doing Option 2 (i.e, neurons intentionally vote and select among peers to enhance connections)?
Yes, the theory is that the predicted cells (the ones with active segments) will activate/fire much faster than unpredicted cells, because the active segments bring the cell closer to its activation threshold. And once cells start activating they will activate inhibitory neurons which in turn inhibits the entire mini-column.
To clarify, we would like only a single segment per column to learn.
But that’s just not feasible, as a single segment represents (a partial knowledge of) a single context, and we just don’t know whether that context is what we’re currently encountering, without the future inputs that may disambiguate for us to know what the correct context was in retrospect.
So the best we can do is to reinforce all active segments in active columns. Conversely, we penalize active segments in inactive columns as they are likely to be wrong.
TL;DR: we want the single segment that represents the correct context to learn, but we just don’t know what the correct context is.
PS: backpropagation is useful exactly for this purpose. I vaguely remember Numenta employing something like BP for TM in the past, but I could be wrong.