I have noted a problem with the Temporal Memory learning algorithm which under some circumstances can cause a previously unseen input to make a prediction even when it is not “semantically close” to any previous input. (And, perversely, if this prediction is actually correct this causes more trouble – inability to distinguish sequences - than if the prediction is wrong, in which case the algorithm recovers the situation, making the correct distinction.)
The problem arises when training sequences are repeated. Let us take a very simple example: we try to learn two sequences “A,B,C” and “D,B,E”, and do this by repeating the first sequence n times, and then repeating the second sequence n times. i.e. training sequences: “A,B,C”, “A,B,C”, … then “D,B,E”, “D,B,E” … (And by so doing we aim to be able to input “A,B” and predict "C, and input “D,B” and predict E.)
For sequence 1: “A,B,C”:
- The first input, A, has no cells in a predictive state so its TM columns will burst, and winner cells chosen at random. No cells are put in the predictive state because A has never been seen before.
- When B is input, since there are no predictive cells, its TM columns will burst, and winner cells chosen at random. A new segment will then created on each of these winner cells with synapses connecting it to the winner cells from input A.
- C is processed in a similar manner.
For sequence 2: “A,B,C”:
- The first input, A, has no cells in a predictive state so its TM columns will burst, and winner cells chosen at random. The bursting of columns will include those columns that won in this stage of learning sequence 1 and so a prediction of B will be made.
- When B is input, its columns match the predictive cells and so the winner cells are the predictive cells and the segments on these cells are grown/strengthened. The problem is that this is done with respect to the winner cells from input A (of sequence 2), and not the cells that actually caused the prediction of B, which were the winner cells from sequence 1’s A. This causes extra synapses to be grown on each segment which are not really needed (i.e. it creates synapses to multiple cells in the same pre-synaptic Column). This can become a problem as will be seen below.
Sequences 3…n are processed as explained for sequence 2.
For sequence n+1: “D,B,E”:
- The first input, D, has no previous predictions so its TM columns will burst, and winner cells chosen at random. Even though this input has never been seen before, and let us say its SDR coding shares just a few columns with input A’s coding (not enough to be semantically significant) then B might be predicted from D because the activation threshold can be reached on the segments on the winner cells of B from sequence 1, because the repeated learning of the sequence A,B,C has caused multiple synapses to be grown to multiple cells in the same column of A’s coding, and if just a few of these columns are shared with D this can be enough to reach the activation threshold from the bursting of D which activates all the cells in each column of D.
- The second input, B, happens accidentally to match the prediction and so the winner cells will be the predicted cells (which originally came from the winners of the B input in sequence 2). The segments on these cells will grow/strengthen synapses to the winner cells of input D. This is a problem because we have now lost the ability to distinguish “A,B” from “D,B”. And so at the end of the whole learning process when we input either “A,B” or “D,B” and ask to infer a prediction the answer will be the union of C and E in both cases, which is not what we want.
I have found I can mitigate the above effect by reducing the maxSynapsesPerSegment to just above the number expected number of active columns for a single input, but this is a bit of a fudge.
I wish to propose the following ways the TM algorithm could be improved (choose one not both):
- Process bursting columns differently from non-bursting columns. For bursting columns, if some cells are put in a predictive state, if some (or all) of these predictive cells are in columns in the next input, then strengthen the permanence of the synapses on segments that connect between each correctly predicted cell and all existing pre-synaptic cells that are in a column belonging to the previous input. Do not add new synapses to any correctly predicted cell when the previous input burst.
- Without detecting or processing bursting columns differently, when deciding whether to put a cell into a predictive state, when evaluating whether the number of matching pre-synaptic cells that reach the required threshold; only count one cell per pre-synaptic column.
I think option 2 will be a bit faster to implement, but option 1 is better really as it gets to the root cause of the problem (it stops unnecessary synapses from being created whereas option 2 just tries to mitigate the effect of having unnecessary synapses).