Suggest change to Sequence Memory algorithm

sequence-memory
#1

I have noted a problem with the Temporal Memory learning algorithm which under some circumstances can cause a previously unseen input to make a prediction even when it is not “semantically close” to any previous input. (And, perversely, if this prediction is actually correct this causes more trouble – inability to distinguish sequences - than if the prediction is wrong, in which case the algorithm recovers the situation, making the correct distinction.)

The problem arises when training sequences are repeated. Let us take a very simple example: we try to learn two sequences “A,B,C” and “D,B,E”, and do this by repeating the first sequence n times, and then repeating the second sequence n times. i.e. training sequences: “A,B,C”, “A,B,C”, … then “D,B,E”, “D,B,E” … (And by so doing we aim to be able to input “A,B” and predict "C, and input “D,B” and predict E.)

For sequence 1: “A,B,C”:

  • The first input, A, has no cells in a predictive state so its TM columns will burst, and winner cells chosen at random. No cells are put in the predictive state because A has never been seen before.
  • When B is input, since there are no predictive cells, its TM columns will burst, and winner cells chosen at random. A new segment will then created on each of these winner cells with synapses connecting it to the winner cells from input A.
  • C is processed in a similar manner.

For sequence 2: “A,B,C”:

  • The first input, A, has no cells in a predictive state so its TM columns will burst, and winner cells chosen at random. The bursting of columns will include those columns that won in this stage of learning sequence 1 and so a prediction of B will be made.
  • When B is input, its columns match the predictive cells and so the winner cells are the predictive cells and the segments on these cells are grown/strengthened. The problem is that this is done with respect to the winner cells from input A (of sequence 2), and not the cells that actually caused the prediction of B, which were the winner cells from sequence 1’s A. This causes extra synapses to be grown on each segment which are not really needed (i.e. it creates synapses to multiple cells in the same pre-synaptic Column). This can become a problem as will be seen below.

Sequences 3…n are processed as explained for sequence 2.

For sequence n+1: “D,B,E”:

  • The first input, D, has no previous predictions so its TM columns will burst, and winner cells chosen at random. Even though this input has never been seen before, and let us say its SDR coding shares just a few columns with input A’s coding (not enough to be semantically significant) then B might be predicted from D because the activation threshold can be reached on the segments on the winner cells of B from sequence 1, because the repeated learning of the sequence A,B,C has caused multiple synapses to be grown to multiple cells in the same column of A’s coding, and if just a few of these columns are shared with D this can be enough to reach the activation threshold from the bursting of D which activates all the cells in each column of D.
  • The second input, B, happens accidentally to match the prediction and so the winner cells will be the predicted cells (which originally came from the winners of the B input in sequence 2). The segments on these cells will grow/strengthen synapses to the winner cells of input D. This is a problem because we have now lost the ability to distinguish “A,B” from “D,B”. And so at the end of the whole learning process when we input either “A,B” or “D,B” and ask to infer a prediction the answer will be the union of C and E in both cases, which is not what we want.

I have found I can mitigate the above effect by reducing the maxSynapsesPerSegment to just above the number expected number of active columns for a single input, but this is a bit of a fudge.

I wish to propose the following ways the TM algorithm could be improved (choose one not both):

  1. Process bursting columns differently from non-bursting columns. For bursting columns, if some cells are put in a predictive state, if some (or all) of these predictive cells are in columns in the next input, then strengthen the permanence of the synapses on segments that connect between each correctly predicted cell and all existing pre-synaptic cells that are in a column belonging to the previous input. Do not add new synapses to any correctly predicted cell when the previous input burst.
  2. Without detecting or processing bursting columns differently, when deciding whether to put a cell into a predictive state, when evaluating whether the number of matching pre-synaptic cells that reach the required threshold; only count one cell per pre-synaptic column.

I think option 2 will be a bit faster to implement, but option 1 is better really as it gets to the root cause of the problem (it stops unnecessary synapses from being created whereas option 2 just tries to mitigate the effect of having unnecessary synapses).

2 Likes
#2

If I understand correctly, I think the official TM algorithm already does this. If a segment has “max new synapse count” active synapses when its post-synaptic cell becomes predicted active, then it will not grow any additional synapses.

The relevant pseudocode from BAMI:

46.  newSynapseCount = (SAMPLE_SIZE -
47.              numActivePotentialSynapses(t-1, learningSegment))
48.  growSynapses(learningSegment, newSynapseCount)

Here, “SAMPLE_SIZE” is the more commonly referred to “max new synapse count”. When a new dendrite segment is grown, it connects “max new synapse count” synapses with winners from the previous timestep. If that same cell is later correctly predicted (such as after some bursting minicolumns), newSynapseCount will evaluate to zero in the above equation, so no new synapses will be grown.

In your example, you are not using resets, which means you are also convoluting the original topic with the “repeating inputs” problem (this makes things a little more difficult to visualize). Not a big issue, but thought I would mention it. Lets walk through your example with the above point from the TM algorithm in mind (we’ll still forego any resets since that is the scenario you described). I’m assuming (as you did in your example) that the system is configured for one-shot learning.

  1. Input A bursts, and winner cells A’ are selected
  2. Input B bursts, and winner cells B’ are selected, connecting to A’
  3. Input C bursts, and winner cells C’ are selected, connecting to B’
  4. Input A bursts, and winner cells A’’ are selected, connecting to C’. Cells for B’ are predicted (because A’ is also active due to the bursting A minicolumns)
  5. Input B causes cells B’ to become active, and they strengthen their connection to A’. (note: no connections to A’’ are created, for the reasons described above). Cells for C’ are predicted.
  6. Input C causes C’ to become active. Cells for A’’ are predicted.
  7. Input A causes A’’ to become active. No predictions (see step 5 for why)
  8. Input B bursts, and winner cells B’’ are selected, connecting to A’’. Cells for C’ are predicted (because B’ is also active doe to the bursting B minicolumns)
  9. Input C causes C’ to become active, and they strengthen their connection to B’. (note: no connections to B’’ are created, for the reasons described above). Cells for A’’ are predicted.

Rinse and repeat. Hopefully you see the pattern. Each time the long ABCABCABCABC sequence reaches the end of what it has learned, there is a burst, one more element is added to the end of the sequence, and you then cycle back through until you reach the end again. There is an unrelated issue with learning this type of repeating sequence without resets (but that is off topic, so you can search “repeating inputs” here on the forum if you are interested in more details about that)

Now lets move to sequence D, B, E:

  1. Input D bursts, and winner cells D’ are selected.
  2. Input B bursts, and winner cells B’’’ are selected, connecting to D’. Cells C’ and C’’ are predicted (because B’ and B’’ are also active due to the bursting B minicolumns)
  3. Input E bursts, and winner cells E’ are selected, connecting to B’’’
  4. Input D bursts, and winner cells D’’ are selected, connecting to E’. Cells for B’’’ are predicted (because D’ is also active due to the bursting D minicolumns)
  5. Input B causes cells B’’’ to become active, and they strengthen their connection to D’. (note: no connections to D’’ are created, for the reasons described above). Cells for E’ are predicted.
  6. Input E causes cells E’ to become active. D’’ is predicted.
  7. Input D causes D’’ to become active. No predictions (see step 5 for why)
  8. Input B bursts, and winner cells B’’’’ are selected, connecting to D’’. Cells for E’, C’, and C’’ are predicted (because B’’’, B’’, and B’ are also active)
  9. Input E causes cells E’ to become active and strengthen their connection to B’’’. (note: no connections to B’’’’ are created, for the reasons described above). Cells for D’’ are predicted.

As you can see, the behavior for this one is very similar to the previous repeating sequence, and the connections formed do not cause any ambiguity between the two. When a burst happens on any shared elements (B in this scenario), you get multiple predictions due to the lost context, but the system quickly recovers in the subsequent timesteps.

Let me know if anything is inaccurate with this example (it is possible that I still have gaps in my understanding, so always good to be corrected when I get things wrong).

1 Like
#3

Thank you for your reply which I am still digesting – however I wish to clarify that I am using a reset after each sequence, i.e.: learn A,B,C; tm.reset, learn A,B,C again; tm.reset(); … learn D, B, E; tm.reset(); learn D, B, E again; tm.reset() … etc.

I will reply again later when I have examined your response further.
Impressed with your fast and detailed response!

#4

Oh, cool. It is a lot simpler with the reset. That should play out like so:

  1. Input A bursts, and winner cells A’ are selected
  2. Input B bursts, and winner cells B’ are selected, connecting to A’
  3. Input C bursts, and winner cells C’ are selected, connecting to B’
  4. Reset
  5. Input A bursts. Cells B’ become predictive, because A’ is active.
  6. Input B causes cells B’ to become active. Cells C’ become predictive.
  7. Input C causes cells C’ to become active. No predictions.
  8. GOTO step 4

Thus the following representations have been connected:
A’ -> B’ -> C’

  1. Input D bursts, and winner cells D’ are selected.
  2. Input B bursts, and winner cells B’’ are selected, connecting to D’. Cells for C’ are predicted (because B’ is also active due to the bursting B minicolumns)
  3. Input E bursts, and winner cells E’ are selected, connecting to B’’
  4. Reset
  5. Input D bursts. Cells B’’ become predictive, because D’ is active.
  6. Input B causes cells B’’ to become active. Cells E’ become predictive.
  7. Input E causes cells E’ to become active. No predictions.
  8. GOTO step 4

Thus the following (unambiguous) representations have been connected:
D’ -> B’’ -> E’

1 Like
#5

I am confused as to which source code you are citing? (and where I can find it?) I am looking a the python source code on github at: https://github.com/numenta/nupic/blob/master/src/nupic/algorithms/temporal_memory.py this has the following lines of code for the method “_burstColumn”: (starting at line 608)

def _burstColumn(...)
...

608 if columnMatchingSegments is not None:
      numActive = lambda s: numActivePotentialSynapsesForSegment[s.flatIdx]
      bestMatchingSegment = max(columnMatchingSegments, key=numActive)
      winnerCell = bestMatchingSegment.cell

      if learn:
        cls._adaptSegment(connections, bestMatchingSegment, prevActiveCells,
                          permanenceIncrement, permanenceDecrement)

        nGrowDesired = maxNewSynapseCount - numActive(bestMatchingSegment)

        if nGrowDesired > 0:
          cls._growSynapses(connections, random, bestMatchingSegment,
                            nGrowDesired, prevWinnerCells, initialPermanence,
                             maxSynapsesPerSegment)

In this version nGrowDesired is the equivalent of the variable “newSynapseCount” in your response, but it is not calculated in the way that you have indicated (hence my problem).

Am I using the wrong version of the Temporal Memory algorithm? (I am using what is provided by you in your nupic docker image, which includes the source code that I have cited above.)

1 Like
#6

It looks the same to me (am I missing something?) I was referencing BAMI pseudocode, you are showing actual NuPIC code.

nGrowDesired = maxNewSynapseCount - numActive(bestMatchingSegment)

This line means a winning segment will only grow new synapses if less than maxNewSynapseCount synapses on it are active. The first time a transition is learned, this will grow maxNewSynapseCount new synapses. Then when that same segment is activated later after bursting, there will already be maxNewSynapseCount synapses active, so nGrowDesired will be zero (and no new synapses will be grown).

1 Like
#7

OK, glad we are on the same page. So now I can see how we both interpret the line you have cited in different ways. You appear to be assuming that the number of pre-synaptic winner cells available for a connection exceeds maxNewSynapseCount. In my particular SDR coding (with a low sparsity) it does not. So although the first time around a value of nGrowDesired of 255 (my setting of maxNewSynapseCount) is passed to method _growSynapse, only a much smaller number of synapses are initially established - due the calculation of nActual in the code quoted below:

759 def _growSynapses(cls, connections, random, segment, nDesiredNewSynapes,
prevWinnerCells, initialPermanence, maxSynapsesPerSegment):

...

775 candidates = list(prevWinnerCells)

    for synapse in connections.synapsesForSegment(segment):
      i = binSearch(candidates, synapse.presynapticCell)
      if i != -1:
        del candidates[i]

    nActual = min(nDesiredNewSynapes, len(candidates))

    # Check if we're going to surpass the maximum number of synapses.
    overrun = connections.numSynapses(segment) + nActual - maxSynapsesPerSegment
    if overrun > 0:
      cls._destroyMinPermanenceSynapses(connections, random, segment, overrun,
                                        prevWinnerCells)

    # Recalculate in case we weren't able to destroy as many synapses as needed.
    nActual = min(nActual,
                  maxSynapsesPerSegment - connections.numSynapses(segment))

    for _ in range(nActual):
      i = random.getUInt32(len(candidates))
      connections.createSynapse(segment, candidates[i], initialPermanence)
      del candidates[i]

This then allows room for the unwanted synapses to be grown when the same sequence is learned again causing the problem I have reported.

1 Like
#8

Are you describing the scenario where maxSynapsesPerSegment is less than maxNewSynapseCount (or alternately, where number of active minicolumns is less than maxNewSynapseCount)? I don’t think I’ve ever run the algorithm configured that way, so maybe someone else has better insights?

1 Like
#9

I have maxSynapsesPerSegment and maxNewSynapseCount=255 (default settings).
However I do have the number of active minicolumns less than maxNewSynapseCount.

As I said in my first post, I can fudge the issue by reducing maxSynapsesPerSegment to just above the number expected number of active mini-columns. This works fairly well but is not ideal because there is a random process of selecting which synapses will be destroyed to make room for the new ones, and there is no guarantee that this process will result in some synapses for each field of the SDR (I have a SDR comprising 8 fields each with their own encoding). Thus sometimes a field might be missing impairing performance. I am not currently using a Spatial Pooler (partly to make it easier to understand what the TM is doing! - its so much easier to follow when the columns of the TM predictions correspond with the input columns ). It is not clear to me whether introducing a Spatial Pooler would solve this problem?

I could reduce the maxNewSynapseCount to the number of expected active mini-columns, but I do not think this would solve the problem. This would still allow enough unwanted “duplicate” synapses (different cells of the same columns) to be created to cause the reported problem.

2 Likes
#10

Interesting. I never realized that was the default – this isn’t a very good value to use for a typical 2048 minicolumn TM layer where only around 40 minicolumns are active per timestep. Having this value very high, each cell would connect to every winning cell in the previous timestep, which is kind of overkill. It isn’t necessary to connect to every bit in an SDR to be able to recognize it.

–EDIT-- Looks like the default should be 20:

1 Like
#11

You are of course correct, and I was mistaken when I said 255 was the default value (but it was the value I used all the same). I have just tried reducing it to 20, and can now confirm that this does not solve the problem (as it still allows unwanted “duplicate” synapses to be added).

I can see from your last response that (from your experience of what is “normal”) that the issue is partly to do with the way I have done the encoding, which is too sparse, such that correct distinction between inputs does require nearly all the bits to be connected to a segment. I will try changing the encoding next. However, in my (inexperienced) opinion there is still room for improvement of the TM algorithm along the lines I have suggested. Having a higher sparsity in the encoding may make the problem occur less frequently but I do not think it will remove it altogether. I will do some more experiments.

2 Likes
#12

There really is no “default”. For our Quick Start, we use:

maxNewSynapseCount=20
1 Like
#13

If I understand the algorithm, as long as you keep maxNewSynapseCount equal to or less than the number of active minicolumns (and less than or equal to maxSynapsesPerSegment), it should behave as I described above. I could be wrong about that though (my implementation has been incorrect on other points in the past).

2 Likes
#14

Just tried your latest suggestion of reducing maxNewSynapseCount to the number of active mini-colomns, keeping maxSynapsesPerSegment at the default 255 - AND IT WORKS PERFECTLY!

I see now that this follows from

617 nGrowDesired = maxNewSynapseCount - numActive(bestMatchingSegment)

as you said before, this causes nGrowDesired to become zero when a training sequence is repeated, which solves the problem.

Thank you very much for this advice.

PS It would be a good idea to put this guidance in the source-code/documentation so others will know how to set this configuration parameter.

3 Likes