Column Pooler exceeds target SDR size when there is insufficient lateral support (and intertia is also insufficient). Why?

Hello, all. I am working on understanding the inner works of the ColumnPooler and am curious about why the ColumnPooler exceeds the target size when there is insufficient lateral support and not enough inertia from the prior state to achieve the target. I understand the how of as documented in the code (line 404 from column_pooler.py):

        # If we haven't filled the sdrSize quorum, add cells that have feedforward
        # support and no lateral support.
        discrepancy = self.sdrSize - len(chosenCells)
        if discrepancy > 0:
            remainingFFcells = np.setdiff1d(feedforwardSupportedCells, chosenCells)

            # Inhibit cells proportionally to the number of cells that have already
            # been chosen. If ~0 have been chosen activate ~all of the feedforward
            # supported cells. If ~sdrSize have been chosen, activate very few of
            # the feedforward supported cells.

            # Use the discrepancy:sdrSize ratio to determine the number of cells to
            # activate.
            n = (len(remainingFFcells) * discrepancy) // self.sdrSize
            # Activate at least 'discrepancy' cells.
            n = max(n, discrepancy)
            # If there aren't 'n' available, activate all of the available cells.
            n = min(n, len(remainingFFcells))

            if len(remainingFFcells) > n:
                selected = self._random.sample(remainingFFcells, n)
                chosenCells = np.append(chosenCells, selected)
            else:
                chosenCells = np.append(chosenCells, remainingFFcells)

I am just curious why you would not take the top-k (where k = target sdr size) based on on forward overlaps. This would still give a blend of possible states if the forward in ambiguous. It seems, though, the code is suggesting that you want more than “just a blend”. The test case for this assumes a 100% union (line 219 from column_pooler_test.py):

      # feed unions of patterns in objects A and B
        pattern = objectA[0] | objectB[0]
        self.pooler.reset()
        self.infer(feedforwardPattern=pattern)
        self.assertEqual(
            self._getActiveRepresentation(),
            representationA | representationB,
            "The active representation is incorrect"
        )

Thanks for any guidance.

EDIT: was calling it ContextPooler instead of ColumnPooler. Corrected.

It’s important that it activate an entire object SDR, not pieces of a few. Generally these SDRs are being read out by dendrites that expect at least 13/20 of the SDR to be active (though this depends on the parameters). A sampling of a few SDRs would likely lead to zero dendrites detecting a known SDR.

Also, the feedforward input doesn’t really provide a prioritized list of cells. All cells whose feedforward input is above a threshold are considered equal, which is why we sample randomly rather than using forward overlaps to rank them.

2 Likes

Thanks @mrcslws for answering. I apologize for the delay in replying. I wanted to make sure I understood your points and, if I had difficulty, that I could articulate my questions properly. Also, my real job got in the way.

You make two points and I wonder if I might treat them separately and in reverse order as I think the second is easier for me to ask about. You say,

Also, the feedforward input doesn’t really provide a prioritized list of cells. All cells whose feedforward input is above a threshold are considered equal, which is why we sample randomly rather than using forward overlaps to rank them.

I am wondering why you do not take advantage of the differences in overlap counts on forward input. Having a threshold is fine as it weeds out weak stimulus but if some inputs are just over the threshold while others blow past it, shouldn’t the latter count for more weight?

As for the random sample, i see that this would only apply when some outputs have already been chosen due to lateral support or inertia. Otherwise, it appears that the output would just be the whole forward input('s activated output based on threshold) which means 100% sampling. This may go back to my prior question, but if we have already started to specify the output with laterally supported forward input and inertia, wouldn’t we want to prioritize the remaining output based on the strength of the learned connections to the forward input?

Your first point is a little harder for me to dissect with the multiple references to SDR but I will give it a try:

It’s important that it activate an entire object SDR, not pieces of a few. Generally these SDRs are being read out by dendrites that expect at least 13/20 of the SDR to be active (though this depends on the parameters). A sampling of a few SDRs would likely lead to zero dendrites detecting a known SDR.

When you say “It’s important that it activate an entire object SDR”, I am assuming your are referring to maximizing the ColumnPooler Output SDR. This is because “these SDRs are being read out by dendrites that expect at least 13/20 of the SDR to be active”. The papers do not have another layer on top of the column pooler’s output (though we might envision a classifier). Are you referring to other ColumnPoolers and ApicalTiebreakMemories? If so I can start to see your point that not having sufficient ColumnPooler output might lead to understimulation of the peer ColumnPoolers or the dependent ApicalTiebreakers.

I am curious though if this plays out in experimentation. If the output was limited to the target number of bits, backfilled at the last step by unsupported forward input based on the highest overlaps scores, might this not lead to better learning (strong signals to learn against or none at all)? Is the purpose of going past the target number of output bits to jump start learning with the hope (or intuition) that weak signals across cases might build up into a strong signal in general?

I am still working through these ideas so I am sorry if I am not explaining myself well or am not picking up on more fundamental assumption about the code or the biology. Thanks for any help in improving my understanding.

I was discussing this with @klokare offline, and we found that this is one of the differences between HTM and BrainBlocks. In BrainBlocks, for our PatternPooler, PatternClassifier, ContextLearner and SequenceLearner blocks, when receiving feedforward proximal inputs, we use k-WTA activation strategies where the overlap scores are prioritized and the k-highest scored cells are activated (subject to any specific rules for the “block”).

Of course, this approach doesn’t work if you’re trying to maintain other properties of the SDR such as its “distributedness” through local area density or activating neglected cells through the boosting method of the SpatialPooler.