Since I made the change in my TM implementation to allow activation of multiple cells in a single column in case more than one are in predictive state, I have been noticing a phenomenon that affects sparsity. For certain input sequences, multiple cells in a single column can get connected to the same context.
The easiest way to understand the problem is to consider a single input is repeated, then the columns for that input later reactivated due to bursting (there are many other ways to produce the phenomenon, but this is an easy one to describe and understand).
For example, let’s input sequence: A,A,A,A
- First input A is unexpected, so the columns burst. Cells representing A’ are selected randomly for learning.
- Second input A is also unexpected, so the columns for A burst. Cells representing A’’ are selected randomly for learning. They form distal connections with cells representing A’.
- Third input A is also unexpected, so the columns burst. Cells representing A’’’ are selected randomly for learning. They form distal connections with cells representing A’’. Cells for A’’ also become predictive, because bursting column means the cells for A’ are also now active
- Fourth input A is expected, and cells for A’’ become active. They grow distal connections with cells representing A’’’.
At this point we now have a circular connection between cells representing A’’ and cells representing A’’’. In the future, any time A bursts, it will put two cells in the column into predictive state, and thus activate and enforce both cells in the case of a correct prediction, and equally degrade them in case of a wrong prediction. Thus there does not appear to be any way for one of the cells to ever win out over the other one in representing the context.
Here are a couple of screenshots of this happening in practice. In this case, there was a more complex sequence, but you can see that the phenomenon has occurred in four of the “C” columns:
Looking at this, I can see that one of the predictive cells is better connected to the input than the other one. The possible solution that comes to my mind would be to only select the better matching one for learning, and degrade the less matching one. Does that seem like a good solution, or could it lead to problems in other scenarios?
Anyway, thought I would get some feedback. Maybe there is a process that I have overlooked, or I might be interpreting something incorrectly.