Visualizations may take a while to draw, so let me start with a quick analysis of how one input A,A,A,A,… repeating can be represented. This may be enough to demonstrate if and where I am going wrong in my understanding of the algorithm with respect to capacity.
In the TM algorithm, when an input is completely unpredicted, all the cells in that input’s minicolumns activate (which the algorithm calls bursting), and one winning cell per minicolumn is chosen to represent the input in that context. The cell chosen for each minicolumn is chosen from the cell(s) which have the fewest distal segments, using a random tie breaker.
Suppose we have determined the first three inputs of A, and all cells now have one distal segment each. Let’s call these representations A(1), A(2), and A(3). If your layer dimensions were 4 minicolumns per input and 3 cells per minicolumn (and assuming SP boosting is not enabled), the representations might be something like this (we do not need to consider the other 196 other inactive minicolumns, since they will never have active cells in them in this scenario)
If “A” is input a fourth time, and a representation is chosen for A(4), the chances of the four random tiebreakers resulting in exactly one of these three representations is 3 / 3^4 (“number of repeated inputs” / “all possible representations”). Then for A(5) it would be 4 / 81, then for A(6) it would be 5 / 81, etc. The numerator increases by one for each new set of tiebreakers (i.e, the longer the sequence, the higher the likelihood of randomly selecting a representation that has already been used in the sequence).
What are the chances of this happening in the layer dimensions you mentioned earlier? The number of possible representations for 20 minicolumns containing 32 cells each is 32 ^ 20. Thus, in a layer of this size, it is infinitesimally likely that any element in a sequence repeating inputs would by random chance happen to be exactly the same as a previous representation, until that sequence has become astronomically long.
With this in mind, lets consider an HTM system with the dimensions that you mentioned earlier, trained on a repeating input A, A, A, A, A…
Lets train it like this, where we start with some other input Z after a reset, then increase the number of A’s one at a time. This will ensure that we always start with the A(1) representation (versus a burst in the A minicolumns which requires multiple iterations to disambiguate). Training the system this way will speed up the training process in case you want to repeat this experiment with NuPIC. The different representations involved will be learned like so:
(reset) Z → A(1)
(reset) Z → A(1) → A(2)
(reset) Z → A(1) → A(2) → A(3)
(reset) Z → A(1) → A(2) → A(3) → A(4)
…
(reset) Z → A(1) → A(2) → (…) → A(31) → A(32)
Now, lets assume we set the activation threshold to 1 (lowest capacity). When we begin inputting the next sequence into the pattern:
(reset) Z → A(1) → A(2) → (…) → A(32) → A(33)
The representation for A(33) will consist of random bits that are all each contained in one of the previous 32 representations for A (since all cells in the A minicolumns at this point will have been used exactly once) Thus, when A(33) becomes active, some combination of the previous A representations will become predictive, and the A minicolumns will never burst again (and thus no new representations for A will be chosen).
Thus the capacity of a system of this size and configuration (for a single repeating input) is 32 transitions. Hopefully it is intuitive that if the sequence has more diversity than just one repeating input (or is using SP boosting), the number of transitions would be larger than that. So 32 (i.e. the number of cells per minicolumn) represents a lower bound of capacity for this size of system, when the activation threshold is equal to one.
Now consider the other extreme, and assume the activation threshold is 20 (i.e. the number of minicolumns per input) (and max synapses per segment is at least 20). Now each segment fully connects to each representation and can uniquely distinguish them, so we are bounded only by the number of distal segments that a cell is allowed to grow. This can be set to any arbitrary value (assuming non biological plausibility).
Thus the capacity of a system of this size and configuration (for a single repeating input) with activation threshold 20 is 32 ^ 20 transitions (actually it will be something lower than this, because the unlikely random event I mentioned earlier of a previous representation being chosen will happen some time before then). This astronomical capacity of course comes at the cost of zero noise tolerance.
Adding in biological constraints, or configuring your system for various properties like noise tolerance, one-shot learning, etc. could place the capacity of a given HTM system with these same dimensions anywhere within this vast range of possibilities.
Does anyone see if and where I am going astray in my understanding of capacity?