At first the TM would predict all 3 outcomes from A (‘B’, ‘C’, and ‘D’) because it wouldn’t have learned enough context to distinguish between the 3 A’s.
However if the sequence is repeated enough times, the TM will learn all of the context there is – so it will come to know that the ‘A after D’ is different from ‘A after B’ or ‘A after C’. Once it knows this A is ‘A after D’, it will predict more precisely (‘B’ only in that case).
These different version of A are sometimes denoted as A’, A’’, A’’’, etc.
Each A-version activates the same columns, but each activates different cells within the columns-- so A’ cells would connect to D’ cells, while A’’ would connect to B’ and A’’’ to C’.
For a more complex version of this pattern (for instance with multiple ‘ABA’ or ‘CAD’ subsequences), the TM will eventually learn to distinguish between them too. In those cases it would simply need to see the pattern repeated more times, in order to make those distinctions.