Measuring/Testing TM Sequence storage capacity?

I know how to measure TM Transitions capacity :

 TCapacity = (nseg * pat-per-seg) / sparsity

My question is how to measure, probably impossible … then how to test the SeqCapacity of specific implementation of TM.

My conundrum is because there are many possible sequence “types/forms” f.e. :

    aaaaaaaa........
    ababab...........
    ab***ab****ab****.........
    ab*c***ab*c***ab*c.........
    and so on .........

OR what about of sequence that combines all of the above ?

what sort tests do you devise and what sort of statistics or other measures do you use ?

1 Like

Yes, this makes the testing of TM inherently subjective to some extent.

When I think of sequence “types/forms”, I imagine two basic parameters:

  • degree of noise/randomness present

  • length & complexity of patterns present

So you could test sequences which are:

  1. low noise & low complexity (which would consume the least resources)

  2. high noise & high complexity (which would consume the most)

  3. low noise & high complexity (somewhere in between)

  4. high noise & low complexity (somewhere in between)

Then there’s the question of how to measure the level of resources consumed. I think the number of TM distal segments is a good basic one.

2 Likes

how do you represent and calculate complexity in the sequence ?

Then there’s the question of how to measure the level of resources consumed. I think the number of TM distal segments is a good basic one.

yeah , I would test using different number of segments.

I basically mean how much overlap there is between the sequence elements.

So take two repeating sequences of equal length (seq.1 & seq.2) , say:

A, B, C, D, X, B, C, Y, …
and
A, B, C, D, E, F, G, H, …

Both are 8 time steps long, but seq.1 has repeating elements (B and C).
This means that to fully learn seq,1 a TM has to learn to distinguish B after A from B after X. and C before D from C before Y. The 2 learned B’s and C’s mean there are 2 winner cells (and thus 2 segments) in all B-columns and all C columns.

Seq.2 however has no repeating elements, so just 1 segment in the all A-H columns.
This means TM has fewer segments, and also that it learn the precise sequence faster.

I’d measure seq complexity by how long it takes the TM to stabilize. Meaning how long before the anomaly score settles at 0 and the prediction count settles at 1. This should theoretically happen at some point in any noiseless sequence, assuming it repeats enough times.