The (H)TM theory is that the purpose of a column is to predict its input with one time step ahead, assuming other H(igher)TM layers or blocks are using the lower layers prediction for their own purpose (which is again, predicting their own input).
Which otherwise put, a TM learns to provide an output at time t that matches its input at time t+1, and since this is what it learns, then that must be its purpose or utility.
The problem with such perspective is there-s very limited use of anticipating future with a 10ms advance. (or how long it takes a signal to propagate across several layers in a mini column).
One might hope the upper TM would use this prediction to make a new 10+10 = 20ms prediction and so on. But wait, after 20 ms the lower level’s prediction is already obsolete! It takes 5 ms to propagate the inputs up… So there need to have 100s of layers on top of each other in order to predict future a few seconds ahead.
So what if whatever a TM column predicts is NOT what the upper layers really need (or not the most)?
What else is left to be desired (== useful) if not the input SDR at t+1?
And the only plausible answer is the overall cell activation state, because it is a generally useful representation of whatever happened during the (relatively) recent past. A cell out of N_cols x N_layers represents some spatio-temporal feature that is potentially useful.
The “nature” makes the assumption that whatever bit of information is useful to make a prediction 10ms ahead, then that bit is potentially relevant in other places too. Similarly to how convolution activations in CNN (learn to) expose local scale spatial features in an input image, a TM cell learns to expose spatio-temporal features.
A simple proof it is useful for other tasks is that some managed to train a sequence classifier on top of the global cell activation state. I think I’ve seen that in BrainBlocks.
Very similar to using an intermediate embedding of an image autoencoder to represent a “summary” of an image used for e.g. searching for similar images or as “world view” in training RL agents.
TLDR a TM block’s main purpose is to learn an useful encoding of the (relatively) long past, not to predict the immediate (t+1) future.
Learning to predict its own input at t+1 is the means by which it discovers patterns in the longer term (t-N) past, and for whoever wants to predict t+N future those patterns are much more useful than the future at t+1.