Are we confusing the means for purpose in TM?

The (H)TM theory is that the purpose of a column is to predict its input with one time step ahead, assuming other H(igher)TM layers or blocks are using the lower layers prediction for their own purpose (which is again, predicting their own input).

Which otherwise put, a TM learns to provide an output at time t that matches its input at time t+1, and since this is what it learns, then that must be its purpose or utility.

The problem with such perspective is there-s very limited use of anticipating future with a 10ms advance. (or how long it takes a signal to propagate across several layers in a mini column).

One might hope the upper TM would use this prediction to make a new 10+10 = 20ms prediction and so on. But wait, after 20 ms the lower level’s prediction is already obsolete! It takes 5 ms to propagate the inputs up… So there need to have 100s of layers on top of each other in order to predict future a few seconds ahead.


So what if whatever a TM column predicts is NOT what the upper layers really need (or not the most)?
What else is left to be desired (== useful) if not the input SDR at t+1?

And the only plausible answer is the overall cell activation state, because it is a generally useful representation of whatever happened during the (relatively) recent past. A cell out of N_cols x N_layers represents some spatio-temporal feature that is potentially useful.

The “nature” makes the assumption that whatever bit of information is useful to make a prediction 10ms ahead, then that bit is potentially relevant in other places too. Similarly to how convolution activations in CNN (learn to) expose local scale spatial features in an input image, a TM cell learns to expose spatio-temporal features.

A simple proof it is useful for other tasks is that some managed to train a sequence classifier on top of the global cell activation state. I think I’ve seen that in BrainBlocks.

Very similar to using an intermediate embedding of an image autoencoder to represent a “summary” of an image used for e.g. searching for similar images or as “world view” in training RL agents.

TLDR a TM block’s main purpose is to learn an useful encoding of the (relatively) long past, not to predict the immediate (t+1) future.

Learning to predict its own input at t+1 is the means by which it discovers patterns in the longer term (t-N) past, and for whoever wants to predict t+N future those patterns are much more useful than the future at t+1.

1 Like

Right. I don’t know how much we can take it literally but I can see that the main purpose of the TM is to recognize the context (through the encoding of the long past).
It’s just that how it achieves this is through predicting short term and detecting unknown contexts by finding discrepancies in the short term predictions (surprise).

2 Likes

Ok so there would be a case in which anticipating the input with only 10ms would make sense: as consensus synchronization feedback.

Imagine that in order for you to see a ball are many (mini)columns that have to say “yup, there-s a ball” in order for you (aka “upper layers”) to become aware of the presence of the ball.

But, there-s a problem - there are no memory locations in brain to increment in order to count the votes. So the only way a 1000 column message to get through, they not only need to say “there-s a ball!”, they have to do it in sync,

So the “input” of each mini column has the purpose of reinforcement feedback: “Bravo, you were right on spot”.

This way each mini-column gets informed whether consensus was achieved or not.

1 Like

I try to think of the column/minicolumn behavior in terms of spatial and temporal integration. Consider the magno- and parvo-cellular pathways. One is doing fast integration to detect motion or other sudden changes in the environment. The other is integrating its inputs slower to accumulate more detailed information or to average out transient noise or unsteadiness.

So, for me, the role that the minicolumn serves is as a spatio-temporal pattern detector /auto encoder. It serves this role as part of a network that is generating a stable pattern of us own that could be interpreted as a consensus representation of the bottom up input and/or a confirmation of a top down expected state.

Regardless of your interpretation, I think you should consider the fact that the minicolumns are not just sensing and responding to their current and previous states to predict their next state. They are accumulating input signals (proximal, distal, and apical) as evidence and generating a stable pattern that could be considered a SDR for a probability distribution of potential next states consistent with the current input, recent events, and prior experiences.

Of course, this is just for the upper layers. The lower layers (5&6) are also working out there own interpretation that, I think, is related to trying to work out an encoding for the space of affordances for potential interaction with the environment.

1 Like

Yeah, that is interesting.

What would be also interesting would be how to use that information to push prediction power beyond what a PC-level machine can perform when running a “beefy” HTM model.

A couple points would be:

  1. make it more computationally efficient
  2. more “flexible” and “stable” at the same time
  3. and scalable beyond a single many-core, many-gigs machine.

I’ve recall only a few observations on how big vs how fast an HTM model get. These is obviously an inverse correlation, but there are a few actual figures.

PS here-s one of the few example benchmarks, although not on an actual HTM brainblocks/examples/python/experiments/scalability at master · the-aerospace-corporation/brainblocks · GitHub

1 Like

Well, I just happen to do high performance computing research for my day job. If you have something specific in mind, we should probably have a discussion.

1 Like

I’ll have to check this really happens.
Yes. the pattern of firing cells at time t should act as an encoding of recent past but I’m not sure actually behaves like one.

What is an autoencoder valuable at: given two similar, but slightly different inputs, the outputs are similar too. It is specifically trained with this purpose.


But I digress. What I wanted to say with this topic is that indeed the current cell activation state provides a much richer context than the predicted SDR.

If we are to connect multiple TMs in more complex networks they should share these encodings instead of inputs and outputs. Or we should begin experimenting with that - what happens when a TM sees not only its own cell activations + input but also “sees” inner state of different TM blocks? Or any external bits of Information it is not meant to change, does not need to predict, but might provide useful context

1 Like