If you think of a temporal memory as a series of nodes, when any of its nodes are selected, that is technically “predicting” the entire series. Forgetting about the voting aspect or bursting (I think maybe that just obfuscated my point) when the output layer in a single CC recognizes a couple of nodes in the series activating in the input layer, it activates a representation for that series. This is an active prediction of the series. If you take the activity which represents the series, and the activity which represents the position in the series, those together encode not only the same information as the next node in the series, but also all future nodes. This is much more useful than simply transmitting what will happen a few milliseconds from now.
A couple of other thoughts I had about this:
Timing. For streaming data, when a belief is correct, the amount of time between when a cell is predictive and when it is active is a very short. There is probably not a lot to gain by sending the next element of information a few milliseconds sooner. More useful would be sending information about what will happen a few seconds or minutes in advance. This is what I see the output layer does. Since the activity is more stable than the input layer, it is representing a temporal correlation that encodes information not only what is the current input, but also the future inputs.
Division of labor. What does a CC within a hierarchy need to know? Of course, I need to know what will happen next (my prediction based on my belief), but what about the levels above and below me?
The CC below me needs to know my current belief. Since I am already predicting what will happen next at my own level of abstraction, the CC below me does not need to know what I am predicting, and doesn’t need to copy my predictions (nor likely can it, since it is modelling lower abstractions). It just needs to unfold my current belief using its own predictions at its own level of abstraction. I’ll tell it when to move to my next belief to unfold.
The CC above me needs to know if my current belief is panning out for me and matching reality. It doesn’t need me to tell it my predictions, it is making its own predictions at a higher level of abstraction. When my belief is wrong, it needs me to tell it that (and it needs me to provide some alternate possibilities based on my own memory). It uses this information to judge how well its own belief is matching reality, and to inform the next level up about that.
Learning. When predictions are going well, there is nothing to learn, so no reason to transmit them outside of where they are being locally managed. When they are not going well, then it is time to get the hierarchy involved to help figure out what has gone wrong, and update models accordingly.
The wrong belief could be at any level of the hierarchy (tripping on a newly formed crack in the sidewalk vs a road has been blocked off on my walk to the store, etc) The lowest levels of the hierarchy (being where evidence from the world enters the picture) are always going to be the first to recognize when things are going wrong, and they have a mechanism (bursting) to transmit that quickly up the hierarchy to whatever level needs it.
I think this is actually a desired property for temporal abstractions (for example, the temporal difference between “left, right, left, right” and “walk forward”). But where you want to shortcut this temporal difference is when beliefs are not matching reality and you need to update your model. There is a mechanism for this – bursting. Being a much denser activity, it is shouting for attention of the next higher level. If this “shouting” contradicts the beliefs of the next higher level, then it will also start bursting, and shouting up to the next level and so-on, until the source of the problem is reached, and learning can occur to fix the faulty model.
There is a feedback loop. The input layer nudges on the activity in the output layer (along with other sibling CCs), and as the output layer changes, it biases cells in the input layers causing them to change context.
Haha, no it doesn’t require anything that magical. There are simple, local-rules based mechanisms that can do this. One example is self-reinforcing hex grids (borrowing from William Calvin’s book “The Cerebral Code”). I am in the process of drawing visualizations to explain this particular implementation of an “output layer”, and will be posting a thread about it soon.