Thank you so much for all your feedback. It's very helpful. My speculation is no more valid than yours, so if I disagree about something, I'm sorry if it sounds like I think your idea is stupid. I'm just trying to explain my speculation and possibly show what I'm not understanding. Sorry about this long post. The bottom third isn't too important, so you might want to skip it.
Maybe our definitions of temporal pooling differ in the length of sequences which it needs to pool. I guess I'm uncomfortable with the idea that sequences of sequences totaling a second or less can be abstract enough. Maybe it's because I don't understand the new sensorimotor work very well. I'm also uncomfortable with applying the current ideas to abstract things because abstract things are often very disconnected from sensory input and some abstract things can manifest as various sequences. To be clear, I don't think predictive firing can replace temporal pooling.
Equating duration with abstraction is a big error in my thinking. Thanks.
I'm still not sure how temporal pooling could recognize different clouds as the same type of thing. They come in various forms, so they come in various sequences. Those sequences can have some similarities because of context, but that context isn't reliable. The sky isn't always blue, for example.
I agree that it needs a stable representation. The issue is making sure that some bits of the stable SDR are similar for all types of clouds (while the other bits can represent specifics, like the cloud's shape). Predictive firing can create a stable representation if its details are right. For example, if you look at a cloud, you can predict seeing the blue sky if you move your eyes and/or attention. It's not that you will definitely see the blue sky, it's just that you could see it in the near future. By predictively firing, there is a stable representation since it's always a possibility as long as you look at the cloud. Similarly, you can predict that the cloud might dissipate. There are various predictions which continue as long as you see the cloud.
It's just that I don't trust context to be reliable. It can be unstable, nonexistent, inconsistent, subtle, out of the scope of attention, or the same for multiple concepts.
Predictions and context are pretty similar. Predictive firing just adds more context, including context which isn't always there because you could still see it and that's good enough to make a prediction.
Predictive firing can also do some other functions which are part of the theory, although probably not as well. For example, it can create stability between actions by firing both before and after the action. It can create allocentric representation by firing before you see the object from a different relative location. It's also already part of the (possibly tentative) theory for motor control. As I understand it, layer 5 predictively fires to generate motor output. Layer 5 has an apical dendrite in L2/3, so it would be easy to remove sequence context.
I'm not sure. When I've looked for persistent firing, I couldn't find any mention of cells firing for more than maybe a minute, probably more like 20 seconds. Maybe that counts, but I'm not sure if that resulted from stable input, such as staying in the same place field. It's also possible that cells at the highest level of abstraction choose subsets of that abstraction or otherwise do something which prevents persistent firing. For example, grid cells in the entorhinal cortex firing at place fields arranged in a triangular grid. They do this everywhere and have the same relative spacing between each cell's grid everywhere. I guess that's pretty abstract, and it's definitely not very specific, but they don't fire for huge durations because of their discontinuous fields. High abstraction might not mean persistent firing, either. It might be more about degree of separation from the input. A cell could fire for a very long time in response to lines if you look at the same thing for a while, whereas a cell which responds to dogs could fire very briefly if you just glance at a dog.
I'm not exactly sure what you mean. I think we might think about the hippocampus differently or I didn't explain my thinking very well. The rest of what I write here might not be too important to read, since all I'm arguing is that place cells don't do temporal pooling.
Here's the model I have in my head: each level of the hierarchy does temporal pooling to learn sequences of sequences. When a sequence is novel, the next level of the hierarchy can't pool it, so the novel sequence gets passed up the hierarchy until it reaches a region which recognizes it. The hippocampus is at the top of the hierarchy, where it stores completely novel sequences (or completely novel continuations of known sequences) even if it sees it just once, if it's an important sequence to remember. During sleep and waking states of inactivity, it replays sequences to transfer them to the neocortex, so the neocortex can learn sequences after the brain sees that sequence just once. It's important to do this rapid learning for paths when you want to take the exact same path again, for example.
The problem I see with this is that novelty exists on a lower time scale than the sequence of sequences high in the hierarchy. If the hippocampus were to receive a stable representation of a known sequence then a representation of a novel sequence, it would store this sequence: a stable representation of the known sequence followed by each SDR of the novel sequence. To teach the neocortex that the known sequence is actually followed by each SDR of the novel sequence, it would have to replay the known portion of the sequence and then play the novel portion. So the brain would have to unfold the known portion of the sequence, since the hippocampus stored it as a single SDR. The hippocampus would also have to wait a little while before it moves on to the novel portion of the sequence, to give the unfolded sequence time to play. It's much easier to store the whole sequence by learning synapses between each cell of the SDR, so it can replay the whole sequence later without unfolding anything. As far as I know, there aren't any cells in the hippocampus which are active in a whole path (unless that path is pretty small so it fits in a place field). It seems like the hippocampus learns paths as sequences of SDRs rather than pooled sequences. There are some theories how it does this, such as CRISP  and something a bit like a modified temporal pooler . CRISP seems to fit biology better, but it ignores learning in CA3 recurrent collaterals, so it can't explain some things which the other source can explain. I don't have the expertise to say that confidently, though.