It’s also related to how GPU works.
Both the feedforward and the feedback happens in parallel per minibatch.
Thus, it HAS access to all of the information of the previous “time step”(i.e. the actviations cause by the previous minibatch input).
You can make use of the information if you want.
Individual backpropagations(gradient descent through the layers) aren’t computationally expensive either.
They have about the same computational complexity as the feedforward stage has.
The formula between the feedforward and the feedback stage are quite similar, too.
That’s where it got its name, “backpropagation”.
Training of deep learning models is costly because of the number of iterations it demands to work well.
What you’re suggesting(minibatch size of 1) might not be parallel enough to computed efficiently by GPU.
I’m currently implementing Deep HTM for GPU and indeed it has minibatch size of dozens.
I’m not there yet, but I think it wouldn’t have any particular problem because of the fact that it does.
If you think it might cause a problem of some kind, please let me know how.