Backprop is all well and good with the data representation used in “those” models.
With sparse I suggest a different direction - look at how the brain does it - modification of the learning rate.
We know that the amygdala releases learning rate modifiers and I believe that the RAC gates the stream of senses, again, with the effect of greatly enhancing learning rate for novel presentations.
In this case - the bursting drives gating more in that area - in effect - sipping from the firehose.
There are numerous lines pointing to this method in the brain.
One other method to work around the learning rate bottleneck is the fast learning in the hippocampus/replay into the slower learning cortex.
Last but not least - the “three visual streams” paper proposes the the top-down temporal steam does do an effective back-prop learning rate modification as in that direction we do have a local target value to train against.