Does anybody have thoughts/experience with spatially pooling the output of a spatial pooler before passing it to Temporal Memory?
I think of it as akin to applying multiple convolutions as is done in traditional CNNs, where the multiple convolutions pare down the encodings to the most important distinguishing features.
First it would rather add depth (as in DNN) rather than convolutions (as in CNNs) since in CNNs are both deep but also lower layers input is restricted on a small “patch” of the input.
Second, without backprop, or other means to adjust lower layers processing with feedback from above ones, I would not expect too much from it.
But the experiment could be interesting, specially if you somehow manage to have many low level tiny&local SPs and another large one on top of them.
Reminds me of this idea with the “sandwich pooling” architecture…
But they don’t spatially pool the first spatial poolers output directly. There is an array called “Pooling Activations” inbetween, where the outputs of the first pooler are accumulated over several time steps. In addition, these pooling activations are allowed to decay so that not too many bits are active at once when they enter the second pooler.