A while back I implemented the HTM spatial pooler and tested it on MNIST (I don’t have access to the code right now) I was using 1024 hidden units with a top-32 WTA activation scheme. I also implemented boosting. It was pretty textbook and followed the description of the spatial pooler in the YouTube “HTM School” series almost exactly.
Anyway, I found most features ended up being simple prototypes. The prototypes most similar to the input were the ones that activated. It seems that, if the network has high enough capacity, it would prefer to memorize prototypes rather than learn a factorized representation.
My theory is that this behavior is due to the lack of a negative/contrastive learning phase (as in restricted Boltzmann machines) or any type of error-based learning. So I hacked in a simple contrastive learning term, and indeed; the features became much more diverse and representations became more factorized and less redundant.
I suspect that whitening the input data as an alternative to contrastive learning would also help, although since the HTM spatial pooler uses binary inputs I’m not sure how that would work.
What are your thoughts?