I’ve been trying to improve the accuracy of MNIST classifications. My ~65% accuracy accuracy using SP was a bit disappointing for myself. Particularly, MNIST isn’t a hard problem in ML.
I find the problem was mostly how I implemented the classier. The classifier computes the overflap score of a input SDR versus a stored reference SDR. However my old implementation does a crud job maintaining the sparsity of the stored SDR. Improving that immediately improved the classification accuracy to 87.15%, on-par with the earliest neural networks. And surprisingly the optimal hyper parameter changed dramatically. Instead of a boosting strength of 0.1, now the optimal boost strength is a very high value like 9.
I think the next step to further improve the performance would be building a vision encoder. But it is out of my capability. Hopefully my result can inspire someone to look further into it.
No, that’s someone else using NuPIC’s SDRClassifer, which internally is a softmax regresser. I can only achieve 72% using a biologically possible classier with 16384bit SDR.
My apologies for the bad naming. NuPIC/HTM.core’s SDRClassifer is softmax regression. But SDRClassifer in Etaler is the old CLAClassifer from way back.
It would be interesting to relate your work with the Sparse-CNN models of @subutai and @lscheinkman in their paper “How Can We Be So Dense? The Benefits of Using Highly Sparse Representations”:
They modified their previous classical Spatial Pooler in order to compete with other CNN models on the MNIST challenge.
Here we discuss a particular sparse network implementation that is designed to exploit Eq. 3. This implementation is an extension of our previous work on the HTM Spatial Pooler, a binary sparse coding algorithm that models sparse code generation in the neocortex (Hawkins et al., 2011; Cui et al., 2017). Specifically, we formulate a version of the Spatial Pooler that is designed to be a drop-in layer for neural networks trained with back-propagation. Our work is also closely related to previous literature on k-winner take all networks (Majani et al., 1989) and fixed sparsity networks (Makhzani & Frey, 2015).
They achieve near state-of-the-art results (around 99%) with their sparse implementation. Though, it can be argued that their implementation is not biologically possible because they trained their network with backpropagation, contrary to your approach (the point of their paper was more about the added value of sparse representation for noise robustness, not the biologically-compatible training algorithm).