From one of the authors of A Mathematical Formalization of Hierarchical Temporal Memory's Spatial Pooler.
To put this into perspective, 91% on MNIST is an extremely bad result. In fact, just last night I was testing a new architecture from a recent paper, and I coulnāt get MNIST accuracy above 97.5%. This pretty much was a show stopper. An acceptable result would be at least 99%.
MNIST recognition aināt precisely a good fit for HTM, but we didnāt need a neuromorphic architecture to know that.
So, what I see as interesting here is the 1364x speedup
however we poor implenters-in-a-garage wonāt benefit from such things for a while :x
Pardon the newbie question, but can you explain why you think MNIST is not a good fit for HTM? Are you aware of anyone building an SDR encoder for image learning/classification in general, perhaps using ImageNet or CIFAR datasets?
From what I can find, the human error rate for classifying MNIST is 2% - 2.5%. Can you explain why you think that a HTM version that achieves human-level accuracy is a show stopper? Isnāt it just as likely that for a simple dataset like MNIST that the neural network solutions are overfitting?
Current HTM theory is built around a single level dealing with temporal sequences, and may detect unexpected elements when a sequence has started to match.
Recognition of objects, on the other hand, would in my view involve a hierarchical model. Neural networks succeeding at MNIST classification with very high rates typically are deep (many layers in a hierarchy). Assuming the problem of encoding a visual image as binary SDRs has been solved, a single āspatial poolerā would only get you so far, and thus an HTM based solution to image recognition would also need a hierarchy.
Some people here are trying to tweak the current āsingle-levelā HTM to a model supporting a hierarchy, but this is not part of current Numenta implementation, and is as far as I know an open problem.
Thanks for the explanation. I remember in the HTM videos that hierarchy was discussed, and since the output of the pooler is another SDR it seems that a hierarchical system is certainly possible, so are you saying simply that Numenta hasnāt āyetā implemented a hierarchical solution in code, or that thereās a more fundamental part of the model that is missing (and thus preventing implementation)?
Perhaps @rhyolight has more to add to this.
I believe that human result came from this paper and it was not done on MNIST, but on a smaller US POST dataset, which was considered to be harder than NIST. So itās not really clear what would be a human performance on MNIST, but Iād say there are only a few dozen of truly unrecognizable digits (to a human). In any case, HTM is 91%, so itās not even close to 97.5% āhuman performanceā.
Iām not sure if you can say āitās overfittingā: itās tested on the previously unseen test set. Actually I agree with others who say that HTM is poorly suited to image recognition. Itās just not very impressive to see these results published as an achievement.
If some of the digits are unrecognizable to a human, then providing them with a label is basically arbitrary.
A human should have practically perfect performance on something like MNIST, and whatever errors there are would be due to ambiguity, and should in reality be classified as being both of the digits such ambiguously represent. Arbitrarily disambiguating and penalizing as a result of such arbitrary labelling is nonsense.
Couldnāt agree with you more, I mean try categorising these guys: http://neuralnetworksanddeeplearning.com/images/mnist_really_bad_images.png
Some of them are quite easy (at least, personally) to categorize! I didnāt take more than 5 seconds to categorize each of them separately. (Note: normally, you would classify or identify a ādecentlyā written digit in less than a second). Nonetheless, I agree, there are ambiguous pictures. Furthermore, we arenāt able to categorize some of them because they really arenāt digits anymore (to us).
Wow youāre good! I donāt think I could do any. I havenāt seen the labels for them, but just looking at them it would be pure guess work for me.
Then you probably havenāt seen as many handwritten digits as I did. Anyway, again, some of them could be a 4 or 1 (e.g.), but this is actually a common case!
Sure, but you can judge the performance of your model by looking at misclassified images. Ignore all images that are unrecognizable/mislabeled, and make sure there are no easy examples the model has missed. My point was that if a network gets 250 examples wrong (as is the case for 97.5% accuracy), then it definitely got many easy examples wrong.
To provide some more perspective, state of the art on CIFAR-10 is about 98.5% (CIFAR is a much harder dataset), and ~85% on ImageNet (which has 1000 classes, rather than 10).
True it helps judge performance, but a human eye on errors made is also welcome.
As for other models, many DNN models are based on decades of research and build on top of that.
An entirely novel architecture may not necessarily fare as well on its first outings.
91% classification accuracy on MNIST benchmark is still not acceptable in machine learning community, but this is for an HTM network with 100 mini-columns. Increasing the number of mini-columns to 2048 can bring the accuracy up to ~ 95-96%.
As others have said, the accuracy is not the interesting part of the story. The interesting part is the neuromorphic architecture.