NIST SD19 (alphabetical version of MNIST digits)

davidra · February 17, 2017, 2:34am

Hi all

I know that a lot of people here have tried digit classification using the famous MNIST dataset. But HTM is really about sequences and it’s confusing to build sequences of numbers. Letters are much easier to appreciate, as a human!

So we have written some code to preprocess the NIST SD19 dataset of handwritten uppercase characters A…Z (widely studied) into the same image format and resolution as the MNIST dataset of digits 0…9. If you combine MNIST + NIST SD19 you get an alphanumeric dataset.

So now you can feed text with uncertain representation of individual characters into your NUPIC/HTM engines and see how it predicts sentences etc. Your existing code using MNIST digit encoding will work without any changes.

Java code here to preprocess the images:

See the README for info on where to get the NIST data (they recently made it free to download).

Topic		Replies	Views
Educational video about mnist classification in HTM.core or HTM.vision NuPIC	4	614	September 3, 2020
A basic question about htm.core NuPIC	2	427	June 24, 2021
Please help me about htm.core NuPIC	9	936	June 3, 2021
Applying HTM to the Omniglot dataset Machine Learning	3	586	August 31, 2020
SDRClassifier in javascript NuPIC	11	1176	March 17, 2019

NIST SD19 (alphabetical version of MNIST digits)

Related topics