Question on MNIST data set

Is there a version that identifies the percentage error if the system correctly identifies the poorly formed or ambiguous characters and sets them aside?
For that matter - is there anything that works to identify these bad characters as the goal?

When you say “system” here, may we know what are you referring to?

At least in DL, while the aim is to reach for the highest-certainty result and return that, there is nothing to stop anyone from selecting a medium or low-certainty result instead, or doing some analysis on the outputted certainties.

In fact, in one of my projects (a chatbot), I’m specifically selecting and logging those “uncertain” or ambiguous responses as an important function that keeps humans in the loop, and helps with ongoing training and feedback.

The basic algorithm would be:

  1. Feed input data.

  2. Get the result.

  3. Check probability/certainty score, while noting the known answer.
    a. If within certain threshold such as 30-80% certainty, log it.

  4. Post-process to find which numbers are appearing most frequently within “ambiguous” data.

By system I mean the entire process of preparing the data set, training and testing.

I have read image recognition papers which reported two classification accuracies:

  1. Whole dataset accuracy, uses all inputs
  2. High confidence accuracy, which rejects uncertain classifications. I think they also report the fraction of samples which were rejected. I honestly can’t remember any college statistics any more so I can’t elaborate or help calculate.