Comparison and Combination of State-of-the-art Techniques for Handwritten Character Recognition: Topping the MNIST Benchmark

Reading time: 5 minute
...

📝 Original Info

  • Title: Comparison and Combination of State-of-the-art Techniques for Handwritten Character Recognition: Topping the MNIST Benchmark
  • ArXiv ID: 0710.2231
  • Date: 2007-10-12
  • Authors: ** C. Simard, D. Steinkraus, J. Platt (원 논문에 명시된 저자) **

📝 Abstract

Although the recognition of isolated handwritten digits has been a research topic for many years, it continues to be of interest for the research community and for commercial applications. We show that despite the maturity of the field, different approaches still deliver results that vary enough to allow improvements by using their combination. We do so by choosing four well-motivated state-of-the-art recognition systems for which results on the standard MNIST benchmark are available. When comparing the errors made, we observe that the errors made differ between all four systems, suggesting the use of classifier combination. We then determine the error rate of a hypothetical system that combines the output of the four systems. The result obtained in this manner is an error rate of 0.35% on the MNIST data, the best result published so far. We furthermore discuss the statistical significance of the combined result and of the results of the individual classifiers.

💡 Deep Analysis

📄 Full Content

The recognition of handwritten digits is a topic of practical importance because of applications like automated form reading and handwritten zip-code processing. It is also a subject that has continued to produce much research effort over the last decades for several reasons:

• The problem is prototypical for image processing and pattern recognition, with a small number of classes. • Standard benchmark data sets exist that make it easy to obtain valid results quickly.

• Many publications and techniques are available that can be cited and built on, respectively. • The practical applications motivate the research performed.

• Improvements in classification accuracy over existing techniques continue to be obtained using new approaches.

This paper has the objective to analyze four of the state-of-the-art methods for the recognition of handwritten digits [3,9,15,26] by comparing the errors made on the standard MNIST benchmark data. (A part of this work has been described in [13].) We perform a statistically analysis of the errors using a bootstrapping technique [5] that not only uses the error count but also takes into account which errors were made. Using this technique we can determine more accurate estimates of the statistical significance of improvements.

When analyzing the errors made we observe that -although the error rates obtained are all very similar -there are substantial differences in which patterns are classified erroneously. This can be interpreted as an indicator for using classifier combination. An experiment shows that indeed a combination of the classifiers performs better than the single best classifier. The statistical analysis shows that the probability that this results constitutes a real improvement and is not based on chance alone is 94%.

This paper is of course only possible because the results of the four chosen base methods [3,9,15,26] were available 1 . These approaches are presented in more detail in Section 4. We are aware that there exist other methods that also achieve very good classification error rates on the data used, e.g. [18]. However, we feel that the four methods chosen comprise a set of well-motivated and selfcontained approaches. Furthermore, they represent the different classification methods most commonly used (in the research literature), that is, the nearest neighbor classifier, neural networks, and the support vector machine. All four methods use the appearance-based paradigm in the broad sense and can thus be considered as being sufficiently general as to be applied to other object recognition tasks.

There is a large amount of work available on the topic of classifier combination as well (an introduction can be found e.g. in [16]) and much work exists on applying classifier combination to handwriting recognition (e.g. [4,7,8,12]). Note that we do not propose new algorithms for classification of handwritten digits or for the combination of classifiers. Instead, our contribution is to present a statistical analysis that compares different classifiers and to show that their combination improves the performance even though the individual classifiers all reach state-of-the-art error rates by themselves.

The modified NIST handwritten digit database (MNIST, [17]) contains 60,000 images in the training set and 10,000 patterns in the test set, each of size 28×28 pixels with 256 graylevels. The data set is available online 2 and some examples from the MNIST corpus are shown in Figure 1.

The preprocessing of the images is described as follows in [17]: “The original black and white (bilevel) images were size normalized to fit in a 20×20 pixel box while preserving their aspect ratio. The resulting images contain gray levels as result of the antialiasing (image interpolation) technique used by the normalization algorithm. […] the images were centered in a 28×28 image by computing the center of mass of the pixels and translating the image so as to position this point at the center of the 28×28 field.” Note that some authors use a ‘deslanted’ version of the database.

The task is generally not considered to be ‘difficult’ (in the sense that absolute error rates are high) recognition task for two reasons. First, the human error rate is estimated to be only about 0.2%, although it has not been determined for the whole test set [27]. Second, the large training set allows machine learning algorithms to generalize well. With respect to the connection between training set size and classification performance for OCR tasks it is argued [28] that increasing the training set size by a factor of ten cuts the error rate approximately to half the original figure.

Table 1 gives a comprehensive overview of the error rates reported for the MNIST data. One disadvantage of the MNIST corpus is that there exists no development test set, which leads to effects known as ’training on the testing data’. This is not necessarily true for each of the research groups performing experiments, but

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut