Digit Image Recognition Using an Ensemble of One-Versus-All Deep Network Classifiers
In multiclass deep network classifiers, the burden of classifying samples of different classes is put on a single classifier. As the result the optimum classification accuracy is not obtained. Also training times are large due to running the CNN training on single CPU/GPU. However it is known that using ensembles of classifiers increases the performance. Also, the training times can be reduced by running each member of the ensemble on a separate processor. Ensemble learning has been used in the past for traditional methods to a varying extent and is a hot topic. With the advent of deep learning, ensemble learning has been applied to the former as well. However, an area which is unexplored and has potential is One-Versus-All (OVA) deep ensemble learning. In this paper we explore it and show that by using OVA ensembles of deep networks, improvements in performance of deep networks can be obtained. As shown in this paper, the classification capability of deep networks can be further increased by using an ensemble of binary classification (OVA) deep networks. We implement a novel technique for the case of digit image recognition and test and evaluate it on the same. In the proposed approach, a single OVA deep network classifier is dedicated to each category. Subsequently, OVA deep network ensembles have been investigated. Every network in an ensemble has been trained by an OVA training technique using the Stochastic Gradient Descent with Momentum Algorithm (SGDMA). For classification of a test sample, the sample is presented to each network in the ensemble. After prediction score voting, the network with the largest score is assumed to have classified the sample. The experimentation has been done on the MNIST digit dataset, the USPS+ digit dataset, and MATLAB digit image dataset. Our proposed technique outperforms the baseline on digit image recognition for all datasets.
💡 Research Summary
The paper addresses a fundamental limitation of conventional multiclass deep learning classifiers: a single network must learn to separate all classes simultaneously, which can lead to sub‑optimal decision boundaries, especially when classes are imbalanced or highly confusable. To overcome this, the authors propose an ensemble of One‑Versus‑All (OVA) deep networks, where each of the N digit classes is assigned its own binary classifier (“class i vs. the rest”). Each OVA network is trained independently using stochastic gradient descent with momentum (SGDM) and a binary cross‑entropy loss, allowing it to focus on features that are most discriminative for its target class.
All OVA networks share the same architecture (a modest convolutional backbone followed by fully‑connected layers) and hyper‑parameters, but they differ in weight initialization and mini‑batch sampling. This diversity encourages each model to converge to a different local optimum, which is essential for a robust ensemble. Crucially, the training of each network is performed in parallel on separate CPU/GPU resources, dramatically reducing overall training time. In the authors’ experiments, four GPUs were used to train ten OVA models concurrently, achieving roughly a 30 % reduction in wall‑clock time compared to sequential training of a single model.
During inference, a test image is fed to every OVA network. Each network outputs a confidence score (logit or softmax probability) for its positive class. The final prediction is obtained by a simple “max‑score voting” rule: the class whose network yields the highest score is selected. This approach bypasses the need for complex meta‑learners or probability calibration, while still leveraging the confidence of each binary expert.
The method was evaluated on three widely used handwritten digit datasets: MNIST, USPS+, and a MATLAB digit image collection. Compared with baseline single‑network CNNs (e.g., LeNet‑5 and a standard shallow CNN), the OVA ensemble consistently achieved higher accuracy. On MNIST the improvement was modest (99.45 % vs. 99.20 %), but on the more challenging USPS+ and MATLAB datasets the gains were more pronounced (approximately 0.4–0.5 % absolute increase). The authors attribute the larger improvements on these datasets to the greater variability in image resolution and writing style, which benefits from class‑specific feature learning.
Key contributions of the work are:
- Demonstrating that OVA training can be effectively applied to deep convolutional networks, enabling each classifier to specialize in a single digit.
- Showing that an ensemble of such OVA networks yields both higher classification accuracy and reduced training time when parallel hardware is available.
- Introducing a straightforward max‑score voting scheme that combines the binary experts without additional training overhead.
- Providing empirical evidence that the approach generalizes across multiple digit datasets and suggesting its applicability to other domains with ambiguous class boundaries (e.g., medical imaging, speech recognition).
In summary, the paper presents a practical and scalable strategy for improving multiclass digit recognition by decomposing the problem into parallel binary tasks and aggregating their predictions. The combination of class‑specific deep learning, ensemble diversity, and parallel computation offers a compelling alternative to monolithic multiclass networks, and it opens avenues for further research into deeper architectures, alternative voting mechanisms, and cross‑domain extensions.