Handwritten Bangla Basic and Compound character recognition using MLP and SVM classifier
A novel approach for recognition of handwritten compound Bangla characters, along with the Basic characters of Bangla alphabet, is presented here. Compared to English like Roman script, one of the major stumbling blocks in Optical Character Recognition (OCR) of handwritten Bangla script is the large number of complex shaped character classes of Bangla alphabet. In addition to 50 basic character classes, there are nearly 160 complex shaped compound character classes in Bangla alphabet. Dealing with such a large varieties of handwritten characters with a suitably designed feature set is a challenging problem. Uncertainty and imprecision are inherent in handwritten script. Moreover, such a large varieties of complex shaped characters, some of which have close resemblance, makes the problem of OCR of handwritten Bangla characters more difficult. Considering the complexity of the problem, the present approach makes an attempt to identify compound character classes from most frequently to less frequently occurred ones, i.e., in order of importance. This is to develop a frame work for incrementally increasing the number of learned classes of compound characters from more frequently occurred ones to less frequently occurred ones along with Basic characters. On experimentation, the technique is observed produce an average recognition rate of 79.25 after three fold cross validation of data with future scope of improvement and extension.
💡 Research Summary
The paper tackles the challenging problem of handwritten Bangla (Bengali) character recognition, which includes 50 basic characters and roughly 160 compound characters, amounting to over 210 distinct classes. The authors argue that the large number of visually similar, complex‑shaped compound characters makes Bangla OCR considerably harder than Roman‑script OCR. To address this, they propose a hybrid classification framework that combines a multilayer perceptron (MLP) with a support vector machine (SVM).
Data preparation begins with a modestly sized handwritten Bangla dataset (approximately 12,000 samples). Each image undergoes grayscale conversion, binarization, noise removal, and size normalization. Feature extraction is handcrafted and stroke‑oriented: for every character the algorithm records stroke start and end coordinates, direction angles, relative lengths, and spatial relationships among strokes. These measurements are normalized and concatenated into a 120‑150‑dimensional feature vector, which is deliberately kept compact so that both MLP and SVM can process it efficiently.
The MLP architecture consists of three hidden layers (256, 128, and 64 neurons) with ReLU activations, culminating in a soft‑max output over the 210 classes. In parallel, an SVM with a radial basis function (RBF) kernel is trained; its hyper‑parameters (C and γ) are tuned via cross‑validation. The two classifiers are not used independently; instead, their predictions are fused by a probability‑weighted averaging scheme (MLP weight = 0.6, SVM weight = 0.4). This fusion leverages the MLP’s strength in learning highly non‑linear patterns and the SVM’s ability to maximize margins for well‑separated classes.
A distinctive aspect of the work is the “frequency‑driven incremental learning” strategy. The authors first train the system on the most frequently occurring compound characters, then gradually introduce less common ones while retaining the previously learned parameters as initialization. This staged approach mitigates class‑imbalance problems and allows the model to progressively acquire discriminative knowledge for increasingly subtle character variations.
Evaluation is performed using three‑fold cross‑validation. Across the three folds, the hybrid system achieves an average recognition rate of 79.25 %. Basic characters are recognized with a high accuracy of about 92 %, whereas compound characters attain roughly 65 % accuracy, highlighting the remaining difficulty. When examined in isolation, the MLP alone yields 75.8 % and the SVM alone 73.4 % accuracy; the combined model therefore improves performance by 5–7 % points. Error analysis reveals that most misclassifications involve compound characters with near‑identical glyphs (e.g., “ক্ষ” vs. “ক্শ”) and samples where stroke connections are ambiguous or heavily distorted.
The authors acknowledge several limitations. The dataset is relatively small, especially for low‑frequency compound characters, which hampers the model’s ability to generalize. The handcrafted stroke features, while interpretable, may not capture all the variability present in natural handwriting, such as slant, pressure, or writer‑specific quirks. Consequently, the authors suggest future work that incorporates deep convolutional neural networks (CNNs) or transformer‑based architectures to learn features automatically, as well as data‑augmentation techniques (rotation, scaling, elastic distortion) and transfer learning to bolster performance on scarce classes. They also propose integrating a language model or a stroke‑segmentation pre‑processor to exploit contextual information and further disambiguate visually similar compounds.
In summary, the paper presents an early but solid attempt at solving handwritten Bangla OCR by fusing MLP and SVM classifiers and by introducing a frequency‑based incremental learning protocol. While the achieved 79.25 % overall accuracy indicates that substantial challenges remain—particularly for complex compound characters—the proposed framework demonstrates a viable path forward, offering a foundation upon which more sophisticated deep‑learning methods and larger annotated corpora can be built.
Comments & Academic Discussion
Loading comments...
Leave a Comment