Segmentation of Offline Handwritten Bengali Script

Character segmentation has long been one of the most critical areas of optical character recognition process. Through this operation, an image of a sequence of characters, which may be connected in some cases, is decomposed into sub-images of individual alphabetic symbols. In this paper, segmentation of cursive handwritten script of world’s fourth popular language, Bengali, is considered. Unlike English script, Bengali handwritten characters and its components often encircle the main character, making the conventional segmentation methodologies inapplicable. Experimental results, using the proposed segmentation technique, on sample cursive handwritten data containing 218 ideal segmentation points show a success rate of 97.7%. Further feature-analysis on these segments may lead to actual recognition of handwritten cursive Bengali script.

💡 Research Summary

The paper addresses one of the most challenging steps in optical character recognition (OCR) for the Bengali script: the segmentation of offline handwritten text into individual character images. Bengali, the fourth most spoken language worldwide, possesses a unique orthographic structure in which vowels, consonants, and a variety of diacritical marks (known as “matras” or “modifiers”) can appear above, below, or on both sides of a base character. These modifiers often encircle the core glyph, creating a highly intertwined visual pattern that defeats conventional segmentation techniques originally designed for Latin scripts.

Problem Statement
Traditional segmentation methods—such as vertical projection histograms, global thresholding, and simple connected‑component analysis—rely on clear, linear separations between characters. In Bengali handwriting, however, the presence of encircling modifiers leads to ambiguous boundaries, frequent over‑segmentation (splitting a single character into multiple pieces) or under‑segmentation (merging two distinct characters). Consequently, a dedicated approach that respects the script’s structural idiosyncrasies is required.

Proposed Methodology
The authors introduce a three‑stage algorithm that combines connectivity‑based region feature extraction with a dynamically adjusted thresholding scheme:

Binarization and Connected‑Component Extraction – After Otsu binarization, the image is processed using 8‑connectivity to identify all foreground components. For each component, both the outer contour and any internal holes (loops) are recorded, providing a richer description than simple pixel counts.
Local Feature‑Driven Candidate Point Detection – Instead of a global vertical projection histogram, the method computes a histogram for each local window and determines a dynamic threshold based on the window’s mean and standard deviation. This adaptive approach captures subtle valleys that correspond to true character boundaries, even when the strokes are thin or heavily overlapped.
Cost‑Function Optimization for Final Segmentation – Each candidate point is evaluated using a composite cost function that incorporates (a) Connectivity Loss, measuring how many foreground connections would be broken if the cut were made, and (b) Loop Area Ratio, quantifying the proportion of internal holes on either side of the cut. The point with the minimal cost is selected as the segmentation location. This dual‑criterion strategy enables the algorithm to differentiate between a modifier that merely surrounds a base glyph and an actual inter‑character gap.

Experimental Setup
A curated dataset of 218 handwritten Bengali word samples was assembled, each annotated with the “ideal” segmentation points as determined by expert linguists. The dataset includes a wide variety of pen pressures, stroke widths, and inter‑character overlaps to reflect realistic writing conditions.

Results
Applying the proposed algorithm yielded 213 correct segmentations out of 218, resulting in a 97.7 % success rate. The five failures fell into two categories:

Extremely Thin Strokes – The adaptive histogram sometimes missed a valley when the stroke width approached a single pixel, causing a missed cut.
Severe Overlap – In cases where two characters merged into a single connected component, the connectivity loss term incorrectly favored a non‑existent cut, leading to an over‑segmentation error.

The authors note that these error modes could be mitigated by incorporating a pre‑processing stage for stroke‑width normalization and a post‑processing step that detects and resolves overlapping components.

Discussion and Future Work
While the paper focuses on segmentation accuracy, the ultimate goal is a complete OCR pipeline for Bengali handwriting. The authors propose extending their work by feeding the segmented glyphs into a convolutional neural network (CNN) or transformer‑based recognizer, thereby evaluating end‑to‑end recognition performance. They also suggest expanding the dataset to include real‑world documents such as newspapers, letters, and historical manuscripts to test the algorithm’s robustness across diverse writing styles. Moreover, because the core idea—leveraging connectivity and dynamic thresholds—does not rely on language‑specific heuristics, the method could be adapted to other Brahmic scripts (e.g., Hindi, Tamil, Malayalam) that share similar modifier‑encircling characteristics.

Conclusion
The study presents a novel, script‑aware segmentation technique that successfully addresses the unique challenges posed by offline handwritten Bengali text. By integrating region‑level connectivity information with locally adaptive thresholding, the algorithm achieves a high segmentation success rate of 97.7 %, outperforming traditional methods that struggle with the script’s encircling modifiers. The work lays a solid foundation for subsequent recognition stages and opens avenues for applying the same principles to other complex Indic scripts, moving the field closer to robust, multilingual handwritten OCR solutions.