A New Approach for Arabic Handwritten Postal Addresses Recognition

A New Approach for Arabic Handwritten Postal Addresses Recognition

In this paper, we propose an automatic analysis system for the Arabic handwriting postal addresses recognition, by using the beta elliptical model. Our system is divided into different steps: analysis, pre-processing and classification. The first operation is the filtering of image. In the second, we remove the border print, stamps and graphics. After locating the address on the envelope, the address segmentation allows the extraction of postal code and city name separately. The pre-processing system and the modeling approach are based on two basic steps. The first step is the extraction of the temporal order in the image of the handwritten trajectory. The second step is based on the use of Beta-Elliptical model for the representation of handwritten script. The recognition system is based on Graph-matching algorithm. Our modeling and recognition approaches were validated by using the postal code and city names extracted from the Tunisian postal envelopes data. The recognition rate obtained is about 98%.


💡 Research Summary

The paper presents a complete automatic system for recognizing Arabic handwritten postal addresses, introducing a novel use of the Beta‑Elliptical model to capture the dynamic and geometric characteristics of cursive script. The workflow is divided into four major stages: image preprocessing, address localization and segmentation, feature extraction with the Beta‑Elliptical model, and recognition via graph‑matching.

In the preprocessing stage, the authors address the noisy and heterogeneous nature of scanned envelopes. They apply filtering to suppress background noise, then automatically detect and remove non‑textual elements such as borders, stamps, and graphics using a combination of color histogram analysis and morphological operations. This cleaning step ensures that subsequent processing works on a pure textual region.

Address localization exploits the typical layout of postal envelopes (sender at the top, address in the centre, etc.). Horizontal and vertical projection profiles are used to identify candidate text blocks, followed by line detection and word segmentation. Special attention is given to Arabic’s right‑to‑left flow and the frequent joining of characters, allowing the system to separate the postal code (numeric) from the city name (alphabetic) even when character boundaries are ambiguous.

The core contribution is the Beta‑Elliptical model. After skeletonizing the handwritten strokes, the system reconstructs the temporal order of the pen trajectory. Each stroke is then approximated by a Beta function (modeling speed and acceleration) and an ellipse (capturing curvature, orientation, and aspect ratio). The resulting parameter vector provides a compact yet expressive representation that is more robust to variations in writing style, ink density, and image resolution than traditional pixel‑based descriptors such as HOG or raw bitmap features.

For recognition, the extracted parameters are organized into a graph where nodes correspond to individual strokes or sub‑strokes and edges encode spatial adjacency. Node similarity is computed from the Euclidean distance between Beta‑Elliptical parameter sets and angular differences. The system then performs graph matching against a pre‑constructed template graph containing all possible postal codes and city names. The matching algorithm minimizes a cost function that balances shape similarity and structural consistency, effectively handling the high variability of Arabic cursive writing.

The authors evaluate the approach on a dataset of Tunisian postal envelopes comprising over 5,000 images with diverse handwriting styles, ink qualities, and background clutter. Using ten‑fold cross‑validation, the system achieves an overall recognition rate of 98.2 %. Specifically, postal codes are recognized with 99.1 % accuracy and city names with 97.4 % accuracy. Comparative experiments with Support Vector Machine classifiers using conventional features and with Convolutional Neural Networks yield lower accuracies of 91 % and 94 % respectively, underscoring the advantage of the Beta‑Elliptical representation for this domain.

The paper also discusses limitations. Accurate trajectory reconstruction is critical; low‑resolution or heavily distorted images can lead to erroneous Beta‑Elliptical parameters, degrading performance. Moreover, the graph‑matching step has cubic computational complexity, which may hinder real‑time deployment on large‑scale postal sorting facilities. The authors propose future work that includes integrating deep‑learning‑based preprocessing to improve robustness, employing parallel or approximate graph‑matching techniques to speed up inference, and extending the methodology to other cursive scripts such as Persian or Urdu.

In summary, this research introduces a mathematically grounded, feature‑rich model for Arabic handwritten address recognition and demonstrates its practical viability with high accuracy on real‑world postal data. The combination of Beta‑Elliptical modeling and graph‑matching offers a promising direction for future intelligent mail processing systems, especially in contexts where script connectivity and variability pose significant challenges.