Arabic Character Segmentation Using Projection Based Approach with Profiles Amplitude Filter
📝 Abstract
Arabic is one of the languages that present special challenges to Optical character recognition (OCR). The main challenge in Arabic is that it is mostly cursive. Therefore, a segmentation process must be carried out to determine where the character begins and where it ends. This step is essential for character recognition. This paper presents Arabic character segmentation algorithm. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. This is done using profile’s amplitude filter and simple edge tool to find characters separations. Our algorithm shows promising performance when applied on different machine printed documents with different Arabic fonts.
💡 Analysis
Arabic is one of the languages that present special challenges to Optical character recognition (OCR). The main challenge in Arabic is that it is mostly cursive. Therefore, a segmentation process must be carried out to determine where the character begins and where it ends. This step is essential for character recognition. This paper presents Arabic character segmentation algorithm. The proposed algorithm uses the projection-based approach concepts to separate lines, words, and characters. This is done using profile’s amplitude filter and simple edge tool to find characters separations. Our algorithm shows promising performance when applied on different machine printed documents with different Arabic fonts.
📄 Content
Arabic Character Segmenta Mahmoud A. A. Mousa Dept. of Computer and Systems Engin Zagazig University, Zagazig, Egy mamosa@zu.edu.eg Abstract—Arabic is one of the languages th challenges to Optical character recognition ( challenge in Arabic is that it is mostly curs segmentation process must be carried out character’s start and end. This step is essen recognition. This paper presents Ar segmentation algorithm. The proposed alg projection-based approach concepts to separ and characters. This is done using profile’s and simple edge tool to find characters algorithm shows promising performance w different printed documents with different Ar Keywords—Character Segmentation, Ara Projection-Based Approach, Amplitude Filter I. INTRODUCTION Optical character recognition (OCR) is a image recognition that studies automatic done by taking an image of text writte language to be understood by the computer computer representation for this text. OCR vary according to the language which will b and the application in which this techniqu The ultimate goal of OCR is to imitate the read at a much faster rate by associating sy with images of characters. Arabic is one of the languages that challenges to OCR. The main challenge in A mostly cursive. Arabic is written by conn together to produce words or parts of words
- Arabic text is written from right to left. has 28 basic characters, of which 16 have f dots.
Arabic characters have many shapes an on their position in the word. For examp “noon” is written in the form of " “ﻧـat the middle, and " “ـﻦat the end of a word but th of this character is " .“نThe shape and th Figure 1. The characters connectivity of Ar ation Using Projection-Based Approach Amplitude Filter neering, ypt Mohammed S. Sayed and Mah Dept. of Electronics and Communi Zagazig University, Zaga msayed@zu.edu.eg, mabdal
hat present special (OCR). The main sive. Therefore, a to determine the ntial for character rabic character gorithm uses the rate lines, words, s amplitude filter separations. Our when applied on rabic fonts. abic Text OCR, r an application for reading. This is en in a specific and get the final R techniques may be used, its nature ue is applied [1]. human ability to ymbolic identities t present special Arabic is that it is necting characters s as shown in Fig. Arabic language from one to three nd depend mainly ple, the character e start, " “ـﻨـat the he separated form he size of Arabic characters vary with respect to their this is a great challenge in Arabic tex Because of the different nat characters may overlap vertical compounds of characters at certain word segments such as " , ﻣﺤـ ﺣﻤـ represented by single atomic grap Traditional Arabic font for examp graphemes, and another common fewer ligatures) like Simplified Ara graphemes [1, 17, 18]. Some Arabic characters have si “بand another characters have dou and other characters have triple do doted characters exhibit a big processed. This paper presents Arabic algorithm. The proposed algorithm approach concepts to separate line using profile’s amplitude filter and s of the paper is organized as foll different segmentation techniques. proposed algorithm. Section 4 dem performance analysis. Section 5 con II. SEGMENTATION In this part, methods of how to contains Arabic text into characte This is done using three seg segmentation, word segmenta segmentation. A. Line segmentation approaches: Projection-based approach; in w being summed along the horizontal this is referred as a horizontal proje along the vertical axis for each x val image and this is called vertical pro 18]. Smearing approach; in which along the horizontal direction are between the white space is calcula within a predefined threshold, it is The text lines are bounded with co pixels [6, 10, 11]. Grouping approach; in which constructed by grouping neighborin
Arabic text.
h with Profile’s
hmoud I. Abdalla
cations Engineering,
azig, Egypt
la@zu.edu.eg
r position in the word and
xt [1].
ture Arabic text fonts,
lly to produce certain
n positions of the Arabic
ﺣ , " ﻧﺠـwhich can be
phemes called ligatures.
ple contains around 220
less involved font (with
abic contains around 151
ingle dot such as " , ن , ج
uble dots such as " “ﺗـ , ﻳـ
ots such as " , ﺛـ
.“ﺷـ The
problem while being
character segmentation
uses the projection-based
es, word, and characters
simple edge tool. The rest
lows: Section 2 reviews
Section 3 presents the
monstrates the results and
ncludes this paper.
TECHNIQUES
o convert the image that
er images are discussed.
gmentation stages: line
ation,
and
character
which pixels of image are
axis for each y value and
ection [2-5, 10,14-18 ] or
lue on the segmented line
ojection [2, 3, 12, 13, 14-
consecutive black pixels
e smeared. The distance
ated. If the distance lies
filled with black pixels.
onnected shapes of black
text lines are it
This content is AI-processed based on ArXiv data.