A case study on English-Malayalam Machine Translation
📝 Abstract
In this paper we present our work on a case study on Statistical Machine Translation (SMT) and Rule based machine translation (RBMT) for translation from English to Malayalam and Malayalam to English. One of the motivations of our study is to make a three way performance comparison, such as, a) SMT and RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to Malayalam RBMT and Malayalam to English RBMT. We describe the development of English to Malayalam and Malayalam to English baseline phrase based SMT system and the evaluation of its performance compared against the RBMT system. Based on our study the observations are: a) SMT systems outperform RBMT systems, b) In the case of SMT, English - Malayalam systems perform better than that of Malayalam - English systems, c) In the case RBMT, Malayalam to English systems are performing better than English to Malayalam systems. Based on our evaluations and detailed error analysis, we describe the requirements of incorporating morphological processing into the SMT to improve the accuracy of translation.
💡 Analysis
In this paper we present our work on a case study on Statistical Machine Translation (SMT) and Rule based machine translation (RBMT) for translation from English to Malayalam and Malayalam to English. One of the motivations of our study is to make a three way performance comparison, such as, a) SMT and RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to Malayalam RBMT and Malayalam to English RBMT. We describe the development of English to Malayalam and Malayalam to English baseline phrase based SMT system and the evaluation of its performance compared against the RBMT system. Based on our study the observations are: a) SMT systems outperform RBMT systems, b) In the case of SMT, English - Malayalam systems perform better than that of Malayalam - English systems, c) In the case RBMT, Malayalam to English systems are performing better than English to Malayalam systems. Based on our evaluations and detailed error analysis, we describe the requirements of incorporating morphological processing into the SMT to improve the accuracy of translation.
📄 Content
A case study on English-Malayalam Machine Translation
Sreelekha. S IIT Bombay India Pushpak Bhattacharyya IIT Bombay India sreelekha@cse.iitb.ac.in pb@cse.iitb.ac.in
Abstract
In this paper we present our work on a case
study on Statistical Machine Translation
(SMT) and Rule based machine translation
(RBMT) for translation from English to
Malayalam and Malayalam to English. One
of the motivations of our study is to make a
three way performance comparison, such
as, a) SMT and RBMT b) English to
Malayalam
SMT
and
Malayalam
to
English SMT c) English to Malayalam
RBMT and Malayalam to English RBMT.
We describe the development of English to
Malayalam and Malayalam to English
baseline phrase based SMT system and the
evaluation of its performance compared
against the RBMT system. Based on our
study the observations are: a) SMT systems
outperform RBMT systems, b) In the case
of SMT, English - Malayalam systems
perform better than that of Malayalam -
English systems, c) In the case RBMT,
Malayalam
to
English
systems
are
performing
better
than
English
to
Malayalam
systems.
Based
on
our
evaluations and detailed error analysis, we
describe the requirements of incorporating
morphological processing into the SMT to
improve the accuracy of translation.
1
Introduction
In a large multi-lingual society like India, there
is a great demand for translation of documents
from one language to another. Most of the state
governments work is in the respective regional
languages whereas the Union Government’s
official documents and reports are in bilingual
form (English/Hindi). In order to have a proper
communication there is a need to translate
these documents and reports in the respective
regional
languages.
The
newspapers
in
regional languages are required to translate
news in English received from International
News Agencies. With the limitations of human
translators most of this reports and documents
are missing and not percolating down. A
machine assisted translation system or a
translator’s workstation would increase the
efficiency of the human translators. As is clear
from above, India is rich in linguistic
divergence there are many morphologically
rich languages which are quite different from
English as well as from each other, there is a
great need for machine translation between
them.
There are many ongoing attempts to
develop MT systems for regional languages
using various approaches (Kunchukuttan et al.,
2014). The approaches to machine translation
are categorized as, Rule Based or Knowledge
Driven approaches and Corpus Based or Data-
Driven approaches. The RBMT approaches are
further classified into Transfer based MT,
Interlingua MT and Dictionary based MT,
while the Corpus Based approaches are
classified into Example Based MT and SMT.
In the case of English to Indian languages and
Indian to Indian languages, there have been
fruitful attempts with all approaches (Antony,
2013; Sreelekha et al., 2013; Sreelekha et al.,
2014).
This
paper
discusses
various
approaches used in English to Malayalam and
Malayalam to English MT systems.
The rest of the paper is as follows, Section
2 deals with challenges in MT, Section 3 deals
with approaches in MT, RBMT and SMT,
Section 4 deals with Experiments conducted,
Evaluations
and
Error
analysis
which
concludes the main components of the paper.
- Challenges in English–Malayalam MT
Major difficulties in Machine Translation
are handling the structural difference between
the
two
languages
and
handling
the
ambiguities.
2.1 Challenge of Ambiguity
There are three types of ambiguities: structural
ambiguity, lexical ambiguity and semantic
ambiguity.
2.1.1 Lexical Ambiguity
Words and phrases in one language often have
multiple meaning in another language.
For example, the English sentence,
English- His view was good
Malayalam-
അവന്റെ അഭിപ്രായം നല്ലതായിരുന്നു
{ avante abhiprayam nallathayirunnu}
Here in the above sentence “view”, has
ambiguity in meaning. It is not clear that
whether the word “view”, is used as the
“opinion”
(“അഭിപ്രായം” {abhiprayam}
in
Malayalam)
sense
or
the
“eye
sight”
(“കാഴ്ച”{kazhcha} in Malayalam) sense. This
kind of ambiguity has to be identified from the
context.
2.1.2 Structural Ambiguity
In this case, due to the structural order, there
will be multiple meanings. For example,
Malayalam-
അവിറെ വണ്ണമുള്ള രശുവും കാളയും ഉണ്ടായിരുന്നു
{avide vannamulla pashuvum kalayum undayirunnu}
English- There were fat cows and buffalos there
Here from the words “വണ്ണമുള്ള രശുവും കാളയും”{vannamulla pashuvum kalayum} it is clear that, cows are fat but it is not clear that buffallos are fat, since in Malayalam to represent fat cows and buffalos only one word “വണ്ണമുള്ള” {vannamulla} {fat} is being used. It can have two interpret
This content is AI-processed based on ArXiv data.