A case study on English-Malayalam Machine Translation

Reading time: 6 minute
...

📝 Abstract

In this paper we present our work on a case study on Statistical Machine Translation (SMT) and Rule based machine translation (RBMT) for translation from English to Malayalam and Malayalam to English. One of the motivations of our study is to make a three way performance comparison, such as, a) SMT and RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to Malayalam RBMT and Malayalam to English RBMT. We describe the development of English to Malayalam and Malayalam to English baseline phrase based SMT system and the evaluation of its performance compared against the RBMT system. Based on our study the observations are: a) SMT systems outperform RBMT systems, b) In the case of SMT, English - Malayalam systems perform better than that of Malayalam - English systems, c) In the case RBMT, Malayalam to English systems are performing better than English to Malayalam systems. Based on our evaluations and detailed error analysis, we describe the requirements of incorporating morphological processing into the SMT to improve the accuracy of translation.

💡 Analysis

In this paper we present our work on a case study on Statistical Machine Translation (SMT) and Rule based machine translation (RBMT) for translation from English to Malayalam and Malayalam to English. One of the motivations of our study is to make a three way performance comparison, such as, a) SMT and RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to Malayalam RBMT and Malayalam to English RBMT. We describe the development of English to Malayalam and Malayalam to English baseline phrase based SMT system and the evaluation of its performance compared against the RBMT system. Based on our study the observations are: a) SMT systems outperform RBMT systems, b) In the case of SMT, English - Malayalam systems perform better than that of Malayalam - English systems, c) In the case RBMT, Malayalam to English systems are performing better than English to Malayalam systems. Based on our evaluations and detailed error analysis, we describe the requirements of incorporating morphological processing into the SMT to improve the accuracy of translation.

📄 Content

A case study on English-Malayalam Machine Translation

Sreelekha. S IIT Bombay India Pushpak Bhattacharyya IIT Bombay India sreelekha@cse.iitb.ac.in pb@cse.iitb.ac.in

Abstract In this paper we present our work on a case study on Statistical Machine Translation (SMT) and Rule based machine translation (RBMT) for translation from English to Malayalam and Malayalam to English. One of the motivations of our study is to make a three way performance comparison, such as, a) SMT and RBMT b) English to Malayalam SMT and Malayalam to English SMT c) English to Malayalam RBMT and Malayalam to English RBMT. We describe the development of English to Malayalam and Malayalam to English baseline phrase based SMT system and the evaluation of its performance compared against the RBMT system. Based on our study the observations are: a) SMT systems outperform RBMT systems, b) In the case of SMT, English - Malayalam systems perform better than that of Malayalam - English systems, c) In the case RBMT, Malayalam to English systems are performing better than English to Malayalam systems. Based on our evaluations and detailed error analysis, we describe the requirements of incorporating morphological processing into the SMT to improve the accuracy of translation.
1 Introduction In a large multi-lingual society like India, there is a great demand for translation of documents from one language to another. Most of the state governments work is in the respective regional languages whereas the Union Government’s official documents and reports are in bilingual form (English/Hindi). In order to have a proper communication there is a need to translate these documents and reports in the respective regional languages. The newspapers in regional languages are required to translate news in English received from International News Agencies. With the limitations of human translators most of this reports and documents are missing and not percolating down. A machine assisted translation system or a translator’s workstation would increase the efficiency of the human translators. As is clear from above, India is rich in linguistic divergence there are many morphologically rich languages which are quite different from English as well as from each other, there is a great need for machine translation between them.
There are many ongoing attempts to develop MT systems for regional languages using various approaches (Kunchukuttan et al., 2014). The approaches to machine translation are categorized as, Rule Based or Knowledge Driven approaches and Corpus Based or Data- Driven approaches. The RBMT approaches are further classified into Transfer based MT, Interlingua MT and Dictionary based MT, while the Corpus Based approaches are classified into Example Based MT and SMT. In the case of English to Indian languages and Indian to Indian languages, there have been fruitful attempts with all approaches (Antony, 2013; Sreelekha et al., 2013; Sreelekha et al., 2014). This paper discusses various approaches used in English to Malayalam and Malayalam to English MT systems.
The rest of the paper is as follows, Section 2 deals with challenges in MT, Section 3 deals with approaches in MT, RBMT and SMT, Section 4 deals with Experiments conducted, Evaluations and Error analysis which concludes the main components of the paper.

  1. Challenges in English–Malayalam MT

Major difficulties in Machine Translation are handling the structural difference between the two languages and handling the ambiguities.
2.1 Challenge of Ambiguity There are three types of ambiguities: structural ambiguity, lexical ambiguity and semantic ambiguity.

2.1.1 Lexical Ambiguity
Words and phrases in one language often have multiple meaning in another language. For example, the English sentence, English- His view was good Malayalam- അവന്റെ അഭിപ്രായം നല്ലതായിരുന്നു
{ avante abhiprayam nallathayirunnu} Here in the above sentence “view”, has ambiguity in meaning. It is not clear that whether the word “view”, is used as the “opinion” (“അഭിപ്രായം” {abhiprayam} in Malayalam) sense or the “eye sight” (“കാഴ്ച”{kazhcha} in Malayalam) sense. This kind of ambiguity has to be identified from the context.

2.1.2 Structural Ambiguity
In this case, due to the structural order, there will be multiple meanings. For example,
Malayalam-
അവിറെ വണ്ണമുള്ള രശുവും കാളയും ഉണ്ടായിരുന്നു {avide vannamulla pashuvum kalayum undayirunnu}
English- There were fat cows and buffalos there

Here from the words “വണ്ണമുള്ള രശുവും കാളയും”{vannamulla pashuvum kalayum} it is clear that, cows are fat but it is not clear that buffallos are fat, since in Malayalam to represent fat cows and buffalos only one word “വണ്ണമുള്ള” {vannamulla} {fat} is being used. It can have two interpret

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut