Demo of Sanskrit-Hindi SMT System

Reading time: 3 minute
...

📝 Original Info

  • Title: Demo of Sanskrit-Hindi SMT System
  • ArXiv ID: 1804.06716
  • Date: 2018-04-19
  • Authors: Rajneesh Pandey, Atul Kr. Ojha, Girish Nath Jha

📝 Abstract

The demo proposal presents a Phrase-based Sanskrit-Hindi (SaHiT) Statistical Machine Translation system. The system has been developed on Moses. 43k sentences of Sanskrit-Hindi parallel corpus and 56k sentences of a monolingual corpus in the target language (Hindi) have been used. This system gives 57 BLEU score.

💡 Deep Analysis

📄 Full Content

Sanskrit and Hindi belong to an Indo-Aryan language family. Hindi is considered to be a direct descendant of an early form of Sanskrit, through Sauraseni Prakrit and 1 speaker in India. Today Hindi is widely spoken across the country as well as in some parts of countries like Mauritius etc. According to the Census of 2001 1 , India has more than 378,000,000 Hindi speakers. The knowledge or information source can be accessed by users through translation of the texts from Sanskrit to other languages. Development of a Machine Translation

The first step was the creation of parallel (Sanskrit-Hindi) corpus and monolingual corpus of the target language (Hindi). We prepared 43k sentences. For building the system, we followed the processes of tokenization of parallel and monolingual corpus, filtering out long sentences, the creation of language and translation model, tuning, testing, automatic and human evaluation.

In third phase experiments, we have got 57 BLEU score. We have also evaluated on human evaluation parameter to the last phase experiment. This Sanskrit-Hindi MT system was evaluated by three evaluators. They judged the MT output based on the adequacy and fluency. Adequacy and fluency are calculated based on score between 1-5 given by the evaluators. 91% Adequacy and 66.72% Fluency.

The MT system encountered several errors. But during the linguistics evaluation, we found the system is not able to produce correct output of target language in the case of Karka relational sentences, Complex sentences, and with Compounding and Sandhi words which reduced the systems accuracy around 68.43 out 100% Pandey 2016). It happens because the system was trained on very small size of corpus. So far this reason, the system is not able to generate. For example:

SaHiT attempts to translate Sanskrit text into Hindi language. It gives decent results as compared to previous rule-based MT system or others. Now days it produces 91% adequacy and 66.72% fluency. In future, we will collect more data to train on NMT approach and also work on improving the translation quality of complex and long sentences, and compounding problems etc.

http://mpinfo.org/News/SanskritNews.aspx

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut