Spoken Language Translation (SLT) is becoming more widely used and becoming a communication tool that helps in crossing language barriers. One of the challenges of SLT is the translation from a language without gender agreement to a language with gender agreement such as English to Arabic. In this paper, we introduce an approach to tackle such limitation by enabling a Neural Machine Translation system to produce gender-aware translation. We show that NMT system can model the speaker/listener gender information to produce gender-aware translation. We propose a method to generate data used in adapting a NMT system to produce gender-aware. The proposed approach can achieve significant improvement of the translation quality by 2 BLEU points.
Nearly half the world languages have a grammatical gender system. For native speakers of these languages, violations of gender agreement are associated with a difficulty in comprehension. In one study [1] , gender agreement violations resulted in a delay of 500 to 700 ms. in response time while reading Spanish sentences. A similar study [2], has reached analogous conclusions for spoken language comprehension. These findings suggest that gender agreement violations place an additional cognitive overload on the listener.
In conversational settings, pronouns are frequently used referring to the speaker or addressing the listener(s). Pronominal gender agreement is particularly challenging for machine translation (MT), particularly when the source language does not have gender agreement while the target language does, which is the case for English to Arabic translation. The focus of this paper is to enable a SLT system to produce gender-aware translation for both parties participating in a conversation.
For instance, let us consider a SLT session involving English and French participants. If an English says: “I am certain”. The appropriate translation of the adjective “certain” to French depends on the speaker gender since French has a grammatical gender system. For a male speaker the correct translation is “Je suis certain”, while “Je suis certaine” is the correct form for a female speaker. Similarly, in Arabic, “I am certain” should be translated to متأكد" "أنا (?na mt?kd) or متأكدة" "أنا (?na mt?kdt) for a male or female speaker respectively. The listener’s gender would affect the translation as well. Let’s consider the translation of “You said it” into Arabic. For a male listener, it should be قلته" "أنت (?nt qlth) and for a female listener, the correct translation becomes قلتيه" "أنت (?nt qltyh). As the listener is also a speaker in conversational setting, the term “speakers gender agreement” here refers to both “speaker-dependent” and “listener-dependent” gender agreement unless making the distinction is necessary for the clarity of the presentation.
To assess the prevalence of speakers’ gender agreement in SLT, we have randomly selected 1000 sentences from the English-Arabic Open-Subtitles data [3]. These sentences were manually analyzed for speaker-dependent or listener-dependent gender agreement. More than half the sample contained at least one form of gender dependency. However, smaller number of sentences, had both speaker and listener dependency. Detailed findings are in Table 1. We also observed that the listener dependency is much more dominating than speaker dependency. Fortunately, speaker gender determination from speech has reached high accuracy even for relatively short speech segments [4] . Therefore, we can rely on having this information at runtime. However, training a SLT system would require gender tagged parallel sentences to be able to generate gender-aware translations. This is particularly important in the current pipelined approach to SLT, which combines a speech recognition component followed by machine translation, commonly used in large scale SLT systems. A promising direction is training end-to-end speech to speech translation systems [5] which is trained on source language audio and produces target language audio(or text). In such setting, the speaker’s gender information can be easily extracted from the source language audio. However, the listener’s gender information would still be required to be able to produce gender-aware SLT.
One of the main challenges in training gender-aware SLT is to find a large gender tagged parallel corpus that has both the speaker’s and listener’s gender information. To address this challenge, we propose an approach to automatically label a parallel conversational corpus with gender information. Applying this approach to the Open Subtitle data set has produced the training data needed for this work. The proposed approach uses a part-of-speech tagger and a set of rules to automatically tag sentences with speaker and listener genders. The tagged sentences are used to adapt a baseline neural MT system trained using sequence to sequence training with attention. This baseline system is trained using both gender dependent and gender independent sentences, then adapted using the sentences with identified gender dependence.
The main contribution of this paper is twofold: enabling NMT systems to produce gender-aware translation and provide a method to generate the data to achieve that. The remainder of this paper is structured as follows. Section 2 reviews some of the work on speaker gender determination from speech. Section 3 describes the sentence labelling process for speaker gender dependent and listener gender dependent utterance extraction. Section 4 outlines the NMT training and testing used. Section 5 summarizes the experiments we have conducted, and Section 6 concludes the paper.
Humans can easily identify the gender of the spe
This content is AI-processed based on open access ArXiv data.