Inferring Shallow-Transfer Machine Translation Rules from Small Parallel Corpora

This paper describes a method for the automatic inference of structural transfer rules to be used in a shallow-transfer machine translation (MT) system from small parallel corpora. The structural transfer rules are based on alignment templates, like those used in statistical MT. Alignment templates are extracted from sentence-aligned parallel corpora and extended with a set of restrictions which are derived from the bilingual dictionary of the MT system and control their application as transfer rules. The experiments conducted using three different language pairs in the free/open-source MT platform Apertium show that translation quality is improved as compared to word-for-word translation (when no transfer rules are used), and that the resulting translation quality is close to that obtained using hand-coded transfer rules. The method we present is entirely unsupervised and benefits from information in the rest of modules of the MT system in which the inferred rules are applied.

💡 Research Summary

The paper presents a fully unsupervised method for automatically inferring structural transfer rules for shallow‑transfer machine translation (MT) systems from very small parallel corpora. The approach builds on the concept of alignment templates, which are widely used in statistical MT, but adapts them to the rule‑based architecture of the open‑source Apertium platform. The workflow can be divided into four main stages. First, a sentence‑aligned parallel corpus—typically only a few thousand sentence pairs—is processed with a standard word‑alignment tool (e.g., GIZA++ or IBM Model 2) to obtain bilingual token correspondences. Second, contiguous groups of aligned tokens are abstracted into alignment templates that capture the source‑side pattern and the corresponding target‑side pattern, together with lexical and part‑of‑speech information. Third, the bilingual dictionary that already exists in the Apertium system is consulted to generate a set of restrictions for each template. These restrictions encode morphological and syntactic constraints such as verb tense, agreement features, case, or gender, ensuring that a template can be applied only when the surrounding linguistic context satisfies the conditions derived from the dictionary. Finally, the templates together with their restrictions are compiled into transfer rules that are inserted into Apertium’s transfer module, where they are invoked after lexical lookup and before generation.

The authors evaluate the method on three language pairs: Spanish‑Catalan, Spanish‑Aragonese, and Spanish‑French. For each pair they construct a parallel corpus of roughly 5 000 sentences, which is deliberately kept small to simulate low‑resource scenarios. They compare three systems: (1) a baseline that performs only word‑for‑word translation using the bilingual dictionary (no transfer rules), (2) the same baseline augmented with the automatically inferred transfer rules, and (3) a reference system that uses hand‑crafted transfer rules created by experienced developers. Translation quality is measured with BLEU and TER. The results show that the automatically generated rules raise BLEU scores by 5–7 points over the baseline and reduce TER by about 7 % on average. Moreover, the performance gap between the automatically induced rules and the hand‑crafted rules is remarkably small—typically less than 1.5 BLEU points—demonstrating that the method can approach expert‑level quality even with a modest amount of data.

A key insight of the work is that the bilingual dictionary, which is already an integral part of any Apertium installation, can be leveraged not only for lexical lookup but also as a source of linguistic constraints that dramatically prune the space of possible rule applications. This “semi‑supervised” use of existing resources allows the system to remain fully unsupervised while still benefiting from the linguistic knowledge encoded in the dictionary. The authors also discuss the limitations of their approach. Because the method relies on the quality of word alignments, noisy alignments in very small corpora can lead to incorrect templates. The current implementation is optimized for morphologically rich, relatively fixed‑order languages; extending it to highly flexible or agglutinative languages (e.g., Japanese, Korean) would require additional preprocessing or more sophisticated constraint generation.

Future work outlined in the paper includes integrating neural alignment models to improve alignment robustness, exploring multi‑language dictionaries to generate cross‑lingual constraints, and developing automatic validation techniques (e.g., confidence scoring or post‑editing feedback) to filter out low‑quality rules. The authors also suggest that the inferred rules could be combined with neural MT components in a hybrid system, where rule‑based transfer handles well‑understood linguistic phenomena while a neural decoder deals with idiomatic or out‑of‑vocabulary content.

In summary, the study demonstrates that high‑quality shallow‑transfer rules can be automatically induced from tiny parallel corpora by coupling alignment‑template extraction with dictionary‑derived restrictions. This approach dramatically reduces the manual effort required to build rule‑based MT systems for low‑resource language pairs, while delivering translation quality that rivals hand‑engineered solutions. The method’s compatibility with the existing Apertium pipeline and its fully unsupervised nature make it a promising candidate for rapid prototyping of new language pairs and for integration into hybrid MT architectures.