AfriNLLB: Efficient Translation Models for African Languages

AfriNLLB: Efficient Translation Models for African Languages
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

In this work, we present AfriNLLB, a series of lightweight models for efficient translation from and into African languages. AfriNLLB supports 15 language pairs (30 translation directions), including Swahili, Hausa, Yoruba, Amharic, Somali, Zulu, Lingala, Afrikaans, Wolof, and Egyptian Arabic, as well as other African Union official languages such as Arabic (MSA), French, Portuguese, and Spanish. Our training data covers bidirectional translation between English and 13 languages, and between French and two languages (Lingala and Wolof). AfriNLLB models are based on NLLB-200 600M, which we compress using iterative layer pruning and quantization. We fine-tune the pruned models on parallel corpora we curated for African languages, employing knowledge distillation from a larger teacher model. Our work aims at enabling efficient deployment of translation models for African languages in resource-constrained settings. Our evaluation results demonstrate that AfriNLLB models achieve performance comparable to the baseline while being significantly faster. We release two versions of the AfriNLLB models, a Transformers version that allows further fine-tuning and a CTranslate2 version for efficient inference. Moreover, we release all the training data that we used for fine-tuning the baseline and pruned models to facilitate further research.


💡 Research Summary

AfriNLLB introduces a suite of lightweight machine‑translation models specifically designed for African languages. Building on Meta’s NLLB‑200 600‑million‑parameter model, the authors apply iterative decoder‑layer pruning and FP16 quantization to drastically reduce model size and inference latency while preserving translation quality. The system supports 15 language pairs (30 directions), covering ten native African languages—Swahili, Hausa, Yoruba, Amharic, Somali, Zulu, Lingala, Afrikaans, Wolof, and Egyptian Arabic—and five African Union official languages (Modern Standard Arabic, French, Portuguese, Spanish, and English).

Data collection is a major contribution. The team harvested parallel corpora from OPUS, HuggingFace, GitHub, and other public sources, initially gathering 1.2 million sentence pairs across 11 African language pairs (English↔13 languages) and high‑resource language pairs. A four‑stage cleaning pipeline—rule‑based filtering, language detection (AfroLID for African languages, fastText for others), semantic similarity filtering using Sentence‑Transformers (LabSE for African languages, DistilUSE for high‑resource pairs), and reference‑free quality estimation (COMET for high‑resource, AfriCOMET‑QE‑STL for African languages)—reduces the corpus to 1.6 million high‑quality bidirectional examples (≈3.2 million after reversal). This careful curation mitigates the noise that typically plagues low‑resource MT datasets.

Model adaptation proceeds in two phases. First, the baseline NLLB‑200 600M model is fine‑tuned on the curated African data, improving its baseline performance on the target languages. Next, iterative layer pruning is performed exclusively on decoder layers: at each iteration, the authors evaluate the impact of removing each remaining layer on chrF++ scores (using the Flores200 dev set where African languages are targets) and prune the layer with the smallest degradation. They experiment with removing 4, 6, and 8 decoder layers, yielding models of 548 M, 498 M, and 448 M parameters respectively. After pruning, a short (one‑epoch) fine‑tuning restores most of the lost quality.

To further close the performance gap, sequence‑level knowledge distillation is employed. A larger NLLB‑200 3.3 B teacher generates synthetic translations for the same training data; after deduplication and the same four‑stage filtering, 568 k synthetic segments are mixed with authentic data for a second round of fine‑tuning. This step particularly benefits the low‑resource African pairs, where the distilled student models recover or even surpass the baseline quality.

Evaluation uses the Flores200 benchmark (dev for validation, devtest for final testing). Metrics include BLEU, chrF++, and COMET (or AfriCOMET for African languages). Results show that the pruned 548 M model (four decoder layers removed) achieves average chrF++ scores within 0.5 % of the original 600 M baseline while delivering a 23 % speedup in token‑per‑second throughput. When FP16 quantization is applied, throughput improves by up to 57 % with negligible quality loss. Across language directions, BLEU scores remain in the mid‑30s for English↔African pairs and mid‑20s for English↔French‑African pairs, comparable to the fine‑tuned baseline.

Ablation studies compare (i) middle‑layer pruning (removing a contiguous block of layers) versus importance‑guided iterative pruning, (ii) pruning encoder layers in addition to decoder layers, and (iii) varying the number of pruned decoder layers. Importance‑guided pruning consistently outperforms middle‑layer removal, and retaining all encoder layers proves beneficial, echoing findings from speech‑model compression literature.

Two deployment formats are released: a HuggingFace Transformers checkpoint for further research and fine‑tuning, and a CTranslate2 binary optimized for fast inference on GPUs (e.g., NVIDIA A40) with beam size 3 and large batch sizes. All code, processed datasets, and model weights are publicly available on GitHub and HuggingFace, encouraging reproducibility and community‑driven extensions.

Limitations include the lack of a suitable sentence‑embedding model for Lingala (preventing semantic filtering for that language), limited exploration of encoder‑layer pruning, and the absence of real‑world deployment benchmarks on edge devices or low‑power environments. Future work may address these gaps, expand language coverage, explore multimodal translation, and incorporate user‑feedback loops for continual learning.

In summary, AfriNLLB demonstrates that large multilingual models can be systematically compressed and adapted to serve low‑resource African languages without sacrificing translation quality, providing a practical, open‑source foundation for deploying MT services in resource‑constrained settings across the continent.


Comments & Academic Discussion

Loading comments...

Leave a Comment