TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B

TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The 2025 Multimodal Models for Low-Resource Contexts and Social Impact (MMLoSo) Language Challenge addresses one of India’s most pressing linguistic gaps: the lack of resources for its diverse low-resource languages (LRLs). In this study, we investigate whether enforcing cross-lingual similarity in specific internal layers of a decoder-only multilingual large language model (LLM) can improve translation quality from LRL to high-resource language (HRL). Specifically, we combine Centered Kernel Alignment (CKA), a similarity metric that encourages representations of different languages to align, with REPINA, a regularization method that constrains parameter updates to remain close to the pretrained model, into a joint method we call TRepLiNa. In this research project, we experiment with zero-shot, few-shot, and fine-tuning settings using Aya-23 8B with QLoRA across MMLoSo shared task language pairs (Mundari, Santali, Bhili) with Hindi/English pivots. Our results show that aligning mid-level layers using TRepLiNa (CKA+REPINA) is a low-cost, practical approach to improving LRL translation, especially in data-scarce settings.


💡 Research Summary

This paper, titled “TRepLiNa: Layer-wise CKA+REPINA Alignment Improves Low-Resource Machine Translation in Aya-23 8B,” presents a novel method to enhance translation from low-resource languages (LRLs) to high-resource languages (HRLs) by explicitly aligning their internal representations within a multilingual large language model (LLM). The work is situated within the context of the MMLoSo 2025 challenge, which focuses on bridging linguistic resource gaps for Indian LRLs like Bhili, Mundari, Santali, and Gondi, using Hindi and English as pivot HRLs.

The core hypothesis is that multilingual decoder-only LLMs, such as Aya-23 8B, develop language-agnostic representations in their intermediate layers. The authors propose that actively guiding LRL representations to align with these shared, stable HRL representations at specific layers should improve cross-lingual transfer for translation. To operationalize this, they introduce TRepLiNa, a joint training objective that combines Centered Kernel Alignment (CKA) and Representation Projection Invariance (REPINA). CKA acts as a similarity loss that pulls the hidden states of parallel LRL and HRL sentences closer together at a chosen layer. REPINA acts as a stabilizer, penalizing deviations of the HRL representations from their original, pre-trained states, thus preventing the degradation of HRL knowledge while the LRL aligns to it.

Experiments were conducted in two main phases. First, a layer-wise sweep was performed using a small subset of data (1k sentence pairs) to identify the most effective layer for alignment. Results showed that while CKA-alone peaked at layer 10, the combined TRepLiNa (CKA+REPINA) method achieved its best performance at a deeper layer, 15. This indicates that REPINA’s stabilization effect allows for effective alignment at higher layers where CKA-alone might cause harmful drift in HRL features.

In the second phase, full fine-tuning using QLoRA was conducted on approximately 20k sentence pairs for up to 5 epochs, applying TRepLiNa at the identified optimal layer (15). The method was evaluated against strong baselines: standard QLoRA fine-tuning (NoAlign), REPINA-only regularization, and prompt-based (zero-shot and few-shot) approaches. TRepLiNa consistently outperformed these baselines on three out of four language pairs (Mundari→Hindi, Gondi→Hindi, Santali→English), demonstrating significant improvements in the composite score (0.6BLEU + 0.4ChrF). For instance, on Santali→English, TRepLiNa achieved a BLEU score of 25.24, a 2.27x relative improvement over a prior reported result. The sole exception was the Bhili→Hindi pair, where REPINA-only slightly outperformed TRepLiNa. The authors attribute this to the typological proximity of Bhili and Hindi, suggesting that for closely related languages, strong alignment pressure might oversmooth beneficial language-specific features, warranting a lower CKA weight (λ).

The study concludes that layer-wise alignment, particularly using the TRepLiNa framework at mid-level layers, is a low-cost and practical strategy for improving LRL-to-HRL translation in data-scarce scenarios. Key takeaways include the importance of stabilizing the HRL representation during alignment and the need to adjust the alignment strength based on the linguistic distance between the source and target languages. Limitations and future work include exploring alternative similarity metrics, more sophisticated hyperparameter scheduling, and human evaluation of translation quality.


Comments & Academic Discussion

Loading comments...

Leave a Comment