Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs

Can you map it to English? The Role of Cross-Lingual Alignment in Multilingual Performance of LLMs
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models (LLMs) can answer prompts in many languages, despite being trained predominantly on English; yet, the mechanisms driving this generalization remain poorly understood. This work asks: How does an LLM’s ability to align representations of non-English inputs to English impact its performance on natural language understanding (NLU) tasks? We study the role of representation alignment in instance-level task decisions, complementing prior analyses conducted both at the language level and task-independently. We introduce the Discriminative Alignment Index ($\DALI$) to quantify instance-level alignment across 24 languages other than English and three distinct NLU tasks. Results show that incorrect NLU predictions are strongly associated with lower representation alignment with English in the model’s middle layers. Through activation patching, we show that incorrect predictions in languages other than English can be fixed by patching their parallel English activations in the middle layers, thereby demonstrating the causal role of representation (mis)alignment in cross-lingual correctness.


💡 Research Summary

This paper investigates how a large language model’s (LLM) ability to align non‑English inputs with their English counterparts influences performance on natural language understanding (NLU) tasks. While LLMs are predominantly trained on English data, they often exhibit strong multilingual capabilities, yet the internal mechanisms enabling this transfer remain unclear. Prior work has measured cross‑lingual alignment at the language level using metrics such as MEXA, which correlates aggregate alignment scores with overall task accuracy. However, these studies do not address whether alignment at the level of individual test instances predicts correct answers.

To fill this gap, the authors introduce the Discriminative Alignment Index (DALI) and a stricter variant DALI st. For a given discriminative NLU instance (premise + multiple‑choice options), they extract hidden representations of each premise‑option pair in both English and a target language across all transformer layers. DALI assigns a binary score of 1 if the cosine similarity between the English‑target matched pair exceeds the similarity of any mismatched cross‑lingual pair; otherwise it is 0. DALI st adds an additional requirement that the matched similarity also surpasses all within‑language mismatched pairs, making the criterion more robust against anisotropic representation spaces. A task‑specific version of MEXA, called MEXA_T, is also defined for comparison.

The empirical study spans 24 non‑English languages and three multilingual NLU benchmarks: Belebele (reading comprehension), XStoryCloze (narrative understanding), and XCOPA (commonsense reasoning). For each language and benchmark, instances are split into Transfer Success (TS) – correctly answered in both English and the target language – and Transfer Failure (TF) – correctly answered in English but wrong in the target language. DALI, DALI st, and MEXA_T are computed for every transformer layer, yielding a binary alignment vector per instance. The layer with the highest overall alignment rate (λ_max) is identified, and the proportion of aligned instances at λ_max is compared between TS and TF groups using a one‑sided z‑test for proportions (α = 0.05). Across virtually all languages and tasks, TS instances show significantly higher alignment rates than TF instances, especially in middle layers (typically layers 8–12). The differences are statistically significant (p < 0.01), indicating that correct cross‑lingual transfer is strongly associated with better alignment in these intermediate representations.

To establish causality, the authors employ activation patching. For each TF instance, they run two forward passes: one in English (which yields the correct answer) and one in the target language (which yields an incorrect answer). They then replace the hidden activations at a specific layer λ of the target‑language pass with the corresponding activations from the English pass. The patched model’s prediction is examined; a successful patch flips the answer to the correct one. Patching is performed across all layers, and the flip rate is recorded. When the patched layer corresponds to the λ_max identified by DALI st = 1, the flip rate exceeds 70 %, whereas control patches using unrelated English instances (with the same answer token but different semantics) achieve flip rates below 15 %. This stark contrast demonstrates that aligned middle‑layer representations are not merely correlated with success but are causal mediators of correct cross‑lingual predictions.

The paper also discusses the anisotropy of transformer embeddings, which can inflate cosine similarities even for unrelated sentences. By binarizing DALI scores, the authors mitigate spurious high similarities. They note that DALI’s reliance on a small set of mismatched pairs (especially in two‑option tasks) can lead to false positives, which DALI st alleviates by incorporating intra‑language mismatches.

Key contributions and insights are:

  1. Introduction of instance‑level alignment metrics (DALI, DALI st, MEXA_T) that capture how well a non‑English representation aligns with its English counterpart within a specific NLU instance.
  2. Empirical evidence that higher alignment in middle transformer layers correlates with successful transfer (TS) across a wide range of languages and tasks.
  3. Causal validation via activation patching, showing that injecting aligned English activations into the target‑language forward pass can repair erroneous predictions, pinpointing the middle layers as the critical locus of alignment.
  4. Analysis of methodological trade‑offs (binary vs. continuous scores, effect of anisotropy, number of options) and recommendations for future multilingual LLM design, such as explicitly encouraging English‑centric alignment during fine‑tuning or adding alignment‑regularization modules targeting middle layers.

Overall, the study advances our understanding of multilingual LLM behavior by linking cross‑lingual representation alignment to concrete task outcomes at the granularity of individual examples, and by providing a clear experimental framework for probing and manipulating this alignment. This work paves the way for more principled approaches to improve low‑resource language performance through targeted alignment interventions.


Comments & Academic Discussion

Loading comments...

Leave a Comment