Multilingual LLMs Struggle to Link Orthography and Semantics in Bilingual Word Processing
Bilingual lexical processing is shaped by the complex interplay of phonological, orthographic, and semantic features of two languages within an integrated mental lexicon. In humans, this is evident in the ease with which cognate words - words similar in both orthographic form and meaning (e.g., blind, meaning “sightless” in both English and German) - are processed, compared to the challenges posed by interlingual homographs, which share orthographic form but differ in meaning (e.g., gift, meaning “present” in English but “poison” in German). We investigate how multilingual Large Language Models (LLMs) handle such phenomena, focusing on English-Spanish, English-French, and English-German cognates, non-cognate, and interlingual homographs. Specifically, we evaluate their ability to disambiguate meanings and make semantic judgments, both when these word types are presented in isolation or within sentence contexts. Our findings reveal that while certain LLMs demonstrate strong performance in recognizing cognates and non-cognates in isolation, they exhibit significant difficulty in disambiguating interlingual homographs, often performing below random baselines. This suggests LLMs tend to rely heavily on orthographic similarities rather than semantic understanding when interpreting interlingual homographs. Further, we find LLMs exhibit difficulty in retrieving word meanings, with performance in isolative disambiguation tasks having no correlation with semantic understanding. Finally, we study how the LLM processes interlingual homographs in incongruent sentences. We find models to opt for different strategies in understanding English and non-English homographs, highlighting a lack of a unified approach to handling cross-lingual ambiguities.
💡 Research Summary
The paper investigates how multilingual large language models (LLMs) handle lexical phenomena that are central to bilingual cognition: cognates (words sharing form and meaning across languages), non‑cognates (different forms but same meaning), and interlingual homographs (identical orthography but divergent meanings). Using three language pairs—English‑Spanish, English‑French, and English‑German—the authors construct three experimental tasks. Task A asks models to retrieve the meaning of isolated words; Task B requires disambiguation of homographs presented without context; Task C provides semantically constrained sentences and asks the model to select the appropriate meaning. Five publicly available multilingual LLMs (including mBERT, XLM‑R, LLaMA‑2‑13B‑multilingual, BLOOM‑z, and GPT‑3.5‑turbo) are evaluated in zero‑shot and few‑shot settings, with accuracy, F1, and comparison to a random baseline as metrics. Results show that all models perform well on cognates and non‑cognates in isolation (≈85 %+ accuracy), but struggle dramatically with homographs, often falling below the random baseline (≈30‑45 % accuracy). Even when strong sentence context is provided, performance improves only modestly, remaining under 60 %. Moreover, high scores on the isolated‑meaning task do not correlate with success on the contextual disambiguation task, suggesting that orthographic pattern learning and semantic understanding are largely decoupled. Model‑specific analysis reveals a systematic bias: English homographs are interpreted using an “English‑most‑likely” heuristic, whereas non‑English homographs are resolved by selecting the most frequent cross‑language token, indicating divergent internal strategies. The authors conclude that current multilingual LLMs rely heavily on shared orthographic cues and lack a unified mechanism for cross‑lingual semantic alignment. They propose future work to (1) train language‑specific meaning embeddings, (2) explicitly incorporate homograph‑type examples into pre‑training, and (3) develop architectures that jointly align form and meaning across languages. This study highlights a critical gap between human bilingual lexical processing and present‑day LLM capabilities, offering concrete directions for building more cognitively plausible multilingual AI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment