Mind the Gap: Pitfalls of LLM Alignment with Asian Public Opinion
Large Language Models (LLMs) are increasingly being deployed in multilingual, multicultural settings, yet their reliance on predominantly English-centric training data risks misalignment with the diverse cultural values of different societies. In this paper, we present a comprehensive, multilingual audit of the cultural alignment of contemporary LLMs including GPT-4o-Mini, Gemini-2.5-Flash, Llama 3.2, Mistral and Gemma 3 across India, East Asia and Southeast Asia. Our study specifically focuses on the sensitive domain of religion as the prism for broader alignment. To facilitate this, we conduct a multi-faceted analysis of every LLM’s internal representations, using log-probs/logits, to compare the model’s opinion distributions against ground-truth public attitudes. We find that while the popular models generally align with public opinion on broad social issues, they consistently fail to accurately represent religious viewpoints, especially those of minority groups, often amplifying negative stereotypes. Lightweight interventions, such as demographic priming and native language prompting, partially mitigate but do not eliminate these cultural gaps. We further show that downstream evaluations on bias benchmarks (such as CrowS-Pairs, IndiBias, ThaiCLI, KoBBQ) reveal persistent harms and under-representation in sensitive contexts. Our findings underscore the urgent need for systematic, regionally grounded audits to ensure equitable global deployment of LLMs.
💡 Research Summary
The paper conducts a multilingual audit of five state‑of‑the‑art large language models (GPT‑4o‑Mini, Gemini‑2.5‑Flash, Llama 3.2, Mistral, and Gemma 3) to assess how well they align with public opinion on religious matters across India, East Asia, and Southeast Asia. Using nationally representative Pew Research surveys as ground truth, the authors translate the original questionnaire into local languages (Hindi, Korean, Japanese, Vietnamese, Thai, etc.) and prompt each model in both English and the native language. For every question the model’s token‑level log‑probabilities are extracted, yielding a probability distribution over answer choices that is compared to the survey distribution via Jensen‑Shannon Divergence (JSD) and Hellinger Distance.
Key findings:
- On non‑religious topics (economics, education, environment) the models show modest divergence (average JSD ≈ 0.12–0.18), indicating reasonable alignment.
- On religion‑related items, especially those concerning minority faiths (Sikhism, Zoroastrianism, minority Muslim sects) or sensitive inter‑faith dynamics, divergence spikes dramatically (JSD ≈ 0.35–0.48). Models systematically assign higher probability to negative stereotypes (e.g., associating Muslims with violence, portraying caste‑based discrimination as more prevalent).
- Prompting in the native language reduces overall divergence by about 0.07 on average, but the reduction is far smaller for the most biased items; the gap remains substantial.
- Demographic priming (pre‑pending age, gender, location) yields slight improvements for majority groups but does not significantly help minority religious communities.
- Four region‑specific bias benchmarks—CrowS‑Pairs, IndiBias, ThaiCLI, and KoBBQ—confirm the distributional findings: the models preferentially select negative framings as “more plausible” in 62 %–78 % of cases, with the strongest bias observed on items about Sunni‑Shia tensions and Hindu caste issues.
Methodological notes: the translation pipeline achieved high inter‑annotator agreement (Cohen’s κ = 0.82), yet cultural nuance loss is inevitable. Log‑probability comparison can be distorted when models refuse to answer or produce neutral responses, a common behavior on highly sensitive prompts. The Pew surveys, while weighted for national representativeness, may under‑sample rural or low‑internet populations, introducing a secondary source of bias.
The authors argue that simple interventions—local‑language prompting or demographic priming—are insufficient to close the alignment gap. They advocate for deeper solutions: (a) expanding region‑specific, high‑quality training data that include voices of religious minorities; (b) incorporating multilingual and culturally aware tokenization and attention mechanisms during pre‑training; (c) establishing continuous monitoring pipelines that combine quantitative divergence metrics with expert qualitative review; and (d) developing standardized prompting templates that embed cultural context without relying on ad‑hoc priming.
In conclusion, the study provides robust empirical evidence that current LLMs, despite strong performance on general tasks, fail to faithfully represent the religious attitudes of Asian populations, especially for minority groups. Without systematic, region‑grounded audits and targeted data/model interventions, these systems risk perpetuating and amplifying harmful stereotypes, undermining equitable AI deployment worldwide.
Comments & Academic Discussion
Loading comments...
Leave a Comment