A Systematic Analysis of Biases in Large Language Models

Reading time: 5 minute
...

📝 Original Info

  • Title: A Systematic Analysis of Biases in Large Language Models
  • ArXiv ID: 2512.15792
  • Date: 2025-12-16
  • Authors: ** - Xulang Zhang (장쉐) – Nanyang Technological University, Singapore - Rui Mao (마오루이) – Nanyang Technological University, Singapore - Erik Cambria – Nanyang Technological University, Singapore **

📝 Abstract

Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensuring that these models uphold fairness across varied contexts is critical to their safe and responsible deployment. In this study, we undertake a comprehensive examination of four widely adopted LLMs, probing their underlying biases and inclinations across the dimensions of politics, ideology, alliance, language, and gender. Through a series of carefully designed experiments, we investigate their political neutrality using news summarization, ideological biases through news stance classification, tendencies toward specific geopolitical alliances via United Nations voting patterns, language bias in the context of multilingual story completion, and gender-related affinities as revealed by responses to the World Values Survey. Results indicate that while the LLMs are aligned to be neutral and impartial, they still show biases and affinities of different types. Main Humans have the propensity to trust the suggestions and decisions made by automated systems to be neutral and reliable. However, as Large Language Models (LLMs) are progressively getting integrated into the daily lives and decision-making processes of users around the world, there are growing doubts about whether LLMs are able to give fair and unbiased responses [1] . As such, to the betterment of Artificial Intelligence (AI) safety, it is important to examine the various biases that may be propagated into LLMs during the training procedure, so as not to perpetuate prejudice, stereotypes, and harmful messaging to a global user base.

💡 Deep Analysis

📄 Full Content

A Systematic Analysis of Biases in Large Language Models Xulang Zhang, Rui Mao, Erik Cambria Nanyang Technological University, Singapore, Singapore. *Corresponding author(s). E-mail(s): cambria@ntu.edu.sg; Contributing authors: xulang.zhang@ntu.edu.sg; rui.mao@ntu.edu.sg; Abstract Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensur- ing that these models uphold fairness across varied contexts is critical to their safe and responsible deployment. In this study, we undertake a comprehensive examination of four widely adopted LLMs, probing their underlying biases and inclinations across the dimensions of politics, ideology, alliance, language, and gender. Through a series of carefully designed experiments, we investigate their political neutrality using news summarization, ideological biases through news stance classification, tendencies toward specific geopolitical alliances via United Nations voting patterns, language bias in the context of multilingual story com- pletion, and gender-related affinities as revealed by responses to the World Values Survey. Results indicate that while the LLMs are aligned to be neutral and impartial, they still show biases and affinities of different types. 1 Main Humans have the propensity to trust the suggestions and decisions made by automated systems to be neutral and reliable. However, as Large Language Models (LLMs) are progressively getting integrated into the daily lives and decision-making processes of users around the world, there are growing doubts about whether LLMs are able to give fair and unbiased responses [1]. As such, to the betterment of Artificial Intelligence (AI) safety, it is important to examine the various biases that may be propagated into LLMs during the training procedure, so as not to perpetuate prejudice, stereotypes, and harmful messaging to a global user base. 1 arXiv:2512.15792v1 [cs.CY] 16 Dec 2025 Existing works have explored different types of bias analyses on LLMs. Biases against certain demographics have long been a focal point of research in this field, e.g., gender bias [2–5], racial bias [6], ableist bias [7], and various harmful stereotypes and biases in LLMs [8–12] and VLMs [13, 14]. With different probing methodolo- gies, these works have consistently shown that LLMs have varying degrees of biases on different subject matters, mirroring human prejudice and discrimination in their decision-making and generated text. Furthermore, with the widespread use of LLMs around the world, there is a growing scrutiny in the culture bias [15–19] of LLMs. Notably, LLMs are susceptible to the semantic anglocentrism inherited from the pre- dominantly US-based English training corpora [20–23]. It can be concluded that most LLMs are ill-equipped to handle cultural nuances, not only because of the lack of knowledge on certain cultural practices and conventions, but also the misalignment with non-English-speaking cultures embedded in the semantic space. In a similar vein, as interests grow in employing LLMs as tools for media bias analyses [24, 25], some works explored the inherent political bias of LLMs by prompting LLMs to generate a stance on selected political and ideological questions and topics [26–29]. Other studies investigated whether LLMs show signs of human-like social identity bias, confirming that with explicitly or implicitly assigned identity, most LLMs exhibit a similar degree of ingroup solidarity and outgroup hostility as humans demonstrate in the pretrain- ing data and in real life [30, 31]. More interestingly, Laurito et al. showcased that LLMs have AI-AI bias, where they consistently favor texts generated by LLMs over humans [32]. Therefore, it is apparent that LLMs are not agents of fairness and neutrality that we aim for them to be. Despite the plethora of existing works dissecting LLMs’ bias, we have only scratched the surface. As many LLMs nowadays are becoming closed- source or resource-demanding for local deployment and finetuning, it is important for the research community to develop accessible ways to probe LLMs for their biases in various domains. In this paper, we meticulously designed a set of experiments to systematically examine the underlying biases and predispositions of four widely used LLMs (namely, Qwen, DeepSeek, Gemini, and GPT) in areas of politics, ideology, alliance, language, and gender. While some existing works explored some of these areas individually, as introduced above, this paper aims to provide a more comprehensive evaluation through rigorous experiment design, offering a novel perspective into the nature of these biases presented in widely-used LLMs. We examined the LLMs’ political leaning by analyzing their generated text from the task of news summarization, showing that while the LLMs are, at large, politi- cally neutral, some of them do show a slight inclination to certain political leanings. We investi

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut