📝 Original Info
- Title: A Systematic Analysis of Biases in Large Language Models
- ArXiv ID: 2512.15792
- Date: 2025-12-16
- Authors: ** - Xulang Zhang (장쉐) – Nanyang Technological University, Singapore - Rui Mao (마오루이) – Nanyang Technological University, Singapore - Erik Cambria – Nanyang Technological University, Singapore **
📝 Abstract
Large language models (LLMs) have rapidly become indispensable tools for acquiring information and supporting human decision-making. However, ensuring that these models uphold fairness across varied contexts is critical to their safe and responsible deployment. In this study, we undertake a comprehensive examination of four widely adopted LLMs, probing their underlying biases and inclinations across the dimensions of politics, ideology, alliance, language, and gender. Through a series of carefully designed experiments, we investigate their political neutrality using news summarization, ideological biases through news stance classification, tendencies toward specific geopolitical alliances via United Nations voting patterns, language bias in the context of multilingual story completion, and gender-related affinities as revealed by responses to the World Values Survey. Results indicate that while the LLMs are aligned to be neutral and impartial, they still show biases and affinities of different types.
Main Humans have the propensity to trust the suggestions and decisions made by automated systems to be neutral and reliable. However, as Large Language Models (LLMs) are progressively getting integrated into the daily lives and decision-making processes of users around the world, there are growing doubts about whether LLMs are able to give fair and unbiased responses [1] . As such, to the betterment of Artificial Intelligence (AI) safety, it is important to examine the various biases that may be propagated into LLMs during the training procedure, so as not to perpetuate prejudice, stereotypes, and harmful messaging to a global user base.
💡 Deep Analysis
📄 Full Content
A Systematic Analysis of Biases in Large
Language Models
Xulang Zhang, Rui Mao, Erik Cambria
Nanyang Technological University, Singapore, Singapore.
*Corresponding author(s). E-mail(s): cambria@ntu.edu.sg;
Contributing authors: xulang.zhang@ntu.edu.sg; rui.mao@ntu.edu.sg;
Abstract
Large language models (LLMs) have rapidly become indispensable tools for
acquiring information and supporting human decision-making. However, ensur-
ing that these models uphold fairness across varied contexts is critical to their
safe and responsible deployment. In this study, we undertake a comprehensive
examination of four widely adopted LLMs, probing their underlying biases and
inclinations across the dimensions of politics, ideology, alliance, language, and
gender. Through a series of carefully designed experiments, we investigate their
political neutrality using news summarization, ideological biases through news
stance classification, tendencies toward specific geopolitical alliances via United
Nations voting patterns, language bias in the context of multilingual story com-
pletion, and gender-related affinities as revealed by responses to the World Values
Survey. Results indicate that while the LLMs are aligned to be neutral and
impartial, they still show biases and affinities of different types.
1 Main
Humans have the propensity to trust the suggestions and decisions made by automated
systems to be neutral and reliable. However, as Large Language Models (LLMs) are
progressively getting integrated into the daily lives and decision-making processes of
users around the world, there are growing doubts about whether LLMs are able to give
fair and unbiased responses [1]. As such, to the betterment of Artificial Intelligence
(AI) safety, it is important to examine the various biases that may be propagated into
LLMs during the training procedure, so as not to perpetuate prejudice, stereotypes,
and harmful messaging to a global user base.
1
arXiv:2512.15792v1 [cs.CY] 16 Dec 2025
Existing works have explored different types of bias analyses on LLMs. Biases
against certain demographics have long been a focal point of research in this field,
e.g., gender bias [2–5], racial bias [6], ableist bias [7], and various harmful stereotypes
and biases in LLMs [8–12] and VLMs [13, 14]. With different probing methodolo-
gies, these works have consistently shown that LLMs have varying degrees of biases
on different subject matters, mirroring human prejudice and discrimination in their
decision-making and generated text. Furthermore, with the widespread use of LLMs
around the world, there is a growing scrutiny in the culture bias [15–19] of LLMs.
Notably, LLMs are susceptible to the semantic anglocentrism inherited from the pre-
dominantly US-based English training corpora [20–23]. It can be concluded that most
LLMs are ill-equipped to handle cultural nuances, not only because of the lack of
knowledge on certain cultural practices and conventions, but also the misalignment
with non-English-speaking cultures embedded in the semantic space. In a similar vein,
as interests grow in employing LLMs as tools for media bias analyses [24, 25], some
works explored the inherent political bias of LLMs by prompting LLMs to generate a
stance on selected political and ideological questions and topics [26–29]. Other studies
investigated whether LLMs show signs of human-like social identity bias, confirming
that with explicitly or implicitly assigned identity, most LLMs exhibit a similar degree
of ingroup solidarity and outgroup hostility as humans demonstrate in the pretrain-
ing data and in real life [30, 31]. More interestingly, Laurito et al. showcased that
LLMs have AI-AI bias, where they consistently favor texts generated by LLMs over
humans [32].
Therefore, it is apparent that LLMs are not agents of fairness and neutrality that
we aim for them to be. Despite the plethora of existing works dissecting LLMs’ bias,
we have only scratched the surface. As many LLMs nowadays are becoming closed-
source or resource-demanding for local deployment and finetuning, it is important for
the research community to develop accessible ways to probe LLMs for their biases
in various domains. In this paper, we meticulously designed a set of experiments to
systematically examine the underlying biases and predispositions of four widely used
LLMs (namely, Qwen, DeepSeek, Gemini, and GPT) in areas of politics, ideology,
alliance, language, and gender. While some existing works explored some of these areas
individually, as introduced above, this paper aims to provide a more comprehensive
evaluation through rigorous experiment design, offering a novel perspective into the
nature of these biases presented in widely-used LLMs.
We examined the LLMs’ political leaning by analyzing their generated text from
the task of news summarization, showing that while the LLMs are, at large, politi-
cally neutral, some of them do show a slight inclination to certain political leanings.
We investi
Reference
This content is AI-processed based on open access ArXiv data.