A word association network methodology for evaluating implicit biases in LLMs compared to humans
📝 Original Info
- Title: A word association network methodology for evaluating implicit biases in LLMs compared to humans
- ArXiv ID: 2510.24488
- Date: 2025-10-28
- Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (저자명, 소속, 기여도 등은 원문을 확인해 주세요.) **
📝 Abstract
As Large language models (LLMs) become increasingly integrated into our lives, their inherent social biases remain a pressing concern. Detecting and evaluating these biases can be challenging because they are often implicit rather than explicit in nature, so developing evaluation methods that assess the implicit knowledge representations of LLMs is essential. We present a novel word association network methodology for evaluating implicit biases in LLMs based on simulating semantic priming within LLM-generated word association networks. Our prompt-based approach taps into the implicit relational structures encoded in LLMs, providing both quantitative and qualitative assessments of bias. Unlike most prompt-based evaluation methods, our method enables direct comparisons between various LLMs and humans, providing a valuable point of reference and offering new insights into the alignment of LLMs with human cognition. To demonstrate the utility of our methodology, we apply it to both humans and several widely used LLMs to investigate social biases related to gender, religion, ethnicity, sexual orientation, and political party. Our results reveal both convergences and divergences between LLM and human biases, providing new perspectives on the potential risks of using LLMs. Our methodology contributes to a systematic, scalable, and generalizable framework for evaluating and comparing biases across multiple LLMs and humans, advancing the goal of transparent and socially responsible language technologies.💡 Deep Analysis
📄 Full Content
Reference
This content is AI-processed based on open access ArXiv data.