Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge

Reading time: 5 minute
...

📝 Original Info

  • Title: Towards A Cultural Intelligence and Values Inferences Quality Benchmark for Community Values and Common Knowledge
  • ArXiv ID: 2512.05176
  • Date: 2025-12-04
  • Authors: Brittany Johnson, Erin Reddick, Angela D. R. Smith

📝 Abstract

Large language models (LLMs) have emerged as a powerful technology, and thus, we have seen widespread adoption and use on software engineering teams. Most often, LLMs are designed as "general purpose" technologies meant to represent the general population. Unfortunately, this often means alignment with predominantly Western Caucasian narratives and misalignment with other cultures and populations that engage in collaborative innovation. In response to this misalignment, there have been recent efforts centered on the development of "culturally-informed" LLMs, such as ChatBlackGPT, that are capable of better aligning with historically marginalized experiences and perspectives. Despite this progress, there has been little effort aimed at supporting our ability to develop and evaluate culturally-informed LLMs. A recent effort proposed an approach for developing a national alignment benchmark that emphasizes alignment with national social values and common knowledge. However, given the range of cultural identities present in the United States (U.S.), a national alignment benchmark is an ineffective goal for broader representation. To help fill this gap in this US context, we propose a replication study that translates the process used to develop KorNAT, a Korean National LLM alignment benchmark, to develop CIVIQ, a Cultural Intelligence and Values Inference Quality benchmark centered on alignment with community social values and common knowledge. Our work provides a critical foundation for research and development aimed at cultural alignment of AI technologies in practice.

💡 Deep Analysis

Figure 1

📄 Full Content

Towards A Cultural Intelligence and Values Inference Quality Benchmark for Community Values and Common Knowledge Brittany Johnson johnsonb@gmu.edu George Mason University USA Erin Reddick tech@chatblackgpt.com ChatBlackGPT USA Angela D.R. Smith adrsmith@utexas.edu University of Texas at Austin USA Abstract Large language models (LLMs) have emerged as a powerful technol- ogy, and thus, we have seen widespread adoption and use on soft- ware engineering teams. Most often, LLMs are designed as “general purpose” technologies meant to represent the general population. Unfortunately, this often means alignment with predominantly Western Caucasian narratives and misalignment with other cul- tures and populations that engage in collaborative innovation. In response to this misalignment, there have been recent efforts cen- tered on the development of “culturally-informed” LLMs, such as ChatBlackGPT, that are capable of better aligning with historically marginalized experiences and perspectives. Despite this progress, there has been little effort aimed at supporting our ability to develop and evaluate culturally-informed LLMs. A recent effort proposed an approach for developing a national alignment benchmark that emphasizes alignment with national social values and common knowledge. However, given the range of cultural identities present in the United States (U.S.), a national alignment benchmark is an ineffective goal for broader representation. To help fill this gap in this US context, we propose a replication study that translates the process used to develop KorNAT, a Korean National LLM alignment benchmark, to develop CIVIQ, a Cultural Intelligence and Values In- ference Quality benchmark centered on alignment with community social values and common knowledge. As a proof-of-concept for our approach, we focus our initial efforts on the Black community in the US and leverage both general purpose (e.g., ChatGPT) and culturally-informed (e.g., ChatBlackGPT) LLMs in our efforts. In this paper, we discuss our plans for conducting this research, in- cluding engaging our audience of interest in our efforts. Our work provides a critical foundation for research and development aimed at cultural alignment of AI technologies in practice. Keywords large language models, culturally-informed AI, benchmark Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. Conference’17, Washington, DC, USA © 2025 Copyright held by the owner/author(s). Publication rights licensed to ACM. ACM ISBN 978-x-xxxx-xxxx-x/YYYY/MM https://doi.org/10.1145/nnnnnnn.nnnnnnn ACM Reference Format: Brittany Johnson, Erin Reddick, and Angela D.R. Smith. 2025. Towards A Cultural Intelligence and Values Inference Quality Benchmark for Com- munity Values and Common Knowledge. In . ACM, New York, NY, USA, 6 pages. https://doi.org/10.1145/nnnnnnn.nnnnnnn 1 Introduction The advent of large language models (LLMs) has led to a shift in how we collaborate and engage in software engineering. LLMs can support natural language interactions for tasks such as facil- itating asynchronous communication and learning, information retrieval, analysis and summarization, and practical guidance across domains, a trait that makes them highly attractive and frequently integrated into everyday workflows [40], [24] , [9]. As a result, we have evolved to the era of AI-assisted collaboration and innova- tion [20], where artificial intelligence (AI) technologies like LLMs are rapidly being integrated to support collaborative tasks such as decision-making [19], communication [37], code review [10], and project management [26]. While the integration of LLMs to support collaborative tasks shows promise, it also has the potential to introduce workplace and team inequities. These inequities stem from both the nature of AI as a technology (e.g., centrality of data, nature of decision- making) and the broader context in which it has been developed and would potentially be used [13, 32]. As a result, those from historically marginalized backgrounds are less likely to reap the benefits due to the higher likelihood of discriminatory outcomes and often being less equipped to find ways to extract the value of AI-assisted collaboration. Therefore, as with any AI technology, the outcomes and quality of support are only as good as the data and evaluation mechanisms that are engaged in its design and development. One of the most common ways to assess the capabilities and risks assoc

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut