Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models

Reading time: 6 minute
...

📝 Original Info

  • Title: Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models
  • ArXiv ID: 2512.01892
  • Date: 2025-12-01
  • Authors: - Heloisa Candello (IBM Research, Brazil) - Muneeza Azmat (IBM Research, United States) - Uma Sushmitha Gunturi (IBM, United States) - Raya Horesh (IBM Research, United States) - Rogerio Abreu de Paula (IBM Research, Brazil) - Heloisa Pimentel (UNICAMP, Brazil) - Marcelo Carpinette Grave (IBM Research, Brazil) - Aminat Adebiyi (IBM Research, United States) - Tiago Machado (IBM, Brazil) - Maysa Malfiza Garcia de Macedo (IBM Research, Brazil) —

📝 Abstract

With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their 'aptitude' for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants' native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies. CCS Concepts: • Social and professional topics → Computing occupations; • Human-centered computing → User studies; • Computing methodologies → Information extraction; • Software and its engineering;

💡 Deep Analysis

Deep Dive into Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models.

With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their ‘aptitude’ for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants’ native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is

📄 Full Content

Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models HELOISA CANDELLO, IBM Research, Brazil MUNEEZA AZMAT, IBM Research, United States UMA SUSHMITHA GUNTURI, IBM, United States RAYA HORESH, IBM Research, United States ROGERIO ABREU DE PAULA, IBM Research, Brazil HELOISA PIMENTEL, UNICAMP, Brazil MARCELO CARPINETTE GRAVE, IBM Research, Brazil AMINAT ADEBIYI, IBM Research, United States TIAGO MACHADO, IBM, Brazil MAYSA MALFIZA GARCIA DE MACEDO, IBM Research, Brazil With the rapid uptake of generative AI, investigating human perceptions of generated responses has become crucial. A major challenge is their ‘aptitude’ for hallucinating and generating harmful contents. Despite major efforts for implementing guardrails, human perceptions of these mitigation strategies are largely unknown. We conducted a mixed-method experiment for evaluating the responses of a mitigation strategy across multiple-dimensions: faithfulness, fairness, harm-removal capacity, and relevance. In a within-subject study design, 57 participants assessed the responses under two conditions: harmful response plus its mitigation and solely mitigated response. Results revealed that participants’ native language, AI work experience, and annotation familiarity significantly influenced evaluations. Participants showed high sensitivity to linguistic and contextual attributes, penalizing minor grammar errors while rewarding preserved semantic contexts. This contrasts with how language is often treated in the quantitative evaluation of LLMs. We also introduced new metrics for training and evaluating mitigation strategies and insights for human-AI evaluation studies. CCS Concepts: • Social and professional topics →Computing occupations; • Human-centered computing →User studies; • Computing methodologies →Information extraction; • Software and its engineering; Additional Key Words and Phrases: Human-evaluation of LLM, Social Value Alignment, Guardrails ACM Reference Format: Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, and Maysa Malfiza Garcia de Macedo. 2018. Exploring Human Perceptions of AI Responses: Insights from a Mixed-Methods Study on Risk Mitigation in Generative Models. In Proceedings of Make sure to Authors’ Contact Information: Heloisa Candello, IBM Research, São Paulo, Brazil, heloisacandello@ibm.com; Muneeza Azmat, IBM Research, Yorktown Heights, New York, United States; Uma Sushmitha Gunturi, IBM, San Jose, California, United States; Raya Horesh, IBM Research, Yorktown Heights, New York, United States, rhoresh@us.ibm.com; Rogerio Abreu de Paula, IBM Research, São Paulo, SP, Brazil; Heloisa Pimentel, UNICAMP, São Paulo, São Paulo, Brazil; Marcelo Carpinette Grave, IBM Research, São Paulo, SP, Brazil; Aminat Adebiyi, IBM Research, Yorktown Heights, New York, United States; Tiago Machado, IBM, São Paulo, Brazil, tiago.machado@ibm.com; Maysa Malfiza Garcia de Macedo, IBM Research, São Paulo, SP, Brazil. Permission to make digital or hard copies of all or part of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage and that copies bear this notice and the full citation on the first page. Copyrights for components of this work owned by others than the author(s) must be honored. Abstracting with credit is permitted. To copy otherwise, or republish, to post on servers or to redistribute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@acm.org. © 2018 Copyright held by the owner/author(s). Publication rights licensed to ACM. Manuscript submitted to ACM Manuscript submitted to ACM 1 arXiv:2512.01892v1 [cs.CL] 1 Dec 2025 2 Heloisa Candello, Muneeza Azmat, Uma Sushmitha Gunturi, Raya Horesh, Rogerio Abreu de Paula, Heloisa Pimentel, Marcelo Carpinette Grave, Aminat Adebiyi, Tiago Machado, and Maysa Malfiza Garcia de Macedo enter the correct conference title from your rights confirmation email (Conference acronym ’XX). ACM, New York, NY, USA, 21 pages. https://doi.org/XXXXXXX.XXXXXXX 1 Introduction As generative AI systems become increasingly integrated into decision-making and communication platforms, ensuring their outputs are safe, fair, and contextually appropriate is critical. Generative AI systems may generate sentences with hallucinations [20, 22], produce offensive content [52]; and hiding strategies not aligned to human expectations [14, 29]. Model-related mitigation techniques have being created recently to assure the detection of harms [35], using adversarial training and [10]. Those approaches brought significant advances to mitigate LLM outputs [3, 27, 34, 44, 47, 50, 54] and additional challenges emerged to evaluate the real representation and quality of data being generated. To evaluate, at scale, the massive a

…(Full text truncated)…

📸 Image Gallery

methodology_phases.png methodology_phases.webp personas.png personas.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut