Human attribution of empathic behaviour to AI systems
Artificial intelligence systems increasingly generate text intended to provide social and emotional support. Understanding how users perceive empathic qualities in such content is therefore critical. We examined differences in perceived empathy signals between human-written and large language model (LLM)-generated relationship advice, and the influence of authorship labels. Across two preregistered experiments (Study 1: n = 641; Study 2: n = 500), participants rated advice texts on overall quality and perceived cognitive, emotional, and motivational empathy. Multilevel models accounted for the nested rating structure. LLM-generated advice was consistently perceived as higher in overall quality, cognitive empathy, and motivational empathy. Evidence for a widely reported negativity bias toward AI-labelled content was limited. Emotional empathy showed no consistent source advantage. Individual differences in AI attitudes modestly influenced judgments but did not alter the overall pattern. These findings suggest that perceptions of empathic communication are primarily driven by linguistic features rather than authorship beliefs, with implications for the design of AI-mediated support systems.
💡 Research Summary
**
This paper investigates how people attribute empathic qualities to advice texts that are either written by humans or generated by a large language model (LLM), focusing on relationship advice. Across two preregistered experiments (Study 1: n = 641; Study 2: n = 500), participants evaluated 100 advice excerpts (50 human‑written, 50 GPT‑4‑generated) on overall satisfaction and three empathy components: cognitive (recognizing the advisee’s mental state), emotional (sharing the advisee’s feelings), and motivational (the desire to help). The key manipulation was the authorship label presented to participants: “human‑written,” “AI‑generated,” or “no label” (Study 1 only). The actual source of each excerpt was orthogonal to the label, allowing the authors to disentangle the effects of perceived versus real authorship.
Methodologically, the authors selected the 50 most “human‑like” texts from each source based on prior human‑rating data, ensuring that any observed differences would not be driven by obvious stylistic cues. Participants, recruited via Prolific, were randomly assigned to a label condition and then rated five randomly chosen excerpts. Ratings were collected on 5‑point Likert scales, and attention checks were embedded. Individual‑difference measures included the Affinity for Technology Interaction (ATI), the AI Attitude Scale (AIAS‑4), and the Empathy Components Questionnaire (ECQ).
Statistical analysis employed multilevel linear models (lme4 in R) with random intercepts for both text item and rater, accounting for the nested rating structure. Fixed effects comprised label, actual source, and their interaction; significance was evaluated with an ANOVA and post‑hoc tests corrected for false discovery rate using the Benjamini‑Hochberg procedure.
Results consistently showed that LLM‑generated advice received higher overall satisfaction scores and higher ratings on cognitive and motivational empathy than human‑written advice, regardless of the label presented. The anticipated “negativity bias” toward AI‑labeled content was weak: the AI label did not significantly lower ratings compared with the human or no‑label conditions. Emotional empathy showed no reliable main effect of source or label, suggesting that this component is more dependent on the reader’s own affective disposition than on textual cues. Individual differences in technology affinity and AI attitudes modestly predicted higher ratings overall, but they did not interact with source or label to alter the main pattern.
Study 2 replicated the findings while removing the “no label” condition, confirming that the limited bias toward AI labels is robust across slightly different designs. The authors conclude that perceptions of empathic communication are driven primarily by linguistic features of the advice rather than by beliefs about its authorship. This challenges the prevailing view that people automatically downgrade AI‑generated empathetic content once they learn it is synthetic.
Implications are twofold. First, designers of AI‑mediated support systems can prioritize the quality of language (clarity, relevance, appropriate emotional wording) to enhance perceived empathy, without fearing a strong backlash once users become aware the content is AI‑generated. Second, because emotional empathy appears less susceptible to textual manipulation, future research should explore how interactive, multimodal cues (tone of voice, facial expression) might be needed to elicit genuine affective resonance.
Overall, the paper makes a valuable contribution by combining rigorous preregistration, robust multilevel modeling, and a novel context (relationship advice) to clarify how empathy attributions to AI systems operate. It suggests that, at least for written advice, AI can match or even surpass human performance on certain empathy dimensions, while the feared negativity bias is far more nuanced than previously thought.
Comments & Academic Discussion
Loading comments...
Leave a Comment