Human-like Affective Cognition in Foundation Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Understanding emotions is fundamental to human interaction and experience. Humans easily infer emotions from situations or facial expressions, situations from emotions, and do a variety of other affective cognition. How adept is modern AI at these inferences? We introduce an evaluation framework for testing affective cognition in foundation models. Starting from psychological theory, we generate 1,280 diverse scenarios exploring relationships between appraisals, emotions, expressions, and outcomes. We evaluate the abilities of foundation models (GPT-4, Claude-3, Gemini-1.5-Pro) and humans (N = 567) across carefully selected conditions. Our results show foundation models tend to agree with human intuitions, matching or exceeding interparticipant agreement. In some conditions, models are ``superhuman’’ – they better predict modal human judgements than the average human. All models benefit from chain-of-thought reasoning. This suggests foundation models have acquired a human-like understanding of emotions and their influence on beliefs and behavior.

💡 Research Summary

This paper introduces a principled evaluation framework for testing affective cognition in modern foundation models. Drawing on psychological theories of appraisal and emotion, the authors construct an abstract causal template that links two appraisal dimensions, an outcome, an emotion, and (in multimodal conditions) a facial expression. By prompting large language models to populate this template, they automatically generate 1,280 diverse scenarios covering four inference tasks: predicting the emotion given appraisals and outcome, inferring each of the two appraisals given the other appraisal, outcome, and emotion, and predicting the outcome given appraisals and emotion. Ten background stories are created for each of two appraisal sets (goal‑congruence + perceived control, and safety + expectedness), yielding eight concrete scenarios per story and thus the full stimulus set.

Human participants (N = 567) answer the same questions, providing a baseline of average human judgments and inter‑participant agreement (IPA). Three state‑of‑the‑art models—GPT‑4, Claude‑3.5, and Gemini‑1.5‑Pro—are evaluated under two prompting regimes: a direct answer and a chain‑of‑thought (CoT) version that forces step‑by‑step reasoning. Experiments 1a/1b use text‑only stimuli, while 2a/2b add rendered facial expressions generated from Facial Action Units, allowing a multimodal assessment.

Results show that all models achieve high correlation with human averages, often matching or surpassing IPA. The CoT prompting consistently improves performance by 7–12 percentage points across tasks. In the emotion‑prediction and outcome‑prediction tasks, models even exceed the average human participant, a “superhuman” result. Appraisal‑prediction tasks are slightly more challenging but still exceed human agreement levels. The findings suggest that foundation models have acquired a human‑like conceptual understanding of how appraisals, outcomes, and emotions interrelate, and that they can reason about these relationships when guided by explicit chain‑of‑thought.

Key contributions include: (1) a theory‑driven, scalable benchmark for affective cognition; (2) an automated pipeline that uses LLMs to generate high‑quality, theory‑consistent stimuli; (3) a multimodal extension incorporating facial expressions; and (4) empirical evidence that current LLMs can perform affective inferences at or above human levels. Limitations are acknowledged: only two binary appraisal dimensions are explored, the emotion taxonomy is limited to a small set of discrete labels, and the study does not disentangle whether models truly “understand” emotions or merely retrieve patterns from training data. Future work should expand the dimensionality of appraisals, incorporate continuous and mixed‑emotion representations, and probe the internal mechanisms underlying affective reasoning.

Overall, the paper demonstrates that large language models are capable of human‑like affective cognition, especially when prompted to reason step‑by‑step, opening avenues for more emotionally intelligent AI assistants, empathetic dialogue systems, and AI‑augmented psychological support, while also highlighting the need for rigorous validation and ethical safeguards in affective AI applications.

Human-like Affective Cognition in Foundation Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment