SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers

SocialVeil: Probing Social Intelligence of Language Agents under Communication Barriers
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models (LLMs) are increasingly evaluated in interactive environments to test their social intelligence. However, existing benchmarks often assume idealized communication between agents, limiting our ability to diagnose whether LLMs can maintain and repair interactions in more realistic, imperfect settings. To close this gap, we present \textsc{SocialVeil}, a social learning environment that can simulate social interaction under cognitive-difference-induced communication barriers. Grounded in a systematic literature review of communication challenges in human interaction, \textsc{SocialVeil} introduces three representative types of such disruption, \emph{semantic vagueness}, \emph{sociocultural mismatch}, and \emph{emotional interference}. We also introduce two barrier-aware evaluation metrics, \emph{unresolved confusion} and \emph{mutual understanding}, to evaluate interaction quality under impaired communication. Experiments across 720 scenarios and four frontier LLMs show that barriers consistently impair performance, with mutual understanding reduced by over 45% on average, and confusion elevated by nearly 50%. Human evaluations validate the fidelity of these simulated barriers (ICC$\approx$0.78, Pearson r$\approx$0.80). We further demonstrate that adaptation strategies (Repair Instruction and Interactive learning) only have a modest effect far from barrier-free performance. This work takes a step toward bringing social interaction environments closer to real-world communication, opening opportunities for exploring the social intelligence of LLM agents.


💡 Research Summary

The paper addresses a critical gap in the evaluation of large language models (LLMs) for social intelligence: most existing benchmarks assume ideal, noise‑free communication between agents, ignoring the myriad barriers that routinely affect human conversation. To bridge this gap, the authors introduce SocialVeil, an interactive social‑learning environment that deliberately injects three cognitively‑driven communication barriers—semantic vagueness, sociocultural mismatch, and emotional interference—into dialogues between two agents.

Barrier taxonomy and theoretical grounding
Through a systematic literature review, the authors identify the three barrier categories and anchor each in well‑established theories: Grice’s maxims and fuzzy logic for semantic vagueness, Brown & Levinson’s politeness theory and Sapir‑Whorf’s linguistic relativity for sociocultural mismatch, and affect‑cognition models (Eysenck, Gross, Lerner & Keltner) for emotional interference. Concrete real‑world examples illustrate how each barrier manifests in everyday speech.

Design and implementation
SocialVeil creates episodes adapted from the SOTOPIA benchmark, neutralizing public scenario descriptions to prevent goal leakage. Each episode involves a barrier agent and a partner agent; the barrier agent receives a style prompt (P_b) and a set of quantitative parameters (R_b) that operationalize the chosen barrier across four dimensions: narrative stance, interaction tactics, confusion mechanisms, and exemplar templates. The partner agent receives the standard prompt only, ensuring that any disruption originates solely from the barrier agent. This unilateral design mirrors realistic asymmetries (e.g., a colleague who is habitually indirect). Dialogue proceeds turn‑by‑turn until 20 turns or an early exit.

Evaluation protocol
Beyond conventional goal‑oriented metrics (goal completion, relationship quality, knowledge acquisition), SocialVeil introduces two barrier‑aware measures:

  1. Unresolved Confusion – a 5‑point Likert rating of remaining ambiguity at dialogue end.
  2. Mutual Understanding – a 5‑point Likert rating of alignment on shared context and goals.

Both are collected from human annotators; automatic scores are shown to correlate strongly with human judgments (Pearson r ≈ 0.80, ICC ≈ 0.78), confirming metric reliability.

Experimental setup
The authors generate 180 episodes for each barrier type and a baseline (no barrier), totaling 720 scenarios, and evaluate four frontier LLMs: GPT‑4o‑mini, GPT‑4o, Claude‑2, and Llama‑2‑70B. Two difficulty splits are used (standard + hard, hard‑only).

Key findings

  • Barrier impact – All three barriers significantly degrade performance. On average, mutual understanding drops by >45 %, and unresolved confusion rises by ~50 % compared with the barrier‑free condition. Semantic vagueness leads to a 58 % reduction in mutual understanding; emotional interference cuts relationship quality by 49 %.
  • Human validation – Annotators correctly identify barrier types with 68 % accuracy, and their ratings align closely with automatic metrics, supporting the ecological validity of the simulated barriers.
  • Adaptation strategies – Two interventions are tested: (a) Repair Instruction, a static prompt that tells the model to ask clarification questions; (b) Interactive Learning, a fine‑tuning loop where the model receives feedback from simulated partners. Both yield modest gains but fail to close the performance gap to the no‑barrier baseline, indicating that current LLMs lack robust repair mechanisms.

Limitations and future work
The study’s unilateral barrier injection does not capture scenarios where both interlocutors experience barriers simultaneously. The reliance on prompt‑based barrier instantiation may oversimplify the richness of human affect and cultural nuance. Future research directions include multi‑agent barrier interactions, graded barrier intensity, and validation with real human‑LLM conversations.

Conclusion
SocialVeil provides the first systematic, controllable framework for probing LLM social intelligence under realistic communication disruptions. By demonstrating that even state‑of‑the‑art models falter dramatically when faced with semantic, cultural, or emotional barriers, the work highlights a crucial frontier for developing truly resilient, socially aware AI systems.


Comments & Academic Discussion

Loading comments...

Leave a Comment