Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support

Reading time: 5 minute
...

📝 Original Info

  • Title: Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support
  • ArXiv ID: 2512.07801
  • Date: 2025-12-08
  • Authors: Raunak Jain, Mudita Khurana

📝 Abstract

LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually make decisions. Sensemaking (the ability to co-construct causal explanations, surface uncertainties, and adapt goals) is the key capability that current training pipelines do not explicitly develop or evaluate. We propose Collaborative Causal Sensemaking (CCS) as a research agenda to develop this capability from the ground up, spanning new training environments that reward collaborative thinking, representations for shared human-AI mental models, and evaluation centred on trust and complementarity. Taken together, these directions shift MAS research from building oracle-like answer engines to cultivating AI teammates that co-reason with their human partners over the causal structure of shared decisions, advancing the design of effective human-AI teams.

💡 Deep Analysis

Figure 1

📄 Full Content

Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human–AI Decision Support Raunak Jain Intuit Mountain View, California, USA raunak_jain1@intuit.com Mudita Khurana Airbnb San Francisco, California, USA mudita.khurana@airbnb.com ABSTRACT LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reli- ably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemak- ing through which experts actually make decisions. Sensemaking (the ability to co-construct causal explanations, surface uncertain- ties, and adapt goals) is the key capability that current training pipelines do not explicitly develop or evaluate. We propose Collab- orative Causal Sensemaking (CCS) as a research agenda to develop this capability from the ground up, spanning new training envi- ronments that reward collaborative thinking, representations for shared human-AI mental models, and evaluation centred on trust and complementarity. Taken together, these directions shift MAS research from building oracle-like answer engines to cultivating AI teammates that co-reason with their human partners over the causal structure of shared decisions, advancing the design of effec- tive human–AI teams. KEYWORDS Human-AI Collaboration, Multi-Agent Systems, Collaborative Sense- making, Causal Reasoning, Human–AI Complementarity, Trust 1 INTRODUCTION Multi-agent systems (MAS) built from large language model (LLM) agents are increasingly positioned as decision-support teammates for humans in domains such as personalisation, planning, and multi- objective optimisation, where consequences are delayed, uncertain, and value-laden [1–5]. While AI assistants have unlocked productiv- ity gains in verifiable domains like coding and translation, empirical work in decision-making under uncertainty reveals a persistent com- plementarity gap: where judgement is subjective and verification is costly, human–AI teams frequently underperform the best indi- vidual agent [6–10]. For next-generation MAS, this is not a minor usability flaw but a core systems failure: agents that cannot sustain calibrated, shared understanding with their human partners will systematically mis-coordinate, even if their standalone predictions are strong. A growing body of studies documents characteristic failure modes that undermine calibrated trust. Users over-weight confi- dent model outputs even when these conflict with domain expertise, exhibiting automation bias and over-reliance [11–14]. Verification- and-correction loops can erase efficiency gains, as experts feel compelled to second-guess model suggestions step by step [6, 7, 14]. Alignment methods that reward agreement and user satisfaction can induce sycophancy, where models collapse to the user’s prior beliefs even when these conflict with evidence [15, 16]. This is fatal for sensemaking, which by definition requires the repair and re- structuring of mental models, not merely their confirmation [17, 18]. The result is trust poorly calibrated to actual competence: humans rely on agents for fluency rather than causal reasoning [19–21]. Current training pipelines do not address this. Preference-based alignment (RLHF, DPO, and variants) shapes outputs toward helpful- ness and safety [22–26]; reasoning methods (chain-of-thought, RL with verifiable rewards, process supervision) make multi-step rea- soning instrumentally useful [27–31]; and world-model approaches train predictive models of environment dynamics [32, 33]. However, these methods optimise for solitary performance: they align the agent to a label, a verifier, or a simulator, not to the evolving mental model of a partner. Any collaborative sensemaking that emerges in current systems is incidental, not a first-class optimisation tar- get. Richer ecologies offer a complementary lever: multi-agent and open-ended environments show that strategies, tool use, and social conventions emerge when long horizons, other agents, and strategic feedback make them instrumentally valuable [34–38]. To achieve genuine complementarity, we need training ecologies where col- laborative friction (disagreement, clarification, and re-framing) can emerge because the environment makes such behaviours reward- ing. Cognitive science shows that humans reason through structured mental models [17, 39–41], and team effectiveness depends on these models being sufficiently aligned [42–44]. Co-constructing causal structure improves trust and decisions [45, 46]; constructivist ac- counts show that learners acquire causal understanding by active exploration, not passive instruction [47, 48]. In expert settings there is no single canonical world model available during collaboration, only perspectival models held by particular humans. To be effective, an agent must align with the expert’s causal framing not

📸 Image Gallery

page_1.png page_2.png page_3.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut