Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support
Reading time: 5 minute
...
📝 Original Info
Title: Collaborative Causal Sensemaking: Closing the Complementarity Gap in Human-AI Decision Support
ArXiv ID: 2512.07801
Date: 2025-12-08
Authors: Raunak Jain, Mudita Khurana
📝 Abstract
LLM-based agents are increasingly deployed for expert decision support, yet human-AI teams in high-stakes settings do not yet reliably outperform the best individual. We argue this complementarity gap reflects a fundamental mismatch: current agents are trained as answer engines, not as partners in the collaborative sensemaking through which experts actually make decisions. Sensemaking (the ability to co-construct causal explanations, surface uncertainties, and adapt goals) is the key capability that current training pipelines do not explicitly develop or evaluate. We propose Collaborative Causal Sensemaking (CCS) as a research agenda to develop this capability from the ground up, spanning new training environments that reward collaborative thinking, representations for shared human-AI mental models, and evaluation centred on trust and complementarity. Taken together, these directions shift MAS research from building oracle-like answer engines to cultivating AI teammates that co-reason with their human partners over the causal structure of shared decisions, advancing the design of effective human-AI teams.
💡 Deep Analysis
📄 Full Content
Collaborative Causal Sensemaking: Closing the Complementarity
Gap in Human–AI Decision Support
Raunak Jain
Intuit
Mountain View, California, USA
raunak_jain1@intuit.com
Mudita Khurana
Airbnb
San Francisco, California, USA
mudita.khurana@airbnb.com
ABSTRACT
LLM-based agents are increasingly deployed for expert decision
support, yet human-AI teams in high-stakes settings do not yet reli-
ably outperform the best individual. We argue this complementarity
gap reflects a fundamental mismatch: current agents are trained
as answer engines, not as partners in the collaborative sensemak-
ing through which experts actually make decisions. Sensemaking
(the ability to co-construct causal explanations, surface uncertain-
ties, and adapt goals) is the key capability that current training
pipelines do not explicitly develop or evaluate. We propose Collab-
orative Causal Sensemaking (CCS) as a research agenda to develop
this capability from the ground up, spanning new training envi-
ronments that reward collaborative thinking, representations for
shared human-AI mental models, and evaluation centred on trust
and complementarity. Taken together, these directions shift MAS
research from building oracle-like answer engines to cultivating
AI teammates that co-reason with their human partners over the
causal structure of shared decisions, advancing the design of effec-
tive human–AI teams.
KEYWORDS
Human-AI Collaboration, Multi-Agent Systems, Collaborative Sense-
making, Causal Reasoning, Human–AI Complementarity, Trust
1
INTRODUCTION
Multi-agent systems (MAS) built from large language model (LLM)
agents are increasingly positioned as decision-support teammates
for humans in domains such as personalisation, planning, and multi-
objective optimisation, where consequences are delayed, uncertain,
and value-laden [1–5]. While AI assistants have unlocked productiv-
ity gains in verifiable domains like coding and translation, empirical
work in decision-making under uncertainty reveals a persistent com-
plementarity gap: where judgement is subjective and verification
is costly, human–AI teams frequently underperform the best indi-
vidual agent [6–10]. For next-generation MAS, this is not a minor
usability flaw but a core systems failure: agents that cannot sustain
calibrated, shared understanding with their human partners will
systematically mis-coordinate, even if their standalone predictions
are strong.
A growing body of studies documents characteristic failure
modes that undermine calibrated trust. Users over-weight confi-
dent model outputs even when these conflict with domain expertise,
exhibiting automation bias and over-reliance [11–14]. Verification-
and-correction loops can erase efficiency gains, as experts feel
compelled to second-guess model suggestions step by step [6, 7, 14].
Alignment methods that reward agreement and user satisfaction
can induce sycophancy, where models collapse to the user’s prior
beliefs even when these conflict with evidence [15, 16]. This is fatal
for sensemaking, which by definition requires the repair and re-
structuring of mental models, not merely their confirmation [17, 18].
The result is trust poorly calibrated to actual competence: humans
rely on agents for fluency rather than causal reasoning [19–21].
Current training pipelines do not address this. Preference-based
alignment (RLHF, DPO, and variants) shapes outputs toward helpful-
ness and safety [22–26]; reasoning methods (chain-of-thought, RL
with verifiable rewards, process supervision) make multi-step rea-
soning instrumentally useful [27–31]; and world-model approaches
train predictive models of environment dynamics [32, 33]. However,
these methods optimise for solitary performance: they align the
agent to a label, a verifier, or a simulator, not to the evolving mental
model of a partner. Any collaborative sensemaking that emerges
in current systems is incidental, not a first-class optimisation tar-
get. Richer ecologies offer a complementary lever: multi-agent and
open-ended environments show that strategies, tool use, and social
conventions emerge when long horizons, other agents, and strategic
feedback make them instrumentally valuable [34–38]. To achieve
genuine complementarity, we need training ecologies where col-
laborative friction (disagreement, clarification, and re-framing) can
emerge because the environment makes such behaviours reward-
ing.
Cognitive science shows that humans reason through structured
mental models [17, 39–41], and team effectiveness depends on these
models being sufficiently aligned [42–44]. Co-constructing causal
structure improves trust and decisions [45, 46]; constructivist ac-
counts show that learners acquire causal understanding by active
exploration, not passive instruction [47, 48]. In expert settings there
is no single canonical world model available during collaboration,
only perspectival models held by particular humans. To be effective,
an agent must align with the expert’s causal framing not