From Retrieving Information to Reasoning with AI: Exploring Different Interaction Modalities to Support Human-AI Coordination in Clinical Decision-Making

From Retrieving Information to Reasoning with AI: Exploring Different Interaction Modalities to Support Human-AI Coordination in Clinical Decision-Making
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

LLMs are popular among clinicians for decision-support because of simple text-based interaction. However, their impact on clinicians’ performance is ambiguous. Not knowing how clinicians use this new technology and how they compare it to traditional clinical decision-support systems (CDSS) restricts designing novel mechanisms that overcome existing tool limitations and enhance performance and experience. This qualitative study examines how clinicians (n=12) perceive different interaction modalities (text-based conversation with LLMs, interactive and static UI, and voice) for decision-support. In open-ended use of LLM-based tools, our participants took a tool-centric approach using them for information retrieval and confirmation with simple prompts instead of use as active deliberation partners that can handle complex questions. Critical engagement emerged with changes to the interaction setup. Engagement also differed with individual cognitive styles. Lastly, benefits and drawbacks of interaction with text, voice and traditional UIs for clinical decision-support show the lack of a one-size-fits-all interaction modality.


💡 Research Summary

This paper investigates how clinicians interact with large language models (LLMs) for clinical decision support across three distinct interaction modalities: free‑text conversational chat, visual user interfaces (UIs) that present AI reasoning artifacts, and voice‑based systems resembling ambient scribes. Twelve physicians from varied specialties participated in a qualitative study in which they tackled complex clinical vignettes using each modality in turn. After each session, semi‑structured interviews and behavioral logs were analyzed to uncover patterns of use, cognitive load, and perceived trust.

The findings reveal a clear divergence in how each modality shapes the clinician‑AI partnership. In the text‑chat condition, participants treated the LLM primarily as an information‑retrieval tool. They issued short prompts, skimmed the model’s output for a single confirming cue, and moved on, rarely engaging the model in deeper reasoning. This “tool‑centric” behavior aligns with prior observations of automation bias, where users rely on AI outputs without thorough verification.

Conversely, the visual UI externalized the AI’s reasoning process. Diagnostic lists, evidence citations, and confidence scores were displayed side‑by‑side, allowing clinicians to compare multiple hypotheses, critique the AI’s logic, and iteratively refine the decision. Dynamic UI elements (e.g., drag‑and‑drop hypothesis reordering, clickable evidence) further encouraged critical engagement, supporting the formation of shared mental models between human and AI—a core tenet of human‑AI teaming literature.

The voice modality was viewed less favorably. While hands‑free input was appreciated, the auditory‑only output made it difficult to parse complex data, and ambient noise introduced additional cognitive load. Participants reported “attention‑switching costs” and occasional transcription errors, leading to a perception that voice is disruptive in high‑stakes decision making.

Individual cognitive styles moderated modality preferences. Visual‑dominant clinicians gravitated toward the UI, conversational‑oriented clinicians favored the chat, and multitasking clinicians saw voice as a supplemental input channel. Moreover, when the LLM was framed as a “specialist” and prompts were structured like formal clinical notes, participants engaged more deeply, suggesting that role framing and prompt scaffolding can shift the AI from a simple lookup engine to a collaborative reasoning partner.

Design implications emerge from these insights. First, a multimodal approach that separates input (voice) from output (visual reasoning artifacts) can balance convenience with the need for thorough evaluation. Second, positioning the LLM as a specialist and providing structured note‑style templates encourage richer interaction. Third, UI design should prioritize evidence‑based summaries and comparative lists over raw probability statements, which were found to increase cognitive load. Finally, systems should adapt to users’ cognitive preferences, offering customizable modality bundles.

In sum, the study demonstrates that the effectiveness of LLM‑augmented clinical decision support is less about the raw capabilities of the model and more about how interaction design makes AI reasoning visible, comparable, and open to challenge. A one‑size‑fits‑all interface does not exist; instead, context‑aware, multimodal, and user‑personalized designs are required to transform LLMs from fact‑retrieval tools into true collaborative partners in clinical reasoning.


Comments & Academic Discussion

Loading comments...

Leave a Comment