Metacognitive Demands and Strategies While Using Off-The-Shelf AI Conversational Agents for Health Information
As Artificial Intelligence (AI) conversational agents become widespread, people are increasingly using them for health information seeking. The use of off-the-shelf conversational agents for health information seeking could place high metacognitive demands (the need for extensive monitoring and control of one’s own thought process) on individuals, which could compromise their experience of seeking health information. However, currently, the specific demands that arise while using conversational agents for health information seeking, and the strategies people use to cope with those demands, remain unknown. To address these gaps, we conducted a think-aloud study with 15 participants as they sought health information using our off-the-shelf AI conversational agent. We identified the metacognitive demands such systems impose, the strategies people adopt in response, and propose considerations for designing beyond off-the-shelf interfaces to reduce these demands and support better user experiences and affordances in health information seeking.
💡 Research Summary
The paper investigates the metacognitive demands placed on users when they seek health information through off‑the‑shelf AI conversational agents (e.g., ChatGPT, Claude, Gemini) and the strategies people employ to cope with those demands. Recognizing that current general‑purpose chat interfaces are not designed for health‑focused tasks, the authors conducted a think‑aloud study with fifteen participants. Six clinically plausible health‑seeking scenarios (allergies, insomnia, digestive issues, migraines, diabetes, high blood pressure) were co‑designed with an internal‑medicine physician. Each participant was randomly assigned two scenarios and asked to imagine themselves in those situations while using a custom‑built conversational agent powered by the ChatGPT‑4o API. The custom UI mirrored typical commercial chat windows but added side‑by‑side scenario text and logged all interactions for analysis.
Qualitative coding of the think‑aloud transcripts revealed five major categories of metacognitive demand: (1) Goal formulation and question specification – users often began with vague queries and needed to iteratively refine prompts; (2) Comprehension and content verification – participants had to monitor their understanding of AI responses, especially when jargon or ambiguous statements appeared; (3) Trust calibration and uncertainty awareness – without explicit confidence cues, users judged reliability by internal heuristics, leading to over‑ or under‑confidence; (4) Information integration and contextual mapping – synthesizing multiple AI outputs into a coherent personal health narrative required active monitoring; and (5) Decision‑making and action planning – users ultimately decided whether to seek professional care or adjust self‑care based on the AI’s advice.
To manage these demands, participants employed seven strategies: (a) Prompt restructuring and segmentation, breaking complex questions into smaller parts; (b) External verification via web searches, medical websites, or consulting a professional; (c) Note‑taking and keyword logging to keep track of salient points; (d) Self‑assigned uncertainty labels (e.g., “possible,” “needs confirmation”) to flag doubtful information; (e) Comparing multiple AI answers by re‑asking or re‑phrasing the same question; (f) Visual/structural organization such as tables, lists, or mind‑maps; and (g) Time and effort budgeting, limiting the depth of inquiry to avoid fatigue.
The authors analyze the effectiveness of each strategy, noting that while external verification and multiple‑answer comparison reduced false confidence, they also introduced information overload and increased cognitive load. Prompt restructuring was helpful but required a baseline skill in formulating effective queries, which many participants lacked.
Based on these findings, the paper proposes five design considerations for future health‑focused conversational agents: (1) Provide prompt templates and guided question scaffolding to aid goal setting; (2) Display real‑time confidence scores, source citations, and uncertainty visualizations alongside AI responses; (3) Offer alternative answer options or “show other perspectives” to mitigate over‑reliance on a single synthesis; (4) Integrate metacognitive support tools such as reflection checklists or a “review my reasoning” button that prompts users to articulate their thought process; and (5) Include automatic summarization and structuring features (e.g., auto‑generated tables or bullet points) to ease information integration.
The study contributes (i) an empirically grounded taxonomy of metacognitive demands in health‑information seeking with AI chatbots, (ii) a catalog of user‑generated coping strategies, and (iii) concrete UI design recommendations aimed at reducing cognitive burden and improving safety. Limitations include the small sample size, a focus on relatively common health concerns, and the use of a single custom‑built interface rather than multiple commercial platforms. Future work should explore diverse user populations, more complex clinical scenarios, and longitudinal effects of metacognitive support features on health outcomes and trust calibration.
Comments & Academic Discussion
Loading comments...
Leave a Comment