MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions

MedClarify: An information-seeking AI agent for medical diagnosis with case-specific follow-up questions
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Large language models (LLMs) are increasingly used for diagnostic tasks in medicine. In clinical practice, the correct diagnosis can rarely be immediately inferred from the initial patient presentation alone. Rather, reaching a diagnosis often involves systematic history taking, during which clinicians reason over multiple potential conditions through iterative questioning to resolve uncertainty. This process requires considering differential diagnoses and actively excluding emergencies that demand immediate intervention. Yet, the ability of medical LLMs to generate informative follow-up questions and thus reason over differential diagnoses remains underexplored. Here, we introduce MedClarify, an AI agent for information-seeking that can generate follow-up questions for iterative reasoning to support diagnostic decision-making. Specifically, MedClarify computes a list of candidate diagnoses analogous to a differential diagnosis, and then proactively generates follow-up questions aimed at reducing diagnostic uncertainty. By selecting the question with the highest expected information gain, MedClarify enables targeted, uncertainty-aware reasoning to improve diagnostic performance. In our experiments, we first demonstrate the limitations of current LLMs in medical reasoning, which often yield multiple, similarly likely diagnoses, especially when patient cases are incomplete or relevant information for diagnosis is missing. We then show that our information-theoretic reasoning approach can generate effective follow-up questioning and thereby reduces diagnostic errors by ~27 percentage points (p.p.) compared to a standard single-shot LLM baseline. Altogether, MedClarify offers a path to improve medical LLMs through agentic information-seeking and to thus promote effective dialogues with medical LLMs that reflect the iterative and uncertain nature of real-world clinical reasoning.


💡 Research Summary

The paper introduces MedClarify, an information‑seeking AI agent designed to emulate the iterative history‑taking process that clinicians use to resolve diagnostic uncertainty. While large language models (LLMs) have shown promise for medical diagnosis, they typically operate in a static, single‑shot fashion: given a patient description, they output one diagnosis or a short list. This approach fails when the initial case is incomplete—a common situation in real practice—because many plausible conditions remain indistinguishable without targeted follow‑up questions.

MedClarify addresses this gap by first generating a differential diagnosis set with associated confidence scores using an LLM (the authors primarily use Llama 3.3‑70B). Each candidate diagnosis is mapped to an ICD‑11 code, allowing the system to compute a semantic similarity matrix between diagnoses. Instead of naïve entropy, the model calculates a “diagnostic entropy” that discounts uncertainty contributed by closely related conditions, thereby focusing on truly distinct disease branches.

The core of MedClarify is the Diagnostic Expected Information Gain (DEIG) module. For every candidate follow‑up question, DEIG estimates how much the posterior distribution over diagnoses would be expected to tighten after receiving an answer. The scoring function combines three terms: (1) the classic expected information gain, (2) a “disease divergence” term that rewards questions capable of eliminating multiple disease clusters, and (3) a “disease concentration” term that penalizes questions that only separate already low‑probability alternatives. The final selection rule is

 q* = arg max


Comments & Academic Discussion

Loading comments...

Leave a Comment