MediTools -- Medical Education Powered by LLMs
Artificial Intelligence (AI) has been advancing rapidly and with the advent of large language models (LLMs) in late 2022, numerous opportunities have emerged for adopting this technology across various domains, including medicine. These innovations hold immense potential to revolutionize and modernize medical education. Our research project leverages large language models to enhance medical education and address workflow challenges through the development of MediTools - AI Medical Education. This prototype application focuses on developing interactive tools that simulate real-life clinical scenarios, provide access to medical literature, and keep users updated with the latest medical news. Our first tool is a dermatology case simulation tool that uses real patient images depicting various dermatological conditions and enables interaction with LLMs acting as virtual patients. This platform allows users to practice their diagnostic skills and enhance their clinical decision-making abilities. The application also features two additional tools: an AI-enhanced PubMed tool for engaging with LLMs to gain deeper insights into research papers, and a Google News tool that offers LLM generated summaries of articles for various medical specialties. A comprehensive survey has been conducted among medical professionals and students to gather initial feedback on the effectiveness and user satisfaction of MediTools, providing insights for further development and refinement of the application. This research demonstrates the potential of AI-driven tools in transforming and revolutionizing medical education, offering a scalable and interactive platform for continuous learning and skill development.
💡 Research Summary
The paper presents MediTools, a prototype web application that leverages large language models (LLMs) to enhance medical education. Built with Python, Streamlit, and a suite of APIs, MediTools integrates three distinct tools: (1) a dermatology case simulation that uses real patient images from the public DermNet dataset, (2) an AI‑enhanced PubMed interface that retrieves articles, extracts metadata, fetches full‑text when available, and enables LLM‑driven summarization and question‑answer interaction, and (3) a Google News module that searches for recent medical news using the Google Serper API and generates concise AI‑written summaries via OpenAI’s GPT‑4o.
The system architecture separates the front‑end UI (Streamlit pages with HTML/CSS) from the back‑end logic, which relies on LangChain to orchestrate LLM calls, maintain chat memory, and format prompts. Users can select among multiple LLM providers (OpenAI, Anthropic, Meta) and switch models on the fly. Session state is stored in a dictionary to preserve variables such as selected model, chat history, and user credentials across page navigation. Speech‑to‑text and text‑to‑speech capabilities are provided through OpenAI’s Whisper‑1 and TTS APIs, allowing both typed and spoken interaction with the virtual patient.
In the dermatology simulator, a random image is chosen, its file path conveys the disease label, and a prompt constructs a virtual patient profile (name, personality, history). Users engage in a dialogue, request lab tests, and receive AI‑generated test results. When a diagnosis is submitted, the system uses fuzzy string matching (thefuzz token‑set ratio) with a 0.7 similarity threshold to determine correctness and then provides a performance report containing the correct diagnosis, a transcript, and targeted feedback. Two feedback modes are offered: end‑of‑session only, or real‑time feedback after each user utterance.
The PubMed tool first queries NCBI E‑utilities to obtain PMID lists, then parses XML metadata (title, authors, abstract). If a PubMed Central ID exists, the full text is retrieved via Diffbot, fed to the LLM, and the user can ask follow‑up questions or request a summary. The Google News tool lets users specify specialties, keywords, recency, and the number of desired summaries; the API returns article URLs, which are summarized by a LangChain‑wrapped GPT‑4o summarizer.
To evaluate usability and perceived educational value, the authors conducted an informal survey with ten healthcare professionals and students using Qualtrics. The questionnaire comprised 25 items covering demographics, tool‑specific feedback, and overall satisfaction. Data were analyzed in a Jupyter notebook using Pandas, NumPy, and visualization libraries (Matplotlib, Seaborn). Results indicated high satisfaction, perceived improvement in diagnostic reasoning, and enthusiasm for AI‑augmented learning, though the authors acknowledge the small convenience sample and lack of IRB oversight as limitations.
The discussion highlights the potential of LLM‑driven tools to reduce faculty time, provide scalable, on‑demand practice, and keep learners up‑to‑date with the latest literature and news. It also raises concerns about hallucinations, accuracy of medical content, data privacy, and copyright when using copyrighted articles or patient images. Future work is suggested to include larger, controlled studies, integration with formal curricula, robust validation of AI outputs, and expansion to multimodal interactions (e.g., image analysis). All source code and data are publicly available on GitHub, supporting reproducibility and community‑driven improvement.
In conclusion, MediTools demonstrates a feasible, modular approach to embedding LLM capabilities into medical education. The initial user feedback is promising, and with further refinement and rigorous validation, such platforms could accelerate the digital transformation of medical training.
Comments & Academic Discussion
Loading comments...
Leave a Comment