A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties

Reading time: 5 minute
...

📝 Original Info

  • Title: A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties
  • ArXiv ID: 2512.08185
  • Date: 2025-12-09
  • Authors: ** - Jinghao Wang (The Ohio State University) - Ping Zhang (The Ohio State University) - Carter Yagemann (The Ohio State University) **

📝 Abstract

Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data -- barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk -- from high-risk domains such as emergency medicine and psychiatry to general practice -- addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems.

💡 Deep Analysis

Figure 1

📄 Full Content

A Practical Framework for Evaluating Medical AI Security: Reproducible Assessment of Jailbreaking and Privacy Vulnerabilities Across Clinical Specialties Jinghao Wang1, Ping Zhang1, and Carter Yagemann1 1The Ohio State University Abstract Medical Large Language Models (LLMs) are increasingly deployed for clinical decision support across diverse specialties, yet systematic evaluation of their robustness to adversarial misuse and privacy leakage remains inaccessible to most researchers. Existing security benchmarks require GPU clusters, commercial API access, or protected health data—barriers that limit community participation in this critical research area. We propose a practical, fully reproducible framework for evaluating medical AI security under realistic resource constraints. Our framework design covers multiple medical specialties stratified by clinical risk—from high-risk domains such as emergency medicine and psychiatry to general practice—addressing jailbreaking attacks (role-playing, authority impersonation, multi-turn manipulation) and privacy extraction attacks. All evaluation utilizes synthetic patient records requiring no IRB approval. The framework is designed to run entirely on consumer CPU hardware using freely available models, eliminating cost barriers. We present the framework specification including threat models, data generation methodology, evaluation protocols, and scoring rubrics. This proposal establishes a foundation for comparative security assessment of medical-specialist models and defense mechanisms, advancing the broader goal of ensuring safe and trustworthy medical AI systems. Keywords: Medical AI, Adversarial Attacks, AI Safety, Privacy, Jailbreaking, LLM Security, Reproducible Research, Clinical Specialties 1 Introduction Large Language Models are rapidly transforming healthcare across all clinical specialties [Singhal et al., 2024, 2023a]. GPT-4 achieves expert-level performance on medical licensing examinations [OpenAI, 2023], and AI assistants increasingly provide clinical decision support in domains ranging from emergency medicine to psychiatry. However, these systems face critical security vulnerabilities that directly threaten patient safety [Dong et al., 2024, Amodei et al., 2016]. The Problem. Medical AI systems face two critical security vulnerabilities. First, jailbreaking attacks bypass safety mechanisms through adversarial prompts, causing models to generate dangerous treatment recommendations or lethal drug information [Wei et al., 2023, Zou et al., 2023]. Zhang et al. [2024] demonstrated that medical-specialist models paradoxically show higher compliance with harmful requests than general models—domain knowledge amplifies rather than mitigates security risks. Second, privacy extraction attacks exploit the tendency of language models to memorize and 1 arXiv:2512.08185v1 [cs.CR] 9 Dec 2025 regurgitate training data [Carlini et al., 2021], creating HIPAA violations when models leak protected health information [U.S. Department of Health and Human Services, 2003]. Despite these critical risks, systematic security evaluation remains inaccessible to most researchers. Existing benchmarks such as HarmBench [Mazeika et al., 2024] and DecodingTrust [Wang et al., 2023] require GPU clusters, commercial API budgets, or access to protected health information. This accessibility barrier conflicts with the principle that security research benefits from broad participation [Ganguli et al., 2022]. Why This Matters. The consequences of medical AI security failures extend beyond typical AI risks to direct patient harm. Jailbreaking attacks that elicit dangerous medical advice can cause patient injury or death [Finlayson et al., 2019]. HIPAA violations carry penalties up to $1.5 million per incident [U.S. Department of Health and Human Services, 2003]. Critically, risks are not uniform across medical domains: emergency medicine involves time-critical decisions where errors can be immediately fatal, psychiatry deals with vulnerable populations, and pharmacology presents risks of dangerous drug interactions [Seyyed-Kalantari et al., 2021, Obermeyer et al., 2019]. A comprehensive security framework must therefore evaluate vulnerabilities across the spectrum of clinical practice. Contributions. We address this gap by proposing a practical framework for evaluating medical AI security that any researcher can replicate: 1. Multi-specialty threat model: Attack scenarios organized by clinical risk level and grounded in domain-specific risks identified by foundational medical AI research. 2. Accessible design: Framework designed to run on consumer hardware without GPU require- ments, using freely available models. 3. Synthetic data methodology: Patient record generation approach requiring no IRB approval, enabling fully reproducible evaluation. 4. Evaluation protocol: Standardized metrics and scoring rubrics adapted from established security research. 2 Related Work Foundations in AI Safe

📸 Image Gallery

Fig1.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut