Accurate and interpretable image-based diagnosis remains a fundamental challenge in medical AI, particularly under domain shifts and rare-class conditions. Deep learning models often struggle with real-world distribution changes, exhibit bias against infrequent pathologies, and lack the transparency required for deployment in safety-critical clinical environments. We introduce MedXAI (An Explainable Framework for Medical Imaging Classification), a unified expert knowledge based framework that integrates deep vision models with clinicianderived expert knowledge to improve generalization, reduce rareclass bias, and provide human-understandable explanations by localizing the relevant diagnostic features rather than relying on technical post-hoc methods (e.g., Saliency Maps, LIME). We evaluate MedXAI across heterogeneous modalities on two challenging tasks: (i) Seizure Onset Zone localization from resting-state fMRI, and (ii) Diabetic Retinopathy grading. Experiments on ten multicenter datasets show consistent gains, including a 3% improvement in cross-domain generalization and a 10% improvmnet in F1 score of rare class, substantially outperforming strong deep learning baselines. Ablations confirm that the symbolic components act as effective clinical priors and regularizers, improving robustness under distribution shift. MedXAI delivers clinically aligned explanations while achieving superior in-domain and cross-domain performance, particularly for rare diseases in multimodal medical AI.
💡 Deep Analysis
📄 Full Content
MedXAI: A Retrieval-Augmented and
Self-Verifying Framework for Knowledge-Guided
Medical Image Analysis
Midhat Urooj, Ayan Banerjee, Farhat Shaikh, Kuntal Thakur, Ashwith Poojary, Sandeep Gupta
Impact Lab
Arizona State University
Tempe, AZ, USA
Emails: {murooj, abanerj3, fshaik12, kthakur9, apoojar4, sandeep.gupta}@asu.edu
Abstract—Accurate and interpretable image-based diagnosis
remains a fundamental challenge in medical AI, particularly un-
der domain shifts and rare-class conditions. Deep learning mod-
els often struggle with real-world distribution changes, exhibit
bias against infrequent pathologies, and lack the transparency
required for deployment in safety-critical clinical environments.
We introduce MedXAI (An Explainable Framework for Med-
ical Imaging Classification), a unified expert knowledge based
framework that integrates deep vision models with clinician-
derived expert knowledge to improve generalization, reduce rare-
class bias, and provide human-understandable explanations by
localizing the relevant diagnostic features rather than relying on
technical post-hoc methods (e.g., Saliency Maps, LIME).
We evaluate MedXAI across heterogeneous modalities on
two challenging tasks: (i) Seizure Onset Zone localization from
resting-state fMRI, and (ii) Diabetic Retinopathy grading. Ex-
periments on ten multicenter datasets show consistent gains,
including a 3% improvement in cross-domain generalization
and a 10% improvmnet in F1 score of rare class, substantially
outperforming strong deep learning baselines. Ablations confirm
that the symbolic components act as effective clinical priors
and regularizers, improving robustness under distribution shift.
MedXAI delivers clinically aligned explanations while achieving
superior in-domain and cross-domain performance, particularly
for rare diseases in multimodal medical AI.
I. INTRODUCTION
Medical imaging is central to disease diagnosis and treat-
ment planning in conditions such as diabetic retinopathy (DR),
tumor detection, and neurodegenerative disorders. While deep
learning (DL) models, particularly Convolutional Neural Net-
works (CNNs) and Vision Transformers (ViTs), have achieved
remarkable predictive performance [1], [2], three key chal-
lenges limit their adoption in real-world clinical practice: (i)
interpretability, as DL models are often black boxes and post-
hoc explainability methods such as Grad-CAM [3] and SHAP
[4] remain heuristic, static, and disconnected from clinical
reasoning. Attention or uncertainty based methods [5], [6]
provide partial insight but do not leverage structured medical
knowledge, while reinforcement learning and meta-learning
approaches [7] allow adaptive predictions but lack clinically
grounded explanations. Existing model explainability in medi-
cal AI often uses technical terminology that does not align with
clinical language, making it difficult for healthcare profession-
Fig. 1.
Conceptual overview of the MedXAI framework. Knowledge ex-
traction is based on a Retrieval-Augmented and Self-Verifying Framework
through LLM.
als and patients to interpret. (ii) rare-class learning, because
clinically significant pathologies are often infrequent and het-
erogeneous, causing traditional DL models to underperform
in capturing nuanced visual and clinical patterns of minority
disease classes [8]; and (iii) cross-domain generalization, as
models trained on one institution’s data frequently fail on data
from other centers due to variations in acquisition protocols,
imaging devices, or patient demographics [9]–[11].
Rule-based and expert knowledge systems offer inter-
pretability but struggle to scale across heterogeneous popu-
lations and imaging protocols [12]–[15]. expert knowledge
based learning, which combines DL feature extraction with
symbolic reasoning, has emerged as a promising solution [16],
[17]. These systems leverage neural networks to capture com-
plex representations while encoding domain knowledge and
logical constraints to ensure clinically consistent reasoning.
Yet, existing expert knowledge based approaches rarely ad-
dress rare-class bias, intra-class variability, and cross-domain
generalization in a unified framework.
To address these limitations, we propose MedXAI, a ex-
pert knowledge based framework that seamlessly integrates
structured clinical knowledge with deep neural representations
in a scalable and interpretable manner. Clinical expertise is
extracted from Pubmed fetcher through an RAG connected
with an LLM in the knowledge extractor module. The frame-
work combines: (i) a data-driven neural branch that captures
complex imaging features, and (ii) a knowledge-informed
arXiv:2512.10098v1 [cs.LG] 10 Dec 2025
symbolic branch that encodes clinically derived rules. An
adaptive routing mechanism inspired by Hunt’s algorithm
constructs a decision tree of expert models, each specialized
for a specific class and drawing from both neural and symbolic
branches. The resulting diagnosis is then processed by a