MelanomaNet: Explainable Deep Learning for Skin Lesion Classification
Automated skin lesion classification using deep learning has shown remarkable accuracy, yet clinical adoption remains limited due to the “black box” nature of these models. We present MelanomaNet, an explainable deep learning system for multi-class skin lesion classification that addresses this gap through four complementary interpretability mechanisms. Our approach combines an EfficientNet V2 backbone with GradCAM++ attention visualization, automated ABCDE clinical criterion extraction, Fast Concept Activation Vectors (FastCAV) for concept-based explanations, and Monte Carlo Dropout uncertainty quantification. We evaluate our system on the ISIC 2019 dataset containing 25,331 dermoscopic images across 9 diagnostic categories. Our model achieves 85.61% accuracy with a weighted F1 score of 0.8564, while providing clinically meaningful explanations that align model attention with established dermatological assessment criteria. The uncertainty quantification module decomposes prediction confidence into epistemic and aleatoric components, enabling automatic flagging of unreliable predictions for clinical review. Our results demonstrate that high classification performance can be achieved alongside comprehensive interpretability, potentially facilitating greater trust and adoption in clinical dermatology workflows. The source code is available at https://github.com/suxrobgm/explainable-melanoma
💡 Research Summary
MelanomaNet is an explainable deep‑learning framework for multi‑class skin lesion classification that couples high diagnostic performance with four complementary interpretability mechanisms. The core classifier uses EfficientNet‑V2‑M as a backbone, processing dermoscopic images at 384 × 384 pixels and outputting probabilities for eight ISIC‑2019 categories (Melanoma, Nevus, BCC, AK, BKL, VASC, DF, SCC). EfficientNet‑V2‑M provides a good trade‑off between accuracy and parameter efficiency, preserving fine‑grained dermoscopic details essential for clinical assessment.
Interpretability is addressed on four levels: (1) GradCAM++ generates class‑specific heatmaps that highlight the most influential image regions, offering sharper localization than standard Grad‑CAM. (2) An automated ABCDE analysis extracts the four classic clinical criteria—Asymmetry, Border irregularity, Color variation, Diameter—by first segmenting the lesion with Otsu thresholding and morphological refinement, then computing quantitative scores for each criterion. These scores are presented alongside the prediction, directly mapping model output to familiar dermatological reasoning. (3) Fast Concept Activation Vectors (FastCAV) adapt the TCA‑V approach for efficient concept‑based explanations. Linear classifiers are trained in the 1280‑dimensional feature space to separate concept‑positive from concept‑negative examples for each clinical attribute (asymmetry, irregular border, multicolor, large diameter). The resulting CAVs provide signed importance scores that indicate how strongly each concept supports or opposes a given prediction. (4) Monte Carlo Dropout (T = 10 stochastic forward passes) yields three uncertainty measures: predictive uncertainty (entropy of the averaged softmax), epistemic uncertainty (variance across passes), and aleatoric uncertainty (average entropy of individual passes). A threshold on predictive uncertainty (0.5) flags predictions as “UNRELIABLE,” prompting human review.
Training employs a weighted cross‑entropy loss to counteract the severe class imbalance (Nevus ≈ 51 % vs. Dermatofibroma < 1 %). AdamW with a cosine annealing schedule, mixed‑precision arithmetic, and extensive augmentations (flips, rotations, affine transforms, color jitter) are used over 100 epochs.
On the ISIC‑2019 test set (3,800 images), MelanomaNet achieves 85.61 % overall accuracy and a weighted F1 of 0.8564. Per‑class performance is strongest for Nevus (F1 0.91) and BCC (F1 0.89). Melanoma detection reaches F1 0.77 (precision 0.81, recall 0.75), while minority classes show variable results (e.g., DF recall 0.86 despite few samples). GradCAM‑ABCDE alignment metrics show that, on average, 60 % of the attention heatmap lies within the lesion mask and 53 % overlaps the border region, confirming that the model focuses on clinically relevant structures.
Qualitative examples illustrate the full explanation pipeline: a correctly classified nevus shows high confidence, low predictive uncertainty, GradCAM attention centered on the lesion, low ABCDE risk, and FastCAV scores indicating that large diameter and multicolor slightly support the prediction. Conversely, a melanoma case with 100 % confidence is flagged as uncertain due to high aleatoric uncertainty; FastCAV reveals that large diameter strongly drives the prediction while asymmetry and multicolor oppose it.
The discussion emphasizes that providing explanations in the language of dermatology (ABCDE scores, concept importance) and quantifying uncertainty bridges the gap between AI and clinicians, fostering trust and enabling safe human‑AI collaboration. Limitations include reliance on classical image‑processing for lesion segmentation, the need for expert‑defined concept labels for FastCAV, and the lack of prospective validation of uncertainty flags in real clinical workflows. Future work proposes integrating deep segmentation networks, more sophisticated Bayesian inference, and multimodal data (clinical metadata) to further improve both performance and interpretability.
Comments & Academic Discussion
Loading comments...
Leave a Comment