Uncertainty-Aware Ordinal Deep Learning for cross-Dataset Diabetic Retinopathy Grading

Uncertainty-Aware Ordinal Deep Learning for cross-Dataset Diabetic Retinopathy Grading
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diabetes mellitus is a chronic metabolic disorder characterized by persistent hyperglycemia due to insufficient insulin production or impaired insulin utilization. One of its most severe complications is diabetic retinopathy (DR), a progressive retinal disease caused by microvascular damage, leading to hemorrhages, exudates, and potential vision loss. Early and reliable detection of DR is therefore critical for preventing irreversible blindness. In this work, we propose an uncertainty-aware deep learning framework for automated DR severity grading that explicitly models the ordinal nature of disease progression. Our approach combines a convolutional backbone with lesion-query attention pooling and an evidential Dirichlet-based ordinal regression head, enabling both accurate severity prediction and principled estimation of predictive uncertainty. The model is trained using an ordinal evidential loss with annealed regularization to encourage calibrated confidence under domain shift. We evaluate the proposed method on a multi-domain training setup combining APTOS, Messidor-2, and a subset of EyePACS fundus datasets. Experimental results demonstrate strong cross-dataset generalization, achieving competitive classification accuracy and high quadratic weighted kappa on held-out test sets, while providing meaningful uncertainty estimates for low-confidence cases. These results suggest that ordinal evidential learning is a promising direction for robust and clinically reliable diabetic retinopathy grading.


💡 Research Summary

Diabetic retinopathy (DR) is a leading cause of preventable blindness, and early detection is essential for effective treatment. While deep learning models have achieved expert‑level performance in automated DR grading, two critical gaps remain: (1) the ordinal nature of DR severity is typically ignored, with most approaches treating the problem as a flat K‑class classification, and (2) medical decision‑making requires calibrated uncertainty estimates, especially under domain shift caused by varying imaging devices, lighting conditions, and patient demographics.

This paper introduces an uncertainty‑aware ordinal deep learning framework that simultaneously addresses both issues. The architecture consists of three main components. First, a ConvNeXt‑Base backbone extracts hierarchical feature maps; the authors deliberately use the Stage 2 output (64 × 64 spatial resolution) to preserve fine‑grained lesion information such as micro‑aneurysms and tiny hemorrhages. Second, a Lesion‑Query Attention Pooling (LQAP) module aggregates these high‑resolution features. A set of learnable query vectors undergo self‑attention to coordinate among themselves and then cross‑attention with the image tokens. Temperature scaling sharpens the attention distribution, encouraging each query to focus on distinct pathological regions (e.g., exudates, neovascularization). Visualizations of query activation maps demonstrate that the queries evolve from diffuse, anatomy‑focused patterns early in training to highly localized lesion‑specific patterns later, providing a degree of interpretability.

Third, an evidential ordinal head models each ordinal threshold with a 2‑dimensional Dirichlet (Beta) distribution. For threshold k, the network outputs non‑negative evidence e_k, which is transformed into concentration parameters α_k = e_k + 1. The expected probability of exceeding the threshold is ˆπ_k = α_{k,1}/(α_{k,0}+α_{k,1}). Class probabilities are recovered by enforcing monotonic consistency through cumulative differences, ensuring that the ordinal ordering is respected.

Training uses an ordinal evidential loss:

L_EDL = ∑{k} L_data(ˆπ_k, t_k) + λ(t) ∑{k} KL(Dir(α_k)‖Dir(1)).

The data term is a binary cross‑entropy (or BCE‑like) loss on each threshold; t_k are binary (or soft, when MixUp/CutMix is applied) target indicators of whether the true grade exceeds k. The KL regularizer pulls the Dirichlet toward a uniform prior when the model is uncertain, preventing over‑confident predictions. λ(t) is annealed over the first t_anneal epochs, allowing the network to first learn discriminative evidence and later enforce calibration, which is crucial under domain shift.

To avoid query collapse (multiple queries attending to the same region such as the optic disc), an auxiliary diversity loss penalizes cosine similarity above a margin m among query embeddings, encouraging broader spatial coverage.

Data preprocessing includes a two‑stage quality control pipeline: (1) global brightness filtering (mean intensity B_µ < τ_B = 15) removes severely under‑exposed images, and (2) Laplacian variance assesses sharpness to discard blurry samples. After quality control, images are resized, cropped to remove black borders, and subjected to stochastic augmentations (CLAHE, brightness/contrast jitter, hue/saturation shifts, Gaussian noise/blur, horizontal/vertical flips, MixUp, CutMix). These augmentations improve robustness to the heterogeneous acquisition conditions present across the three datasets.

The model is trained on a combined multi‑domain set comprising APTOS, Messidor‑2, and a subset of EyePACS. Evaluation is performed on held‑out test splits from each dataset, ensuring a genuine cross‑dataset assessment. The proposed method achieves an overall accuracy of 87.6 % and a quadratic weighted kappa of 0.94, comparable to or surpassing state‑of‑the‑art baselines. Importantly, the Dirichlet strength (total evidence) serves as an uncertainty proxy: images with high uncertainty scores correspond to low‑quality or ambiguous cases, indicating that the model can flag samples that merit human review.

Ablation studies confirm the contribution of each component: removing LQAP degrades performance, especially on small‑lesion detection; omitting the evidential head reduces calibration and inflates confidence on out‑of‑distribution images; disabling KL‑annealing leads to over‑confident predictions under domain shift.

The paper’s contributions are threefold: (1) an end‑to‑end ordinal evidential learning framework that jointly predicts DR severity and epistemic uncertainty, (2) a lesion‑query attention pooling mechanism that captures fine‑grained pathological cues while preserving global context, and (3) a thorough cross‑dataset evaluation demonstrating robust generalization across heterogeneous fundus imaging benchmarks.

Limitations include the computational overhead of the transformer‑based query decoder (memory and inference time scale with the number of queries), the need for dataset‑specific quality‑control thresholds, and the lack of external validation on completely unseen populations or prospective clinical trials. Future work could explore lightweight attention designs, meta‑learning for domain adaptation, and large‑scale prospective studies to assess real‑world impact.

Overall, the study showcases how integrating ordinal modeling with evidential uncertainty estimation can produce DR grading systems that are not only accurate but also trustworthy, addressing a key barrier to the deployment of AI‑assisted retinal screening in clinical practice.


Comments & Academic Discussion

Loading comments...

Leave a Comment