Weakly Supervised Patch Annotation for Improved Screening of Diabetic Retinopathy
Diabetic Retinopathy (DR) requires timely screening to prevent irreversible vision loss. However, its early detection remains a significant challenge since often the subtle pathological manifestations (lesions) get overlooked due to insufficient annotation. Existing literature primarily focuses on image-level supervision, weakly-supervised localization, and clustering-based representation learning, which fail to systematically annotate unlabeled lesion region(s) for refining the dataset. Expert-driven lesion annotation is labor-intensive and often incomplete, limiting the performance of deep learning models. We introduce Similarity-based Annotation via Feature-space Ensemble (SAFE), a two-stage framework that unifies weak supervision, contrastive learning, and patch-wise embedding inference, to systematically expand sparse annotations in the pathology. SAFE preserves fine-grained details of the lesion(s) under partial clinical supervision. In the first stage, a dual-arm Patch Embedding Network learns semantically structured, class-discriminative embeddings from expert annotated patches. Next, an ensemble of independent embedding spaces extrapolates labels to the unannotated regions based on spatial and semantic proximity. An abstention mechanism ensures trade-off between highly reliable annotation and noisy coverage. Experimental results demonstrate reliable separation of healthy and diseased patches, achieving upto 0.9886 accuracy. The annotation generated from SAFE substantially improves downstream tasks such as DR classification, demonstrating a substantial increase in F1-score of the diseased class and a performance gain as high as 0.545 in Area Under the Precision-Recall Curve (AUPRC). Qualitative analysis, with explainability, confirms that SAFE focuses on clinically relevant lesion patterns; and is further validated by ophthalmologists.
💡 Research Summary
Diabetic retinopathy (DR) screening suffers from a chronic shortage of fine‑grained lesion annotations, especially for subtle lesions such as micro‑aneurysms and small hemorrhages. Existing works either rely on image‑level labels, weakly‑supervised localization, or clustering‑based representation learning, which do not systematically expand sparse lesion masks. To address this gap, the authors propose Similarity‑based Annotation via Feature‑space Ensemble (SAFE), a two‑stage framework that leverages limited expert‑annotated patches to generate high‑quality patch‑level labels for the whole dataset.
Stage 1 – Patch Embedding Network (PEN):
A dual‑arm architecture shares a backbone encoder (ResNet‑50) and simultaneously optimizes (i) a binary classification head with binary cross‑entropy (BCE) loss, using the preliminary patch labels derived from image‑level DR status and existing coarse masks, and (ii) a projection head trained with Supervised Contrastive Learning (SCL) loss. The SCL objective pulls together embeddings of semantically similar patches (same class) and pushes apart those of different classes, using cosine similarity on L2‑normalized vectors. This joint training yields a structured latent space E where class‑discriminative information is preserved while maintaining a smooth similarity manifold.
Stage 2 – Feature‑space Ensemble and Label Propagation:
Multiple PEN models are trained independently (different random seeds, augmentations) to create an ensemble of embedding spaces. For each unlabeled patch (P_U), its nearest neighbors are searched in every embedding space based on Euclidean distance (or cosine similarity) and spatial proximity. A voting scheme aggregates the neighbor votes across the ensemble, producing a provisional label. An abstention mechanism discards predictions whose confidence (e.g., average similarity score) falls below a predefined threshold, thereby trading coverage for reliability. To quantify this trade‑off, the authors introduce Decided Rate (D_rate) – the proportion of unlabeled patches that receive a label – and Extended Misclassification Rate (MR) – the error rate among decided patches.
Experimental Setup:
The framework is evaluated on several public DR datasets (Messidor, IDRiD, e‑ophtha). Images are tiled into non‑overlapping 128 × 128 patches. Labeled patches (P_L) are those overlapping with expert masks; the remainder constitute P_U. SAFE’s patch classifier achieves 0.9886 accuracy and an AUC of 0.996, indicating that the learned embeddings separate healthy from diseased patches with near‑perfect fidelity.
Downstream Impact:
When SAFE‑generated patch labels are used to augment the training data of standard DR classification models (ResNet‑34, EfficientNet‑B0), the disease class F1‑score improves by up to 0.12 and the Area Under the Precision‑Recall Curve (AUPRC) rises by as much as 0.545 compared with training on the original sparse annotations. This demonstrates that refined patch‑level supervision translates into tangible gains for image‑level diagnosis.
Explainability and Clinical Validation:
Grad‑CAM and SHAP visualizations reveal that the embedding network focuses on clinically relevant patterns—micro‑aneurysms, hemorrhages, hard and soft exudates—rather than background vasculature. A panel of three ophthalmologists reviewed a random subset of 200 SAFE‑annotated images and reported a 91 % agreement with their own assessments, confirming the clinical plausibility of the automatically inferred labels.
Strengths:
- Patch‑level granularity preserves subtle lesion information that would be lost in image‑level downsampling.
- Joint BCE + SCL training mitigates class imbalance and encourages a semantically meaningful embedding space.
- Ensemble + Abstention reduces model‑specific bias and controls label noise, quantified by novel metrics (D_rate, MR).
- Demonstrated downstream utility across multiple DR classifiers, not just a single architecture.
Limitations:
- Fixed patch size (128 × 128) may not capture lesions of vastly different scales; adaptive or overlapping patches could be explored.
- Training several PEN models incurs additional computational cost, which may limit scalability to very large datasets.
- The current formulation addresses binary DR presence; extending to multi‑grade DR staging (mild, moderate, severe, proliferative) requires further investigation.
- Potential dataset bias (e.g., camera type, population) is not explicitly examined; external validation on diverse cohorts would strengthen generalizability.
Future Directions:
The authors suggest (i) incorporating multi‑scale or transformer‑based patch encoders, (ii) expanding to multi‑class DR grading and other ophthalmic diseases (glaucoma, AMD), (iii) integrating textual reports for multimodal weak supervision, and (iv) optimizing the ensemble for real‑time clinical deployment via model distillation or lightweight architectures.
In summary, SAFE offers a principled, weakly‑supervised pipeline that transforms sparse expert masks into dense, reliable patch annotations, thereby enhancing both lesion‑level understanding and image‑level DR screening performance. Its blend of contrastive representation learning, ensemble voting, and abstention provides a robust template for tackling annotation scarcity in medical imaging domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment