계층형 앵커 프로토타입 모듈레이션을 통한 당뇨망막병증 등급화

Reading time: 5 minute
...

📝 Abstract

Diabetic retinopathy (DR) grading plays a critical role in early clinical intervention and vision preservation. Recent explorations predominantly focus on visual lesion feature extraction through data processing and domain decoupling strategies. However, they generally overlook domain-invariant pathological patterns and underutilize the rich contextual knowledge of foundation models, relying solely on visual information, which is insufficient for distinguishing subtle pathological variations. Therefore, we propose integrating fine-grained pathological descriptions to complement prototypes with additional context, thereby resolving ambiguities in borderline cases. Specifically, we propose a Hierarchical Anchor Prototype Modulation (HAPM) framework to facilitate DR grading. First, we introduce a variance spectrum-driven anchor prototype library that preserves domain-invariant pathological patterns. We further employ a hierarchical differential prompt gating mechanism, dynamically selecting discriminative semantic prompts from both LVLM and LLM sources to address semantic confusion between adjacent DR grades. Finally, we utilize a two-stage prototype modulation strategy that progressively integrates clinical knowledge into visual prototypes through a Pathological Semantic Injector (PSI) and a Discriminative Prototype Enhancer (DPE). Extensive experiments across eight public datasets demonstrate that our approach achieves pathology-guided prototype evolution while outperforming state-of-the-art methods. The code is available at https://github.com/zhcz328/HAPM . CCS Concepts • Computing methodologies → Artificial intelligence; • Applied computing → Life and medical sciences.

💡 Analysis

Diabetic retinopathy (DR) grading plays a critical role in early clinical intervention and vision preservation. Recent explorations predominantly focus on visual lesion feature extraction through data processing and domain decoupling strategies. However, they generally overlook domain-invariant pathological patterns and underutilize the rich contextual knowledge of foundation models, relying solely on visual information, which is insufficient for distinguishing subtle pathological variations. Therefore, we propose integrating fine-grained pathological descriptions to complement prototypes with additional context, thereby resolving ambiguities in borderline cases. Specifically, we propose a Hierarchical Anchor Prototype Modulation (HAPM) framework to facilitate DR grading. First, we introduce a variance spectrum-driven anchor prototype library that preserves domain-invariant pathological patterns. We further employ a hierarchical differential prompt gating mechanism, dynamically selecting discriminative semantic prompts from both LVLM and LLM sources to address semantic confusion between adjacent DR grades. Finally, we utilize a two-stage prototype modulation strategy that progressively integrates clinical knowledge into visual prototypes through a Pathological Semantic Injector (PSI) and a Discriminative Prototype Enhancer (DPE). Extensive experiments across eight public datasets demonstrate that our approach achieves pathology-guided prototype evolution while outperforming state-of-the-art methods. The code is available at https://github.com/zhcz328/HAPM . CCS Concepts • Computing methodologies → Artificial intelligence; • Applied computing → Life and medical sciences.

📄 Content

Disease grading evaluates pathological severity in medical images, guiding clinical decisions and treatment plans. In diabetic retinopathy (DR), disease progression is classified into five categories (No DR, Mild NPDR, Moderate NPDR, Severe NPDR, and PDR) according to international standards (e.g. DRSS), requiring quantitative biomarker changes such as microaneurysm count and exudate volume for determination [1,18]. In practice, DR grading faces unique challenges: severity levels exhibit inherent semantic ambiguity, stemming from the continuity of disease progression and crossdomain heterogeneity, as shown in Figure 1. On one hand, adjacent levels may differ only by minor morphological changes; on the other hand, retinal images of the same severity level may have significantly different texture feature distributions due to equipment differences or imaging protocol variations from various institutions, making cross-domain grading tasks more complex [17,26].

DR grading methods have witnessed significant advancements in recent years [3,6,10]. However, existing approaches predominantly rely on data augmentation, domain decoupling or visual feature comparisons to mitigate distribution shifts. These methods fail to effectively mine the grade-invariant pathological patterns that persist across domains. In real-world applications, the following challenges arise: 1) Cross-domain sensitivity and long-tail distribution: Imaging differences across medical centers/devices and the long-tail nature of data distribution make model localization of key lesions (e.g. microaneurysms) susceptible to style interference [29,40]. 2) Progression boundary ambiguity: The high similarity between levels makes it difficult for traditional networks to distinguish minor but clinically significant pathological changes. 3) Underutilization of foundation models: Current approaches fail to leverage the rich contextual knowledge embedded in foundation models, overlooking the potential of pre-trained architectures, LVLMs, and LLMs to provide valuable pathological context for enhanced diagnostic accuracy. 4) Limited multimodal differentiation: Visual features alone often prove insufficient for distinguishing subtle pathological variations, whereas integrating fine-grained textual descriptions could provide complementary context to resolve ambiguities in borderline cases.

To overcome these limitations, our preliminary investigations demonstrate that using frozen self-supervised pre-trained models to drive prototype classification, when applied to cross-domain DR datasets, results in particularly poor discrimination between adjacent severity levels. This suggests fundamental representational inadequacies in capturing the subtle pathological variations critical for accurate DR staging. While semantics can serve as an additional supervisory signal to guide prototype evolution [33,49], we observed significant overlap and intersection of prompt embeddings across different grades, causing multi-level semantic confusion between adjacent DR severity levels. Therefore, we propose a Hierarchical Anchor Prototype Modulation (HAPM) framework for DR grading through principled representational refinement.

Specifically, we first construct a variance spectrum-driven anchor prototype library by selecting representative samples from each severity class that minimize intra-class feature embedding variance, thereby establishing preliminary domain-invariant pathological prototypes. To address division ambiguity, we design a hybrid prompt architecture that bridges global case priors from visionlanguage models (LVLM) and large language models (LLM) with lesion-specific features. This prompt generation system combines class-level LVLM prompts with fine-grained pathological descriptions from LLMs, creating a comprehensive prompt library that captures the semantic differences between adjacent DR grades. Furthermore, we introduce a differentiated grade description mechanism that precisely captures pathological feature differences between DR grades using a template for LLM. This generates discriminative description pairs that help differentiate between easily confused categories, particularly adjacent severity levels.

Finally, we implement a two-stage prototype modulation process through the Pathological Semantic Injector (PSI) and Discriminative Prototype Enhancer (DPE), which progressively integrate diverse description features and differentiated description features into the visual prototypes. The PSI module uses an attention-based mechanism to integrate diversified description features into initial prototypes, enabling precise mapping from macro-semantic descriptions to micro-pathological regions. The DPE module then further enhances these prototypes by incorporating differentiated descriptions through an adaptive weighting mechanism that establishes clearer decision boundaries between adjacent DR severity grades. By using a frozen self-supervised pre-trained model

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut