TFFM: Topology-Aware Feature Fusion Module via Latent Graph Reasoning for Retinal Vessel Segmentation
Precise segmentation of retinal arteries and veins carries the diagnosis of systemic cardiovascular conditions. However, standard convolutional architectures often yield topologically disjointed segmentations, characterized by gaps and discontinuities that render reliable graph-based clinical analysis impossible despite high pixel-level accuracy. To address this, we introduce a topology-aware framework engineered to maintain vascular connectivity. Our architecture fuses a Topological Feature Fusion Module (TFFM) that maps local feature representations into a latent graph space, deploying Graph Attention Networks to capture global structural dependencies often missed by fixed receptive fields. Furthermore, we drive the learning process with a hybrid objective function, coupling Tversky loss for class imbalance with soft clDice loss to explicitly penalize topological disconnects. Evaluation on the Fundus-AVSeg dataset reveals state-of-the-art performance, achieving a combined Dice score of 90.97% and a 95% Hausdorff Distance of 3.50 pixels. Notably, our method decreases vessel fragmentation by approximately 38% relative to baselines, yielding topologically coherent vascular trees viable for automated biomarker quantification. We open-source our code at https://tffm-module.github.io/.
💡 Research Summary
The paper addresses a critical limitation of current retinal artery‑vein segmentation models: while many deep convolutional networks achieve high pixel‑wise overlap scores, they often produce fragmented vessel maps that break the underlying vascular graph. Such topological discontinuities render downstream graph‑based analyses—essential for extracting clinical biomarkers—unreliable. To overcome this, the authors propose a topology‑aware segmentation framework that integrates a novel Topology‑Aware Feature Fusion Module (TFFM) with a hybrid loss function combining Tversky and soft clDice terms.
The backbone is a U‑Net++ architecture equipped with Attention Gates and an EfficientNet‑B0 encoder, selected after extensive ablation studies for its balance of parameter efficiency and fine‑grained feature extraction. TFFM operates at each decoder level. First, the decoder feature map is compressed via a 1×1 convolution, then two pooled representations are generated: a primary grid (F_main) that defines graph nodes and an auxiliary grid (F_aux) that preserves a minimal coarse resolution. Pairwise cosine similarity between node vectors yields an adjacency matrix, which is sparsified by keeping only the top‑k neighbors per node and adding self‑connections. A Graph Attention Network (GAT) processes this sparse graph, computing masked attention scores, applying a learnable temperature τ, and aggregating neighbor features. The resulting graph features are reshaped back to spatial dimensions, concatenated with the auxiliary grid, and refined through a 3×3 convolution followed by channel and spatial attention (CAM, SAM). A lightweight vesselness gating network further modulates the fused tensor, emphasizing vessel‑relevant regions. Finally, a learnable gated residual connection blends the graph‑enhanced features with the original local features, allowing the network to adaptively inject global connectivity cues while preserving local detail.
For training, the authors combine Tversky loss (α = 0.65, β = 0.35) to address the severe foreground‑background imbalance typical of retinal images, and soft clDice loss, which operates on differentiable soft skeletons of predictions and ground truth. The clDice component explicitly penalizes breaks in the vessel skeleton, encouraging the model to maintain continuous curvilinear structures. The total loss is L_total = L_Tversky + λ·L_clDice with λ = 0.5, a weighting empirically found to balance region accuracy and topological fidelity.
Experiments are conducted on the Fundus‑AVSeg dataset, a recent benchmark containing 100 high‑resolution fundus images with detailed artery, vein, crossing, and uncertain vessel annotations. Images are resized to 512 × 512, intensity‑normalized, and augmented with rotations, flips, contrast adjustments, Gaussian noise, and smoothing. The dataset is split 80 %/10 %/10 % for training, validation, and testing, respectively, using a fixed random seed for reproducibility. Training runs on an NVIDIA H200 GPU with AdamW optimizer (lr = 1e‑3), batch size 10, and up to 500 epochs with early stopping (patience = 10).
A comprehensive ablation study evaluates (i) encoder choice (EfficientNet‑B0 vs. B4), (ii) presence of TFFM, (iii) graph attention versus plain convolution, and (iv) loss composition. Results show that the full model (U‑Net++ + EfficientNet‑B0 + TFFM + GAT + hybrid loss) achieves a Dice score of 90.97 %, Hausdorff distance of 3.50 pixels, and a clDice index of 0.84, outperforming state‑of‑the‑art baselines. Notably, vessel fragmentation—measured as the number of disconnected components—drops by approximately 38 % compared to the best competing method, confirming the effectiveness of the graph‑based reasoning.
The authors also provide qualitative visualizations illustrating smoother, more continuous vessel trees, especially in thin capillaries and tortuous bifurcations where conventional CNNs typically fail. The code and pretrained models are publicly released at the provided URL, facilitating reproducibility and further research.
In summary, the paper makes four key contributions: (1) a latent‑graph feature fusion module that brings global vascular topology into the feature hierarchy, (2) the integration of graph attention mechanisms to dynamically model vessel connectivity, (3) a hybrid loss that simultaneously tackles class imbalance and topological errors, and (4) state‑of‑the‑art performance on a challenging retinal artery‑vein segmentation benchmark with demonstrable reductions in fragmentation. This work paves the way for reliable, topology‑preserving vessel segmentation, enabling downstream clinical pipelines such as automated biomarker extraction, disease risk stratification, and longitudinal vascular monitoring.
Comments & Academic Discussion
Loading comments...
Leave a Comment