VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We introduce VascX models, a comprehensive set of model ensembles for analyzing retinal vasculature from color fundus images (CFIs). Annotated CFIs were aggregated from public datasets . Additional CFIs, mainly from the population-based Rotterdam Study were annotated by graders for arteries and veins at pixel level, resulting in a dataset diverse in patient demographics and imaging conditions. VascX models demonstrated superior segmentation performance across datasets, image quality levels, and anatomic regions when compared to existing, publicly available models, likely due to the increased size and variety of our training set. Important improvements were observed in artery-vein and disc segmentation performance, particularly in segmentations of these structures on CFIs of intermediate quality, common in large cohorts and clinical datasets. Importantly, these improvements translated into significantly more accurate vascular features when we compared features extracted from VascX segmentation masks with features extracted from segmentation masks generated by previous models. With VascX models we provide a robust, ready-to-use set of model ensembles and inference code aimed at simplifying the implementation and enhancing the quality of automated retinal vasculature analyses. The precise vessel parameters generated by the model can serve as starting points for the identification of disease patterns in and outside of the eye.

💡 Research Summary

The paper introduces VascX, a comprehensive suite of deep‑learning model ensembles designed to perform four core tasks required for retinal vascular analysis from color fundus images (CFIs): vessel segmentation, artery‑vein (A/V) classification, optic disc segmentation, and fovea localization. The authors argue that while numerous deep‑learning models have been proposed for each of these tasks, their generalizability is limited because publicly available training sets are small, homogeneous, and often captured with a single device or centered on a specific retinal region. To overcome these constraints, the authors assembled a massive and heterogeneous training corpus that combines all major public datasets (e.g., CHASE, HRF, FIVES, Leuven‑Haifa, REFUGE2, IDRiD, ADAM, PALM) with newly annotated images from the Rotterdam Study, the AMD‑Life trial, and the Dutch Myopia Study (MYST). In total, more than 15 000 CFIs were used, spanning a wide age range, multiple ethnicities, a variety of ocular pathologies (diabetic retinopathy, glaucoma, AMD, pathological myopia, etc.), and images captured on several camera models and OCT‑based fundus cameras.

Annotation was carried out by four professional graders using a custom tablet‑based tool. Vessel masks were initially generated by an AI model and then manually corrected; A/V masks were created on separate overlapping layers with an additional “unknown” class for ambiguous vessels, and special attention was paid to correctly label crossing points. Optic disc masks excluded peripapillary atrophy, and fovea locations were recorded by placing an ETDRS grid. The authors also employed a quality‑assessment model (based on the EyeQ dataset) to filter out unusable images.

Pre‑processing consists of detecting the circular fundus boundary, cropping to a square, resizing to 1024 × 1024 pixels, and applying Gaussian contrast enhancement. Both the original RGB image and the enhanced version are concatenated, yielding a six‑channel input that preserves fine vessel contrast. The backbone architecture for all three segmentation tasks is a U‑Net with eight down‑sampling stages, deep supervision, and six‑channel input. Vessel segmentation is a binary problem (background vs. vessel); A/V segmentation uses four classes (background, artery, vein, crossing) and a Dice + Cross‑Entropy loss with equal weighting, which emphasizes correct handling of crossing pixels. Optic disc segmentation is also binary. Data augmentation is extensive: random hue‑saturation‑value shifts, Gaussian noise, defocus blur, scaling (1.0–1.15×), rotations up to 10°, elastic deformations, and random cropping to 512 × 512 patches. During inference, a sliding window with 50 % overlap and Gaussian merging is employed, and test‑time augmentation (horizontal/vertical flips) is averaged before thresholding. The fovea localization model uses a heat‑map regression approach with the same U‑Net backbone, training the network to output a probability map centered on the fovea keypoint.

Performance was benchmarked against two publicly available pipelines: Automorph (Zhou et al.) and the model from Zhou et al. (2021). Evaluation metrics include Dice coefficient, Intersection‑over‑Union, and mean Euclidean distance for fovea localization, stratified by image quality (good, usable, bad) as estimated by the quality model. VascX consistently outperformed the baselines across all datasets, with average Dice scores of 0.96 for vessels, 0.94 for A/V, and 0.97 for optic disc, representing improvements of 3–7 percentage points. The gains were most pronounced for images of intermediate quality, which are common in large epidemiological cohorts. Feature extraction (vessel caliber, tortuosity, branching angles) from VascX masks showed very high correlation (r > 0.9) with features derived from expert‑annotated masks, and the differences were statistically significant compared with the baseline pipelines.

The authors release the full pre‑processing pipeline, inference code, and trained weights under an open‑source license, together with detailed documentation and Docker containers to facilitate reproducibility. They acknowledge limitations: the current models have not been extensively validated on ultra‑high‑resolution images (>2000 px) or on rare ocular diseases, and the computational cost of the full ensemble may be prohibitive for real‑time clinical deployment. Future work will explore multimodal integration with OCT, model compression for edge devices, and prospective validation in clinical settings.

In summary, VascX demonstrates that assembling a large, demographically and device‑diverse training set, combined with careful annotation protocols and robust augmentation, can substantially improve the reliability of automated retinal vascular analysis across heterogeneous real‑world fundus images. The released resources are poised to accelerate research into vascular biomarkers for both ocular and systemic diseases.

VascX Models: Model Ensembles for Retinal Vascular Analysis from Color Fundus Images

💡 Research Summary

Comments & Academic Discussion

Leave a Comment