Full end-to-end diagnostic workflow automation of 3D OCT via foundation model-driven AI for retinal diseases
Optical coherence tomography (OCT) has revolutionized retinal disease diagnosis with its high-resolution and three-dimensional imaging nature, yet its full diagnostic automation in clinical practices remains constrained by multi-stage workflows and conventional single-slice single-task AI models. We present Full-process OCT-based Clinical Utility System (FOCUS), a foundation model-driven framework enabling end-to-end automation of 3D OCT retinal disease diagnosis. FOCUS sequentially performs image quality assessment with EfficientNetV2-S, followed by abnormality detection and multi-disease classification using a fine-tuned Vision Foundation Model. Crucially, FOCUS leverages a unified adaptive aggregation method to intelligently integrate 2D slices-level predictions into comprehensive 3D patient-level diagnosis. Trained and tested on 3,300 patients (40,672 slices), and externally validated on 1,345 patients (18,498 slices) across four different-tier centers and diverse OCT devices, FOCUS achieved high F1 scores for quality assessment (99.01%), abnormally detection (97.46%), and patient-level diagnosis (94.39%). Real-world validation across centers also showed stable performance (F1: 90.22%-95.24%). In human-machine comparisons, FOCUS matched expert performance in abnormality detection (F1: 95.47% vs 90.91%) and multi-disease diagnosis (F1: 93.49% vs 91.35%), while demonstrating better efficiency. FOCUS automates the image-to-diagnosis pipeline, representing a critical advance towards unmanned ophthalmology with a validated blueprint for autonomous screening to enhance population scale retinal care accessibility and efficiency.
💡 Research Summary
This paper introduces FOCUS (Full‑process OCT‑based Clinical Utility System), an end‑to‑end artificial‑intelligence framework that automates the entire clinical workflow for three‑dimensional optical coherence tomography (3D OCT) retinal disease diagnosis. Current OCT‑AI solutions are fragmented, typically handling single tasks (e.g., image quality control or disease detection) on isolated 2‑D slices, and they lack the ability to integrate volumetric context or to provide a patient‑level diagnosis. FOCUS addresses these gaps through a sequential pipeline composed of three core modules.
First, an EfficientNetV2‑S model performs image‑quality assessment, filtering out low‑quality B‑scans with an F1 score of 99.01 %. This step ensures that downstream analysis receives only diagnostically reliable data, a crucial prerequisite for real‑world deployment across heterogeneous OCT devices.
Second, a Vision Foundation Model (VisionFM) fine‑tuned on a large multi‑center OCT dataset is used for simultaneous abnormality detection and multi‑disease classification. The model predicts the presence of nine clinically relevant retinal conditions—including age‑related macular degeneration (AMD), choroidal neovascularization (CNV), central serous chorioretinopathy (CSC), diabetic retinopathy (DR), macular hole (MH), macular edema (ME), epiretinal membrane (ERM), retinitis pigmentosa (RP), and others—on a per‑slice basis. By leveraging the massive pre‑training of a foundation model, FOCUS inherits strong generalization capabilities while still achieving task‑specific performance after fine‑tuning.
Third, the Unified Adaptive Aggregation Classifier (UAAC) aggregates slice‑level predictions into a coherent patient‑level diagnosis. Unlike traditional multiple‑instance learning (MIL) approaches that apply static pooling (e.g., max or mean), UAAC dynamically weights each slice according to its predicted confidence and estimated uncertainty. This adaptive mechanism preserves “needle‑in‑a‑haystack” pathological signals that would otherwise be diluted, enabling accurate detection of sparse lesions spread across the volume.
The authors assembled a comprehensive dataset: 3,300 patients (40,672 B‑scans) from four medical centers of varying tiers and five different OCT manufacturers for training and internal testing, and an external validation set of 1,345 patients (18,498 B‑scans) collected from the same centers. Across internal tests, FOCUS achieved F1 scores of 99.01 % (quality control), 97.46 % (abnormality detection), and 94.39 % (patient‑level multi‑disease diagnosis). External validation demonstrated robust performance with patient‑level F1 ranging from 90.22 % to 95.24 %, confirming resilience to device heterogeneity and demographic variation.
Human‑machine comparison experiments were conducted to benchmark the system against clinical staff. In abnormality detection, FOCUS (F1 = 95.47 %) outperformed four ophthalmic technicians (average F1 = 90.91 %). In multi‑disease diagnosis, the model (F1 = 93.49 %) matched or slightly exceeded nine retinal specialists (average F1 = 91.35 %). Moreover, inference time per volume was under 0.2 seconds, a substantial speed gain over conventional 3‑D CNNs, making the system suitable for large‑scale screening.
The paper discusses several technical contributions. By decoupling 2‑D feature extraction (via a foundation model) from volumetric aggregation (via UAAC), FOCUS avoids the prohibitive computational cost and massive annotation requirements of full 3‑D CNNs while still exploiting 3‑D context. The adaptive aggregation also mitigates the risk of over‑reliance on noisy slices, improving robustness in real‑world settings where image quality varies.
Limitations are acknowledged. The training and validation cohorts are predominantly Chinese, which may restrict generalizability to other ethnic groups and disease prevalence patterns. The retrospective nature of the study means disease prevalence in the datasets is higher than in community screening scenarios, potentially inflating performance metrics. Future work should include prospective, population‑based trials, broader international data, and integration of large language models for report generation and explainability.
In summary, FOCUS represents a significant step toward “unmanned ophthalmology.” It delivers clinically comparable or superior accuracy to human experts, operates at near‑real‑time speed, and demonstrates consistent performance across multiple centers and devices. By fully automating the OCT workflow—from quality control through to patient‑level diagnosis—FOCUS paves the way for scalable, cost‑effective retinal disease screening and could substantially improve early detection and treatment access worldwide.
Comments & Academic Discussion
Loading comments...
Leave a Comment