Tri-Reader: An Open-Access, Multi-Stage AI Pipeline for First-Pass Lung Nodule Annotation in Screening CT
Using multiple open-access models trained on public datasets, we developed Tri-Reader, a comprehensive, freely available pipeline that integrates lung segmentation, nodule detection, and malignancy classification into a unified tri-stage workflow. The pipeline is designed to prioritize sensitivity while reducing the candidate burden for annotators. To ensure accuracy and generalizability across diverse practices, we evaluated Tri-Reader on multiple internal and external datasets as compared with expert annotations and dataset-provided reference standards.
💡 Research Summary
The paper presents Tri‑Reader, an open‑access, three‑stage artificial‑intelligence pipeline designed to provide a “first‑pass” annotation of pulmonary nodules in low‑dose CT scans used for lung‑cancer screening. The authors note that while many open‑source tools exist for lung segmentation, nodule detection (computer‑aided detection, CADe) and malignancy assessment (computer‑aided diagnosis, CADx), they are typically applied in isolation, making it difficult for radiology teams to generate a manageable set of candidate lesions for human review. Tri‑Reader integrates these components into a single reproducible workflow that emphasizes high sensitivity while dramatically reducing the number of candidates that must be examined by annotators.
Stage 1 performs consensus detection using two complementary CADe models, both based on MONAI 3‑D RetinaNet. One model (CADe‑ROIonly) is trained on standard nodule patches, while the second (CADe‑FPaware) is trained with strategic hard‑negative mining to become “false‑positive‑aware.” Both models are trained on a mixture of public datasets (DLCS24, LUNA16, LUNA25, VLST). After lung segmentation with VISTA3D, only candidates that are detected by both CADe models are retained and assigned a confidence tier of 1.0.
Stage 2 addresses the many nodules that are missed by one of the detectors. Two CADx classifiers—LUNA25‑CADxr50 and DLCS24‑CADxr50SWS—are applied to the remaining candidates. Their malignancy scores are averaged, and any candidate whose average exceeds a pre‑determined threshold (τ_CADX = 0.10) is promoted to confidence 0.5. This step injects “malignancy‑informed” prioritization, ensuring that lesions with higher cancer risk are kept even if detection confidence is modest.
Stage 3 is a final refinement. All still‑remaining candidates are evaluated by the average CADe score; those with a score ≥ τ_CADE = 0.20 are kept with confidence 0.2. The output is a single list of lesions with three confidence tiers (1.0, 0.5, 0.2) that can be fed directly into annotation pipelines. An optional rule‑based natural‑language‑processing (NLP) module can parse radiology reports (size, lobe, laterality, Lung‑RADS category) and automatically match textual descriptors to the spatial candidates, enabling semi‑automated cohort characterization.
The authors evaluated Tri‑Reader on four datasets that differ in geography, annotation conventions, and reference standards: two internal DLCS cohorts (benchmark and private test), the public LNDbv4 cohort from Portugal, and the IMD‑CT cohort from China. As a baseline they used a single MONAI RetinaNet model trained on LUNA16 (LUNA16‑De). All models were processed with identical preprocessing (0.7 × 0.7 × 1.25 mm resampling, HU clipping –1000 to 500, 192 × 192 × 80 patches). Performance was measured with lesion‑level free‑response operating characteristic (FROC) analysis, reporting CPM (average sensitivity across 1/8–8 false positives per scan) and sensitivity at 1 FP/scan, together with the average number of candidates per scan.
Across all cohorts Tri‑Reader consistently reduced the candidate burden by 40–55 % while preserving or modestly improving sensitivity. For example, on the DLCS benchmark set Tri‑Reader achieved CPM = 0.62 (95 % CI 0.58–0.67) and sensitivity = 0.68 at 1 FP/scan with 13.0 candidates per scan, compared with the baseline CPM = 0.57 and 23.1 candidates per scan. On the external IMD‑CT cohort the CPM rose from 0.63 to 0.73 and candidates fell from 13.9 to 6.71 per scan. Similar reductions were observed on LNDbv4 (19.3 → 11.34 candidates) with comparable CPM.
Sub‑analyses revealed that detection probability correlated strongly with radiologist consensus: unanimously agreed nodules (3‑reader consensus) had mean detection probability = 0.94 ± 0.13, whereas single‑reader nodules showed more dispersion (0.77 ± 0.32). Missed nodules were predominantly sub‑centimeter and benign; only 5.8 % of malignant nodules were missed versus 15.2 % of benign nodules. In the DLCS24 test set, Tri‑Reader identified 30 of 33 pathologically confirmed cancers (90.9 %) with median detection probability = 0.99, and captured 100 % of Lung‑RADS 4B lesions.
The discussion emphasizes that the pipeline’s modular, open‑source nature allows rapid deployment without retraining for each new site, and that the confidence tiers can serve as a proxy for annotation priority. Limitations include dependence on the quality of the underlying public models, the fact that operating thresholds were tuned on specific validation cohorts, and the retrospective nature of the evaluation. Prospective workflow studies are needed to quantify real‑world time savings and annotation quality. Future work will explore site‑specific calibration, integration of additional open‑source models, extension to longitudinal screening, and incorporation of vision‑language models.
All code, pretrained weights (where licensing permits), and evaluation scripts will be released on GitHub (https://github.com/fitushar/TriAnnot). Dataset links are provided for HAID, LUNA16, LUNA25, DLCS24, and VLST, ensuring transparency and reproducibility for the research community.
Comments & Academic Discussion
Loading comments...
Leave a Comment