NodMAISI: Nodule-Oriented Medical AI for Synthetic Imaging
Objective: Although medical imaging datasets are increasingly available, abnormal and annotation-intensive findings critical to lung cancer screening, particularly small pulmonary nodules, remain underrepresented and inconsistently curated. Methods: We introduce NodMAISI, an anatomically constrained, nodule-oriented CT synthesis and augmentation framework trained on a unified multi-source cohort (7,042 patients, 8,841 CTs, 14,444 nodules). The framework integrates: (i) a standardized curation and annotation pipeline linking each CT with organ masks and nodule-level annotations, (ii) a ControlNet-conditioned rectified-flow generator built on MAISI-v2’s foundational blocks to enforce anatomy- and lesion-consistent synthesis, and (iii) lesion-aware augmentation that perturbs nodule masks (controlled shrinkage) while preserving surrounding anatomy to generate paired CT variants. Results: Across six public test datasets, NodMAISI improved distributional fidelity relative to MAISI-v2 (real-to-synthetic FID range 1.18 to 2.99 vs 1.69 to 5.21). In lesion detectability analysis using a MONAI nodule detector, NodMAISI substantially increased average sensitivity and more closely matched clinical scans (IMD-CT: 0.69 vs 0.39; DLCS24: 0.63 vs 0.20), with the largest gains for sub-centimeter nodules where MAISI-v2 frequently failed to reproduce the conditioned lesion. In downstream nodule-level malignancy classification trained on LUNA25 and externally evaluated on LUNA16, LNDbv4, and DLCS24, NodMAISI augmentation improved AUC by 0.07 to 0.11 at <=20% clinical data and by 0.12 to 0.21 at 10%, consistently narrowing the performance gap under data scarcity.
💡 Research Summary
NodMAISI addresses a critical bottleneck in lung‑cancer screening: the scarcity and inconsistent curation of small pulmonary nodule annotations in publicly available CT datasets. The authors first aggregated six major open‑source lung‑nodule collections—LNDbv4, NSCLCR, LIDC‑IDRI, DLCS24, IMD‑CT, and LUNA25—into a unified cohort comprising 7,042 patients, 8,841 CT scans, and 14,444 nodule annotations. For each scan they generated organ‑level masks using the pretrained VISTA‑3D segmentation model, derived pseudo‑nodule candidates with a detection network, and refined nodule masks via a point‑driven 3D k‑means segmentation pipeline. This standardization yields a consistent “CT + organ mask + nodule mask + bounding box” representation for every case.
The generative core builds on the MAISI‑v2 foundation, which includes a variational auto‑encoder (VAE) and a rectified‑flow latent‑space transport model. Rather than fine‑tuning these large pretrained components, the authors introduced a ControlNet branch that learns to inject three conditioning inputs—body mask, nodule mask, and voxel spacing—into the frozen flow U‑Net. A heavily weighted loss (factor 100) on the nodule channel forces the model to preserve fine‑grained lesion details, while an additional region‑specific term further improves reconstruction of small nodules. Training was performed for 500 epochs with a batch size of 8 and a learning rate of 1e‑5. At inference time, the conditioned triplet is fed through ControlNet, merged with the flow backbone, and sampled from noise back to the latent space; the final CT is decoded by the pretrained VAE. Because only ControlNet is trained, the system retains the stability and coverage of MAISI‑v2 while gaining precise control over both global anatomy and local lesion morphology.
To enrich nodule diversity, the authors devised a lesion‑aware augmentation pipeline. Each nodule mask is iteratively shrunk (or expanded) by morphological operations until a target volume percentage is reached. The altered mask, together with the unchanged body mask, serves as conditioning for the ControlNet‑guided generator, producing paired CT volumes where only the lesion characteristics differ. This approach enables realistic simulation of nodule growth or regression without altering surrounding anatomy, thereby providing longitudinal variations for downstream models.
Evaluation was conducted on six public test sets. Fréchet Inception Distance (FID) between real and synthetic scans dropped from 1.69–5.21 (MAISI‑v2) to 1.18–2.99 (NodMAISI), indicating a closer match to the true data distribution. Lesion detectability was assessed with a MONAI nodule detector trained on LUNA16; on the IMD‑CT and DLCS24 cohorts, NodMAISI raised average sensitivity from 0.39 to 0.69 and from 0.20 to 0.63, respectively, with the most pronounced gains for sub‑10 mm nodules that MAISI‑v2 often failed to reproduce. Finally, a malignancy‑classification model trained on the synthetic‑augmented LUNA25 data was tested on LUNA16, LNDbv4, and DLCS24. When only 10–20 % of the clinical data were available, augmentation with NodMAISI improved AUC by 0.07–0.21, consistently narrowing the performance gap caused by data scarcity.
In summary, NodMAISI delivers a four‑pronged solution: (1) a large, standardized multi‑source lung‑nodule repository; (2) a ControlNet‑conditioned rectified‑flow generator that enforces anatomical consistency while faithfully rendering small lesions; (3) a lesion‑aware augmentation strategy that creates realistic size‑variant nodules without disturbing surrounding structures; and (4) comprehensive quantitative and clinical‑task evaluations demonstrating superior fidelity, detectability, and downstream classification performance. The framework not only advances synthetic medical imaging for lung cancer screening but also provides a blueprint for extending lesion‑focused synthesis and augmentation to other organ systems and pathologies.
Comments & Academic Discussion
Loading comments...
Leave a Comment