방사선 기초 모델 Pillar0 대규모 CT MRI 사전학습과 RATE 라벨링 프레임워크를 통한 임상 성능 혁신

Reading time: 7 minute
...

📝 Abstract

Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth, contributing to burnout and challenges in care delivery. Foundation models offer a path toward assisting with the full spectrum of radiology tasks, but existing medical models remain limited: they process volumetric CT and MRI as low-fidelity 2D slices, discard critical grayscale contrast information, and lack evaluation frameworks that reflect real clinical practice. Here, we introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs from a large academic center, together with RATE, a scalable framework that extracts structured labels for 366 radiologic findings with near-perfect accuracy using large language models. Across internal test sets of 14,230 abdomen-pelvis CTs, 10,646 chest CTs, 4,906 head CTs, and 1,585 breast MRIs, Pillar-0 establishes a new performance frontier, achieving mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2% (319/366) tasks. Pillar-0 similarly outperforms all baselines in an external validation on the Stanford abdomen-pelvis CT dataset, including Merlin (82.2 vs 80.6 AUROC), which uses the Stanford dataset for development. Pillar-0 extends to tasks beyond its pretraining, such as long-horizon lung cancer risk prediction, where it improves upon the state-of-the-art Sybil by 3.0 C-index points on NLST, and generalizes with gains of 5.9 (MGH) and 1.9 (CGMH). In brain hemorrhage detection, Pillar-0 obtained a >95 AUROC when using only 1 20 of the data of the next most sample efficient baseline. Pillar-0 and RATE together provide an open, clinically rigorous foundation for building high-performance radiology systems, enabling applications that were previously infeasible due to computational, data, and evaluation constraints.

💡 Analysis

Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth, contributing to burnout and challenges in care delivery. Foundation models offer a path toward assisting with the full spectrum of radiology tasks, but existing medical models remain limited: they process volumetric CT and MRI as low-fidelity 2D slices, discard critical grayscale contrast information, and lack evaluation frameworks that reflect real clinical practice. Here, we introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs from a large academic center, together with RATE, a scalable framework that extracts structured labels for 366 radiologic findings with near-perfect accuracy using large language models. Across internal test sets of 14,230 abdomen-pelvis CTs, 10,646 chest CTs, 4,906 head CTs, and 1,585 breast MRIs, Pillar-0 establishes a new performance frontier, achieving mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2% (319/366) tasks. Pillar-0 similarly outperforms all baselines in an external validation on the Stanford abdomen-pelvis CT dataset, including Merlin (82.2 vs 80.6 AUROC), which uses the Stanford dataset for development. Pillar-0 extends to tasks beyond its pretraining, such as long-horizon lung cancer risk prediction, where it improves upon the state-of-the-art Sybil by 3.0 C-index points on NLST, and generalizes with gains of 5.9 (MGH) and 1.9 (CGMH). In brain hemorrhage detection, Pillar-0 obtained a >95 AUROC when using only 1 20 of the data of the next most sample efficient baseline. Pillar-0 and RATE together provide an open, clinically rigorous foundation for building high-performance radiology systems, enabling applications that were previously infeasible due to computational, data, and evaluation constraints.

📄 Content

Pillar-0: A New Frontier for Radiology Foundation Models Kumar Krishna Agrawal1,* , Longchao Liu2,†, Long Lian1,†, Michael Nercessian2,†, Natalia Harguindeguy2,†, Yufu Wu3, Peter Mikhael4, Gigin Lin3,5,6, Lecia V. Sequist7, Florian Fintelmann8,9, Trevor Darrell1, Yutong Bai1, Maggie Chung10,‡, Adam Yala2,‡ 1 Department of Electrical Engineering and Computer Science, UC Berkeley, USA 2 Computational Precision Health, UC Berkeley and UC San Francisco, USA 3 Department of Medical Imaging and Intervention, Chang Gung Memorial Hospital at Linkou, Taiwan 4 Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, USA 5 Department of Medical Imaging and Radiological Sciences, Chang Gung University, Taiwan 6 Clinical Metabolomics Core and Imaging Core Laboratory, Institute for Radiological Research, Chang Gung Memorial Hospital at Linkou and Chang Gung University, Taiwan 7 Mass General Brigham Cancer Institute, USA 8 Massachusetts General Hospital, USA 9 Harvard Medical School, USA 10 Department of Radiology and Biomedical Imaging, UC San Francisco, USA

  • Project lead † Core contributor (these authors contributed equally; order among core contributors was determined at random) ‡ Co-senior author Abstract Radiology plays an integral role in modern medicine, yet rising imaging volumes have far outpaced workforce growth, contributing to burnout and challenges in care delivery. Foundation models offer a path toward assisting with the full spectrum of radiology tasks, but existing medical models remain limited: they process volumetric CT and MRI as low-fidelity 2D slices, discard critical grayscale contrast information, and lack evaluation frameworks that reflect real clinical practice. Here, we introduce Pillar-0, a radiology foundation model pretrained on 42,990 abdomen-pelvis CTs, 86,411 chest CTs, 14,348 head CTs, and 11,543 breast MRIs from a large academic center, together with RATE, a scalable framework that extracts structured labels for 366 radiologic findings with near-perfect accuracy using large language models. Across internal test sets of 14,230 abdomen-pelvis CTs, 10,646 chest CTs, 4,906 head CTs, and 1,585 breast MRIs, Pillar-0 establishes a new performance frontier, achieving mean AUROCs of 86.4, 88.0, 90.1, and 82.9, outperforming MedGemma (Google), MedImageInsight (Microsoft), Lingshu (Alibaba), and Merlin (Stanford) by 7.8-15.8 AUROC points and ranking best in 87.2% (319/366) tasks. Pillar-0 similarly outperforms all baselines in an external validation on the Stanford abdomen-pelvis CT dataset, including Merlin (82.2 vs 80.6 AUROC), which uses the Stanford dataset for development. Pillar-0 extends to tasks beyond its pretraining, such as long-horizon lung cancer risk prediction, where it improves upon the state-of-the-art Sybil by 3.0 C-index points on NLST, and generalizes with gains of 5.9 (MGH) and 1.9 (CGMH). In brain hemorrhage detection, Pillar-0 obtained a >95 AUROC when using only 1 20 of the data of the next most sample efficient baseline. Pillar-0 and RATE together provide an open, clinically rigorous foundation for building high-performance radiology systems, enabling applications that were previously infeasible due to computational, data, and evaluation constraints. arXiv:2511.17803v1 [cs.CV] 21 Nov 2025 Pillar-0: A New Frontier for Radiology Foundation Models
  1. Main Radiology serves a key role in modern clinical practice, as it allows for the visualization of disease and guides patient management. Imaging utilization has continued to grow significantly year over year, with studies reporting annual growth rates ranging from 5 to 7% [1, 2]. This growth has far outpaced the expansion of the radiology workforce, resulting in radiologist burnout and challenges in traditional patient care delivery models [3, 4, 5, 6]. Although numerous artificial intelligence (AI) tools have been proposed to improve the detection of pathology on imaging studies, including commercially available tools for the detection of lung nodules [7] and intracranial hemorrhage [8], their impact on overall radiology efficiency is limited. These tools assist with only a small fraction of radiologists’ tasks. In practice, radiologists perform comprehensive image interpretation with a wide range of findings across all organ systems, modalities, and protocols [9, 10, 11, 12]. Assisting with this workload requires technology that can address the full spectrum of image findings. Foundation models learn broad, transferable representations from diverse datasets and therefore hold promise for enabling comprehensive image interpretation [13, 14]. An ideal radiology foundation model would 1) enhance performance across a wide range of downstream tasks, including classification, localization, prognosis, and report generation; 2) drastically reduce the amount of training data required for finetuning; and 3) serve as a de-facto platform for downstream model development. Despite extensive effort

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut