A global shortage of radiologists has been exacerbated by the significant volume of chest X-ray workloads, particularly in primary care. Although multimodal large language models show promise, existing evaluations predominantly rely on automated metrics or retrospective analyses, lacking rigorous prospective clinical validation. Janus-Pro-CXR (1B), a chest X-ray interpretation system based on DeepSeek Janus-Pro model, was developed and rigorously validated through a multicenter prospective trial (NCT07117266). Our system outperforms state-of-the-art X-ray report generation models in automated report generation, surpassing even larger-scale models including ChatGPT 4o (200B parameters), while demonstrating reliable detection of six clinically critical radiographic findings. Retrospective evaluation confirms significantly higher report accuracy than Janus-Pro and ChatGPT 4o. In prospective clinical deployment, AI assistance significantly improved report quality scores, reduced interpretation time by 18.3% (P < 0.001), and was preferred by a majority of experts in 54.3% of cases. Through lightweight architecture and domain-specific optimization, Janus-Pro-CXR improves diagnostic reliability and workflow efficiency, particularly in resource-constrained settings. The model architecture and implementation framework will be open-sourced to facilitate the clinical translation of AI-assisted radiology solutions.
The global shortage of radiologists presents a critical challenge, as in most regions, the expansion of imaging applications in patient care has outpaced the capacity of radiologists to meet the demand 1,2 . This issue is particularly pronounced in low-income regions, where countries report just 1.9 radiologists per million residents, compared to 97.9 per million in high-income countries 3,4 .
Chest X-ray (CXR), the most fundamental and widely utilized imaging modality, remains indispensable in clinical practices such as detecting pulmonary infections and screening for tumors. X-ray diagnostics contribute significantly to the workload of radiologists, especially in primary healthcare settings 5 .
Recent advances in artificial intelligence (AI) demonstrate substantial potential to enhance diagnostic efficiency, optimize medical resource allocation, and maintain the quality of healthcare 6 .
Currently, most AI applications focus on classifying and quantitatively analyzing imaging features for specific diseases, but clinical imaging diagnoses are far more complex than simple classification tasks 7 . Developing an intelligent imaging system with automated report generation at its core could be crucial in improving diagnostic efficiency and alleviating the strain on radiologists 8 .
Despite the growing body of research on CXR report generation, most existing models are built from scratch [9][10][11] , presenting inherent limitations such as low data efficiency, modal fragmentation, knowledge transfer challenges, and an exacerbation of the long-tail problem 12 . Transfer learning, leveraging pre-trained knowledge and cross-modal alignment, could significantly improve the accuracy and clinical relevance of report-generation systems 8,13 . However, many current models are either prohibitively large, underperforming, or non-open-source, limiting clinical applicability.
Additionally, most prior studies have evaluated generated reports using natural language generation metrics, without assessing the actual clinical impact [14][15][16][17] . Though some studies have provided comprehensive evaluations 12,18,19 , they have largely relied on retrospective data in collaboration scenarios between clinicians and AI, without prospective validation in real clinical settings.
Consequently, the clinical value of multimodal large models in chest radiograph interpretation remains uncertain. The recently released open-source multimodal large language model Janus-Pro by DeepSeek 20,21 , with its combination of high performance and low cost, offers a new pathway for developing medical-specific report-generation systems. However, the application of this model in medical imaging has not been systematically tested, and existing general multimodal models lack task-specific optimization, necessitating further fine-tuning.
To address these gaps, this study introduces Janus-Pro-CXR, a lightweight CXR-specific model (Figure 1) developed from the unified Janus-Pro model and the public MIMIC-CXR 22 and CheXpert Plus 23 medical imaging datasets through supervised fine-tuning. With 1 billion parameters, the model achieves rapid imaging analysis with a latency of 1-2 seconds on a laptop equipped with a GeForce RTX 4060 (8GB). Its low fine-tuning cost further supports deployment in regions with limited medical resources. The model’s performance in core tasks, including disease diagnosis and report generation, was rigorously evaluated using multicenter retrospective data from 27 hospitals.
A multicenter prospective verification scheme was also implemented for clinical collaboration scenarios. The lightweight architecture and domain-specific optimization of Janus-Pro-CXR enhanced diagnostic accuracy and workflow efficiency, offering particular benefits for radiologists and settings with limited resources. The model architecture will be open-sourced to facilitate the clinical translation of AI-assisted radiology solutions.
The general large language model, Janus-Pro, underwent supervised fine-tuning using the MIMIC-CXR, CheXpert Plus and CXR-27 datasets (multicenter retrospective data) (Figure 2). In the retrospective study, 384,208 images were allocated for the first two stages of model fine-tuning from the MIMIC-CXR and CheXpert Plus datasets and 11,156 images from 27 hospitals in China (CXR-27 dataset) were used for supervised fine-tuning. The remaining data from the CXR-27 dataset (n = 1,240) and a portion of the MIMIC-CXR dataset (n = 2,365) were used to assess model performance.
For the prospective study, data were sourced from three hospitals in China, with a total of 296 patients enrolled in the AI-radiologist collaboration. Baseline patient data are provided in Supplementary Tables 1 and2. The core conclusions of this study are grounded in the findings of the prospective study, while the retrospective study merely provides methodological scaffolding and supporting findings.
The primary outcomes of this study included report qualit
This content is AI-processed based on open access ArXiv data.