OpenPros: A Large-Scale Dataset for Limited View Prostate Ultrasound Computed Tomography
Prostate cancer is one of the most prevalent and deadly cancers among men, motivating the development of accurate and accessible imaging technologies for early detection. Ultrasound computed tomography (USCT) reconstructs quantitative tissue parameters such as speed-of-sound (SOS) and is a promising low-cost alternative to existing modalities. However, prostate USCT remains challenging due to limited-angle acquisition, strong tissue heterogeneity, bone-induced wave distortion, and the lack of large-scale, anatomically realistic datasets for method development and evaluation. We introduce OPENPROS, the first large-scale benchmark dataset for limited-angle prostate USCT, designed to systematically evaluate machine learning methods for quantitative inverse problems. OPENPROS contains over 280,000 paired samples of realistic 2D SOS maps and corresponding ultrasound full-waveform data, generated from anatomically accurate 3D digital prostate models derived from 4 clinical MRI/CT scans and 62 ex vivo prostate specimens with experimental ultrasound measurements. Wave propagation is simulated under clinically realistic configurations using open-source finite-difference time-domain and Runge-Kutta solvers. We provide standardized training, in-distribution, and out-of-distribution benchmarks and evaluate representative deep learning baselines. While learning-based methods substantially improve inference speed and reconstruction accuracy over physics-based approaches, results highlight persistent challenges in robustness, generalization, and high-resolution reconstruction quality. By publicly releasing OPENPROS, we establish a rigorous benchmark to support research in inverse problems, physics-guided learning, and operator learning, and to bridge the gap between machine learning research and practical USCT deployment. The dataset is available at https://open-pros.github.io/.
💡 Research Summary
OpenPros introduces the first large‑scale, anatomically realistic benchmark dataset for limited‑angle prostate ultrasound computed tomography (USCT), addressing a critical gap that has hampered progress in this field. Prostate cancer’s high prevalence and mortality demand early, affordable imaging; USCT can provide quantitative tissue parameters such as speed‑of‑sound (SOS) that serve as potential biomarkers, but prostate USCT suffers from severe limited‑view acquisition, strong heterogeneity, and bone‑induced wave distortion. Existing USCT datasets focus on breast imaging or lack anatomical fidelity, leaving researchers without a realistic testbed.
The authors construct 3‑D digital prostate phantoms by combining four clinical MRI/CT scans with 62 ex‑vivo prostate specimens measured on the QTscan platform. Expert radiologists annotate organ boundaries (prostate, bladder, fat, bone) and assign SOS and attenuation values using a mixture of direct measurements and statistical distributions drawn from the ITIS tissue database. Two probe configurations—an abdominal surface probe and a transrectal probe—are placed to mimic clinically realistic limited‑angle geometry. From these 3‑D models, 2‑D slices are extracted, and full‑waveform ultrasound data are simulated using open‑source finite‑difference time‑domain (FDTD) and Runge‑Kutta acoustic solvers. The simulation setup employs 20 sources, 322 receivers, and 1000 time steps per acquisition, yielding a 4‑D tensor of raw pressure data (S × R × T × B). Each SOS map has a resolution of 401 × 161 grid points. In total, the dataset contains 280,000 paired samples (224 k training, 28 k validation, 28 k test), occupying ~6.8 TB.
OpenPros provides standardized training, in‑distribution, and out‑of‑distribution (OOD) splits, as well as robustness tests with added noise and altered tissue compositions. Baseline experiments compare traditional physics‑based reconstruction (time‑reversal, full‑waveform inversion) with several deep‑learning models, including CNN‑based U‑Nets and Vision Transformers. Results show that learning‑based methods achieve orders‑of‑magnitude faster inference and improve reconstruction metrics (MAE, RMSE, SSIM, PCC) by roughly 10‑15 % over physics‑based approaches. However, challenges remain: high‑resolution reconstructions still exhibit blurred edges, small synthetic lesions are missed, and performance degrades sharply on OOD data, highlighting limited generalization and robustness.
Key contributions are: (1) a massive, high‑fidelity dataset that captures realistic tissue heterogeneity, bone effects, and limited‑angle acquisition; (2) open‑source forward‑modeling code enabling reproducible wave simulations; (3) a comprehensive benchmarking protocol with multiple evaluation metrics and OOD tests; (4) provision of segmentation masks and synthetic lesion labels to support auxiliary tasks such as organ segmentation and lesion detection.
Limitations include the 2‑D slice‑based nature of the data (real clinical deployment requires full 3‑D reconstruction), fixed simulation parameters (single frequency, source waveform), and the substantial storage/computational resources needed to handle the dataset. The current baselines focus on deterministic supervised learning; uncertainty quantification, physics‑informed neural networks, and hybrid model‑based deep learning remain open research avenues.
Future directions suggested are: extending to full 3‑D simulations, incorporating multi‑frequency and varied probe configurations, developing Bayesian or Monte‑Carlo methods for uncertainty estimation, and designing hybrid physics‑guided architectures that combine the interpretability of wave‑physics models with the speed of deep networks. By releasing OpenPros publicly (https://open‑pros.github.io/), the authors aim to catalyze reproducible research in inverse problems, operator learning, and ultimately accelerate the translation of low‑cost USCT into routine clinical prostate cancer screening.
Comments & Academic Discussion
Loading comments...
Leave a Comment