Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Positron emission tomography (PET) is a key nuclear medicine imaging modality that visualizes radiotracer distributions to quantify in vivo physiological and metabolic processes, playing an irreplaceable role in disease management. Despite its clinical importance, the development of deep learning models for quantitative PET image analysis remains severely limited, driven by both the inherent segmentation challenge from PET’s paucity of anatomical contrast and the high costs of data acquisition and annotation. To bridge this gap, we develop generalist foundational models for universal segmentation from 3D whole-body PET imaging. We first build the largest and most comprehensive PET dataset to date, comprising 11041 3D whole-body PET scans with 59831 segmentation masks for model development. Based on this dataset, we present SegAnyPET, an innovative foundational model with general-purpose applicability to diverse segmentation tasks. Built on a 3D architecture with a prompt engineering strategy for mask generation, SegAnyPET enables universal and scalable organ and lesion segmentation, supports efficient human correction with minimal effort, and enables a clinical human-in-the-loop workflow. Extensive evaluations on multi-center, multi-tracer, multi-disease datasets demonstrate that SegAnyPET achieves strong zero-shot performance across a wide range of segmentation tasks, highlighting its potential to advance the clinical applications of molecular imaging.

💡 Research Summary

This paper addresses the long‑standing challenge of developing deep‑learning models for quantitative analysis of whole‑body positron emission tomography (PET) scans. The authors first construct PETWB‑Seg11K, the largest and most diverse PET segmentation dataset to date, comprising 11,041 three‑dimensional (3D) whole‑body PET volumes and 59,831 organ and lesion masks. The dataset aggregates two public sources and three private cohorts, covering multiple scanners, acquisition protocols, slice counts, slice thicknesses, radiotracers (e.g., FDG, PSMA), and disease categories (oncology, neurology, inflammation). This heterogeneity is deliberately designed to mimic real‑world clinical variability and to provide a robust foundation for training a universal model.

Building on this dataset, the authors introduce SegAnyPET, a foundation model for universal volumetric PET segmentation. SegAnyPET adapts the “Segment Anything Model” (SAM) paradigm to 3D medical imaging. Its architecture consists of three main components: a 3D image encoder that extracts volumetric feature embeddings from the PET scan, a prompt encoder that converts user‑provided sparse prompts (points) or dense prompts (coarse masks) into compact embeddings via fixed positional encodings and adaptive prompt‑specific layers, and a mask decoder that fuses image and prompt embeddings, upsamples them through a multi‑scale pipeline, and produces the final segmentation mask via a multilayer perceptron. The prompt‑based design enables rapid, interactive segmentation: clinicians can place a few positive/negative points to obtain an initial mask, then iteratively refine it by adding more points or a rough mask, supporting a human‑in‑the‑loop workflow that aligns with routine radiology practice.

Two model variants are presented. The generic SegAnyPET model is trained on the entire PETWB‑Seg11K dataset and is intended for broad organ‑and‑lesion segmentation. SegAnyPET‑Lesion is a fine‑tuned version focused on small, heterogeneous lesions, achieving higher sensitivity and boundary accuracy for oncological tasks while retaining the same prompt flexibility.

The authors conduct extensive experiments. Internal validation on in‑distribution data (same centers and protocols as training) shows mean Dice scores around 0.86 for multi‑organ and multi‑lesion tasks. External validation on out‑of‑distribution cohorts—including PSMA‑PET scans, PET‑MRI derived PET, and previously unseen organs—demonstrates robust zero‑shot performance with mean Dice ≈0.81, confirming the model’s ability to generalize across tracers, scanners, and disease contexts. Comparisons with state‑of‑the‑art task‑specific segmentation networks (nnUNet, STUNet, SwinUNETR, SegResNet) reveal that SegAnyPET matches or exceeds their performance despite requiring only prompts rather than full retraining for each new target. Notably, for unseen targets, task‑specific models need additional labeled data and re‑training, whereas SegAnyPET achieves comparable accuracy instantly via prompts.

A human‑in‑the‑loop study shows that adding 3–5 corrective points reduces segmentation errors by over 90 % and cuts total annotation time by more than 70 % relative to manual delineation. Moreover, the masks generated by SegAnyPET can be directly fed into downstream quantitative pipelines (e.g., SUVmax calculation, volumetric analysis), yielding results statistically indistinguishable from those obtained with manual masks and modestly improving the area‑under‑curve of treatment‑response prediction models.

The paper acknowledges limitations: SegAnyPET currently processes PET alone and does not jointly learn from complementary modalities such as CT or MRI; rare tracers and ultra‑low‑dose scans remain under‑explored; and the model’s performance on extremely small lesions (<5 mm) could be further improved. Future work is proposed to incorporate multimodal prompt encodings, explore transfer learning for scarce‑tracer domains, and validate the system in large‑scale prospective clinical trials.

In summary, by coupling an unprecedentedly large, heterogeneous PET segmentation dataset with a 3D prompt‑based foundation model, the authors deliver a versatile, high‑performing tool for universal PET segmentation. SegAnyPET’s zero‑shot capability, interactive refinement, and compatibility with downstream quantitative analyses represent a significant step toward routine AI‑assisted interpretation of functional imaging in nuclear medicine.

Developing Foundation Models for Universal Segmentation from 3D Whole-Body Positron Emission Tomography

💡 Research Summary

Comments & Academic Discussion

Leave a Comment