Opportunistic Promptable Segmentation: Leveraging Routine Radiological Annotations to Guide 3D CT Lesion Segmentation

Opportunistic Promptable Segmentation: Leveraging Routine Radiological Annotations to Guide 3D CT Lesion Segmentation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The development of machine learning models for CT imaging depends on the availability of large, high-quality, and diverse annotated datasets. Although large volumes of CT images and reports are readily available in clinical picture archiving and communication systems (PACS), 3D segmentations of critical findings are costly to obtain, typically requiring extensive manual annotation by radiologists. On the other hand, it is common for radiologists to provide limited annotations of findings during routine reads, such as line measurements and arrows, that are often stored in PACS as GSPS objects. We posit that these sparse annotations can be extracted along with CT volumes and converted into 3D segmentations using promptable segmentation models, a paradigm we term Opportunistic Promptable Segmentation. To enable this paradigm, we propose SAM2CT, the first promptable segmentation model designed to convert radiologist annotations into 3D segmentations in CT volumes. SAM2CT builds upon SAM2 by extending the prompt encoder to support arrow and line inputs and by introducing Memory-Conditioned Memories (MCM), a memory encoding strategy tailored to 3D medical volumes. On public lesion segmentation benchmarks, SAM2CT outperforms existing promptable segmentation models and similarly trained baselines, achieving Dice similarity coefficients of 0.649 for arrow prompts and 0.757 for line prompts. Applying the model to pre-existing GSPS annotations from a clinical PACS (N = 60), SAM2CT generates 3D segmentations that are clinically acceptable or require only minor adjustments in 87% of cases, as scored by radiologists. Additionally, SAM2CT demonstrates strong zero-shot performance on select Emergency Department findings. These results suggest that large-scale mining of historical GSPS annotations represents a promising and scalable approach for generating 3D CT segmentation datasets.


💡 Research Summary

The paper introduces a novel paradigm called Opportunistic Promptable Segmentation (OPS) for CT imaging, which leverages the sparse radiologist annotations—arrows and line measurements—automatically stored as DICOM Gray Scale Softcopy Presentation State (GSPS) objects in PACS. These annotations, while abundant, have traditionally been ignored for training deep learning models because they lack dense segmentation masks. The authors propose to treat these GSPS objects as prompts for a prompt‑able segmentation model, thereby converting routine clinical annotations into high‑quality 3D lesion masks without additional radiologist effort.

To realize OPS, the authors develop SAM2CT, the first promptable segmentation model explicitly designed for CT volumes and GSPS‑style prompts. SAM2CT builds on the Segment Anything Model version 2 (SAM2) architecture but introduces two key innovations: (1) an extended prompt encoder that can ingest arrow and line prompts by learning additional token embeddings for line endpoints, arrow start, and arrow end; and (2) a Memory‑Conditioned Memories (MCM) mechanism that fuses the predicted mask with the memory‑conditioned image embeddings before storing them in the memory bank. This modification makes the stored memories lesion‑specific rather than scene‑generic, which is crucial for volumetric segmentation where consecutive slices must share consistent object information.

Training data were assembled from five public CT lesion segmentation datasets covering eight tumor types (Kidney, Liver, Colon, Lung, Pancreas, Abdominal Lymph Nodes, Mediastinal Lymph Nodes, and Bone lesions from DeepLesion3D), amounting to 1,662 training volumes. Because public datasets lack GSPS‑like annotations, the authors synthesize arrow and line prompts from ground‑truth masks. Arrow prompts are generated by selecting a centroid‑to‑boundary vector, placing the arrowhead randomly along this line, and adding a random angular perturbation; line prompts are sampled from long edge‑to‑edge connections, with length thresholds to mimic clinical measurement behavior. During training, 8‑slice sub‑volumes are sampled and processed slice‑by‑slice; the model is fine‑tuned for 80 epochs with a base learning rate of 5e‑6 and an image‑encoder learning rate of 3e‑6. Input CT values are clipped to –500 to 500 HU and converted to 8‑bit RGB to match SAM2’s input format.

For evaluation, the authors compare SAM2CT against three baselines: (i) SAM2(FT), a fine‑tuned SAM2 that supports arrow/line prompts but lacks MCM; (ii) DynUNet, a fully convolutional nnU‑Net‑derived architecture; and (iii) Swin‑UNETR, a hybrid convolution‑transformer model. Prompt information for the UNet‑based baselines is supplied as an additional binary mask channel. All models are trained for 150 epochs with a cosine‑annealed learning rate schedule.

Performance is measured using Dice Similarity Coefficient (DSC) and the percent difference in RECIST 1.1 longest axial measurement. On public lesion benchmarks, SAM2CT achieves DSC = 0.649 for arrow prompts and DSC = 0.757 for line prompts, outperforming SAM2(FT) (0.602 / 0.702) and the UNet baselines (≈0.57‑0.69). The MCM component alone contributes a 4‑6 % DSC boost over the non‑MCM SAM2 variant, confirming its value for volumetric consistency.

The authors also conduct a real‑world clinical study on 60 oncology CT exams extracted from their institution’s PACS, each containing GSPS annotations (20 arrow, 20 single‑line, 20 major‑minor axis). Radiologists rated the automatically generated masks as clinically acceptable or requiring only minor adjustments in 87 % of cases. The average RECIST percent difference was below 12 %, indicating that the derived measurements are close enough for practical use. Additionally, a zero‑shot out‑of‑distribution test on 13 common emergency‑department findings (e.g., abscesses, gallstones) showed DSC values ranging from 0.61 to 0.73, demonstrating that SAM2CT generalizes to unseen pathologies without further fine‑tuning.

The paper’s contributions are threefold: (1) defining the OPS paradigm that turns routine GSPS annotations into a scalable source of 3D segmentation labels; (2) engineering SAM2CT with arrow/line prompt support and the novel MCM memory encoding; and (3) empirically validating the approach on both benchmark datasets and real clinical PACS data, achieving high segmentation quality and clinical acceptability. Limitations include reliance on the quality and consistency of radiologist‑generated GSPS objects, and the current focus on only two prompt types (arrows and lines). Future work could extend the prompt encoder to handle other GSPS shapes (e.g., circles, free‑hand ROI), incorporate automatic quality assessment of prompts, and explore self‑supervised pre‑training on massive unlabeled CT volumes to further boost performance.

In summary, SAM2CT demonstrates that opportunistically harvested radiology annotations can be transformed into high‑fidelity 3D lesion masks at scale, dramatically reducing the bottleneck of manual segmentation and paving the way for larger, more diverse CT datasets to fuel the next generation of medical imaging AI.


Comments & Academic Discussion

Loading comments...

Leave a Comment