SegRap2025: A Benchmark of Gross Tumor Volume and Lymph Node Clinical Target Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

SegRap2025: A Benchmark of Gross Tumor Volume and Lymph Node Clinical Target Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Accurate delineation of Gross Tumor Volume (GTV), Lymph Node Clinical Target Volume (LN CTV), and Organ-at-Risk (OAR) from Computed Tomography (CT) scans is essential for precise radiotherapy planning in Nasopharyngeal Carcinoma (NPC). Building upon SegRap2023, which focused on OAR and GTV segmentation using single-center paired non-contrast CT (ncCT) and contrast-enhanced CT (ceCT) scans, the SegRap2025 challenge aims to enhance the generalizability and robustness of segmentation models across imaging centers and modalities. SegRap2025 comprises two tasks: Task01 addresses GTV segmentation using paired CT from the SegRap2023 dataset, with an additional external testing set to evaluate cross-center generalization, and Task02 focuses on LN CTV segmentation using multi-center training data and an unseen external testing set, where each case contains paired CT scans or a single modality, emphasizing both cross-center and cross-modality robustness. This paper presents the challenge setup and provides a comprehensive analysis of the solutions submitted by ten participating teams. For GTV segmentation task, the top-performing models achieved average Dice Similarity Coefficient (DSC) of 74.61% and 56.79% on the internal and external testing cohorts, respectively. For LN CTV segmentation task, the highest average DSC values reached 60.24%, 60.50%, and 57.23% on paired CT, ceCT-only, and ncCT-only subsets, respectively. SegRap2025 establishes a large-scale multi-center, multi-modality benchmark for evaluating the generalization and robustness in radiotherapy target segmentation, providing valuable insights toward clinically applicable automated radiotherapy planning systems. The benchmark is available at: https://hilab-git.github.io/SegRap2025_Challenge.


💡 Research Summary

The paper introduces the SegRap2025 challenge, a large‑scale benchmark designed to evaluate the generalization and modality robustness of automated segmentation algorithms for nasopharyngeal carcinoma (NPC) radiotherapy planning. Building on the earlier SegRap2023 dataset, which provided paired non‑contrast CT (ncCT) and contrast‑enhanced CT (ceCT) scans from a single institution, SegRap2025 expands the scope in two ways: (1) it adds an external test set from an unseen imaging center to assess cross‑center performance, and (2) it introduces a new task focused on lymph‑node clinical target volume (LN CTV) segmentation using multi‑center data that include both paired and single‑modality scans.

Task 01 addresses Gross Tumor Volume (GTV) segmentation. Training data consist of 120 patients (240 scans) with paired ncCT/ceCT and 500 unlabeled scans for semi‑supervised research. Validation uses 20 patients, internal testing uses 60 patients from the original center, and an external test set of 60 patients from DHCJ evaluates out‑of‑domain performance. Task 02 targets six LN CTV levels (left/right Ib, II+III+Va, IV+Vb+Vc). Training data are gathered from four institutions (total 260 patients, 520 scans) with a mixture of paired, ceCT‑only, and ncCT‑only cases; validation includes 20 patients, and testing uses 50 patients from an unseen center, again with all three modality configurations.

The datasets exhibit substantial heterogeneity: scanner models (Siemens, Philips), slice thickness (2.5–3 mm), in‑plane resolution (512×512 to 1024×1024), tube voltage (120 kV) and current (200–380 mA). Expert radiation oncologists provided full annotations for GTV primary (GTV p), GTV nodal (GTV nd), and the six LN CTV levels. Additionally, 500 unlabeled scans are released to encourage data‑efficient methods such as semi‑supervised learning or domain adaptation.

Participants were allowed to use publicly available foundation models (e.g., pretrained ResNet, Swin‑Transformer backbones) and any open‑source segmentation architecture (UNet++, nnU‑Net, transformer‑based networks). Data augmentation (intensity scaling, random cropping, modality swapping) and domain‑adaptation techniques (entropy minimization, style transfer) were widely employed. External training data were prohibited to ensure a fair comparison. Submissions were containerized via Docker and evaluated on a hidden test server.

Performance was primarily measured by Dice Similarity Coefficient (DSC), complemented by Hausdorff Distance (HD) and Average Surface Distance (ASD). For GTV, the best internal DSC averaged 74.61 %, but dropped sharply to 56.79 % on the external set, highlighting the challenge of cross‑center generalization. LN CTV results were more balanced across modalities: paired CT achieved 60.24 % DSC, ceCT‑only 60.50 %, and ncCT‑only 57.23 %. Small structures such as the Ib level showed larger HD values, indicating persistent difficulty in accurately delineating tiny targets.

Analysis of the ten participating teams revealed that models leveraging large pretrained backbones generally attained higher DSCs, yet incurred higher computational cost and inference latency. Multi‑stream or modality‑agnostic designs proved beneficial for handling missing‑modality scenarios. The results underscore three key insights: (1) without multi‑center, multi‑modality training data, models suffer significant performance degradation on unseen domains; (2) robust segmentation in clinical practice requires architectures that can operate reliably on single‑modality inputs; (3) unlabeled data and semi‑supervised or domain‑adaptation strategies hold promise for bridging the performance gap.

The authors conclude that SegRap2025 provides a realistic testbed for advancing NPC radiotherapy automation. Future work should focus on integrating semi‑supervised learning, sophisticated domain adaptation, and hybrid networks that capture both high‑resolution local details and global context, especially for small LN CTV regions. By fostering collaboration across institutions and encouraging data‑efficient methods, the benchmark aims to accelerate the development of clinically trustworthy, generalizable segmentation tools for NPC radiotherapy planning.


Comments & Academic Discussion

Loading comments...

Leave a Comment