SRA-Seg: Synthetic to Real Alignment for Semi-Supervised Medical Image Segmentation
Synthetic data, an appealing alternative to extensive expert-annotated data for medical image segmentation, consistently fails to improve segmentation performance despite its visual realism. The reason being that synthetic and real medical images exist in different semantic feature spaces, creating a domain gap that current semi-supervised learning methods cannot bridge. We propose SRA-Seg, a framework explicitly designed to align synthetic and real feature distributions for medical image segmentation. SRA-Seg introduces a similarity-alignment (SA) loss using frozen DINOv2 embeddings to pull synthetic representations toward their nearest real counterparts in semantic space. We employ soft edge blending to create smooth anatomical transitions and continuous labels, eliminating the hard boundaries from traditional copy-paste augmentation. The framework generates pseudo-labels for synthetic images via an EMA teacher model and applies soft-segmentation losses that respect uncertainty in mixed regions. Our experiments demonstrate strong results: using only 10% labeled real data and 90% synthetic unlabeled data, SRA-Seg achieves 89.34% Dice on ACDC and 84.42% on FIVES, significantly outperforming existing semi-supervised methods and matching the performance of methods using real unlabeled data.
💡 Research Summary
The paper tackles the chronic shortage of expertly annotated medical images by proposing SRA‑Seg, a semi‑supervised segmentation framework that can effectively leverage large amounts of synthetic data. Recognizing that synthetic and real images occupy distinct semantic feature spaces, the authors introduce a similarity‑alignment (SA) loss that uses frozen DINOv2 ViT‑B/16 embeddings to pull each synthetic sample toward its nearest real counterpart in feature space, directly reducing the domain gap. Synthetic images are generated with StyleGAN2‑ADA, which adapts discriminator augmentations to avoid over‑fitting on limited real data, ensuring high‑quality, diverse samples.
A teacher network, maintained as an exponential moving average (EMA) of the student, provides pseudo‑labels for synthetic images. After softmax conversion, hard class assignments are cleaned with a connected‑component filter, yielding one‑hot pseudo‑labels that retain uncertainty in mixed regions.
To avoid the harsh boundaries created by traditional copy‑paste augmentation (e.g., BCP), SRA‑Seg employs soft‑mix blending. A rectangular mask is blurred with a 3×3 average pool, and the labeled real image and synthetic image (and their respective labels) are combined using a spatially varying weight α, producing two complementary blended pairs. This results in smooth anatomical transitions that better reflect real tissue continuity.
Training optimizes a composite loss: (1) Soft‑Segmentation loss, consisting of a soft Dice term and a soft cross‑entropy term computed directly on continuous probability maps, and (2) the SA loss, the mean of minimum Euclidean distances between synthetic and real DINOv2 embeddings. The total loss L = L_soft + λ·L_SA balances pixel‑level accuracy with feature‑space alignment; gradients flow only through the segmentation network, leaving the frozen ViT untouched.
Experiments on the ACDC and FIVES datasets use only 10 % of the data as labeled real images and 90 % synthetic images as unlabeled data. SRA‑Seg achieves 89.34 % Dice on ACDC and 84.42 % Dice on FIVES, outperforming state‑of‑the‑art semi‑supervised methods (Mean Teacher, UA‑MT, etc.) and matching the performance of methods that rely on real unlabeled data. The results demonstrate that, when properly aligned, synthetic data can be as valuable as real data for medical image segmentation. The authors also release code, facilitating reproducibility and encouraging broader adoption of synthetic‑data‑driven semi‑supervised learning in clinical imaging.
Comments & Academic Discussion
Loading comments...
Leave a Comment