Anatomy-Preserving Latent Diffusion for Generation of Brain Segmentation Masks with Ischemic Infarct

Anatomy-Preserving Latent Diffusion for Generation of Brain Segmentation Masks with Ischemic Infarct
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The scarcity of high-quality segmentation masks remains a major bottleneck for medical image analysis, particularly in non-contrast CT (NCCT) neuroimaging, where manual annotation is costly and variable. To address this limitation, we propose an anatomy-preserving generative framework for the unconditional synthesis of multi-class brain segmentation masks, including ischemic infarcts. The proposed approach combines a variational autoencoder trained exclusively on segmentation masks to learn an anatomical latent representation, with a diffusion model operating in this latent space to generate new samples from pure noise. At inference, synthetic masks are obtained by decoding denoised latent vectors through the frozen VAE decoder, with optional coarse control over lesion presence via a binary prompt. Qualitative results show that the generated masks preserve global brain anatomy, discrete tissue semantics, and realistic variability, while avoiding the structural artifacts commonly observed in pixel-space generative models. Overall, the proposed framework offers a simple and scalable solution for anatomy-aware mask generation in data-scarce medical imaging scenarios.


💡 Research Summary

The paper tackles a critical bottleneck in medical image analysis: the scarcity of high‑quality, multi‑class brain segmentation masks for non‑contrast CT (NCCT) neuroimaging, especially those that include ischemic infarcts. Manual annotation of NCCT is labor‑intensive, subject to inter‑observer variability, and often limited by privacy regulations. To alleviate this shortage, the authors propose an “anatomy‑preserving latent diffusion” framework that generates realistic brain segmentation masks without any conditioning image at inference time, while still allowing coarse control over the presence of lesions via a binary prompt.

The methodology consists of two sequential stages. In the first stage, a mask‑only variational auto‑encoder (MaskVAE) is trained exclusively on one‑hot encoded segmentation masks. The encoder maps each 256×256, 7‑class mask to a 256‑dimensional diagonal Gaussian (mean µ and log‑variance). The decoder reconstructs the mask logits from a sampled latent vector. Training optimizes a categorical cross‑entropy loss (preserving sharp class boundaries) together with a KL‑divergence term weighted by β=0.01, encouraging the latent space to follow a standard normal distribution. Because ischemic lesions are rare, the authors employ a weighted sampling scheme that up‑weights lesion‑containing slices by a factor of five, ensuring the VAE learns pathological configurations despite class imbalance.

In the second stage, the trained MaskVAE is frozen and serves as a strong anatomical prior. A diffusion model is then trained in the latent space of the VAE. For each training mask, the latent code z₀ is drawn from the VAE posterior, and a forward diffusion step adds Gaussian noise according to a linear β schedule over T=100 timesteps. Each slice is also assigned a binary variable y∈{0,1} indicating whether any infarct pixels are present. A learnable embedding (16‑dimensional) maps y to a prompt vector p(y), which is concatenated with the noisy latent vector before being fed to a lightweight denoiser network. This simple concatenation suffices for binary control and avoids the overhead of cross‑attention mechanisms. During sampling, a pure Gaussian latent z_T is iteratively denoised conditioned on the desired prompt, producing a latent ẑ that is finally decoded by the frozen VAE decoder into a discrete segmentation mask.

The authors collected a cohort of 86 ischemic stroke patients (≈6,500 axial slices) from a single institution. After a careful slice‑selection protocol that discards cervical, skull‑base, and posterior‑fossa regions, each slice is resized to 256×256 and one‑hot encoded into seven classes (background, CSF, bone, gray matter, white matter, other tissues, and ischemic infarct). The dataset is split patient‑wise into 61 training, 8 validation, and 7 test subjects (5,238 / 660 / 609 slices respectively). No downstream segmentation networks are trained; the focus is purely on unconditional mask synthesis.

Qualitative results demonstrate that the generated masks preserve global brain anatomy, maintain correct inter‑class adjacency (e.g., skull encloses brain, CSF surrounds parenchyma), and exhibit realistic variability across samples. Compared to pixel‑space GANs or pixel‑space diffusion models, the proposed approach eliminates common artifacts such as fragmented regions, broken topology, and label mixing, especially for the rare infarct class. Moreover, the binary prompt reliably toggles lesion presence, enabling balanced data augmentation for downstream tasks.

To promote reproducibility, the authors release a synthetic dataset of 605 multi‑class masks, the pre‑trained MaskVAE and latent diffusion models, generation scripts, and an online interactive demo. This open‑source package allows researchers to instantly augment scarce NCCT datasets, potentially improving the robustness of supervised segmentation or classification models.

In summary, the paper contributes three key innovations: (1) a mask‑only VAE that learns an explicit anatomical latent manifold, (2) latent‑space diffusion that respects this manifold while offering efficient sampling, and (3) a minimal binary prompt that provides controllable lesion generation. The work bridges classic shape modeling with modern diffusion‑based generative techniques, offering a scalable solution for data‑scarce medical imaging scenarios. Future directions include extending the framework to 3‑D volumes, incorporating richer prompts (e.g., lesion size, location), and integrating the synthetic masks into a full image‑to‑image synthesis pipeline (e.g., SPADE‑based NCCT generation) to further enhance data augmentation pipelines.


Comments & Academic Discussion

Loading comments...

Leave a Comment