Semantically Conditioned Diffusion Models for Cerebral DSA Synthesis

Semantically Conditioned Diffusion Models for Cerebral DSA Synthesis
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Digital subtraction angiography (DSA) plays a central role in the diagnosis and treatment of cerebrovascular disease, yet its invasive nature and high acquisition cost severely limit large-scale data collection and public data sharing. Therefore, we developed a semantically conditioned latent diffusion model (LDM) that synthesizes arterial-phase cerebral DSA frames under explicit control of anatomical circulation (anterior vs.\ posterior) and canonical C-arm positions. We curated a large single-centre DSA dataset of 99,349 frames and trained a conditional LDM using text embeddings that encoded anatomy and acquisition geometry. To assess clinical realism, four medical experts, including two neuroradiologists, one neurosurgeon, and one internal medicine expert, systematically rated 400 synthetic DSA images using a 5-grade Likert scale for evaluating proximal large, medium, and small peripheral vessels. The generated images achieved image-wise overall Likert scores ranging from 3.1 to 3.3, with high inter-rater reliability (ICC(2,k) = 0.80–0.87). Distributional similarity to real DSA frames was supported by a low median Fréchet inception distance (FID) of 15.27. Our results indicate that semantically controlled LDMs can produce realistic synthetic DSAs suitable for downstream algorithm development, research, and training.


💡 Research Summary

Digital Subtraction Angiography (DSA) is indispensable for diagnosing and treating cerebrovascular disease, yet its invasive nature, radiation exposure, and the need for intra‑arterial contrast agents make large‑scale data collection difficult. Publicly available DSA datasets are therefore scarce, limiting the development of data‑intensive AI models. In this work, the authors address this gap by building a large single‑center DSA repository and training a semantically conditioned latent diffusion model (LDM) capable of generating high‑fidelity arterial‑phase DSA frames from scratch, with explicit control over anatomical circulation (anterior vs. posterior) and C‑arm view (two canonical planes).

Dataset construction
From the University Medical Center Mannheim’s PACS, 1,104 DSA studies (859 patients) performed between 2015 and 2021 were extracted, yielding 22,064 series. After discarding non‑anterior/posterior series (3,694) and applying a ResNet‑18 classifier trained on a manually annotated subset (953 series) to detect arterial‑phase frames, the authors isolated 99,349 arterial‑phase frames. These comprised 78,253 frames from the anterior circulation (AC) and 21,096 from the posterior circulation (PC). The four most frequent circulation‑plane combinations—AC‑Plane A, AC‑Plane B, PC‑Plane A, PC‑Plane B—account for 45.4 % of the data and were selected as conditioning targets.

Semantic conditioning
Four textual prompts describing each combination (e.g., “This is an anterior DSA scan taken in Plane A, with a primary angle of 0° and a secondary angle of 0°.”) were encoded with a BERT‑based text encoder. The resulting embeddings serve as cross‑attention conditioning vectors for the diffusion UNet, ensuring that the generated image reflects the specified anatomy and geometry throughout the denoising process.

Model architecture and training
The LDM follows the Stable Diffusion paradigm: images are first encoded into a 64 × 64 × 4 latent space, then a UNet denoiser removes Gaussian noise over 1,000 diffusion steps. Training used 83,614 frames (84.2 %) and validation used 15,735 frames (15.8 %). AdamW optimizer with a cosine learning‑rate schedule was employed, and a conditional loss combined the usual diffusion objective with the text‑image alignment term.

Synthetic image generation
For each of the four conditions, ten batches of 40 images (10 per condition per batch) were sampled, yielding 400 synthetic frames. Visual inspection showed that vessel topology and contrast dynamics closely resembled real arterial‑phase DSA.

Clinical evaluation
Four blinded experts (two neuroradiologists, one neurosurgeon, one internal‑medicine physician) performed a two‑stage assessment. First, a senior neuroradiologist screened the 400 images, retaining 239 (59.8 %) as clinically usable arterial‑phase frames. In the second stage, each expert rated the visibility and delineation of proximal (large), medium, and peripheral (small) vessels on a 5‑point Likert scale (1 = very poor, 5 = excellent). The overall image‑wise mean scores ranged from 3.1 to 3.3, indicating intermediate‑to‑good quality. Proximal vessels received the highest ratings (≈3.5), while peripheral vessels were rated lower (≈2.8–3.0). The posterior‑circulation/Plane A condition performed worst (mean ≈ 2.73), likely reflecting its under‑representation in the training set. Inter‑rater reliability was strong, with ICC(2,k) values between 0.80 and 0.87.

Distributional similarity
Fréchet Inception Distance (FID) between the synthetic set and the real arterial‑phase frames was 15.27, substantially lower than many prior medical‑image generative studies, confirming that the generated images occupy a similar feature distribution to real data.

Insights and limitations
The study demonstrates that a semantically conditioned diffusion model can produce realistic, controllable DSA frames, opening avenues for data augmentation, algorithm benchmarking, and educational simulation. Limitations include reliance on a single‑center dataset, reduced quality for sparsely represented conditioning pairs, and the generation of static frames only (no temporal continuity). Future work should incorporate multi‑center data, extend the model to 4‑D (time‑resolved) DSA synthesis, and improve fine‑scale vessel fidelity.

Conclusion
By integrating explicit textual conditioning with a latent diffusion framework, the authors achieve high‑quality, anatomically and geometrically controllable synthetic cerebral DSA images. The approach yields clinically plausible outputs, as validated by expert readers and quantitative metrics, and holds promise for addressing the data scarcity that hampers AI development in neuro‑interventional radiology.


Comments & Academic Discussion

Loading comments...

Leave a Comment