Towards reconstructing experimental sparse-view X-ray CT data with diffusion models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Diffusion-based image generators are promising priors for ill-posed inverse problems like sparse-view X-ray Computed Tomography (CT). As most studies consider synthetic data, it is not clear whether training data mismatch (``domain shift’’) or forward model mismatch complicate their successful application to experimental data. We measured CT data from a physical phantom resembling the synthetic Shepp-Logan phantom and trained diffusion priors on synthetic image data sets with different degrees of domain shift towards it. Then, we employed the priors in a Decomposed Diffusion Sampling scheme on sparse-view CT data sets with increasing difficulty leading to the experimental data. Our results reveal that domain shift plays a nuanced role: while severe mismatch causes model collapse and hallucinations, diverse priors outperform well-matched but narrow priors. Forward model mismatch pulls the image samples away from the prior manifold, which causes artifacts but can be mitigated with annealed likelihood schedules that also increase computational efficiency. Overall, we demonstrate that performance gains do not immediately translate from synthetic to experimental data, and future development must validate against real-world benchmarks.

💡 Research Summary

This paper investigates the practical deployment of diffusion‑based generative priors for sparse‑view X‑ray computed tomography (CT) when moving from synthetic training data to real experimental measurements. The authors first fabricate a physical phantom that mimics the classic Shepp‑Logan (SL) phantom by laser‑cutting ellipses into a 6 mm thick PMMA plate, filling most regions with epoxy and leaving two air cavities. Using a custom laboratory cone‑beam CT scanner (70 kVp, 600 µA, 30 ms exposure) they acquire 901 equally spaced fan‑beam projections (901 × 478 sinogram) of the phantom, providing a realistic experimental dataset (y_exp).

To study domain shift, three synthetic training sets are generated: (i) X_std – standard SL parameters perturbed with Gaussian noise; (ii) X_exp – the exact ellipse parameters of the physical phantom perturbed similarly; (iii) X_mix – a mixture of the two, controlled by a Bernoulli probability π, yielding a broader distribution. Each set contains 10 000 128 × 128 images. Three diffusion models are trained on these datasets (f_std, f_exp, f_mix) using a cosine noise schedule and the same UNet architecture as in guided‑diffusion.

Reconstruction is performed with the Decomposed Diffusion Sampler (DDS). DDS replaces the costly Jacobian term of standard diffusion‑posterior samplers with a CG‑based projection onto the clean image manifold after each denoising step. The forward operator A is a fan‑beam projector matching the experimental geometry. The data‑consistency loss ℓ(x)=½‖y−Ax‖² supplies a likelihood gradient Aᵀ(y−Ax). Crucially, the authors introduce an annealed likelihood schedule γ_t: early diffusion steps rely heavily on the prior, while later steps gradually increase the weight of the likelihood term. They compare constant γ (0.5 or 5) with linearly decaying schedules (γ_max = 5 or 50).

Four test domains are considered: (a) y_sim(std) – synthetic projections of the canonical SL image; (b) y_sim(cad) – synthetic projections of the CAD model used to cut the physical phantom (lower contrast, same geometry); (c) y_sim(recon) – projections of a full‑view reconstruction of the physical phantom; (d) y_exp – the real measured data.

Results on PSNR and SSIM reveal two main findings. First, the training distribution matters: f_std excels only on y_sim(std) and collapses on the other domains; f_exp performs well on y_sim(cad) but poorly on y_sim(std). The mixed prior f_mix consistently outperforms the narrow priors across all domains, especially when the number of views exceeds five, achieving up to 5 dB PSNR gain over the domain‑specific model. Visual inspection confirms that f_mix reconstructs edges and fine structures without the hallucinations seen with mismatched priors. Second, forward‑model mismatch between the simulated fan‑beam projector and the true experimental system degrades performance. Line‑profile analysis shows spatial shifts of small holes in the phantom when using y_exp, indicating that the likelihood term forces the sampler away from the clean manifold. Increasing reconstruction resolution (128 → 512) narrows but does not eliminate the gap, highlighting persistent errors from beam hardening, scatter, and geometric mis‑calibration.

The annealed likelihood schedule proves effective: a linear decay with γ_max = 5 yields the highest SSIM (≈0.85) and PSNR (≈24 dB) even with only 10 sampling steps, outperforming constant‑γ settings. This schedule mitigates the adverse impact of model mismatch while dramatically reducing computational cost.

In summary, the paper demonstrates that (1) severe domain shift can cause diffusion priors to collapse, but a diversified training set (mixed prior) provides robustness to geometric variations; (2) forward‑model inaccuracies pull reconstructions away from the prior manifold, yet an annealed likelihood weighting can reconcile the two forces and improve both quality and efficiency; and (3) performance gains observed on synthetic benchmarks do not automatically transfer to real CT data, underscoring the need for realistic experimental validation and more accurate physical modeling in future work.

Towards reconstructing experimental sparse-view X-ray CT data with diffusion models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment