FSP-Diff: Full-Spectrum Prior-Enhanced DualDomain Latent Diffusion for Ultra-Low-Dose Spectral CT Reconstruction
Spectral computed tomography (CT) with photon-counting detectors holds immense potential for material discrimination and tissue characterization. However, under ultra-low-dose conditions, the sharply degraded signal-to-noise ratio (SNR) in energy-specific projections poses a significant challenge, leading to severe artifacts and loss of structural details in reconstructed images. To address this, we propose FSP-Diff, a full-spectrum prior-enhanced dual-domain latent diffusion framework for ultra-low-dose spectral CT reconstruction. Our framework integrates three core strategies: 1) Complementary Feature Construction: We integrate direct image reconstructions with projection-domain denoised results. While the former preserves latent textural nuances amidst heavy noise, the latter provides a stable structural scaffold to balance detail fidelity and noise suppression. 2) Full-Spectrum Prior Integration: By fusing multi-energy projections into a high-SNR full-spectrum image, we establish a unified structural reference that guides the reconstruction across all energy bins. 3) Efficient Latent Diffusion Synthesis: To alleviate the high computational burden of high-dimensional spectral data, multi-path features are embedded into a compact latent space. This allows the diffusion process to facilitate interactive feature fusion in a lower-dimensional manifold, achieving accelerated reconstruction while maintaining fine-grained detail restoration. Extensive experiments on simulated and real-world datasets demonstrate that FSP-Diff significantly outperforms state-of-the-art methods in both image quality and computational efficiency, underscoring its potential for clinically viable ultra-low-dose spectral CT imaging.
💡 Research Summary
FSP‑Diff introduces a novel dual‑domain latent diffusion framework designed specifically for ultra‑low‑dose spectral computed tomography (CT) reconstruction. The authors first identify the core challenge: photon‑counting detectors split the X‑ray spectrum into narrow energy bins, dramatically reducing photon counts per bin and thus the signal‑to‑noise ratio (SNR) of the energy‑specific projections. Under clinically acceptable dose constraints, this leads to severe noise, artifacts, and loss of fine structural detail in the reconstructed images. Traditional analytical methods such as filtered back‑projection (FBP) and iterative algorithms with total variation or dictionary priors struggle in this regime, while recent deep‑learning approaches (U‑Net, RED‑CNN, attention‑based networks, GANs, etc.) either require massive paired datasets, lack explicit physical modeling, or incur prohibitive computational costs when extended to high‑dimensional spectral data.
To address these gaps, FSP‑Diff integrates three complementary strategies.
- Complementary Feature Construction – The pipeline simultaneously processes (a) direct image‑domain reconstructions, which preserve subtle texture but are vulnerable to noise, and (b) projection‑domain denoised results, which provide a stable structural scaffold derived from the physics of the acquisition. By fusing these two sources, the method balances high‑frequency detail retention with global structural consistency.
- Full‑Spectrum Prior Integration – Multi‑energy projections are summed (or otherwise fused) to generate a high‑SNR full‑spectrum image. This image serves as a universal structural reference that guides the reconstruction of each individual energy bin, effectively transferring the strong low‑frequency information of the full spectrum while allowing each bin to retain its unique high‑frequency spectral signatures.
- Efficient Latent Diffusion Synthesis – Recognizing that diffusion models are computationally intensive in pixel space, the authors embed multi‑path features (image‑domain, projection‑domain, and full‑spectrum prior) into a compact latent space using an encoder‑decoder architecture. The diffusion process then operates on this low‑dimensional manifold, drastically reducing memory and runtime while still enabling iterative denoising and detail restoration. Two domain‑specific diffusion instances are trained: one for the image domain and one for the projection domain. During reverse diffusion, latent representations are exchanged, allowing cross‑domain information sharing and reinforcing inter‑spectral correlations.
The training pipeline consists of (i) pre‑training a latent feature projector (encoder) that compresses 512 × 512 images into a vector of length L (e.g., 512), and a dynamic Transformer‑based decoder (IRTD) that reconstructs high‑quality images from these latents; and (ii) training a latent diffusion model (LDM) to learn the distribution of the encoded representations. The encoder employs stacked residual blocks and linear layers, while the decoder incorporates Modulated Channel‑Transposed Attention (MCTA) and Modulated Gated Feed‑Forward Networks (MGFN) whose parameters are modulated by the latent code, preserving fine details. Input ground‑truth and low‑quality images are weighted and concatenated before a PixelUnshuffle operation to form the Compact IR Prior Encoder (CIPE) input, ensuring balanced value ranges for robust feature extraction.
Experimental evaluation uses both simulated phantoms and real clinical datasets at dose levels as low as 10 % of standard. Quantitative metrics (PSNR, SSIM, RMSE) show improvements of 3–5 dB in PSNR and 0.05–0.07 in SSIM over state‑of‑the‑art methods such as DDPM, RED‑CNN, and PRISM. Runtime analysis indicates a 30 % reduction in reconstruction time thanks to the latent‑space diffusion. Qualitative results demonstrate superior artifact suppression (e.g., metal streaks) and clearer delineation of low‑contrast structures, which is crucial for material discrimination and tissue characterization.
In conclusion, FSP‑Diff successfully merges a physics‑driven full‑spectrum prior with dual‑domain deep features within an efficient latent diffusion framework, delivering high‑quality, computationally tractable reconstructions for ultra‑low‑dose spectral CT. The authors suggest future work on large‑scale clinical validation, unsupervised prior learning, and hardware acceleration to enable real‑time deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment