FCDM: A Physics-Guided Bidirectional Frequency Aware Convolution and Diffusion-Based Model for Sinogram Inpainting
Computed tomography (CT) is widely used in scientific imaging systems such as synchrotron and laboratory-based nano-CT, but acquiring full-view sinograms requires high radiation dose and long scan times. Sparse-view CT reduces this burden but produces incomplete sinograms with structured signal loss, degrading reconstruction quality. Unlike RGB images, sinograms encode globally coupled projections and exhibit directional spectral patterns, making conventional RGB-oriented inpainting methods, including diffusion models, ineffective because they ignore angular dependencies and physical constraints inherent to tomographic data. We propose FCDM, a diffusion-based framework for sinogram restoration that incorporates bidirectional frequency reasoning, angular-aware masking, and physics-guided regularization to preserve global structure and physical plausibility. Experiments on real-world datasets show that FCDM consistently outperforms existing baselines, achieving over 0.93 SSIM and 31 dB PSNR across diverse sparse-view settings.
💡 Research Summary
The paper addresses the long‑standing challenge of restoring incomplete sinograms that arise in sparse‑view computed tomography (CT). In sparse‑view CT, the number of projection angles is reduced to lower radiation dose and acquisition time, but this creates structured gaps in the sinogram that lead to severe artifacts after the inverse Radon transform. Conventional inpainting techniques, originally designed for RGB images, assume that missing pixels can be inferred locally from surrounding pixels. This assumption fails for sinograms because each pixel encodes an integral of X‑ray attenuation along a specific ray, making the data globally coupled across detector rows and projection angles. Moreover, existing sinogram‑completion methods (U‑Net, GAN, Transformer) operate primarily in the spatial domain and do not explicitly enforce the physical constraints inherent to CT, such as total X‑ray absorption conservation, nor do they model the highly directional spectral patterns that result from the Radon transform.
To overcome these limitations, the authors propose FCDM (Frequency‑aware Convolution and Diffusion Model), a two‑stage framework that combines a physics‑guided encoder‑decoder with a diffusion‑based latent‑space inpainting process. The first stage introduces a Bidirectional Frequency‑Domain Convolution (BFDC) module. BFDC decomposes the latent feature map into two separate Fourier transforms: one along the detector axis and one along the projection‑angle axis. Each transform is multiplied by a learnable frequency‑domain kernel (K_w for angles, K_h for detectors) and then inverse‑transformed back to the spatial domain. The resulting angle‑wise and detector‑wise frequency‑enhanced features (l_w, l_h) are fused with a conventional spatial convolution output (h_s) using learnable weights α_w and α_h. This design respects the anisotropic nature of sinograms, allowing the network to capture directional spectral cues that are invisible to standard 2‑D convolutions.
In addition to BFDC, the encoder‑decoder is trained with a composite loss that enforces both visual fidelity and physical plausibility. The pixel‑wise MSE loss ensures accurate reconstruction of known regions. An adversarial loss encourages realistic texture. The Total‑Absorption Consistency loss (L_absorp) penalizes deviations between the sum of projection values for each angle and the total attenuation inferred via an inverse Radon transform, thereby guaranteeing that the restored sinogram does not introduce non‑physical intensity drift. The Frequency‑Domain Consistency loss (L_freq) minimizes the L2 distance between the Fourier spectra of the restored and ground‑truth sinograms, suppressing spectral artifacts.
The second stage performs diffusion‑based inpainting directly in the latent space learned in stage 1. Standard diffusion models treat missing pixels as independent binary masks and inject isotropic Gaussian noise at each timestep. FCDM modifies both aspects to suit sinogram geometry. First, a Fourier‑Enhanced Mask Embedding (FEME) encodes the angular extent of each missing region using a Fourier encoding operator Φ(θ). This produces a mask embedding M′(θ) that carries explicit angle information, allowing the diffusion model’s attention mechanism to reason about correlations across views. Second, Frequency‑Adaptive Noise Scheduling (FANS) varies the noise magnitude as a function of frequency and diffusion timestep. Early timesteps preserve low‑frequency components, stabilizing the global structure of the sinogram; later timesteps increase high‑frequency noise to refine fine details. The diffusion model is trained with a standard mean‑squared error loss between the predicted and true latent noise, but the noise itself is modulated by the FANS schedule.
Extensive experiments were conducted on two public datasets: TomoBank (high‑resolution synchrotron CT) and LoDoPaB (clinical‑grade low‑dose CT). The authors evaluated multiple sparse‑view configurations (e.g., 1/8, 1/16, 1/32 of full projections) and compared against strong baselines, including U‑Net, GAN‑based sinogram completion, Transformer‑based methods, and a recent diffusion inpainting model (LaMa). Across all settings, FCDM achieved structural similarity index (SSIM) scores above 0.93 and peak signal‑to‑noise ratio (PSNR) around 31 dB, outperforming the best baseline by 2–4 % in SSIM and 1.5–2.5 dB in PSNR. Ablation studies demonstrated that removing BFDC reduced SSIM by ~2 %, omitting the absorption loss caused noticeable intensity drift in reconstructed CT volumes, and disabling FEME or FANS each led to measurable performance drops, confirming the contribution of every component.
In summary, FCDM introduces a principled way to embed the physics of X‑ray attenuation and the anisotropic frequency structure of sinograms into a modern diffusion framework. By jointly learning frequency‑aware latent representations, enforcing total‑absorption consistency, and adapting both mask encoding and noise scheduling to the sinogram’s geometry, the method delivers high‑fidelity sinogram restoration that translates into markedly improved CT reconstructions. The work opens avenues for further research on real‑time deployment, extension to other acquisition geometries (e.g., cone‑beam CT), and tighter integration of physical models into generative diffusion processes.
Comments & Academic Discussion
Loading comments...
Leave a Comment