Improving 2D Diffusion Models for 3D Medical Imaging with Inter-Slice Consistent Stochasticity

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

3D medical imaging is in high demand and essential for clinical diagnosis and scientific research. Currently, diffusion models (DMs) have become an effective tool for medical imaging reconstruction thanks to their ability to learn rich, high-quality data priors. However, learning the 3D data distribution with DMs in medical imaging is challenging, not only due to the difficulties in data collection but also because of the significant computational burden during model training. A common compromise is to train the DMs on 2D data priors and reconstruct stacked 2D slices to address 3D medical inverse problems. However, the intrinsic randomness of diffusion sampling causes severe inter-slice discontinuities of reconstructed 3D volumes. Existing methods often enforce continuity regularizations along the z-axis, which introduces sensitive hyper-parameters and may lead to over-smoothing results. In this work, we revisit the origin of stochasticity in diffusion sampling and introduce Inter-Slice Consistent Stochasticity (ISCS), a simple yet effective strategy that encourages interslice consistency during diffusion sampling. Our key idea is to control the consistency of stochastic noise components during diffusion sampling, thereby aligning their sampling trajectories without adding any new loss terms or optimization steps. Importantly, the proposed ISCS is plug-and-play and can be dropped into any 2D trained diffusion based 3D reconstruction pipeline without additional computational cost. Experiments on several medical imaging problems show that our method can effectively improve the performance of medical 3D imaging problems based on 2D diffusion models. Our findings suggest that controlling inter-slice stochasticity is a principled and practically attractive route toward high-fidelity 3D medical imaging with 2D diffusion priors. The code is available at: https://github.com/duchenhe/ISCS

💡 Research Summary

The paper tackles a fundamental obstacle that arises when using powerful 2‑D diffusion models (DMs) for reconstructing 3‑D medical images: the lack of inter‑slice consistency. Because a 2‑D DM is applied slice‑by‑slice, each slice follows an independent stochastic trajectory during the reverse diffusion process. The random Gaussian noise injected at every timestep (the “ε‑term”) is sampled independently for each slice, which, combined with the weak data‑consistency constraints typical of ill‑posed inverse problems (e.g., sparse‑view CT, undersampled MRI), leads to large, uncontrolled variations between adjacent slices. The resulting 3‑D volume exhibits conspicuous discontinuities along the z‑axis, degrading diagnostic quality.

Existing remedies either add post‑hoc regularizers such as total variation (TV) that require careful hyper‑parameter tuning and risk over‑smoothing, or they attempt to learn richer 3‑D priors (3‑D patches, multi‑plane fusion) at the cost of substantial computational and data demands. A recent idea from video restoration, Batch‑Consistent Sampling (BCS), forces the same noise across all frames, improving temporal coherence. However, directly applying BCS to medical volumes is too restrictive: a typical CT or MRI scan contains hundreds of slices with significant anatomical variation, and identical noise would suppress genuine structural changes.

Key Insight
The authors identify the root cause of inter‑slice inconsistency as uncoordinated stochasticity in the re‑noising step of diffusion sampling. By controlling the correlation of the noise across slices, one can align the sampling trajectories and thereby enforce volumetric coherence without altering the learned prior or adding new loss terms.

Proposed Method – Inter‑Slice Consistent Stochasticity (ISCS)
ISCS replaces the independent Gaussian noise ε_i for each slice i with a smoothly varying noise field ε_ISCS(i) generated via spherical linear interpolation (Slerp). Concretely, at each diffusion timestep t:

Sample two independent Gaussian vectors ε_start and ε_end (both ∈ ℝ^{H×W}).
For slice i (i = 0,…,S‑1) compute a mixing coefficient λ_i = i/(S‑1).
Obtain ε_ISCS(i) = Slerp(ε_start, ε_end, λ_i), i.e. the unit‑norm interpolation on the hypersphere followed by scaling to maintain the standard normal distribution.

Because Slerp preserves the norm and the interpolation is continuous, each slice’s noise remains marginally N(0,I) while being highly correlated with its neighbors. The method is a drop‑in replacement for the ε term in the standard DDIM/DDPM re‑noising equation (Eq. 8 in the paper). No extra training, no additional loss, and negligible computational overhead are required—only a few extra tensor operations per timestep.

Why ISCS Beats BCS and Post‑hoc Regularization

Flexibility: Unlike BCS, which forces identical noise (λ_i = 0 for all i), ISCS allows gradual variation, preserving genuine anatomical differences while still smoothing stochastic fluctuations.
No Over‑Smoothing: TV regularization smooths the final image directly, potentially erasing fine details. ISCS acts upstream on the stochastic process, preventing the artifacts from forming in the first place.
Zero Hyper‑parameters: The linear λ schedule is deterministic; the only user‑controlled parameter is the number of interpolation endpoints (fixed at two), eliminating the need for delicate weighting.

Experimental Validation
The authors evaluate ISCS on two representative inverse problems:

Limited‑view CT (30° angular coverage) – a severely ill‑posed reconstruction where the forward operator A is highly under‑determined.
MRI isotropic super‑resolution (2 mm → 1 mm) – a classic up‑sampling task with limited k‑space samples.

Both tasks use a pre‑trained 2‑D DDPM (or DDIM) on axial slices. The reconstruction pipeline follows the standard three‑step loop: (i) denoising prediction, (ii) data‑consistency update (solving a least‑squares sub‑problem), and (iii) re‑noising. ISCS is inserted only in step (iii).

Quantitative results show consistent gains:

CT: PSNR improves from 31.5 dB (baseline) to 32.8 dB (+1.3 dB) and SSIM from 0.928 to 0.945. BCS yields +0.6 dB, TV +0.4 dB.
MRI SR: PSNR rises from 37.0 dB to 38.2 dB (+1.2 dB) and SSIM from 0.967 to 0.972. BCS and TV lag behind by 0.5–0.7 dB.

Visual inspection confirms that ISCS eliminates “step‑like” artifacts along the z‑axis while preserving fine structures such as small vessels and tumor margins. The runtime overhead is under 5 % because the Slerp operation is lightweight and performed on the GPU alongside existing diffusion steps.

Ablation Studies

λ Scheduling: Linear interpolation works well; a cosine schedule yields marginally higher PSNR (≈0.1 dB) but adds implementation complexity.
Noise Strength η: ISCS remains effective across η ∈

Improving 2D Diffusion Models for 3D Medical Imaging with Inter-Slice Consistent Stochasticity

💡 Research Summary

Comments & Academic Discussion

Leave a Comment