A Unified Framework for Multimodal Image Reconstruction and Synthesis using Denoising Diffusion Models

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Image reconstruction and image synthesis are important for handling incomplete multimodal imaging data, but existing methods require various task-specific models, complicating training and deployment workflows. We introduce Any2all, a unified framework that addresses this limitation by formulating these disparate tasks as a single virtual inpainting problem. We train a single, unconditional diffusion model on the complete multimodal data stack. This model is then adapted at inference time to ``inpaint’’ all target modalities from any combination of inputs of available clean images or noisy measurements. We validated Any2all on a PET/MR/CT brain dataset. Our results show that Any2all can achieve excellent performance on both multimodal reconstruction and synthesis tasks, consistently yielding images with competitive distortion-based performance and superior perceptual quality over specialized methods.

💡 Research Summary

The paper introduces Any2all, a unified framework that consolidates multimodal medical image reconstruction and synthesis into a single virtual inpainting problem. Traditional pipelines treat reconstruction (e.g., undersampled MRI) and synthesis (e.g., generating missing CT or PET) as separate tasks, each requiring its own dedicated neural network. This leads to fragmented training, increased computational overhead, and limited flexibility at inference time. Any2all eliminates this fragmentation by training one unconditional denoising diffusion probabilistic model (DDPM) on the full stack of modalities, then adapting it during inference to any combination of available data—whether clean images, noisy measurements, or completely missing modalities.

Problem formulation: Let the desired set of n modalities be represented as a stacked tensor x =

A Unified Framework for Multimodal Image Reconstruction and Synthesis using Denoising Diffusion Models

💡 Research Summary

Comments & Academic Discussion

Leave a Comment