Align & Invert: Solving Inverse Problems with Diffusion and Flow-based Models via Representation Alignment

Align & Invert: Solving Inverse Problems with Diffusion and Flow-based Models via Representation Alignment
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Enforcing alignment between the internal representations of diffusion or flow-based generative models and those of pretrained self-supervised encoders has recently been shown to provide a powerful inductive bias, improving both convergence and sample quality. In this work, we extend this idea to inverse problems, where pretrained generative models are employed as priors. We propose applying representation alignment (REPA) between diffusion or flow-based models and a DINOv2 visual encoder, to guide the reconstruction process at inference time. Although ground-truth signals are unavailable in inverse problems, we empirically show that aligning model representations of approximate target features can substantially enhance reconstruction quality and perceptual realism. We provide theoretical results showing (a) that REPA regularization can be viewed as a variational approach for minimizing a divergence measure in the DINOv2 embedding space, and (b) how under certain regularity assumptions REPA updates steer the latent diffusion states toward those of the clean image. These results offer insights into the role of REPA in improving perceptual fidelity. Finally, we demonstrate the generality of our approach by We integrate REPA into multiple state-of-the-art inverse problem solvers, and provide extensive experiments on super-resolution, box inpainting, Gaussian deblurring, and motion deblurring confirming that our method consistently improves reconstruction quality, while also providing efficiency gains reducing the number of required discretization steps.


💡 Research Summary

The paper introduces a novel framework that leverages representation alignment (REPA) between pretrained diffusion or flow‑based generative models and a self‑supervised visual encoder (DINOv2) to improve inverse‑problem solving. Inverse problems require reconstructing an unknown signal from degraded measurements, and existing methods typically rely on a pretrained diffusion or flow prior combined with a data‑consistency term. However, these approaches often struggle with severe degradations, fine‑detail recovery, and artifacts introduced by latent‑space decoders.

To address these issues, the authors adapt the REPA technique—originally designed for training diffusion models—to the inference stage of inverse‑problem solvers. REPA enforces patch‑wise cosine similarity between the model’s intermediate hidden representations and DINOv2 embeddings of a target image. Since the ground‑truth image is unavailable during inference, the authors propose a “proxy” representation. Initially the proxy is the DINOv2 embedding of the observed measurement; as the reverse diffusion proceeds, the proxy is gradually replaced by the embedding of the current denoised estimate (the conditional expectation E


Comments & Academic Discussion

Loading comments...

Leave a Comment