ICM-SR: Image-Conditioned Manifold Regularization for Image Super-Resoultion
📝 Abstract
Real world image super-resolution (Real-ISR) often leverages the powerful generative priors of text-to-image diffusion models by regularizing the output to lie on their learned manifold. However, existing methods often overlook the importance of the regularizing manifold, typically defaulting to a text-conditioned manifold. This approach suffers from two key limitations. Conceptually, it is misaligned with the Real-ISR task, which is to generate high quality (HQ) images directly tied to the low quality (LQ) images. Practically, the teacher model often reconstructs images with color distortions and blurred edges, indicating a flawed generative prior for this task. To correct these flaws and ensure conceptual alignment, a more suitable manifold must incorporate information from the images. While the most straightforward approach is to condition directly on the raw input images, their high information densities make the regularization process numerically unstable. To resolve this, we propose image-conditioned manifold regularization (ICM), a method that regularizes the output towards a manifold conditioned on the sparse yet essential structural information: a combination of colormap and Canny edges. ICM provides a task-aligned and stable regularization signal, thereby avoiding the instability of dense-conditioning and enhancing the final super-resolution quality. Our experiments confirm that the proposed regularization significantly enhances super-resolution performance, particularly in perceptual quality, demonstrating its effectiveness for real-world applications. We will release the source code of our work for reproducibility. * indicates equal contribution. 0.25 0.5 0.75 1 LPIPS DISTS FID NIQE MUSIQ MANIQA CLIPIQA Methods SinSR OSEDiff TSD-SR ICM-SR (ours)
💡 Analysis
Real world image super-resolution (Real-ISR) often leverages the powerful generative priors of text-to-image diffusion models by regularizing the output to lie on their learned manifold. However, existing methods often overlook the importance of the regularizing manifold, typically defaulting to a text-conditioned manifold. This approach suffers from two key limitations. Conceptually, it is misaligned with the Real-ISR task, which is to generate high quality (HQ) images directly tied to the low quality (LQ) images. Practically, the teacher model often reconstructs images with color distortions and blurred edges, indicating a flawed generative prior for this task. To correct these flaws and ensure conceptual alignment, a more suitable manifold must incorporate information from the images. While the most straightforward approach is to condition directly on the raw input images, their high information densities make the regularization process numerically unstable. To resolve this, we propose image-conditioned manifold regularization (ICM), a method that regularizes the output towards a manifold conditioned on the sparse yet essential structural information: a combination of colormap and Canny edges. ICM provides a task-aligned and stable regularization signal, thereby avoiding the instability of dense-conditioning and enhancing the final super-resolution quality. Our experiments confirm that the proposed regularization significantly enhances super-resolution performance, particularly in perceptual quality, demonstrating its effectiveness for real-world applications. We will release the source code of our work for reproducibility. * indicates equal contribution. 0.25 0.5 0.75 1 LPIPS DISTS FID NIQE MUSIQ MANIQA CLIPIQA Methods SinSR OSEDiff TSD-SR ICM-SR (ours)
📄 Content
Image Super-Resolution (ISR), which aims to restore a high-quality (HQ) image from its low-quality (LQ) counterpart, is a classical problem in computer vision. While recent advances in deep learning have significantly improved the Figure 1. Performance comparison on DRealSR benchmark [34]. The red and blue metrics are no-reference and reference perceptual metrics, respectively. ICM-SR (ours) stands out for perceptual metrics, highlighting its strong performance in practical scenarios.
ISR performance [5,8,14,18,19], they often fail to generalize to the diverse and unknown degradations encountered in real-world scenarios. To address this limitation, realworld image super-resolution (Real-ISR) [31,44] aims to achieve practical super-resolution by applying significantly more diverse and complex degradation pipelines. Since training models solely with a pixel-wise reconstruction loss inevitably leads to blurry and oversmoothed results, training frameworks from generative models such as Generative Adversarial Networks (GANs) [7] and diffusion models [10,25,26] are adopted for Real-ISR. Both GAN-based methods [15,30,31] and diffusion-based methods [20,43] enable the generation of more realistic and sharp images with superior perceptual quality compared to the models trained only with reconstruction loss.
The emergence of generative foundation models has opened new avenues for Real-ISR, leveraging the powerful generative priors of pretrained text-to-image diffusion models [25]. One approach [36,42] In contrast, our proposed image-conditioned prior, guided by a colormap and Canny edges, provides a much more accurate prediction.
It consistently reconstructs latents with faithful color and sharp structural details, demonstrating a more stable and taskaligned generative prior.
sion models to the Real-ISR task by training LoRA [11] or ControlNet [46]. This approach preserves the powerful generative priors inherent in the text-to-image models, thereby achieving superior generalization capabilities. Despite their impressive performance, standard diffusion models for ISR often require high computational cost due to their iterative sampling process. For efficient inference, researches [6,16,35,41] have explored distilling generative priors into one-step super-resolution models. They adopt distillation techniques for diffusion models such as distribution matching [22,40] and consistency trajectory matching [13,27]. Among these efficient models, OSEDiff [35] regularizes the super-resolution outputs towards the natural image prior embedded within the pretrained diffusion models, facilitated by Variational Score Distillation (VSD) [33].
However, many existing one-step Real-ISR methods focus on applying and enhancing distillation techniques developed for general-purpose image generation. This leads them to overlook a more fundamental aspect: choosing a target manifold that aligns with the characteristics of the Real-ISR task. Specifically, these methods regularize output towards a manifold conditioned on text prompts. This approach creates a conceptual mismatch, as the Real-ISR task requires generating an HQ image that is faithful to the LQ input, not just plausible based on a text description. Moreover, as visualized in Figure 2 (Text cond.), the textconditioned teacher models often reconstruct images with saturated color and blunt boundaries. This indicates that the generative prior is not only misaligned but also practically flawed for the precise task of image restoration. An intuitive solution to resolve this mismatch is conditioning the target manifold on the LQ images. However, we prove that conditioning on the information-dense signal causes VSD [33] to become numerically unstable and degenerate towards SDS [24], thereby harming the distillation performance.
To address the dilemma between conceptual alignment and distillation stability, we propose Image-Conditioned Manifold (ICM) regularization. Our method conditions the target manifold on core structural information, which we compose from a low-resolution colormap and Canny edges. This combination is specifically designed to resolve the aforementioned practical failure of the text-conditioned prior; the colormap provides global guidance to prevent color shifts, while Canny edges enforce sharp structural details. We implement this conditioing using a pretrained T2I-Adapter [23], and this clearly mines more stable and accurate prior from diffusion models as shown in Figure 2 (bottom row).
ICM regularization offers two key advantages. Conceptually, ICM provides a regularization manifold that is fundamentally better aligned with the objectives of Real-ISR. Practically, the structural conditioning improves score estimation accuracy, especially at large diffusion timesteps. Consequently, this synergy of conceptual alignment and practical stability allows ICM regularization to yield superior one-step diffusion models for Real-ISR.
Our key contributions are summarize
This content is AI-processed based on ArXiv data.