스무딩 보정 트랜스포머 기반 고해상도 복원 모델
📝 Abstract
Deep unfolding networks (DUNs) combine the interpretability of model-based methods with the learning ability of deep networks, yet remain limited for blind image restoration (BIR). Existing DUNs suffer from: (1) \textbf{Degradation-specific dependency}, as their optimization frameworks are tied to a known degradation model, making them unsuitable for BIR tasks; and (2) \textbf{Over-smoothing bias}, resulting from the direct feeding of gradient descent outputs, dominated by low-frequency content, into the proximal term, suppressing fine textures. To overcome these issues, we propose UnfoldLDM to integrate DUNs with latent diffusion model (LDM) for BIR. In each stage, UnfoldLDM employs a multi-granularity degradation-aware (MGDA) module as the gradient descent step. MGDA models BIR as an unknown degradation estimation problem and estimates both the holistic degradation matrix and its decomposed forms, enabling robust degradation removal. For the proximal step, we design a degradation-resistant LDM (DR-LDM) to extract compact degradation-invariant priors from the MGDA output. Guided by this prior, an over-smoothing correction transformer (OCFormer) explicitly recovers high-frequency components and enhances texture details. This unique combination ensures the final result is degradation-free and visually rich. Experiments show that our UnfoldLDM achieves a leading place on various BIR tasks and benefits downstream tasks. Moreover, our design is compatible with existing DUN-based methods, serving as a plug-and-play framework. Code will be released.
💡 Analysis
Deep unfolding networks (DUNs) combine the interpretability of model-based methods with the learning ability of deep networks, yet remain limited for blind image restoration (BIR). Existing DUNs suffer from: (1) \textbf{Degradation-specific dependency}, as their optimization frameworks are tied to a known degradation model, making them unsuitable for BIR tasks; and (2) \textbf{Over-smoothing bias}, resulting from the direct feeding of gradient descent outputs, dominated by low-frequency content, into the proximal term, suppressing fine textures. To overcome these issues, we propose UnfoldLDM to integrate DUNs with latent diffusion model (LDM) for BIR. In each stage, UnfoldLDM employs a multi-granularity degradation-aware (MGDA) module as the gradient descent step. MGDA models BIR as an unknown degradation estimation problem and estimates both the holistic degradation matrix and its decomposed forms, enabling robust degradation removal. For the proximal step, we design a degradation-resistant LDM (DR-LDM) to extract compact degradation-invariant priors from the MGDA output. Guided by this prior, an over-smoothing correction transformer (OCFormer) explicitly recovers high-frequency components and enhances texture details. This unique combination ensures the final result is degradation-free and visually rich. Experiments show that our UnfoldLDM achieves a leading place on various BIR tasks and benefits downstream tasks. Moreover, our design is compatible with existing DUN-based methods, serving as a plug-and-play framework. Code will be released.
📄 Content
- UnfoldLDM: Deep Unfolding-based Blind Image Restoration with
- Latent Diffusion Priors
- Chunming He1,∗, Rihan Zhang1,∗, Zheng Chen2 , Bowen Yang3 ,
- Chengyu Fang4 , Yunlong Lin5 , Fengyang Xiao1,† , and Sina Farsiu1,†
- 1Duke University, 2Shanghai Jiao Tong University, 3Peking University,
- 4Tsinghua University, 5Xiamen University,
- ∗Equal Contribution, † Corresponding Author, Contact: chunming.he@duke.edu.
- Stage 1
- GDM1
- PM1
- Stage 1
- GDM1
- PM1
- . . .
- Stage K
- GDMK
- PMK
- Stage K
- GDMK
- PMK
- (i) Existing proximal gradient-based deep unfolding networks
- Failure to preserve fine-grained textures and high-frequency details
- (ii) Our UnfoldLDM with latent diffusion priors
- Stage 1
- GDM1
- PM1
- LDM1
- . . .
- Stage K
- GDMK
- PMK
- LDMK
- Achieving faithful detail recovery and outstanding visual fidelity
- Degraded observation
- Restoration result K: Stage number y y y K x K x K x LQ DGUNet DeepSN-Net Ours GT Figure 1. Comparison between existing proximal gradient-based DUN-based methods (e.g., DGUNet [44] and DeepSN-Net [7]) and our UnfoldLDM. UnfoldLDM better resists unknown degradation and eliminates the over-smoothing bias that existing DUNs suffer. Abstract Deep unfolding networks (DUNs) combine the inter- pretability of model-based methods with the learning abil- ity of deep networks, yet remain limited for blind im- age restoration (BIR). Existing DUNs suffer from: (1) Degradation-specific dependency, as their optimization frameworks are tied to a known degradation model, mak- ing them unsuitable for BIR tasks; and (2) Over-smoothing bias, resulting from the direct feeding of gradient descent outputs, dominated by low-frequency content, into the prox- imal term, suppressing fine textures. To overcome these is- sues, we propose UnfoldLDM to integrate DUNs with la- tent diffusion model (LDM) for BIR. In each stage, Un- foldLDM employs a multi-granularity degradation-aware (MGDA) module as the gradient descent step. MGDA mod- els BIR as an unknown degradation estimation problem and estimates both the holistic degradation matrix and its de- composed forms, enabling robust degradation removal. For the proximal step, we design a degradation-resistant LDM (DR-LDM) to extract compact degradation-invariant priors from the MGDA output. Guided by this prior, an over- smoothing correction transformer (OCFormer) explicitly recovers high-frequency components and enhances texture details. This unique combination ensures the final result is degradation-free and visually rich. Experiments show that our UnfoldLDM achieves a leading place on various BIR tasks and benefits downstream tasks. Moreover, our design is compatible with existing DUN-based methods, serving as a plug-and-play framework. Code will be released.
- Introduction Blind image restoration (BIR) aims to recover high-quality images from unknown degradations [20, 31, 32, 56, 70– 73]. It plays a vital role in numerous applications, includ- ing photography [19, 39, 74], medical imaging [37, 38, 47], and downstream vision tasks [6, 22, 78]. Traditional meth- ods based on handcrafted priors are interpretable but strug- gle to generalize to real-world degradations [27], whereas learning-based methods achieve superior performance but often lack interpretability [4, 19]. Deep unfolding networks (DUNs) have emerged as a promising paradigm to bridge this gap [53]. By unfolding arXiv:2511.18152v1 [cs.CV] 22 Nov 2025 the iterative optimization into a multi-stage network, DUNs inherit the model-based interpretability while leveraging the learning-based representational power. Among them, proximal-gradient (PG)-based DUNs are widely adopted for their flexibility and effectiveness [23, 24]. As shown in Fig. 1, a typical PG-based DUN alternates between a gra- dient descent step derived from the observation model and a proximal operator parameterized by a learnable prior, en- forcing data fidelity while enhancing perceptual quality. However, existing PG-based DUNs face two major chal- lenges: (1) Degradation-specific dependency. Most are designed for a particular degradation type (e.g., deblurring or low-light enhancement) and rely on known physical pri- ors, making them unsuitable for complex or mixed degra- dations. (2) Over-smoothing bias. Due to the dominance of low-frequency information in degraded images, gradient updates primarily recover coarse structures while neglect- ing high-frequency details. As the learnable step size con- verges to small updates, high-frequency textures are further suppressed or mistaken for noise, bringing over-smoothed outputs with reduced structural fidelity (see Fig. 1). To overcome these, we propose UnfoldLDM, which first integrates DUNs with the latent diffusion model (LDM) for BIR. Each stage in UnfoldLDM has two components: (i) a multi-granularity degradation-aware (MGDA) module serv- ing as the gradient descent term, and (ii) a proximal design comprising a Degradation-Resistant LDM (DR-LDM) and an over-smoothing correction transformer (OCFormer). In MGDA, we formulate BIR
This content is AI-processed based on ArXiv data.