RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation

RestoRect: Degraded Image Restoration via Latent Rectified Flow & Feature Distillation
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Current approaches for restoration of degraded images face a trade-off: high-performance models are slow for practical use, while fast models produce poor results. Knowledge distillation transfers teacher knowledge to students, but existing static feature matching methods cannot capture how modern transformer architectures dynamically generate features. We propose a novel Latent Rectified Flow Feature Distillation method for restoring degraded images called \textbf{‘RestoRect’}. We apply rectified flow to reformulate feature distillation as a generative process where students learn to synthesize teacher-quality features through learnable trajectories in latent space. Our framework combines Retinex decomposition with learnable anisotropic diffusion constraints, and trigonometric color space polarization. We introduce a Feature Layer Extraction loss for robust knowledge transfer between different network architectures through cross-normalized transformer feature alignment with percentile-based outlier detection. RestoRect achieves better training stability, and faster convergence and inference while preserving restoration quality, demonstrating superior results across 15 image restoration datasets, covering 4 tasks, on 10 metrics against baselines.


💡 Research Summary

The paper introduces RestoRect, a novel knowledge‑distillation framework for degraded‑image restoration that treats the transfer of teacher knowledge as a generative process in latent space. Traditional distillation methods match static intermediate features, which fails to capture the dynamic feature generation inherent in modern transformer‑based restoration networks. To overcome this, the authors adopt a Latent Rectified Flow (LRF) approach: they train a learnable velocity field that linearly interpolates between random noise and the teacher’s high‑quality feature representations. By applying this velocity field, a lightweight student network can synthesize teacher‑level features in only a few flow steps, dramatically reducing the computational burden compared with conventional diffusion samplers such as DDIM or DDPM.

A central contribution is the Feature Layer Extraction (FLEX) loss. FLEX first normalizes both teacher and student transformer features using the student’s own mean and variance, then performs cross‑normalization to align multi‑scale representations despite differing distributions. In addition, a percentile‑based outlier detector down‑weights noisy activations, preventing the loss from being dominated by extreme values. This combination enables robust alignment even when the teacher and student have different architectures.

The overall architecture is built around Retinex theory. Input images are decomposed into reflectance (R) and illumination (L) components using two dedicated decomposition networks. Separate ResNet encoders extract priors from R and L, while a third encoder processes the raw image. These priors are injected as skip connections into a U‑Net‑style transformer that employs Spatial‑Channel Layer Normalization (SCLN) and Query‑Key (Q‑K) normalization. SCLN computes statistics over the full spatial‑channel tensor, preserving global image statistics and local patterns simultaneously; the authors show that its runtime overhead is below 0.5 % even in low‑precision (FP16/BF16) settings.

Two additional physics‑inspired modules further improve quality. First, a learnable anisotropic diffusion term (c(|∇I|)=exp(−|∇I|²/s²)) enforces edge‑preserving smoothness, with a sensitivity parameter s learned under a constrained range. Second, the authors introduce a trigonometric HVI (Horizontal‑Vertical‑Intensity) color space that eliminates the red‑hue discontinuity of conventional HSV/HSL by mapping hue to continuous cosine/sine coordinates and applying an adaptive intensity collapse factor. A dedicated color loss in HVI space (L_col) encourages accurate chromatic reconstruction.

Training proceeds in two stages. Stage 1 pre‑trains the teacher network using a composite loss L_teach = L_rec + λ_tex·L_tex + λ_col·L_col, where L_rec is a pixel‑wise reconstruction term, L_tex enforces diffusion‑based texture consistency, and L_col penalizes HVI color deviations. Stage 2 distills the teacher into the student. Phase 1 freezes the restoration network and trains only the velocity predictors to reproduce teacher features via LRF. Phase 2 unfreezes the student restoration network and jointly optimizes the velocity predictors and the FLEX loss, allowing the student to generate teacher‑quality features while performing the actual restoration.

Extensive experiments cover 15 publicly available datasets spanning low‑light, underwater, backlit, and fundus imaging, representing four distinct restoration tasks. The authors evaluate ten metrics (PSNR, SSIM, LPIPS, FID, etc.) and report that RestoRect consistently outperforms state‑of‑the‑art diffusion‑based and transformer‑based baselines, achieving an average PSNR gain of over 1.2 dB and a 3× speedup in inference. Notably, the student model requires 5–7× fewer sampling steps than a DDIM baseline to reach comparable or better FID scores, demonstrating the efficiency of the rectified‑flow formulation. Ablation studies confirm that removing SCLN, FLEX, anisotropic diffusion, or HVI color space each degrades performance, underscoring the importance of every component.

Critical analysis reveals several strengths and potential weaknesses. The integration of LRF with FLEX provides a principled way to align dynamic transformer features, a problem that has been largely ignored in prior distillation work. The physics‑based priors (Retinex, diffusion, HVI) are well‑motivated and empirically beneficial. However, the overall pipeline is complex, involving multiple specialized modules, which may hinder reproducibility and deployment in resource‑constrained environments. Some hyper‑parameters (e.g., the diffusion sensitivity s and HVI intensity factor k) are only described by range, without detailed initialization or scheduling strategies, making exact replication challenging. Moreover, the linear nature of rectified flow could limit its ability to model highly non‑linear feature transformations; future work might explore multi‑stage or non‑linear flow trajectories. Finally, while the authors demonstrate cross‑task robustness, the generalization to completely different domains (e.g., medical modalities beyond fundus) remains to be verified.

In summary, RestoRect presents a compelling combination of latent‑space generative distillation, robust cross‑normalization, and physically grounded priors to achieve fast, high‑quality image restoration. It pushes the frontier of transformer‑based knowledge distillation by explicitly modeling feature generation dynamics, and its empirical gains across diverse datasets suggest broad applicability. Further research could simplify the architecture, explore more expressive flow models, and test the framework on additional real‑world restoration challenges.


Comments & Academic Discussion

Loading comments...

Leave a Comment