ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion

Reading time: 1 minute
...

📝 Original Info

  • Title: ScaleDiff: Higher-Resolution Image Synthesis via Efficient and Model-Agnostic Diffusion
  • ArXiv ID: 2510.25818
  • Date: 2025-10-29
  • Authors: ** 논문에 명시된 저자 정보가 제공되지 않았습니다. (정보 없음) **

📝 Abstract

Text-to-image diffusion models often exhibit degraded performance when generating images beyond their training resolution. Recent training-free methods can mitigate this limitation, but they often require substantial computation or are incompatible with recent Diffusion Transformer models. In this paper, we propose ScaleDiff, a model-agnostic and highly efficient framework for extending the resolution of pretrained diffusion models without any additional training. A core component of our framework is Neighborhood Patch Attention (NPA), an efficient mechanism that reduces computational redundancy in the self-attention layer with non-overlapping patches. We integrate NPA into an SDEdit pipeline and introduce Latent Frequency Mixing (LFM) to better generate fine details. Furthermore, we apply Structure Guidance to enhance global structure during the denoising process. Experimental results demonstrate that ScaleDiff achieves state-of-the-art performance among training-free methods in terms of both image quality and inference speed on both U-Net and Diffusion Transformer architectures.

💡 Deep Analysis

📄 Full Content

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut