NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation

Reading time: 5 minute
...

📝 Original Info

  • Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
  • ArXiv ID: 2512.05106
  • Date: 2025-12-04
  • Authors: Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister

📝 Abstract

Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.

💡 Deep Analysis

Figure 1

📄 Full Content

NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation Yu Zeng1 Charles Ochoa1 Mingyuan Zhou2 Vishal M. Patel3 Vitor Guizilini1 Rowan McAllister1 1Toyota Research Institute 2University of Texas, Austin 3Johns Hopkins University Input e FLUX-Kontext QWen-Edit Ours Input Cosmos-Transfer 2.5 Ours Ours Figure 1. We present Phase-Preserving Diffusion (ϕ-PD), a model-agnostic reformulation of the diffusion process that preserves an image’s phase while randomizing its magnitude, enabling structure-aligned generation with no architectural changes or additional parameters. Abstract Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to- image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring ge- ometric consistency, such as re-rendering, simulation en- hancement, and image-to-image translation. We introduce Phase-Preserving Diffusion (ϕ-PD), a model-agnostic re- formulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure- aligned generation without architectural changes or addi- tional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff param- eter. ϕ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across pho- torealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, ϕ-PD produces control- lable, spatially aligned results. When applied to the CARLA simulator, ϕ-PD improves CARLA-to-Waymo planner per- formance by 50%. The method is complementary to existing conditioning approaches and broadly applicable to image- to-image and video-to-video generation. Videos, additional examples, and code are available on our project page. arXiv:2512.05106v2 [cs.CV] 7 Dec 2025 1. Introduction Recent advances in diffusion models have revolution- ized image generation, achieving high-fidelity results for unconditional or text-conditioned synthesis. Yet many prac- tical applications do not require generating a scene from scratch. Instead, they operate within an image-to-image set- ting where the spatial layout, such as object boundaries, ge- ometry and scene structures, should remain fixed while the appearance is modified. Examples include neural rendering, stylization, and sim-to-real transfer for autonomous driving or robotics simulation. We refer to this broad class of prob- lems as structure-aligned generation. Although these tasks are conceptually easier than gener- ating an image from scratch, existing solutions are unnec- essarily complex. Methods such as ControlNet [42], T2I- Adapter [21], and related variants attach auxiliary encoders, or adapter branches to inject structural input into the model. While effective, this introduces additional parameters and computational cost, paradoxically making structure-aligned generation harder than it should be. We argue that this inefficiency stems not from the net- work architecture, but from the diffusion process itself. The forward diffusion process injects Gaussian noise, which de- stroys both the magnitude and phase components in the fre- quency domain. Classical signal processing [23, 30, 37], however, tells us that phase encodes structure while magni- tude encodes texture. Destroying the phase means destroy- ing the very spatial coherence that structure-aligned gener- ation depends on, forcing the model to reconstruct structure from scratch. Motivated by this insight, we propose Phase-Preserving Diffusion (ϕ-PD). Instead of corrupting data with Gaussian noise, ϕ-PD constructs structured noise whose magnitude matches that of Gaussian noise while preserving the input phase. This naturally maintains spatial alignment through- out sampling (Figure 1) with no architectural modifica- tion, no extra parameters (Figure 2), and is compatible with any DDPM or flow-matching model for images or videos. To provide controllable levels of structural rigidity, we further introduce Frequency-Selective Structured (FSS) noise, which interpolates between input phase and pure Gaussian noise via a single cutoff parameter (Figure 4). This allows us to control the trade-off between strict align- ment and creative flexibility. We evaluate ϕ-PD across photorealistic re-rendering, stylized re-rendering and simulation enhancement for embodied-AI agents. ϕ-PD consistently maintains geom- etry alignment while producing high-quality visual outputs, outperforming prior methods across both quantitative and qualitative metrics. When used to enhance CARLA sim- ulations, ϕ-PD improves planner transfer to the Waymo Open Dataset by 49%, substantially narrowing the sim-to- real gap. In summary, our contributions include: • Phase-preserving diffusion process: A diffusion pro- cess tha

📸 Image Gallery

UnrealTrack-AbandonedDistrict-ContinuousColor-v0_agent_44_player_EP0_44_009.jpg UnrealTrack-AbandonedDistrict-ContinuousColor-v0_agent_44_player_EP0_44_009_caption_0.jpg UnrealTrack-AbandonedDistrict-ContinuousColor-v0_agent_44_player_EP0_44_009_kontext.jpg UnrealTrack-AbandonedDistrict-ContinuousColor-v0_agent_44_player_EP0_44_009_qwen.jpg UnrealTrack-Arctic-ContinuousColor-v0_agent_02_player_EP0_2_009.jpg UnrealTrack-Arctic-ContinuousColor-v0_agent_02_player_EP0_2_009_flux.jpg UnrealTrack-Arctic-ContinuousColor-v0_agent_02_player_EP0_2_009_kontext.jpg UnrealTrack-Arctic-ContinuousColor-v0_agent_02_player_EP0_2_009_qwen.jpg ablation_clip_metrics.png ablation_ssim_metrics.png castle_embroidery_2.jpg castle_embroidery_2_caption_4.jpg castle_embroidery_2_caption_4_controlnet.jpg castle_embroidery_2_caption_4_pnp.png castle_embroidery_2_caption_4_sdedit.png diagram_blank.png dog.jpg dog_radius10_rayleigh.jpg dog_radius1_rayleigh.jpg dog_radius20_rayleigh.jpg dog_radius30_rayleigh.jpg dog_radius6_rayleigh.jpg dog_square.png dog_structured_noise_cutoff_10.png dog_structured_noise_cutoff_40.png dog_structured_noise_cutoff_None.png frame0002.png frame0010.png frame0049.png husky_sketch_20.jpg husky_sketch_20_caption_0.jpg husky_sketch_20_caption_0_controlnet.jpg husky_sketch_20_caption_0_pnp.png husky_sketch_20_caption_0_sdedit.png waymo_zeroshot.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut