NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Reading time: 5 minute
...
📝 Original Info
Title: NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
ArXiv ID: 2512.05106
Date: 2025-12-04
Authors: Yu Zeng, Charles Ochoa, Mingyuan Zhou, Vishal M. Patel, Vitor Guizilini, Rowan McAllister
📝 Abstract
Standard diffusion corrupts data using Gaussian noise whose Fourier coefficients have random magnitudes and random phases. While effective for unconditional or text-to-image generation, corrupting phase components destroys spatial structure, making it ill-suited for tasks requiring geometric consistency, such as re-rendering, simulation enhancement, and image-to-image translation. We introduce Phase-Preserving Diffusion φ-PD, a model-agnostic reformulation of the diffusion process that preserves input phase while randomizing magnitude, enabling structure-aligned generation without architectural changes or additional parameters. We further propose Frequency-Selective Structured (FSS) noise, which provides continuous control over structural rigidity via a single frequency-cutoff parameter. φ-PD adds no inference-time cost and is compatible with any diffusion model for images or videos. Across photorealistic and stylized re-rendering, as well as sim-to-real enhancement for driving planners, φ-PD produces controllable, spatially aligned results. When applied to the CARLA simulator, φ-PD improves CARLA-to-Waymo planner performance by 50\%. The method is complementary to existing conditioning approaches and broadly applicable to image-to-image and video-to-video generation. Videos, additional examples, and code are available on our \href{https://yuzeng-at-tri.github.io/ppd-page/}{project page}.
💡 Deep Analysis
📄 Full Content
NeuralRemaster: Phase-Preserving Diffusion for Structure-Aligned Generation
Yu Zeng1 Charles Ochoa1 Mingyuan Zhou2 Vishal M. Patel3 Vitor Guizilini1 Rowan McAllister1
1Toyota Research Institute
2University of Texas, Austin
3Johns Hopkins University
Input e
FLUX-Kontext
QWen-Edit
Ours
Input
Cosmos-Transfer 2.5
Ours
Ours
Figure 1. We present Phase-Preserving Diffusion (ϕ-PD), a model-agnostic reformulation of the diffusion process that preserves an image’s
phase while randomizing its magnitude, enabling structure-aligned generation with no architectural changes or additional parameters.
Abstract
Standard diffusion corrupts data using Gaussian noise
whose Fourier coefficients have random magnitudes and
random phases. While effective for unconditional or text-to-
image generation, corrupting phase components destroys
spatial structure, making it ill-suited for tasks requiring ge-
ometric consistency, such as re-rendering, simulation en-
hancement, and image-to-image translation. We introduce
Phase-Preserving Diffusion (ϕ-PD), a model-agnostic re-
formulation of the diffusion process that preserves input
phase while randomizing magnitude, enabling structure-
aligned generation without architectural changes or addi-
tional parameters. We further propose Frequency-Selective
Structured (FSS) noise, which provides continuous control
over structural rigidity via a single frequency-cutoff param-
eter. ϕ-PD adds no inference-time cost and is compatible
with any diffusion model for images or videos. Across pho-
torealistic and stylized re-rendering, as well as sim-to-real
enhancement for driving planners, ϕ-PD produces control-
lable, spatially aligned results. When applied to the CARLA
simulator, ϕ-PD improves CARLA-to-Waymo planner per-
formance by 50%. The method is complementary to existing
conditioning approaches and broadly applicable to image-
to-image and video-to-video generation. Videos, additional
examples, and code are available on our project page.
arXiv:2512.05106v2 [cs.CV] 7 Dec 2025
1. Introduction
Recent advances in diffusion models have revolution-
ized image generation, achieving high-fidelity results for
unconditional or text-conditioned synthesis. Yet many prac-
tical applications do not require generating a scene from
scratch. Instead, they operate within an image-to-image set-
ting where the spatial layout, such as object boundaries, ge-
ometry and scene structures, should remain fixed while the
appearance is modified. Examples include neural rendering,
stylization, and sim-to-real transfer for autonomous driving
or robotics simulation. We refer to this broad class of prob-
lems as structure-aligned generation.
Although these tasks are conceptually easier than gener-
ating an image from scratch, existing solutions are unnec-
essarily complex. Methods such as ControlNet [42], T2I-
Adapter [21], and related variants attach auxiliary encoders,
or adapter branches to inject structural input into the model.
While effective, this introduces additional parameters and
computational cost, paradoxically making structure-aligned
generation harder than it should be.
We argue that this inefficiency stems not from the net-
work architecture, but from the diffusion process itself. The
forward diffusion process injects Gaussian noise, which de-
stroys both the magnitude and phase components in the fre-
quency domain. Classical signal processing [23, 30, 37],
however, tells us that phase encodes structure while magni-
tude encodes texture. Destroying the phase means destroy-
ing the very spatial coherence that structure-aligned gener-
ation depends on, forcing the model to reconstruct structure
from scratch.
Motivated by this insight, we propose Phase-Preserving
Diffusion (ϕ-PD). Instead of corrupting data with Gaussian
noise, ϕ-PD constructs structured noise whose magnitude
matches that of Gaussian noise while preserving the input
phase. This naturally maintains spatial alignment through-
out sampling (Figure 1) with no architectural modifica-
tion, no extra parameters (Figure 2), and is compatible
with any DDPM or flow-matching model for images or
videos.
To provide controllable levels of structural rigidity, we
further introduce Frequency-Selective Structured (FSS)
noise, which interpolates between input phase and pure
Gaussian noise via a single cutoff parameter (Figure 4).
This allows us to control the trade-off between strict align-
ment and creative flexibility.
We evaluate ϕ-PD across photorealistic re-rendering,
stylized re-rendering and simulation enhancement for
embodied-AI agents. ϕ-PD consistently maintains geom-
etry alignment while producing high-quality visual outputs,
outperforming prior methods across both quantitative and
qualitative metrics. When used to enhance CARLA sim-
ulations, ϕ-PD improves planner transfer to the Waymo
Open Dataset by 49%, substantially narrowing the sim-to-
real gap. In summary, our contributions include:
• Phase-preserving diffusion process: A diffusion pro-
cess tha