FlowBypass: Rectified Flow Trajectory Bypass for Training-Free Image Editing
Training-free image editing has attracted increasing attention for its efficiency and independence from training data. However, existing approaches predominantly rely on inversion-reconstruction trajectories, which impose an inherent trade-off: longer trajectories accumulate errors and compromise fidelity, while shorter ones fail to ensure sufficient alignment with the edit prompt. Previous attempts to address this issue typically employ backbone-specific feature manipulations, limiting general applicability. To address these challenges, we propose FlowBypass, a novel and analytical framework grounded in Rectified Flow that constructs a bypass directly connecting inversion and reconstruction trajectories, thereby mitigating error accumulation without relying on feature manipulations. We provide a formal derivation of two trajectories, from which we obtain an approximate bypass formulation and its numerical solution, enabling seamless trajectory transitions. Extensive experiments demonstrate that FlowBypass consistently outperforms state-of-the-art image editing methods, achieving stronger prompt alignment while preserving high-fidelity details in irrelevant regions.
💡 Research Summary
FlowBypass tackles a fundamental limitation of training‑free image editing that relies on the classic inversion‑reconstruction pipeline. In existing methods, an input image is first “inverted” into a noise latent using a diffusion model (often DDIM‑based), and then the same noise is “reconstructed” under the guidance of an edit prompt. Because the inversion step is solved numerically (Euler discretization of an ODE) and the velocity field is approximated using the previous state, errors accumulate along the trajectory. By the time the process reaches the final noise ε, the latent is already corrupted, leading to loss of fine details in regions that should remain untouched during reconstruction.
The authors propose a principled “bypass” mechanism that avoids the error‑prone terminal noise altogether. They start from the continuous formulation of Rectified Flow (RF), where image generation is expressed as an ODE:
dx/dt = v̄(x, t, C).
For a given pre‑trained RF network vθ, two trajectories are defined: the inversion trajectory xₜ (conditioned on an “inversion” prompt pair (Cⁱⁿᵥ⁺, Cⁱⁿᵥ⁻)) and the reconstruction trajectory yₜ (conditioned on a “reconstruction” prompt pair (Cʳᵉᶜ⁺, Cʳᵉᶜ⁻)). The bypass vector is defined as the offset between them: bₜ = yₜ − xₜ. Differentiating yields
dbₜ/dt = vθ(xₜ + bₜ, t, Cʳᵉᶜ) − vθ(xₜ, t, Cⁱⁿᵥ).
Assuming bₜ is small, a first‑order Taylor expansion gives a linear ODE:
dbₜ/dt ≈ Qₜ + Pₜ bₜ,
where Qₜ = vθ(xₜ, t, Cʳᵉᶜ) − vθ(xₜ, t, Cⁱⁿᵥ) and Pₜ = ∂vθ/∂x (xₜ, t, Cʳᵉᶜ). This first‑order homogeneous linear differential equation admits a closed‑form solution
bₜ = ∫ₜ¹ Q_u exp(∫_u¹ P_s ds) du.
Thus the bypass can be computed analytically without ever traversing the full inversion‑reconstruction path.
In practice the continuous time interval
Comments & Academic Discussion
Loading comments...
Leave a Comment