Wasserstein Propagation for Reverse Diffusion under Weak Log-Concavity: Exploiting Metric Mismatch via One-Switch Routing

W asserstein Pr opagation f or Re verse Diffusion under W eak Log-Conca vity: Exploiting Metric Mismatch via One-Switch Routing Zicheng L yu School of Data Science, Fudan Univ ersity lyuzicheng@gmail.com Zengfeng Huang ∗ School of Data Science, Fudan Univ ersity Shanghai Innov ation Institute huangzf@fudan.edu.cn Abstract Existing analyses of rev erse dif fusion typically propagate sampling error in the Euclidean geometry underlying W 2 throughout the rev erse trajectory . Under weak log-concavity , this can be suboptimal: Gaussian smoothing may create contrac- tion ﬁrst at large separations, while short-scale Euclidean dissipativity is still absent. W e show that exploiting this metric mismatch can yield strictly sharper end-to-end W 2 bounds than direct full-horizon Euclidean propagation on mismatch windows. Our analysis deri ves an explicit radial lower proﬁle for the learned rev erse drift, whose far -ﬁeld and near-ﬁeld limits quantify the contraction reserve and the residual Euclidean load, respectiv ely . This proﬁle determines admissible switch times and leads to a one-switch routing theorem: reﬂection coupling damps initialization mismatch, pre-switch score forcing, and pre-switch discretization in an adapted concav e transport metric; a single p -moment interpolation con verts the damped switch-time discrepancy back to W 2 ; and synchronous coupling propa- gates the remaining error over the late Euclidean windo w . Under L 2 score-error control, a one-sided monotonicity condition on the score error , and standard well- posedness and coupling assumptions, we obtain explicit non-asymptotic end-to-end W 2 guarantees, a scalar switch-selection objecti ve, and a con version exponent θ p = ( p − 2) / (2( p − 1)) that cannot be improv ed uniformly within the afﬁne-tail concav e class under the same p -moment switch assumption. For a ﬁxed switch, the routed and direct Euclidean bounds share the same late-windo w term, so any strict improv ement is entirely an early-window ef fect. 1 Introduction Score-based diffusion models are now a central framew ork for generative modeling, building on early dif fusion and score-based formulations [Sohl-Dickstein et al., 2015, Song and Ermon, 2019, Ho et al., 2020, Song et al., 2020]. At a conceptual lev el, sampling is a re verse-time transport problem. The smoothed intermediate laws determine its geometry , score approximation perturbs the drift, and numerical samplers discretize the dynamics. An end-to-end theory of diffusion sampling should therefore identify the geometry that actually governs error propagation along re verse time and turn it into sharp non-asymptotic bounds. A gro wing theoretical literature pro vides con v ergence guarantees for dif fusion samplers in se veral discrepancy measures, including KL di ver gence, total v ariation, and W asserstein distances [Conforti et al., 2025, Lee et al., 2023, Gao et al., 2025, Gao and Zhu, 2024, Gentiloni-Silv eri and Ocello, 2025]. W e focus on the 2 -W asserstein distance W 2 , in which initialization mismatch, score approximation, ∗ Corresponding author . Preprint. and discretization admit a common coupling-based treatment. In globally regular regimes, discrepanc y can often be propagated directly in W 2 along the entire re verse trajectory , without changing metric across phases [Gao et al., 2025, Gao and Zhu, 2024]. Once global regularity is lost, howe ver , this Euclidean route need not remain sharp. The natural question is: what transport geometry actually governs the pr opagation of sampling err or along the r everse trajectory , and how should end-to-end W 2 bounds r eﬂect that geometry r ather than a uniform Euclidean worst case? W e study this question under weak log-conca vity , where the target la w remains globally conﬁning but may be locally noncon ve x, in the spirit of recent Ornstein–Uhlenbeck regularization analyses beyond global log-concavity [Gentiloni-Silv eri and Ocello, 2025]. In this regime, Gaussian smoothing need not produce Euclidean contractivity uniformly across scales. Instead, the radial lo wer proﬁle we deriv e may become contracti ve ﬁrst at lar ge separations while remaining obstructed near the origin. A direct full-horizon Euclidean propagation is then bottlenecked by the short-scale obstruction e ven though contraction is already av ailable farther out. The effecti ve transport geometry is therefore scale-dependent, and treating it as uniformly Euclidean can make the resulting end-to-end W 2 bound genuinely conservati ve. On an early windo w , discrepanc y is propagated in a concav e transport metric adapted to the radial proﬁle. At an admissible switch point s 0 in the forward smoothing variable, the damped discrepancy is returned once to W 2 , and the remaining late windo w is propagated in Euclidean geometry . For a ﬁxed admissible switch, the routed and direct Euclidean bounds share the same late-windo w term, so any adv antage is entirely an early-window ef fect; on mismatch windows, the routed bound can be strictly sharper . Our contributions. Our contributions are fourfold. (i) W e deri ve an explicit radial lo wer proﬁle for the learned rev erse drift under weak log-conca vity . This proﬁle separates far -ﬁeld contraction from near-ﬁeld obstruction and yields a transparent admissibility criterion for switch points. (ii) W e de velop the early-windo w contraction mechanism behind the routed analysis. Reﬂection coupling reduces the separation process of two copies of the learned rev erse SDE to a one- dimensional radial diffusion, from which we construct an adapted conca ve switch metric W φ s 0 that propagates initialization mismatch, pre-switch score forcing, and early discretization up to the switch. (iii) W e pro ve an end-to-end one-switch W 2 routing theorem. The theorem localizes metric mis- match at a single interface, returns the pre-switch discrepancy from W φ s 0 to W 2 only once under a p -moment budget, induces a scalar switch-selection objecti ve over admissible grid- aligned switches, and yields the e xponent θ p = ( p − 2) / (2( p − 1)) , which cannot be impro ved uniformly within the afﬁne-tail conca ve class under the same switch-moment assumption. (iv) W e gi ve an e xact comparison with the standard direct full-horizon Euclidean route. For an y ﬁxed admissible switch, the routed and direct bounds share the same late-window term, so the comparison reduces entirely to the early window . This identiﬁes mismatch windows on which the routed bound can be strictly sharper . 1.1 T echnique ov erview The proof is organized around a single geometric fact. Be yond globally regular re gimes, reverse diffusion is controlled not by one uniform Euclidean dissipati vity constant but by a scale-dependent radial contracti vity proﬁle of the learned re verse drift. Under weak log-conca vity , Gaussian smoothing can make this proﬁle contractiv e ﬁrst at large separations ev en while the Euclidean one-sided monotonicity bound required for direct W 2 propagation remains non-contractive near the origin. The ﬁrst usable contraction therefore appears in a transport geometry dif ferent from the Euclidean geometry in which the ﬁnal error is measured. Let p s denote the smoothed data law at noise le vel s , and write s = T − t for forward time along the rev erse trajectory . In the score-based SDE formulation [Song et al., 2020], the ideal and learned rev erse dynamics are d e X t = b t ( e X t ) d t + g ( T − t ) d B t , d b Y t = b b t ( b Y t ) d t + g ( T − t ) d B t , (1) 2 where b t is the ideal reverse drift and b b t is the learned reverse drift. For a vector ﬁeld u , deﬁne its radial drift proﬁle by κ u ( r ) := inf ∥ x − y ∥ = r − ⟨ u ( x ) − u ( y ) , x − y ⟩ ∥ x − y ∥ 2 , r > 0 . Under Assumptions 1 and 3—the latter being a one-sided monotonicity condition on the score error rather than a global Lipschitz assumption—the radial proﬁle dictionary in Section 3 yields κ b b t ( r ) ≥ g 2 ( s )  α s − 1 r f M s ( r ) − ℓ ( s )  − f ( s ) , s = T − t, (2) with explicit coef ﬁcients determined by smoothing and score-error control. W e denote the right-hand side of (2) by κ s ( r ) . A single lower en velope controls both geometric re gimes used later: lim r ↓ 0 κ s ( r ) = − b ( s ) , lim r →∞ κ s ( r ) = m ( s ) . Thus b ( s ) and m ( s ) are not separate bookkeeping quantities; they are the near -ﬁeld and far -ﬁeld ends of the same radial lo wer proﬁle. The near-ﬁeld endpoint records the residual Euclidean obstruction, while the far -ﬁeld endpoint records the contraction reserve created by Gaussian smoothing. From synchr onous to r eﬂection coupling. A direct W 2 propagation uses synchronous coupling. Because the two copies are dri ven by the same Bro wnian motion, the noise cancels in the distance dynamics, so a Euclidean Grönwall closure requires a global one-sided bound ⟨ x − y , b b t ( x ) − b b t ( y ) ⟩ ≤ β ( t ) ∥ x − y ∥ 2 , x, y ∈ R d . (3) In globally regular regimes, such a closure is typically obtained from a global one-sided Lipschitz control of the learned score or rev erse drift [Gao et al., 2025, Gao and Zhu, 2024]. The restriction is that (3) must hold uniformly over all radii. Synchronous coupling is therefore governed by the worst part of the radial proﬁle r 7→ κ b b t ( r ) , which in the early weakly log-conca ve re gime is precisely the near ﬁeld. The direct Euclidean route thus fails not because large-scale contraction is absent, but because it cannot exploit that contraction before e very radius is already Euclideanly dissipati ve. T o use the full scale-dependent proﬁle, we instead employ reﬂection coupling [Eberle, 2016, Eberle et al., 2019]. Fix an admissible switch s 0 , and compress the early geometry into a switch-level lo wer en velope κ s 0 , negati ve near the origin and positiv e beyond a threshold R sw ( s 0 ) . If Y t and ¯ Y t are reﬂected copies of the learned reverse SDE and r t := ∥ Y t − ¯ Y t ∥ , then, up to coupling time, the multidimensional dynamics reduce schematically to d r t ≲ − κ s 0 ( r t ) r t d t + 2 g ( T − t ) d W t . (4) Unlike synchronous coupling, the radial dynamics retains a Bro wnian term and therefore sees the entire switch-lev el proﬁle rather than only its worst short-scale Euclidean v alue. This is exactly why a concav e radial cost becomes ef fecti ve: applying Itô’ s formula to φ ( r t ) produces the second-order term 2 g ( T − t ) 2 φ ′′ ( r t ) , which can offset the near -ﬁeld load. The switch metric as an adapted radial L yapunov gauge. The switch metric φ s 0 is therefore not an auxiliary softer cost chosen after the fact. It is the radial L yapunov gauge naturally associated with the reﬂected generator . Writing g ( s 0 ) := inf u ∈ [ s 0 ,T ] g ( u ) , the rele vant radial operator has the schematic form L s 0 φ ( r ) = 2 g ( s 0 ) 2 φ ′′ ( r ) − κ s 0 ( r ) r φ ′ ( r ) . W e construct φ s 0 as an explicit conca ve subsolution of 2 g ( s 0 ) 2 φ ′′ ( r ) − κ s 0 ( r ) r φ ′ ( r ) ≤ − c ( s 0 ) φ ( r ) . (5) This is the conceptual core of the construction: the metric is adapted to the reﬂected radial generator before it is used as a transport cost. The resulting switch metric is φ s 0 ( r ) =    Z r 0 e − λ ( s 0 ) u 2 d u, 0 ≤ r ≤ R sw ( s 0 ) , φ s 0 ( R sw ( s 0 )) + a ( s 0 )  r − R sw ( s 0 )  , r ≥ R sw ( s 0 ) , (6) 3 for switch-dependent parameters λ ( s 0 ) and a ( s 0 ) . Its form mirrors the two ends of the proﬁle. On 0 ≤ r ≤ R sw ( s 0 ) , where κ s 0 may still be negati ve, the curvature term 2 g ( s 0 ) 2 φ ′′ s 0 supplied by the reﬂected radial noise compensates the near-ﬁeld obstruction. On r ≥ R sw ( s 0 ) , where the proﬁle carries positive radial margin, no curv ature is needed; the metric is continued afﬁnely to preserve sensitivity to lar ge separations and hence retain the far-ﬁeld reserve. Applying Itô’ s formula to φ s 0 ( r t ) , together with (5), yields the early semigroup contraction W φ s 0  µP u,v , ν P u,v  ≤ e − c ( s 0 )( v − u ) W φ s 0 ( µ, ν ) , 0 ≤ u ≤ v ≤ t s := T − s 0 , (7) where P u,v denotes the learned re verse semigroup. Thus the far -ﬁeld reserve is con verted into genuine early-window transport contraction, b ut in an adapted concave metric rather than directly in W 2 . This contraction compresses the entire early window into a single damped switch-time quantity through the same one-step defect interface later used in the Euclidean phase. Since φ s 0 ( r ) ≤ r , one has W φ s 0 ( µ, ν ) ≤ W 1 ( µ, ν ) ≤ W 2 ( µ, ν ) , so each one-step W 2 defect also controls the corresponding W φ s 0 defect. The one-time return to W 2 and what it costs. The switch should not be read as the time at which the geometry suddenly becomes Euclidean. Its role is dif ferent: it is the point at which an already damped discrepancy is returned once to the Euclidean metric in which the ﬁnal error is measured. This return is possible because the switch metric has a global linear lower bound, φ s 0 ( r ) ≥ a ( s 0 ) r , r ≥ 0 , and hence W 1 ( µ, ν ) ≤ a ( s 0 ) − 1 W φ s 0 ( µ, ν ) . Combined with a p -moment budget at switch time, this yields the interpolation W 2 2  La w( Z t s ) , p s 0  ≲ ρ a ( s 0 ) ∆ sw φ ( s 0 ) + M sw p ( s 0 ) ρ p − 2 , and optimizing in ρ giv es W 2  La w( Z t s ) , p s 0  ≲ C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p , θ p = p − 2 2( p − 1) . This is the only place where the exponent θ p enters. It measures the price of recov ering quadratic Euclidean transport from a switch metric that is deliberately softer than W 2 . The con version is also where the central tradeoff becomes explicit. A more conca ve early metric is better at absorbing the near-ﬁeld obstruction under the reﬂected radial dynamics, but within the afﬁne-tail conca ve class this comes with weaker quadratic sensitivity at lar ge distances and hence a more expensi ve return to W 2 . The sharpness result shows that this loss is structural rather than a proof artifact: the same non-Euclidean softness that makes early damping possible also pre vents a free recov ery of Euclidean quadratic transport. Late-window Euclidean pr opagation and optimal switching. After the switch, the remaining interval is propagated in the tar get metric W 2 , for which synchronous coupling is again the natural tool. This yields W 2  La w( Z t N ) , p 0  ≲ ∆ late ( s 0 ) + e Γ( s 0 ) C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p . Admissibility and optimization therefore play different roles. Admissibility is a geometric existence statement: it certiﬁes that a genuinely non-Euclidean contractiv e regime is a vailable on the early win- dow . Optimal switching is a routing statement: it decides when the already compressed discrepancy should be re-exposed to W 2 , balancing the remaining early damping, the one-time con version cost, and the residual late Euclidean load. Organization. Section 2 positions the paper relati ve to e xisting dif fusion-model con vergence theory . Section 3 introduces the reverse-dif fusion setting, the structural assumptions, and the radial geometry of the learned drift. Section 4 states the main theorem and explains its meaning. Section 5 returns to the geometric message and the open directions. 4 2 Related W ork Our work lies at the intersection of re verse-time dif fusion representations [Anderson, 1982, Hauss- mann and Pardoux, 1986], W asserstein conv ergence theory for score-based generati ve models, analyses of rev erse-time geometry beyond global log-conca vity , and reﬂection-coupling contraction methods for noncon ve x dif fusions. W 2 theory under regular and changing geometry . A broad theoretical literature studies score- based diffusion sampling under minimal assumptions, polynomial-complexity re gimes, manifold- supported data, and KL/path-space metrics [Azangulov et al., 2024, Conforti et al., 2025, Chen et al., 2022, 2023a, Lee et al., 2023]. Within W asserstein control, strongly log-conca ve and globally regular settings admit direct Euclidean propagation arguments for re verse SDEs and PF-ODEs [Bruno et al., 2025, Gao et al., 2025, Gao and Zhu, 2024]. More recent W 2 analyses move beyond global log- concavity , including Gaussian-tail, semicon vex, weak-semicon vex, and weakly log-conca ve regimes [W ang and W ang, 2024, Conforti, 2024, Bruno and Sabanis, 2025, Gentiloni-Silveri and Ocello, 2025, Kremling et al., 2025]. The closest comparisons from our viewpoint emphasize dif ferent manifestations of time-v arying geometry . Gentiloni-Silveri and Ocello [2025] shows that weak log- concavity can produce alternating contracti ve and non-contractiv e phases along re verse time; Bruno and Sabanis [2025] studies semicon ve x settings with time-dependent Euclidean W 2 control; and Kremling et al. [2025] proves weak-log-conca ve guarantees for PF-ODEs under Euclidean/Lipschitz control. Our result is complementary to these works. The additional point we emphasize is not only when contractivity appears, b ut also in which metric it ﬁrst becomes exploitable: a reverse trajectory may already contract in a softer radial transport geometry e ven while direct Euclidean closure is still blocked by the near ﬁeld. This metric-mismatch principle is what leads to our one-switch route and is also what allo ws the routed bound to be sharper than direct full-horizon Euclidean propagation on mismatch windows. Reﬂection coupling and concav e transport metrics. Methodologically , we b uild on the reﬂection- coupling program for diffusions with nonconv ex geometry , from the classical multidimensional constructions of Lindvall and Rogers [1986], Chen and Li [1989] and the coupling overvie w of Kallenberg [1993] to the concav e-distance contraction framework of Eberle [2016], Eberle et al. [2017, 2019]. Because our pre-switch analysis compares learned and e xact re verse dynamics with different drifts, the sticky-coupling analysis of Eberle and Zimmer [2016] is also methodologically relev ant. W e use the Gaussian-smoothed radial proﬁle of the learned re verse drift to determine an early-window concav e transport metric, propagate initialization mismatch, score forcing, and discretization errors on that window , and then perform a single return to W 2 . Our distincti ve contribution is therefore the re verse-dif fusion interface and one-switch routing theorem, not reﬂection coupling in isolation. Sampler interfaces. A separate line analyzes deterministic, stochastic, and higher-order dif fusion samplers [Chen et al., 2023b, Li et al., 2024, Beyler and Bach, 2025, Y u and Y u, 2025, Pf arr et al., 2026], together with sharper complexity and stability guarantees [Benton et al., 2023, Chen et al., 2026, Strasman et al., 2026]. Our theorem is modular with respect to those discretization results. Any one-step or accumulated W 2 defect estimate enters only through the interface term ξ k , after which the routing theorem determines how much of that sampler error is damped on the early radial window and how much is exposed to the late Euclidean window . Thus our contribution is not a new sampler analysis per se, b ut a propagation theorem that can incorporate sharper sampler bounds without changing the geometric argument. 3 Rev erse-diffusion interface and time-dependent radial geometry Under weak log-conca vity , Gaussian smoothing can stabilize lar ge separations before short scales satisfy a Euclidean one-sided dissipativity bound. W e encode this scale dependence through a radial proﬁle of the learned re verse drift, whose tw o endpoints will later determine the residual Euclidean load and the far -ﬁeld contraction reserve. Notation. W e use the Euclidean inner product and norm throughout, write P p ( R d ) for laws with ﬁnite p -th moment, and denote by Π( µ, ν ) the set of couplings of probability measures µ, ν on R d . 5 W e write W 2 for the 2 -W asserstein distance and parametrize rev erse time by t := T − s . Thus larger s corresponds to earlier rev erse time, whereas smaller s corresponds to later reverse time. 3.1 Forward smoothing and re verse interface Let p 0 (d x ) ∝ e − V 0 ( x ) d x , where V 0 ∈ C 1 ( R d ) and R R d e − V 0 ( x ) d x < ∞ . Given Borel measurable f , g : [0 , T ] → R ≥ 0 with R T 0 f ( u ) d u < ∞ , R T 0 g 2 ( u ) d u < ∞ , and g ( s ) > 0 for every s > 0 , consider the forward dif fusion d X s = − f ( s ) X s d s + g ( s ) d B s , X 0 ∼ p 0 . (8) Writing a ( s ) := exp  − Z s 0 f ( u ) d u  , σ 2 ( s ) := Z s 0  a ( s ) a ( u ) g ( u )  2 d u, (9) its marginal la w is p s = ( S a ( s ) p 0 ) ∗ N (0 , σ 2 ( s ) I d ) , s > 0 , (10) where S a µ is the pushforward of µ under x 7→ ax . Thus forward time combines deterministic linear shrinkage with Gaussian con v olution. The corresponding re verse drifts are b t ( x ) = f ( T − t ) x + g 2 ( T − t ) ∇ log p T − t ( x ) , b b t ( x ) = f ( T − t ) x + g 2 ( T − t ) s θ ( x, T − t ) , (11) and the ideal and learned rev erse SDEs are Anderson [1982], Haussmann and Pardoux [1986] d e X t = b t ( e X t ) d t + g ( T − t ) d B t , e X 0 ∼ p T , (12) d b Y t = b b t ( b Y t ) d t + g ( T − t ) d B t , b Y 0 ∼ b p T . (13) W ith e s ( x ) := s θ ( x, s ) − ∇ log p s ( x ) , s ∈ (0 , T ] , the drift mismatch is exactly b b t − b t = g 2 ( T − t ) e T − t . (14) Whene ver the learned re verse SDE (13) is well posed on an interv al [ u, v ] ⊆ [0 , T ] , we write P u,v for its transition kernel. Background facts on time re versal, re gularity , and well posedness are collected in Appendix A. Deﬁnition 1 (Strong well posedness on an interval) . W e say that the learned rev erse SDE is strongly well posed on an interv al [ u, v ] ⊆ [0 , T ] if for e very F u -measurable initial condition ξ with La w( ξ ) ∈ P 1 ( R d ) , there exists a pathwise unique strong solution on [ u, v ] . 3.2 Structural assumptions Assumption 1 (W eak log-concavity Gentiloni-Silv eri and Ocello [2025]) . Let α, M > 0 , and deﬁne f M ( r ) := 2 √ M tanh  1 2 √ M r  . Assume that for ev ery r > 0 , inf ∥ x − y ∥ = r ⟨∇ V 0 ( x ) − ∇ V 0 ( y ) , x − y ⟩ ∥ x − y ∥ 2 ≥ α − 1 r f M ( r ) . Assumption 2 (Score forcing in L 2 ( p s ) ) . Assume there exists a measurable function ε : (0 , T ] → [0 , ∞ ) such that E X ∼ p s ∥ e s ( X ) ∥ 2 ≤ ε 2 ( s ) , s ∈ (0 , T ] , and s 7→ g 2 ( s ) ε ( s ) is integrable on (0 , T ] . Assumption 3 (One-sided monotonicity control of the score error) . Assume Assumption 2 holds, and that there exists a measurable function ℓ : (0 , T ] → R such that ⟨ e s ( x ) − e s ( y ) , x − y ⟩ ≤ ℓ ( s ) ∥ x − y ∥ 2 , x, y ∈ R d . Remark 1 (On the score-error assumptions) . Assumption 2 controls av eraged score forcing, whereas Assumption 3 controls the pairwise geometric component of the score error that enters distance ev olution through ⟨ e s ( x ) − e s ( y ) , x − y ⟩ . Thus Assumption 3 is a one-sided monotonicity condition rather than a full global regularity requirement. Global Lipschitz or symmetrized Jacobian bounds are stronger sufﬁcient conditions implying it, but are not intrinsic to the theorem. The two assumptions are complementary rather than interchangeable. Concrete sufﬁcient conditions, together with a comparison against global Lipschitz-type hypotheses, are gi ven in Appendix G. 6 3.3 A radial proﬁle dictionary and admissible switches What dri ves the proof is not a single Euclidean one-sided Lipschitz constant, b ut a family of radial lower en velopes r 7→ κ s ( r ) , one for each noise level s . The key point is that the same formula encodes both the near-ﬁeld Euclidean obstruction and the far -ﬁeld contraction reserve created by smoothing. Deﬁnition 2 (Drift proﬁle) . For a Borel v ector ﬁeld u : R d → R d , deﬁne its radial drift proﬁle by κ u ( r ) := inf x  = y ∥ x − y ∥ = r − ⟨ u ( x ) − u ( y ) , x − y ⟩ ∥ x − y ∥ 2 , r > 0 . (15) Positiv e κ u ( r ) means contraction at separation r , while negati ve κ u ( r ) means e xpansion. Gaussian smoothing transfers the weak-log-concavity parameters ( α, M ) of p 0 into the time- dependent quantities α s := α a 2 ( s ) + α σ 2 ( s ) , M s := M a 2 ( s )  a 2 ( s ) + α σ 2 ( s )  2 , s ∈ [0 , T ] . (16) Here α s quantiﬁes the ef fecti ve lar ge-scale con vexity of p s , while M s measures the residual short- scale noncon ve xity after smoothing. Deﬁnition 3. For s ∈ (0 , T ] , deﬁne m ( s ) := g 2 ( s )  α s − ℓ ( s )  − f ( s ) , b ( s ) := f ( s ) + g 2 ( s )  M s + ℓ ( s ) − α s  , (17) and the radial lower en velope κ s ( r ) := g 2 ( s )  α s − 1 r f M s ( r ) − ℓ ( s )  − f ( s ) , r > 0 . (18) For each ﬁx ed s , the same lower en velope can be re written as κ s ( r ) = m ( s ) − g 2 ( s ) 1 r f M s ( r ) = − b ( s ) + g 2 ( s )  M s − 1 r f M s ( r )  , r > 0 . (19) This identity makes the geometry e xplicit: m ( s ) is the far -ﬁeld reserve, while the radius-dependent penalty g 2 ( s ) f M s ( r ) /r is largest near the origin and vanishes at inﬁnity . Hence the reserve is con verted into the Euclidean load − b ( s ) at short scales and reappears at large scales. Lemma 1. Under Assumptions 1 and 3, for t ∈ [0 , T ] and s := T − t , the following hold: (i) Proﬁle transfer . F or every r > 0 , κ b b t ( r ) ≥ κ s ( r ) . (20) (ii) Endpoint structure. F or each ﬁxed s ∈ (0 , T ] , the map r 7→ κ s ( r ) is nondecr easing on (0 , ∞ ) , with lim r ↓ 0 κ s ( r ) = − b ( s ) , lim r →∞ κ s ( r ) = m ( s ) . (21) If m ( s ) > 0 , then R ( s ) := inf { r > 0 : κ s ( r ) ≥ 0 } < ∞ , (22) and κ s ( r ) ≥ 0 for all r ≥ R ( s ) . (iii) Companion Euclidean load. F or e very x, y ∈ R d , ⟨ x − y , b b t ( x ) − b b t ( y ) ⟩ ≤ b ( s ) ∥ x − y ∥ 2 . (23) The left panel is pointwise in s , whereas the right panel illustrates the uniform-in-time far-ﬁeld positivity that underlies admissible switches. Deﬁnition 4 (Admissible switches) . For s ∈ [0 , T ] and s 0 ∈ (0 , T ] , deﬁne Γ( s ) := Z s 0 b ( u ) d u, m ( s 0 ) := inf u ∈ [ s 0 ,T ] m ( u ) , whenev er the inte gral deﬁning Γ( s ) is ﬁnite. W e call s 0 an admissible switch if m ( s 0 ) > 0 , and write S adm := { s 0 ∈ (0 , T ] : m ( s 0 ) > 0 } . 7 r κ s ( r ) m ( s ) − b ( s ) R ( s ) κ s < 0 κ s ≥ 0 (a) s 0 s 0 T late Euclidean window early radial window Γ( s 0 ) m ( s 0 ) > 0 (b) Figure 1: (a) Pointwise radial lower proﬁle at a ﬁxed noise le vel s . (b) W indo w-lev el quantities used by the theorem: m ( s 0 ) = inf u ∈ [ s 0 ,T ] m ( u ) and Γ( s 0 ) = R s 0 0 b ( u ) d u . Thus s 0 is admissible exactly when m ( s 0 ) > 0 . The right panel is schematic and does not represent the ev olution of R ( s ) . The inﬁmum m ( s 0 ) enforces a positi ve far -ﬁeld margin throughout the entire early windo w [ s 0 , T ] . Thus admissibility does not mean that direct Euclidean closure is already a v ailable; it certiﬁes only that a contractiv e radial regime is present before the return to W 2 . For each admissible switch s 0 , the appendices construct the associated switch metric W φ s 0 , its damping rate c ( s 0 ) , and its afﬁne-tail slope a ( s 0 ) . 4 A routing theor em from radial metric mismatch Section 3 produced the theorem input from a single object: the radial lower proﬁle κ s . Its far-ﬁeld endpoint yields the admissible switch set S adm , and its near-ﬁeld endpoint yields the Euclidean load Γ . The theorem below turns this one-proﬁle, two-endpoint structure into a one-switch routing bound. For an admissible switch s 0 , write g ( s 0 ) := inf u ∈ [ s 0 ,T ] g ( u ) , b ( s 0 ) := sup u ∈ [ s 0 ,T ] b ( u ) , G ( s 0 ) := sup u ∈ [ s 0 ,T ] g 2 ( u ) p M u . For any increasing concave function φ : [0 , ∞ ) → [0 , ∞ ) with φ (0) = 0 , deﬁne the associated transport cost W φ ( µ, ν ) := inf π ∈ Π( µ,ν ) Z R d × R d φ ( ∥ x − y ∥ ) π (d x, d y ) . 4.1 Main theorem Throughout this subsection, work under Assumptions 1, 2, and 3. Let 0 = t 0 < · · · < t N = T be a numerical grid, and let ( Z t k ) N k =0 be a Markovian discretization of the learned rev erse SDE initialized by Z t 0 ∼ b p T , with numerical one-step kernels Ψ k , i.e. La w( Z t k +1 | Z t k = x ) = Ψ k ( x, · ) , x ∈ R d , k = 0 , . . . , N − 1 . Write P k := P t k ,t k +1 for the exact learned re verse one-step kernels. Assume that the ideal re verse SDE (12) admits a global strong solution ( e X t ) 0 ≤ t ≤ T on [0 , T ] such that La w( e X t ) = p T − t , t ∈ [0 , T ] . Assume moreov er that the one-step defects satisfy W 2  P k ( x, · ) , Ψ k ( x, · )  ≤ ξ k ( x ) , x ∈ R d , k = 0 , . . . , N − 1 , (24) for Borel measurable functions ξ k : R d → [0 , ∞ ) . Theorem 1. F ix p > 2 and assume b p T ∈ P p ( R d ) . Let s 0 ∈ S adm be a grid-aligned admissible switch, set t s := T − s 0 = t K , and let φ s 0 , c ( s 0 ) , and a ( s 0 ) denote the associated switch metric, early damping rate, and af ﬁne-tail slope. Assume that the learned r ever se SDE is str ongly well posed on [0 , T ] . Assume mor eover that, for every [ u, v ] ⊆ [0 , t s ] and every initial coupling π ∈ Π( µ, ν ) 8 with µ, ν ∈ P 1 ( R d ) , ther e exists a coalescing r eﬂection coupling on [ u, v ] with initial law π . Suppose further that g ( s 0 ) > 0 , b ( s 0 ) < ∞ , G ( s 0 ) < ∞ , Γ( s 0 ) < ∞ . Let M sw p ( s 0 ) < ∞ satisfy M sw p ( s 0 ) ≥ E ∥ Z t s ∥ p + E X ∼ p s 0 ∥ X ∥ p . (25) Then W 2  La w( Z t N ) , p 0  ≤ ∆ late ( s 0 ) + e Γ( s 0 ) C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p , (26) wher e θ p := p − 2 2( p − 1) , C sw p ( s 0 ) := p 2( p − 1)( p − 2) − θ p a ( s 0 ) − θ p  M sw p ( s 0 )  1 2( p − 1) , and ∆ sw φ ( s 0 ) := e − c ( s 0 ) t s W φ s 0 ( b p T , p T ) + K − 1 X k =0 e − c ( s 0 )( t s − t k +1 )  E [ ξ k ( Z t k ) 2 ]  1 / 2 + Z T s 0 e − c ( s 0 )( s − s 0 ) g 2 ( s ) ε ( s ) d s, (27) ∆ late ( s 0 ) := N − 1 X k = K e Γ( T − t k +1 )  E ξ k ( Z t k ) 2  1 / 2 + Z s 0 0 e Γ( s ) g 2 ( s ) ε ( s ) d s. Theorem 1 isolates a single geometric transition at s 0 : before the interface, error is transported in the ﬁrst geometry that actually contracts; after the interface, it is propagated in the Euclidean geometry in which the terminal discrepancy is measured. Thus ∆ sw φ ( s 0 ) is a damped pre-switch budget, and θ p = ( p − 2) / (2( p − 1)) is the one-time price of returning from the softer switch metric to W 2 . Equiv alently , the theorem induces a scalar switch-selection objectiv e on the admissible grid: minimize the right-hand side of (26) ov er admissible grid-aligned s 0 ; see Corollary 2. Appendix D.2 shows in Proposition 3 that this con version price cannot be improved uniformly within the af ﬁne-tail concav e class under the same switch-moment assumption. The adv antage is entirely an early-windo w phenomenon. Under weak log-concavity , the radial proﬁle κ s can become contractiv e in the far ﬁeld before direct Euclidean closure is av ailable. Reﬂection coupling sees this proﬁle through the radial separation process, and the adapted metric W φ s 0 con verts that far -ﬁeld reserve into genuine transport contraction while absorbing the near -ﬁeld obstruction. For ﬁxed s 0 , the routed and direct bounds share the same late-windo w term, so the comparison reduces exactly to the early window . The direct route weights an error injected at noise le vel u ∈ [ s 0 , T ] by the residual Euclidean factor e R u s 0 b ( v ) d v , whereas the routed route ﬁrst damps the same input by e − c ( s 0 )( u − s 0 ) and returns to W 2 only once. Hence the routed bound can be strictly sharper on mismatch windows. Appendix F .2 makes the comparison exact, and Appendix G.4 makes it concrete in a closed-form variance-preserving e xample. 5 Conclusion The main message of this paper is that, under weak log-concavity , end-to-end error in re verse dif fusion should not be propagated in a geometry chosen a priori. The central issue is not whether W 2 is the ﬁnal error metric, but whether it is the right geometry in which to transport error at ev ery stage of the reverse trajectory . The resulting gain is therefore not only geometric but quantitativ e: direct full-horizon Euclidean propagation can be genuinely conserv ativ e on mismatch windo ws. More broadly , our result suggests that rev erse-diffusion theory under weak regularity should be organized around geometry-adapti ve error propagation rather than uniform metric propagation. The present paper isolates this principle in the simplest setting where it can be pro ved cleanly: a single radial interface and theorem-le vel bounds. A natural next step is to understand whether the same principle persists beyond one switch, be yond radial geometries, under weaker veriﬁable conditions, and in complexity guarantees tailored to concrete samplers. 9 References Brian DO Anderson. Rev erse-time dif fusion equation models. Stochastic Pr ocesses and their Applications , 12(3):313–326, 1982. Iskander Azangulov , Geor ge Deligiannidis, and Judith Rousseau. Con ver gence of diffusion models under the manifold hypothesis in high-dimensions. arXiv pr eprint arXiv:2409.18804 , 2024. Joe Benton, V alentin De Bortoli, Arnaud Doucet, and George Deligiannidis. Nearly d -linear con ver - gence bounds for dif fusion models via stochastic localization. arXiv pr eprint arXiv:2308.03686 , 2023. Eliot Beyler and Francis Bach. Con ver gence of deterministic and stochastic dif fusion-model samplers: A simple analysis in wasserstein distance. arXiv pr eprint arXiv:2508.03210 , 2025. Stefano Bruno and Sotirios Sabanis. W asserstein conv ergence of score-based generative models under semicon ve xity and discontinuous gradients. arXiv pr eprint arXiv:2505.03432 , 2025. Stefano Bruno, Y ing Zhang, Dong-Y oung Lim, Omer Deniz Akyildiz, and Sotirios Sabanis. On diffusion-based generati ve models and their error bounds: The log-concave case with full con ver gence estimates. T ransactions on Machine Learning Resear ch , 2025. URL https: //openreview.net/forum?id=zjxKrb4ehr . Fan Chen, Sinho Che wi, Constantinos Daskalakis, and Alexander Rakhlin. High-accuracy sampling for diffusion models and log-conca ve distributions. arXiv pr eprint arXiv:2602.01338 , 2026. Hongrui Chen, Holden Lee, and Jianfeng Lu. Improv ed analysis of score-based generativ e modeling: User-friendly bounds under minimal smoothness assumptions. In International Confer ence on Machine Learning , pages 4735–4763. PMLR, 2023a. Mu-Fa Chen and Shao-Fu Li. Coupling methods for multidimensional diffusion processes. The Annals of Pr obability , pages 151–177, 1989. Sitan Chen, Sinho Chewi, Jerry Li, Y uanzhi Li, Adil Salim, and Anru R Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions. arXiv pr eprint arXiv:2209.11215 , 2022. Sitan Chen, Sinho Che wi, Holden Lee, Y uanzhi Li, Jianfeng Lu, and Adil Salim. The probability ﬂow ode is prov ably fast. Advances in Neur al Information Pr ocessing Systems , 36:68552–68575, 2023b. Giov anni Conforti. W eak semicon vexity estimates for schrödinger potentials and logarithmic sobole v inequality for schrödinger bridges. Probability Theory and Related F ields , 189(3):1045–1071, 2024. Giov anni Conforti, Alain Durmus, and Marta Gentiloni Silveri. Kl con vergence guarantees for score dif fusion models under minimal data assumptions. SIAM Journal on Mathematics of Data Science , 7(1):86–109, 2025. Andreas Eberle. Reﬂection couplings and contraction rates for diffusions. Pr obability theory and r elated ﬁelds , 166(3):851–886, 2016. Andreas Eberle and Raphael Zimmer . Sticky couplings of multidimensional diffusions with different drifts. 2016. URL . Andreas Eberle, Arnaud Guillin, and Raphael Zimmer . Quantitati ve harris type theorems for diffusions and mckean-vlasov processes. 2017. URL . Andreas Eberle, Arnaud Guillin, and Raphael Zimmer . Couplings and quantitati ve contraction rates for langevin dynamics. 2019. Xuefeng Gao and Lingjiong Zhu. Con ver gence analysis for general probability ﬂo w odes of dif fusion models in wasserstein distances. arXiv pr eprint arXiv:2401.17958 , 2024. 10 Xuefeng Gao, Hoang M Nguyen, and Lingjiong Zhu. W asserstein conv ergence guarantees for a general class of score-based generative models. J ournal of machine learning r esear ch , 26(43): 1–54, 2025. Marta Gentiloni-Silveri and Antonio Ocello. Beyond log-concavity and score regularity: Im- prov ed con ver gence bounds for score-based generative models in w2-distance. arXiv preprint arXiv:2501.02298 , 2025. Ulrich G Haussmann and Etienne P ardoux. T ime re versal of dif fusions. The Annals of Pr obability , pages 1188–1205, 1986. Desmond J Higham, Xuerong Mao, and Andrew M Stuart. Strong conv ergence of euler-type methods for nonlinear stochastic dif ferential equations. SIAM journal on numerical analysis , 40 (3):1041–1063, 2002. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif fusion probabilistic models. Advances in neural information pr ocessing systems , 33:6840–6851, 2020. Olav Kallenberg. Lectur es on the Coupling Method (T or gny Lindvall) , volume 35. 1993. doi: 10.1137/1035121. URL https://doi.org/10.1137/1035121 . Gitte Kremling, Francesco Iafrate, Mahsa T aheri, and Johannes Lederer . Non-asymptotic error bounds for probability ﬂow odes under weak log-concavity . arXiv pr eprint arXiv:2510.17608 , 2025. Holden Lee, Jianfeng Lu, and Y ixin T an. Con ver gence of score-based generati ve modeling for general data distributions. In International Conference on Algorithmic Learning Theory , pages 946–985. PMLR, 2023. Runjia Li, Qiwei Di, and Quanquan Gu. Uniﬁed conv ergence analysis for score-based diffusion models with deterministic samplers. arXiv pr eprint arXiv:2410.14237 , 2024. T or gny Lindv all and L Cris G Rogers. Coupling of multidimensional dif fusions by reﬂection. The Annals of Pr obability , pages 860–872, 1986. Emanuel Pfarr , Radu T imofte, and Frank W erner . Analyzing the error of generati ve dif fusion models: From euler-maruyama to higher -order schemes. arXiv pr eprint arXiv:2601.18425 , 2026. Jascha Sohl-Dickstein, Eric W eiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics. In International conference on machine learning , pages 2256–2265. pmlr , 2015. Y ang Song and Stefano Ermon. Generati ve modeling by estimating gradients of the data distrib ution. volume 32, 2019. Y ang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar , Stefano Ermon, and Ben Poole. Score-based generativ e modeling through stochastic differential equations. arXiv pr eprint arXiv:2011.13456 , 2020. Stanislas Strasman, Gabriel Cardoso, Sylv ain Le Corf f, V incent Lemaire, and Antonio Ocello. On forgetting and stability of score-based generati ve models. arXiv pr eprint arXiv:2601.21868 , 2026. Xixian W ang and Zhongjian W ang. W asserstein bounds for generati ve dif fusion models with g aussian tail targets. arXiv pr eprint arXiv:2412.11251 , 2024. Y ifeng Y u and Lu Y u. Adv ancing wasserstein con v ergence analysis of score-based models: Insights from discretization and second-order acceleration. arXiv pr eprint arXiv:2502.04849 , 2025. Xicheng Zhang. Stochastic monge–kantorovich problem and its duality . Stochastics An International Journal of Pr obability and Stochastic Pr ocesses , 85(1):71–84, 2013. 11 Appendix r oadmap. The appendix follo ws the proof spine of the paper . Section A extracts the forward-time smoothing input and deriv es the quantities α s , M s , m ( s ) , and b ( s ) . Section B transfers this proﬁle to the learned reverse drift and characterizes admissible switches. Section C proves early- window contraction in the switch metric and packages the pre-switch initialization, discretization, and score-forcing terms. Section D proves the one-time return from W φ s 0 to W 2 , including the sharpness of θ p . Section E propagates the con verted discrepancy through the late Euclidean windo w , and Section F combines the pieces into the main theorem and the routed-versus-direct comparison. Section G records auxiliary comparison principles, conserv ati ve admissibility certiﬁcates, suf ﬁcient conditions for the early coupling input, and the explicit v ariance-preserving comparison result. A Rev erse-diffusion background and Gaussian smoothing This section isolates the only forward-time regularization input used later . All subsequent quantities α s , M s , m ( s ) , and b ( s ) ultimately come from the representation p s = ( S a ( s ) p 0 ) ∗ γ σ 2 ( s ) . For the theorem, it is not enough to kno w qualitati vely that Gaussian smoothing impro ves geometry . W e need the e xact way in which smoothing acts on the weak proﬁle κ V 0 ( r ) ≥ α − 1 r f M ( r ) . There are three operations behind the formulas prov ed belo w: deterministic scaling by a ( s ) , which weakens both curv ature and defect by the factor a ( s ) − 2 ; Gaussian con v olution, which preserves the weak semicon vex defect once the a v ailable quadratic part has been peeled off; and a quadratic tilt, which returns part of that peeled-of f curvature in e xplicit form. The outcome is the pair ( α s , M s ) : α s records the large-scale con vexity that survi ves smoothing, while M s records the residual short-scale defect that smoothing has not yet erased. Nothing in this section depends on the later metric switch. Its outputs are the e xplicit proﬁle bound for V s := − log p s and the moment bounds used at switch time. Throughout, we use the drift proﬁle from Deﬁnition 2. When h ∈ C 1 ( R d ) is a potential, we write κ h ( r ) := κ −∇ h ( r ) = inf x  = y ∥ x − y ∥ = r ⟨∇ h ( x ) − ∇ h ( y ) , x − y ⟩ ∥ x − y ∥ 2 , r > 0 . Thus κ h is simply the proﬁle of the gradient drift −∇ h . A.1 Forward smoothing and basic regularity Lemma 2. F or every s ∈ (0 , T ] , one has σ ( s ) > 0 . Consequently , p s = La w( X s ) admits a strictly positive C ∞ density , and V s := − log p s ∈ C ∞ ( R d ) . In particular , κ V s is well deﬁned for every s > 0 . Pr oof. Since g ( u ) > 0 for u > 0 , the representation σ 2 ( s ) = Z s 0 exp  − 2 Z s u f ( v ) d v  g 2 ( u ) d u has strictly positiv e integrand, hence σ ( s ) > 0 for s > 0 . Moreover , X s d = a ( s ) X 0 + σ ( s ) ξ , ξ ∼ N (0 , I d ) , so p s = ( S a ( s ) p 0 ) ∗ N (0 , σ 2 ( s ) I d ) . Thus p s is the con volution of a probability measure with a nondegenerate Gaussian density , hence it is strictly positive and C ∞ . Therefore V s = − log p s ∈ C ∞ ( R d ) . 12 A.2 Proﬁle calculus under OU smoothing The next bounds package the smoothing mechanism above into the form used later . The scaling lemma isolates what the deterministic factor a ( s ) does to the proﬁle. The weak-semicon ve xity in v ariance of Gaussian con volution from Conforti [2024, Theorem 2.1] is then applied not to U itself but to h = U − α ′ 2 ∥ · ∥ 2 , because the defect class F M is stable only after the explicit quadratic curvature has been separated off. The factorization lemma explains why con v olving e − U with a Gaussian becomes a heat ﬂow of e − h ev aluated at a rescaled point, multiplied by a new quadratic tilt. That is the precise origin of the denominator 1 + α ′ σ 2 in the ﬁnal proﬁle formulas; see also Gentiloni-Silveri and Ocello [2025, Appendix B] for the same input in the dif fusion-model setting. Lemma 3. F or all M > 0 and r ≥ 0 , 0 ≤ f M ( r ) ≤ min { M r, 2 √ M } . Consequently , for every r > 0 , 1 r f M ( r ) ≤ min ( M , 2 √ M r ) . Pr oof. The claim is tri vial if M = 0 or r = 0 . Otherwise, with y = 1 2 √ M r , one has f M ( r ) = 2 √ M tanh( y ) , and tanh( y ) ≤ min { y , 1 } giv es f M ( r ) ≤ min { 2 √ M y, 2 √ M } = min { M r , 2 √ M } . Dividing by r > 0 yields the second bound. For u > 0 , let γ u denote the density of N (0 , uI d ) , and deﬁne ( P u ϕ )( x ) := Z R d ϕ ( y ) γ u ( x − y ) d y. For M > 0 , set F M := n h ∈ C 1 ( R d ) : κ h ( r ) ≥ − 1 r f M ( r ) ∀ r > 0 o . Lemma 4. Let p 0 ∝ e − V 0 with V 0 ∈ C 1 ( R d ) , and for a > 0 let q := S a p 0 , U a := − log q . Then for every r > 0 , κ U a ( r ) = 1 a 2 κ V 0 ( r /a ) . In particular , if κ V 0 ( r ) ≥ α − 1 r f M ( r ) , r > 0 , then κ U a ( r ) ≥ α a 2 − 1 r f M /a 2 ( r ) , r > 0 . Pr oof. Since q ( x ) = a − d p 0 ( x/a ) , one has U a ( x ) = V 0 ( x/a )+const and ∇ U a ( x ) = a − 1 ∇ V 0 ( x/a ) . Thus, for u = x/a , v = y /a , ⟨∇ U a ( x ) − ∇ U a ( y ) , x − y ⟩ ∥ x − y ∥ 2 = 1 a 2 ⟨∇ V 0 ( u ) − ∇ V 0 ( v ) , u − v ⟩ ∥ u − v ∥ 2 . T aking the inﬁmum o ver ∥ x − y ∥ = r , equi valently ∥ u − v ∥ = r /a , gi ves the ﬁrst identity . The second follo ws by applying the assumed lo wer bound at radius r /a and using a − 1 f M ( r /a ) = f M /a 2 ( r ) . Lemma 5 (Heat semigroup in v ariance of F M ) . Let M > 0 and h ∈ F M . F or u > 0 , deﬁne H u ( x ) := − log  P u ( e − h )  ( x ) . Then H u ∈ F M . 13 Pr oof. This is exactly Conforti [2024, Theorem 2.1]. Lemma 6. Let U : R d → R , σ > 0 , and α ′ ≥ 0 . Deﬁne h ( y ) := U ( y ) − α ′ 2 ∥ y ∥ 2 , A := 1 + α ′ σ 2 , τ := σ 2 A . Let p ( x ) := Z R d e − U ( y ) γ σ 2 ( x − y ) d y. Then ther e e xists C = C ( α ′ , σ , d ) such that for all x ∈ R d , p ( x ) = C exp  − α ′ 2 A ∥ x ∥ 2  P τ ( e − h )   x A  . Pr oof. Since e − U ( y ) = e − h ( y ) e − α ′ ∥ y ∥ 2 / 2 , one gets p ( x ) = (2 π σ 2 ) − d/ 2 Z R d e − h ( y ) exp  − α ′ 2 ∥ y ∥ 2 − ∥ x − y ∥ 2 2 σ 2  d y . Completing the square giv es α ′ 2 ∥ y ∥ 2 + ∥ x − y ∥ 2 2 σ 2 = α ′ 2 A ∥ x ∥ 2 + 1 2 τ    y − x A    2 . Recognizing the Gaussian density γ τ yields p ( x ) = A − d/ 2 exp  − α ′ 2 A ∥ x ∥ 2  P τ ( e − h )   x A  , so the claim holds with C = A − d/ 2 . Lemma 7. Let α ′ ≥ 0 and M ′ ≥ 0 . Assume q ∝ e − U with U ∈ C 1 ( R d ) satisﬁes κ U ( r ) ≥ α ′ − 1 r f M ′ ( r ) , r > 0 . Let p = q ∗ γ σ 2 and V := − log p . Then for all r > 0 , κ V ( r ) ≥ α ′ 1 + α ′ σ 2 − 1 r f M ′ / (1+ α ′ σ 2 ) 2 ( r ) . Pr oof. The proof separates the explicit quadratic curvature from the residual defect. The residual part belongs to a class in v ariant under Gaussian con volution, while the quadratic part can be tracked exactly by completing the square. Set A := 1 + α ′ σ 2 , τ := σ 2 / A , and h := U − α ′ 2 ∥ · ∥ 2 . Then κ h ( r ) = κ U ( r ) − α ′ , so h ∈ F M ′ . By Lemma 6, V ( x ) = α ′ 2 A ∥ x ∥ 2 + H ( x/ A ) + const , H := − log  P τ ( e − h )  . Lemma 5 giv es H ∈ F M ′ . Using the quadratic contribution and the same scaling computation as in Lemma 4, κ V ( r ) ≥ α ′ A + 1 A 2 κ H ( r / A ) ≥ α ′ A − 1 Ar f M ′ ( r / A ) = α ′ A − 1 r f M ′ / A 2 ( r ) . Corollary 1 (OU smoothing of the weak proﬁle) . Assume p 0 ∝ e − V 0 satisﬁes Assumption 1 with parameters ( α, M ) . F or s ∈ (0 , T ] , let p s = ( S a ( s ) p 0 ) ∗ γ σ 2 ( s ) , V s := − log p s . Then for every r > 0 , κ V s ( r ) ≥ α s − 1 r f M s ( r ) , wher e α s = α a 2 ( s ) + α σ 2 ( s ) , M s = M a 2 ( s ) ( a 2 ( s ) + α σ 2 ( s )) 2 . 14 Pr oof. Let U s denote the potential of S a ( s ) p 0 . Lemma 4 gives κ U s ( r ) ≥ α a 2 ( s ) − 1 r f M /a 2 ( s ) ( r ) . Applying Lemma 7 with α ′ = α/a 2 ( s ) , M ′ = M /a 2 ( s ) , and σ = σ ( s ) yields κ V s ( r ) ≥ α/a 2 ( s ) 1 + ( α/a 2 ( s )) σ 2 ( s ) − 1 r f M/a 2 ( s ) (1+( α/a 2 ( s )) σ 2 ( s )) 2 ( r ) , which simpliﬁes to the stated formulas for α s and M s . Remark 2 (Wh y α s and M s share the same denominator) . The common denominator a 2 ( s ) + α σ 2 ( s ) is the algebraic signature of one smoothing mechanism acting on two different parts of the proﬁle. The factor a 2 ( s ) comes from transporting the initial geometry through the linear part of the forward dynamics, while the term α σ 2 ( s ) is the quadratic curv ature that can be fed into the Gaussian- con v olution step. The numerators then separate the two roles. The large-scale con vex part keeps the factor α , producing α s , whereas the short-scale defect keeps the f actor M a 2 ( s ) , producing M s . Thus smoothing improves the av ailable con ve xity and the residual defect through one common denominator , b ut only the defect remembers ho w much unsmoothed noncon v exity was still present before con v olution. A.3 Moment consequences W e also need one inte grability fact later at switch time. The point is that the defect term in Assump- tion 1 is uniformly bounded: by Lemma 3, r − 1 f M ( r ) can at worst create a linear loss along rays, while the positiv e αr term integrates to a quadratic conﬁnement. Thus weak log-concavity with α > 0 already forces Gaussian-type tails. Lemma 8. Under Assumption 1 with parameters α, M > 0 , ther e e xists a constant C 0 < ∞ such that V 0 ( x ) ≥ α 4 ∥ x ∥ 2 − C 0 , ∀ x ∈ R d . Consequently , p 0 ∈ P p ( R d ) for every p ≥ 1 . Mor eover , all forwar d marginals p s = La w( X s ) also belong to P p ( R d ) for every p ≥ 1 . Pr oof. Fix x = r u with r = ∥ x ∥ > 0 and ∥ u ∥ = 1 . Applying Assumption 1 with y = 0 gi ves ⟨∇ V 0 ( r u ) − ∇ V 0 (0) , u ⟩ ≥ αr − f M ( r ) , hence ⟨∇ V 0 ( r u ) , u ⟩ ≥ αr − f M ( r ) − ∥∇ V 0 (0) ∥ . Integrating along the ray s 7→ su , V 0 ( r u ) − V 0 (0) = Z r 0 ⟨∇ V 0 ( su ) , u ⟩ d s ≥ α 2 r 2 − Z r 0 f M ( s ) d s − ∥∇ V 0 (0) ∥ r . By Lemma 3, f M ( s ) ≤ 2 √ M , so V 0 ( x ) ≥ V 0 (0) + α 2 ∥ x ∥ 2 −  2 √ M + ∥∇ V 0 (0) ∥  ∥ x ∥ . Absorbing the linear term into the quadratic term yields V 0 ( x ) ≥ α 4 ∥ x ∥ 2 − C 0 for some C 0 < ∞ . Thus e − V 0 ( x ) ≲ e − α ∥ x ∥ 2 / 4 , so p 0 has ﬁnite moments of e very order . The same then holds for p s because X s d = a ( s ) X 0 + σ ( s ) ξ , ξ ∼ N (0 , I d ) , and Gaussian moments are ﬁnite. 15 B Proﬁle transfer , Euclidean closur e, and admissible switches This section translates the smoothing output into geometry of the learned re verse drift. For each noise lev el s , the learned re verse drift inherits a radial lo wer env elope κ s ( r ) = m ( s ) − g 2 ( s ) 1 r f M s ( r ) , r > 0 , where s = T − t . The key point is that this is one and t he same monotone-in- r curve. Its left endpoint is the Euclidean load − b ( s ) , which is the quantity seen by synchronous coupling. Its right endpoint is the far -ﬁeld reserve m ( s ) , which is the quantity later con verted into contraction by reﬂection coupling. Thus the later coupling split is not imposed from outside: it is already encoded in the shape of κ s . No stochastic argument is needed yet; the whole section is deterministic radial bookk eeping. B.1 Endpoint structure of the lower radial pr oﬁle Pr oof of Lemma 1. Fix t ∈ [0 , T ] and write s := T − t . Item (i) : proﬁle tr ansfer . For x  = y , set r := ∥ x − y ∥ . Since b b t ( x ) = f ( s ) x + g 2 ( s ) ∇ log p s ( x ) + g 2 ( s ) e s ( x ) , one has − ⟨ b b t ( x ) − b b t ( y ) , x − y ⟩ ∥ x − y ∥ 2 = − f ( s ) − g 2 ( s ) ⟨∇ log p s ( x ) − ∇ log p s ( y ) , x − y ⟩ ∥ x − y ∥ 2 − g 2 ( s ) ⟨ e s ( x ) − e s ( y ) , x − y ⟩ ∥ x − y ∥ 2 . Because ∇ log p s = −∇ V s , the middle term is κ V s ( r ) . By Corollary 1, κ V s ( r ) ≥ α s − 1 r f M s ( r ) , and Assumption 3 giv es ⟨ e s ( x ) − e s ( y ) , x − y ⟩ ∥ x − y ∥ 2 ≤ ℓ ( s ) . Combining the two bounds yields − ⟨ b b t ( x ) − b b t ( y ) , x − y ⟩ ∥ x − y ∥ 2 ≥ g 2 ( s )  α s − 1 r f M s ( r ) − ℓ ( s )  − f ( s ) = κ s ( r ) . T aking the inﬁmum o ver all pairs with ∥ x − y ∥ = r prov es (20). Item (ii) : endpoint structure of the same pr oﬁle. Deﬁne q M ( r ) := 1 r f M ( r ) , r > 0 . Then κ s ( r ) = m ( s ) − g 2 ( s ) q M s ( r ) . If M s = 0 , then f M s ≡ 0 , so κ s ( r ) ≡ m ( s ) , and all claims are immediate. Assume no w that M s > 0 . W ith z = 1 2 √ M s r and ψ ( z ) := tanh( z ) /z , one has q M s ( r ) = M s ψ  1 2 p M s r  . Moreov er , ψ ′ ( z ) = z sech 2 ( z ) − tanh( z ) z 2 = − tanh( z ) − z sech 2 ( z ) z 2 . Set h ( z ) := tanh( z ) − z sec h 2 ( z ) . Then h (0) = 0 and h ′ ( z ) = 2 z sech 2 ( z ) tanh( z ) ≥ 0 ( z ≥ 0) , so h ( z ) ≥ 0 for z ≥ 0 . Hence ψ ′ ( z ) ≤ 0 , and therefore r 7→ q M s ( r ) is nonincreasing on (0 , ∞ ) . Consequently r 7→ κ s ( r ) is nondecreasing on (0 , ∞ ) . 16 The endpoint limits now follo w from lim z ↓ 0 tanh( z ) z = 1 , lim z →∞ tanh( z ) z = 0 . Indeed, lim r ↓ 0 q M s ( r ) = M s , lim r →∞ q M s ( r ) = 0 , hence lim r ↓ 0 κ s ( r ) = m ( s ) − g 2 ( s ) M s = − b ( s ) , and lim r →∞ κ s ( r ) = m ( s ) . The identity is just the rearrangement m ( s ) + b ( s ) = g 2 ( s ) M s . If m ( s ) > 0 , the second limit implies that there exists r ∗ > 0 such that κ s ( r ∗ ) > 0 . By monotonicity , κ s ( r ) ≥ 0 for all r ≥ r ∗ . Therefore the set in (22) is nonempty , so R ( s ) < ∞ , and κ s ( r ) ≥ 0 for all r ≥ R ( s ) . Item (iii) : companion Euclidean load. By item (ii), κ s ( r ) ≥ lim u ↓ 0 κ s ( u ) = − b ( s ) , r > 0 . T ogether with item (i), this yields κ b b t ( r ) ≥ − b ( s ) , r > 0 . Hence, for x  = y , ⟨ x − y , b b t ( x ) − b b t ( y ) ⟩ ≤ − κ b b t ( ∥ x − y ∥ ) ∥ x − y ∥ 2 ≤ b ( s ) ∥ x − y ∥ 2 . The case x = y is tri vial, proving (23). Remark 3 (Geometry behind the later coupling split) . Fix s ∈ (0 , T ] . The curve r 7→ κ s ( r ) rises from the near -ﬁeld v alue − b ( s ) to the far -ﬁeld v alue m ( s ) . So Euclidean geometry and radial geometry are reading dif ferent parts of the same object. A synchronous-coupling estimate for ∥ Y t − ¯ Y t ∥ 2 collapses the whole curve to its left endpoint, because it must control pairwise separation uniformly ov er all radii. By contrast, reﬂection coupling can act directly on the radial process and therefore exploit the fact that the same proﬁle may already be positi ve in the tail e ven when its near-ﬁeld endpoint is still negati ve. This is exactly the metric-mismatch regime: far-ﬁeld contraction has appeared, but Euclidean closure has not. The switch metric introduced later is tailored to that shape—it suppresses the near-ﬁeld defect while k eeping sensitivity to the contracti ve tail. B.2 Characterization of admissible switch times Deﬁnition 4 asks that the far -ﬁeld endpoint stay positi ve on the whole remaining noise interv al. By Remark 3, this does not mean that Euclidean geometry has already been restored. It only means that ev ery instantaneous proﬁle on [ s 0 , T ] has a positi ve tail, which is the minimum rob ust input needed later to build one time-uniform switch metric and run a single reﬂection-coupling ar gument on the whole early window . Proposition 1 (Structure of the admissible switch set) . Assume that m is continuous on (0 , T ] . Then the map s 0 7− → m ( s 0 ) := inf u ∈ [ s 0 ,T ] m ( u ) is nondecr easing and continuous on (0 , T ] . Consequently , S adm = { s 0 ∈ (0 , T ] : m ( s 0 ) > 0 } is an upper interval of (0 , T ] . If, in addition, m ( T ) > 0 , then a nontrivial admissible switc h exists. Mor eover , if s min := inf S adm ∈ (0 , T ) , then m ( s min ) = 0 . 17 Pr oof. If 0 < s 1 ≤ s 2 ≤ T , then [ s 2 , T ] ⊆ [ s 1 , T ] , hence m ( s 1 ) = inf u ∈ [ s 1 ,T ] m ( u ) ≤ inf u ∈ [ s 2 ,T ] m ( u ) = m ( s 2 ) , so s 0 7→ m ( s 0 ) is nondecreasing. T o pro ve continuity , deﬁne F ( s 0 , θ ) := m  (1 − θ ) s 0 + θ T  , ( s 0 , θ ) ∈ (0 , T ] × [0 , 1] . Since m is continuous, so is F , and m ( s 0 ) = min θ ∈ [0 , 1] F ( s 0 , θ ) . The minimum of a continuous function ov er the compact set [0 , 1] depends continuously on s 0 , hence m is continuous on (0 , T ] . Because S adm = m − 1 ((0 , ∞ )) and m is nondecreasing, S adm is an upper interval. If m ( T ) > 0 , continuity giv es η > 0 such that m ( u ) > 0 on [ T − η , T ] , hence m ( T − η ) > 0 , so a nontrivial admissible switch exists. Finally , let s min := inf S adm ∈ (0 , T ) . Since S adm is an upper interv al, ev ery u > s min belongs to S adm , hence m ( u ) > 0 , u ∈ ( s min , T ] . Therefore, for ev ery u > s min and ev ery r ∈ [ u, T ] , m ( r ) ≥ m ( u ) > 0 . As u > s min is arbitrary , this implies m ( r ) > 0 , r ∈ ( s min , T ] . Suppose, for contradiction, that m ( s min ) > 0 . Then continuity of m implies that m is strictly positiv e on the compact interval [ s min , T ] , so m ( s min ) = min u ∈ [ s min ,T ] m ( u ) > 0 . By continuity of m , there exists η > 0 such that m ( s min − η ) > 0 , contradicting the deﬁnition of s min . Thus m ( s min ) = 0 . Remark 4 (Why admissibility is the right threshold for the next section) . Admissibility is intentionally weaker than Euclidean stability . It does not require b ( u ) ≤ 0 on [ s 0 , T ] , so one cannot generally replace the early radial route by a direct W 2 argument on that whole window . What it guarantees instead is that the positive tail survi ves uniformly after taking the inﬁmum ov er u ∈ [ s 0 , T ] . The next section turns exactly that survi ving tail positivity into a two-zone lower en velope with a bounded near-ﬁeld loss and a strictly positi ve far-ﬁeld reserv e. Remark 5 (The radial portrait in the main text is schematic) . Figure 1 should be read as a pointwise visualization of the lower proﬁle ( s, r ) 7→ κ s ( r ) . The threshold radius R ( s ) is only an instantaneous zero-crossing scale. Later we will replace it by the switch-dependent radius R sw ( s 0 ) , which is a uniform en velope radius on the whole early windo w rather than a pointwise radius at a single s . No claim is made here that R ( s ) mo ves monotonically inw ard with the noise lev el. C Early-phase damping in the switch metric Fix a grid-aligned admissible switch s 0 ∈ S adm , and write t s := T − s 0 = t K . This section prov es the early-window contraction used later . The argument has three steps. First, we replace the family  r 7→ κ s ( r )  s ∈ [ s 0 ,T ] 18 by a single switch-dependent lower en velope that is uniform on [0 , t s ] . Second, we use reﬂection coupling for two copies of the learned re verse SDE to obtain semigroup contraction in W φ s 0 . Third, we insert the three pre-switch error channels—initialization, discretization, and score forcing—at their entry times and propagate them to t s under that contraction. T wo couplings are used, for different reasons. Reﬂection coupling is used only for the learned- semigroup contraction. By contrast, when we compare the ideal and learned rev erse dynamics on a short interval, we use synchronous coupling so that the noise cancels and only the drift mismatch δ t remains. Throughout this section, P u,v denotes the exact learned rev erse semigroup on [ u, v ] , P k := P t k ,t k +1 the exact learned one-step kernels, and Ψ k the numerical one-step kernels; we write δ t ( x ) := b b t ( x ) − b t ( x ) = g 2 ( T − t ) e T − t ( x ) . (28) From this point on, two theorem-le vel inputs are used repeatedly . Whenev er Lemma 13 is in voked, we use strong well posedness of the learned re verse SDE on [0 , t s ] , existence of coalescing reﬂection couplings on ev ery subinterv al, and the ﬁniteness conditions g ( s 0 ) > 0 , b ( s 0 ) < ∞ , G ( s 0 ) < ∞ . Whenev er Lemma 14 is in voked, we also use the e xistence of a global ideal rev erse solution ( e X t ) 0 ≤ t ≤ T to (12) with La w( e X t ) = p T − t , 0 ≤ t ≤ T . C.1 A uniform early-window pr oﬁle from admissibility The ﬁrst deliberate loss of information occurs here. W e replace the full family  r 7→ κ s ( r )  s ∈ [ s 0 ,T ] by a switch-dependent two-zone lower en velope: a bounded near-ﬁeld load and a strictly positive f ar- ﬁeld reserve. This loses the detailed radius dependence of the exact proﬁle, b ut gains the uniformity in t needed by the later coupling argument. Deﬁne g ( s 0 ) := inf u ∈ [ s 0 ,T ] g ( u ) , b ( s 0 ) := sup u ∈ [ s 0 ,T ] b ( u ) , G ( s 0 ) := sup u ∈ [ s 0 ,T ] g 2 ( u ) p M u , R sw ( s 0 ) := 4 G ( s 0 ) m ( s 0 ) , m sw ( s 0 ) := m ( s 0 ) 2 . Here R sw ( s 0 ) is a switch-dependent uniform en velope radius on the whole early windo w; it is not the instantaneous zero-crossing radius R ( s ) from (22). Deﬁne the switch-lev el lo wer proﬁle κ s 0 ( r ) :=  − b ( s 0 ) , 0 < r ≤ R sw ( s 0 ) , m sw ( s 0 ) , r > R sw ( s 0 ) . Lemma 9. Fix s 0 ∈ S adm and assume b ( s 0 ) < ∞ and G ( s 0 ) < ∞ . Then for every t ∈ [0 , t s ] and every r > 0 , κ b b t ( r ) ≥ κ s 0 ( r ) , (29) and g ( T − t ) ≥ g ( s 0 ) . (30) Pr oof. Fix t ∈ [0 , t s ] and set s := T − t ∈ [ s 0 , T ] . By Lemma 1(i), κ b b t ( r ) ≥ κ s ( r ) = m ( s ) − g 2 ( s ) 1 r f M s ( r ) , r > 0 . If 0 < r ≤ R sw ( s 0 ) , then Lemma 3 giv es r − 1 f M s ( r ) ≤ M s , hence κ b b t ( r ) ≥ g 2 ( s )  α s − ℓ ( s ) − M s  − f ( s ) = − b ( s ) ≥ − b ( s 0 ) . 19 If r > R sw ( s 0 ) , then Lemma 3 giv es f M s ( r ) ≤ 2 √ M s , and therefore κ b b t ( r ) ≥ m ( s ) − 2 g 2 ( s ) √ M s r ≥ m ( s 0 ) − 2 G ( s 0 ) r . By the deﬁnition of R sw ( s 0 ) , the last term is bounded below by m ( s 0 ) − 2 G ( s 0 ) R sw ( s 0 ) = m ( s 0 ) 2 = m sw ( s 0 ) . This prov es (29). The bound (30) is immediate from the deﬁnition of g ( s 0 ) . C.2 Reﬂection coupling and radial reduction W ith the uniform proﬁle in hand, the problem becomes one-dimensional. For two copies of the learned re verse SDE, reﬂection coupling is the natural choice because it puts the stochastic forcing entirely in the radial direction of the separation vector . The distance process then sees the drift only through the scalar proﬁle κ b b t ( r t ) , exactly the object e xtracted in the previous subsection. Deﬁnition 5 (Coalescing reﬂection coupling) . Fix 0 ≤ t 1 < t 2 ≤ t s . Let Y and ¯ Y be solutions of the learned rev erse SDE (13) on [ t 1 , t 2 ] , deﬁned on a common ﬁltered probability space. Set D t := Y t − ¯ Y t , r t := ∥ D t ∥ , τ c := inf { t ≥ t 1 : r t = 0 } . Fix a deterministic unit vector e ∈ R d , and deﬁne n t := D t r t 1 { r t > 0 } + e 1 { r t =0 } , H t := ( I d − 2 n t n ⊤ t ) 1 { t<τ c } + I d 1 { t ≥ τ c } . If Y is driv en by a Bro wnian motion B , deﬁne B ref t := Z t t 1 H u d B u . The pair ( Y , ¯ Y ) is said to be in coalescing reﬂection coupling if Y is driv en by B and ¯ Y is driv en by B ref . Since H t H ⊤ t = I d , Lévy’ s characterization implies that B ref is again a Bro wnian motion. After τ c , both copies ev olve synchronously and therefore stick together . Lemma 10 (Radial dynamics under reﬂection coupling) . F ix s 0 ∈ S adm , and let Y , ¯ Y be coupled as in Deﬁnition 5 on [ t 1 , t 2 ] ⊆ [0 , t s ] . Then there e xists a one-dimensional Br ownian motion W such that, for t < τ c , d r t = D D t ∥ D t ∥ , b b t ( Y t ) − b b t ( ¯ Y t ) E d t + 2 g ( T − t ) d W t . (31) Mor eover , D D t ∥ D t ∥ , b b t ( Y t ) − b b t ( ¯ Y t ) E ≤ − κ b b t ( r t ) r t , (32) and hence d r t ≤ − κ s 0 ( r t ) r t d t + 2 g ( T − t ) d W t , t < τ c . (33) Pr oof. For t < τ c , one has n t = D t /r t and I d − H t = 2 n t n ⊤ t . Hence d D t =  b b t ( Y t ) − b b t ( ¯ Y t )  d t + 2 g ( T − t ) n t n ⊤ t d B t =  b b t ( Y t ) − b b t ( ¯ Y t )  d t + 2 g ( T − t ) n t d W t , where W t − W t 1 := Z t t 1 ⟨ n u , d B u ⟩ . Since n t is predictable and ∥ n t ∥ = 1 on { t < τ c } , W is a one-dimensional Brownian motion. T o justify the radial equation, localize a way from the origin. For n ≥ 1 , let τ n := inf { t ≥ t 1 : r t ≤ n − 1 } . 20 On [ t 1 , τ n ] , the map x 7→ ∥ x ∥ is C 2 in a neighborhood of D t , so Itô’ s formula giv es d r t =  D t ∥ D t ∥ , b b t ( Y t ) − b b t ( ¯ Y t )  d t + 2 g ( T − t ) d W t + 1 2 T r  4 g ( T − t ) 2 n t n ⊤ t ∇ 2 ∥ D t ∥  d t. Now ∇ 2 ∥ x ∥ = 1 ∥ x ∥  I d − xx ⊤ ∥ x ∥ 2  , x  = 0 , and n t = D t / ∥ D t ∥ , so T r  n t n ⊤ t ∇ 2 ∥ D t ∥  = 0 . Thus the second-order term vanishes identically . Letting n → ∞ prov es (31) for t < τ c . The bound (32) is immediate from Deﬁnition 2, because r t = ∥ D t ∥ . Combining it with Lemma 9 giv es (33). C.3 Construction and generator bound for the switch metric Deﬁnition 6 (Radial-cost W asserstein metric) . Let φ : [0 , ∞ ) → [0 , ∞ ) be increasing, concav e, and satisfy φ (0) = 0 . For µ, ν ∈ P 1 ( R d ) , deﬁne W φ ( µ, ν ) := inf π ∈ Π( µ,ν ) Z R d × R d φ ( ∥ x − y ∥ ) π (d x, d y ) . For such φ , the concavity and monotonicity of φ imply subadditivity , and the usual gluing argument shows that W φ is a transport metric on P 1 ( R d ) . The remaining task is to choose a conca ve cost whose generator absorbs the tw o-zone radial inequality from Lemma 9. On the core [0 , R sw ( s 0 )] , the drift may still expand, so one needs negati ve curvature of φ to be paid for by the diffusion term. Beyond R sw ( s 0 ) , the drift is already uniformly contractive, so further concavity w ould only waste this reserve. This leads to a Gaussian core and an af ﬁne tail. For the ﬁx ed switch s 0 , deﬁne λ ( s 0 ) := b ( s 0 ) + + m sw ( s 0 ) 4 g ( s 0 ) 2 , a ( s 0 ) := e − λ ( s 0 ) R sw ( s 0 ) 2 , c ( s 0 ) := m sw ( s 0 ) a ( s 0 ) . (34) Attach to s 0 the cost φ s 0 ( r ) :=    Z r 0 e − λ ( s 0 ) u 2 d u, 0 ≤ r ≤ R sw ( s 0 ) , φ s 0 ( R sw ( s 0 )) + a ( s 0 )  r − R sw ( s 0 )  , r ≥ R sw ( s 0 ) . (35) Lemma 11 (The switch gauge and its generator inequality) . F ix s 0 ∈ S adm such that g ( s 0 ) > 0 , b ( s 0 ) < ∞ , and G ( s 0 ) < ∞ . Then φ s 0 ∈ C 1 ([0 , ∞ )) is increasing and concave , and for all r ≥ 0 , a ( s 0 ) r ≤ φ s 0 ( r ) ≤ r . (36) Mor eover , 2 g ( s 0 ) 2 φ ′′ s 0 ( r ) − κ s 0 ( r ) r φ ′ s 0 ( r ) ≤ − c ( s 0 ) φ s 0 ( r ) for a.e. r > 0 . (37) Consequently , W φ s 0 is ﬁnite on P 1 ( R d ) and satisﬁes the triangle inequality . Pr oof. For brevity write λ := λ ( s 0 ) , R := R sw ( s 0 ) , g := g ( s 0 ) , b := b ( s 0 ) , m := m sw ( s 0 ) , a := a ( s 0 ) , c := c ( s 0 ) , κ := κ s 0 , φ := φ s 0 . From the deﬁnition, φ ′ ( r ) = ( e − λr 2 , 0 < r < R, e − λR 2 , r > R, φ ′′ ( r ) = ( − 2 λr e − λr 2 , 0 < r < R, 0 , r > R, 21 for a.e. r > 0 . Hence φ is increasing and concav e, and φ ′ is continuous at R , so φ ∈ C 1 ([0 , ∞ )) . The upper bound φ ( r ) ≤ r follows from 0 ≤ φ ′ ≤ 1 . For the lo wer bound, if 0 ≤ r ≤ R , then φ ( r ) = Z r 0 e − λu 2 d u ≥ r e − λR 2 = a r . If r ≥ R , then φ ( r ) = φ ( R ) + a ( r − R ) ≥ a R + a ( r − R ) = a r. This prov es (36). If 0 < r ≤ R , then κ ( r ) = − b , and therefore 2 g 2 φ ′′ ( r ) − κ ( r ) r φ ′ ( r ) =  b − 4 g 2 λ  r e − λr 2 . By the deﬁnition of λ , b − 4 g 2 λ = b − b + − m ≤ − m. Hence 2 g 2 φ ′′ ( r ) − κ ( r ) r φ ′ ( r ) ≤ − mre − λr 2 . Since r ≤ R implies e − λr 2 ≥ e − λR 2 = a , and φ ( r ) ≤ r , c φ ( r ) = m a φ ( r ) ≤ me − λr 2 r . Thus the generator inequality holds on (0 , R ] . If r > R , then κ ( r ) = m , φ ′′ ( r ) = 0 , and φ ′ ( r ) = a , so 2 g 2 φ ′′ ( r ) − κ ( r ) r φ ′ ( r ) = − m a r = − cr ≤ − c φ ( r ) , again because φ ( r ) ≤ r . This proves (37). C.4 Early semigroup contraction Remark 6 (The e xtra theorem-le vel input used in the early phase) . Beyond Assumptions 1–2, the ne xt proposition uses one additional theorem-lev el input: the learned rev erse SDE must be strongly well posed on [0 , t s ] and must admit coalescing reﬂection couplings on each subinterv al for ev ery initial coupling. This is not automatic from the standing assumptions. Later , Proposition 7 and Corollary 4 giv e con venient suf ﬁcient conditions, including the standard global/local Lipschitz re gimes used in the literature. Lemma 12 (Signed-load Grönwall for synchronized differences) . Let D : [ u, v ] → R d be absolutely continuous. Assume that β ∈ L 1 ( u, v ) , F ∈ L 1 ( u, v ; R d ) , and for a.e. t ∈ [ u, v ] , ˙ D t = A t + F t , ⟨ D t , A t ⟩ ≤ β ( t ) ∥ D t ∥ 2 . Then for every t ∈ [ u, v ] , ∥ D t ∥ ≤ e R t u β ( r ) d r ∥ D u ∥ + Z t u e R t r β ( τ ) d τ ∥ F r ∥ d r. Pr oof. For ε > 0 , deﬁne q ε ( t ) := p ∥ D t ∥ 2 + ε. Since D is absolutely continuous, so is q ε , and for a.e. t , q ′ ε ( t ) = ⟨ D t , A t + F t ⟩ q ε ( t ) ≤ β ( t ) ∥ D t ∥ 2 q ε ( t ) + ∥ F t ∥ . Now β ( t ) ∥ D t ∥ 2 q ε ( t ) = β ( t ) q ε ( t ) − β ( t ) ε q ε ( t ) ≤ β ( t ) q ε ( t ) + | β ( t ) | √ ε, because q ε ( t ) ≥ √ ε . Hence q ′ ε ( t ) ≤ β ( t ) q ε ( t ) + ∥ F t ∥ + | β ( t ) | √ ε. Gronwall’ s inequality yields q ε ( t ) ≤ e R t u β ( τ ) d τ q ε ( u ) + Z t u e R t r β ( τ ) d τ ∥ F r ∥ d r + √ ε Z t u e R t r β ( τ ) d τ | β ( r ) | d r. Letting ε ↓ 0 prov es the claim. 22 Lemma 13. F ix s 0 ∈ S adm such that g ( s 0 ) > 0 , b ( s 0 ) < ∞ , and G ( s 0 ) < ∞ . Assume, in addition, that the learned re verse SDE is strongly well posed on [0 , t s ] and admits coalescing r eﬂection couplings on every subinterval [ u, v ] ⊆ [0 , t s ] for every initial coupling π ∈ Π( µ, ν ) with µ, ν ∈ P 1 ( R d ) . Then W φ s 0  µP u,v , ν P u,v  ≤ e − c ( s 0 )( v − u ) W φ s 0 ( µ, ν ) , 0 ≤ u ≤ v ≤ t s , (38) for every µ, ν ∈ P 1 ( R d ) . Pr oof. Fix µ, ν ∈ P 1 ( R d ) and 0 ≤ u ≤ v ≤ t s . Choose any coupling π ∈ Π( µ, ν ) , and let ( Y t , ¯ Y t ) be a coalescing reﬂection coupling of the learned re verse SDE on [ u, v ] in the sense of Deﬁnition 5, with initial law π . Write r t := ∥ Y t − ¯ Y t ∥ , φ := φ s 0 , c := c ( s 0 ) , g := g ( s 0 ) . By Lemma 10, d r t ≤ − κ s 0 ( r t ) r t d t + 2 g ( T − t ) d W t for t < τ c . Apply Meyer –Itô’ s formula to the stopped process t 7→ φ ( r t ∧ τ c ) . This is legitimate because φ ∈ C 1 and is piecewise C 2 , and the only potential local-time term at the gluing point R sw ( s 0 ) vanishes because φ ′ is continuous there. Using φ ′′ ≤ 0 and (33), we obtain d φ ( r t ∧ τ c ) ≤  2 g ( T − t ) 2 φ ′′ ( r t ) − κ s 0 ( r t ) r t φ ′ ( r t )  1 { t<τ c } d t + 2 g ( T − t ) φ ′ ( r t ) 1 { t<τ c } d W t . Since g ( T − t ) ≥ g on [ u, v ] and φ ′′ ≤ 0 , 2 g ( T − t ) 2 φ ′′ ( r t ) ≤ 2 g 2 φ ′′ ( r t ) . Hence Lemma 11 giv es d φ ( r t ∧ τ c ) ≤ − c φ ( r t ∧ τ c ) d t + 2 g ( T − t ) φ ′ ( r t ) 1 { t<τ c } d W t . Because 0 ≤ φ ′ ≤ 1 and R v u g ( T − t ) 2 d t < ∞ , the stochastic integral is square integrable and therefore a true martingale. Multiplying by e ct , integrating from u to v , and taking expectations yields E [ φ ( r v )] ≤ e − c ( v − u ) E [ φ ( r u )] . Since the law of ( Y v , ¯ Y v ) is a coupling of µP u,v and ν P u,v , W φ ( µP u,v , ν P u,v ) ≤ E [ φ ( r v )] ≤ e − c ( v − u ) E [ φ ( r u )] . T aking the inﬁmum o ver π ∈ Π( µ, ν ) pro ves (38). Once semigroup contraction is av ailable, ev ery pre-switch error source is handled in the same way: insert the error at the time it enters the chain and propagate it to t s with the damping factor e − c ( s 0 )( t s −· ) . The next proposition separates the three places where error enters: initialization, discretization, and score forcing. Remark 7 (Why the coupling changes) . For the learned re verse SDE, synchronous coupling only sees the near-ﬁeld endpoint b ( s ) , because formally 1 2 d d t ∥ Y t − ¯ Y t ∥ 2 ≤ b ( s ) ∥ Y t − ¯ Y t ∥ 2 . Reﬂection coupling instead acts on the radial process and therefore accesses the full proﬁle r 7→ κ b b t ( r ) , including its contractiv e tail. This is the only reason the proof switches couplings. Lemma 14. F ix s 0 ∈ S adm with t K = t s := T − s 0 , and assume b p T belongs to P 1 ( R d ) , Assumption 2, the hypotheses of Lemma 13, and that the theorem-le vel ideal r ever se SDE (12) admits a global str ong solution ( e X t ) 0 ≤ t ≤ T with La w( e X t ) = p T − t , 0 ≤ t ≤ T . 23 Then ∆ sw φ ( s 0 ) := W φ s 0  La w( Z t s ) , p s 0  satisﬁes ∆ sw φ ( s 0 ) ≤ e − c ( s 0 ) t s W φ s 0 ( b p T , p T ) + K − 1 X k =0 e − c ( s 0 )( t s − t k +1 )  E [ ξ k ( Z t k ) 2 ]  1 / 2 + Z T s 0 e − c ( s 0 )( s − s 0 ) g 2 ( s ) ε ( s ) d s. (39) In particular , the right-hand side is exactly ∆ sw φ ( s 0 ) fr om (27) . Pr oof. Let φ := φ s 0 and c := c ( s 0 ) . Write µ k := La w( Z t k ) , ρ k := µ k P t k ,t s , k = 0 , . . . , K . Then ρ 0 = b p T P 0 ,t s , ρ K = La w( Z t s ) . Initialization mismatch. By Lemma 13, W φ  b p T P 0 ,t s , p T P 0 ,t s  ≤ e − ct s W φ ( b p T , p T ) . Early discretization. By the triangle inequality for W φ , W φ ( ρ K , ρ 0 ) ≤ K − 1 X k =0 W φ ( ρ k +1 , ρ k ) . Since ρ k +1 = ( µ k Ψ k ) P t k +1 ,t s , ρ k = ( µ k P k ) P t k +1 ,t s , Lemma 13 giv es W φ ( ρ k +1 , ρ k ) ≤ e − c ( t s − t k +1 ) W φ ( µ k Ψ k , µ k P k ) . Now φ ( r ) ≤ r , so W φ ≤ W 1 ≤ W 2 . Apply Zhang [2013, Theorem 1.1] on the parameter space ( R d , B ( R d ) , µ k ) with µ x := Ψ k ( x, · ) , ν x := P k ( x, · ) , c ( y, z ) := ∥ y − z ∥ 2 . This yields a Borel measurable family x 7− → π x k ∈ Π(Ψ k ( x, · ) , P k ( x, · )) of optimal couplings for the quadratic cost. Integrating π x k against µ k (d x ) giv es a coupling of µ k Ψ k and µ k P k , and therefore W 2 2 ( µ k Ψ k , µ k P k ) ≤ Z R d W 2 2  Ψ k ( x, · ) , P k ( x, · )  µ k (d x ) ≤ E [ ξ k ( Z t k ) 2 ] . Hence W φ ( µ k Ψ k , µ k P k ) ≤ W 2 ( µ k Ψ k , µ k P k ) ≤  E [ ξ k ( Z t k ) 2 ]  1 / 2 . Summing ov er k gi ves W φ  La w( Z t s ) , b p T P 0 ,t s  ≤ K − 1 X k =0 e − c ( t s − t k +1 )  E [ ξ k ( Z t k ) 2 ]  1 / 2 . (40) 24 Pre-switch scor e f orcing . This step uses two dif ferent couplings. T o propagate a law discrepancy from time u j +1 to t s , we use the learned-semigroup contraction from Lemma 13, whose proof is based on reﬂection coupling. But to compare the ideal and learned re verse dynamics on the short interval [ u j , u j +1 ] , we use a synchronous comparison built from the global ideal re verse path. No separate restart assumption for the ideal equation is used: we simply restrict the theorem-lev el solution ( e X t ) 0 ≤ t ≤ T of (12) to the interval [ u j , u j +1 ] , and solve the learned equation on the same ﬁltered space and with the same Brownian motion B , started from the random state e X u j . Let 0 = u 0 < · · · < u m = t s be a partition, and deﬁne η j := p T − u j P u j ,t s , j = 0 , . . . , m. Then η 0 = p T P 0 ,t s and η m = p s 0 , so W φ ( p T P 0 ,t s , p s 0 ) ≤ m − 1 X j =0 W φ ( η j , η j +1 ) . Fix j . Let Y ( j ) be the strong solution of the learned re verse SDE on [ u j , u j +1 ] , dri ven by the same Brownian motion B as e X , and initialized by Y ( j ) u j = e X u j . Since La w( e X u j ) = p T − u j , the Markov property of the learned re verse semigroup giv es La w  Y ( j ) u j +1 | F u j  = P u j ,u j +1 ( e X u j , · ) , and therefore La w  Y ( j ) u j +1  = p T − u j P u j ,u j +1 . On the other hand, because e X is the global ideal rev erse trajectory , La w( e X u j +1 ) = p T − u j +1 . Therefore η j = La w  Y ( j ) u j +1  P u j +1 ,t s , η j +1 = La w( e X u j +1 ) P u j +1 ,t s . Applying Lemma 13 from u j +1 to t s giv es W φ ( η j , η j +1 ) ≤ e − c ( t s − u j +1 ) W φ  La w( Y ( j ) u j +1 ) , La w( e X u j +1 )  . Set D t := Y ( j ) t − e X t . Because the Brownian motions are identical, the noise cancels and ˙ D t = b b t ( Y ( j ) t ) − b b t ( e X t ) + δ t ( e X t ) for a.e. t ∈ [ u j , u j +1 ] . By Lemma 1(iii), ⟨ D t , b b t ( Y ( j ) t ) − b b t ( e X t ) ⟩ ≤ b ( T − t ) ∥ D t ∥ 2 ≤ b ( s 0 ) ∥ D t ∥ 2 for a.e. t ∈ [ u j , u j +1 ] . Lemma 12, with A t := b b t ( Y ( j ) t ) − b b t ( e X t ) , F t := δ t ( e X t ) , β ( t ) := b ( s 0 ) , and D u j = 0 , therefore yields ∥ D u j +1 ∥ ≤ Z u j +1 u j e b ( s 0 )( u j +1 − r ) ∥ δ r ( e X r ) ∥ d r. Since W φ ≤ W 1 , the coupling ( Y ( j ) u j +1 , e X u j +1 ) giv es W φ  La w( Y ( j ) u j +1 ) , La w( e X u j +1 )  ≤ E ∥ D u j +1 ∥ ≤ Z u j +1 u j e b ( s 0 )( u j +1 − r ) E ∥ δ r ( e X r ) ∥ d r. 25 Therefore W φ ( p T P 0 ,t s , p s 0 ) ≤ m − 1 X j =0 Z u j +1 u j e − c ( t s − u j +1 ) e b ( s 0 )( u j +1 − r ) E ∥ δ r ( e X r ) ∥ d r. Now e X r ∼ p T − r for each r ∈ [0 , t s ] by the theorem-lev el hypothesis, and E ∥ δ r ( e X r ) ∥ = g 2 ( T − r ) E X ∼ p T − r ∥ e T − r ( X ) ∥ ≤ g 2 ( T − r ) ε ( T − r ) by Cauchy–Schwarz and Assumption 2. Hence the integrand is dominated by e ( b ( s 0 )+ c ) t s g 2 ( T − r ) ε ( T − r ) , which is integrable on [0 , t s ] . Letting the mesh of the partition tend to zero and using dominated con ver gence gives W φ ( p T P 0 ,t s , p s 0 ) ≤ Z t s 0 e − c ( t s − r ) E ∥ δ r ( e X r ) ∥ d r = Z T s 0 e − c ( s − s 0 ) g 2 ( s ) E X ∼ p s ∥ e s ( X ) ∥ d s. Applying Cauchy–Schwarz once more yields W φ ( p T P 0 ,t s , p s 0 ) ≤ Z T s 0 e − c ( s − s 0 ) g 2 ( s ) ε ( s ) d s. (41) Combine the three pieces. By the triangle inequality , W φ (La w( Z t s ) , p s 0 ) ≤ W φ (La w( Z t s ) , b p T P 0 ,t s ) + W φ ( b p T P 0 ,t s , p T P 0 ,t s ) + W φ ( p T P 0 ,t s , p s 0 ) . Substituting (40), the initialization estimate, and (41) prov es (39). D Switch-time con version back to W 2 This section performs the one-time return to Euclidean transport used by our one-switch family . The proof is deliberately specialized to the present switch metric. Because φ s 0 has an afﬁne tail, one can decompose D 2 into a small-distance piece controlled by φ s 0 ( D ) and a large-distance piece paid for by a p -moment. This produces the exponent θ p without any additional interpolation machinery and explains why the later sharpness statement is tied to af ﬁne-tail costs. Lemma 15. F ix an admissible switch s 0 ∈ S adm and p > 2 . Assume that the switc h-time budget M sw p ( s 0 ) fr om (25) is ﬁnite. Then W 2  La w( Z t s ) , p s 0  ≤ C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p , (42) wher e C sw p ( s 0 ) is the switch factor fr om Section 4. Pr oof. The argument is a one-threshold decomposition of D 2 . Small separations are controlled by the linear lower bound φ ( r ) ≥ a r , while large separations are paid for by the p -moment budget. Let φ := φ s 0 , a := a ( s 0 ) , and µ := La w( Z t s ) , ν := p s 0 . By Lemma 11, φ ( r ) ≥ a r for all r ≥ 0 . Fix any coupling ( U, V ) of ( µ, ν ) , and set D := ∥ U − V ∥ . For e very ρ > 0 , D 2 = D 2 1 { D ≤ ρ } + D 2 1 { D>ρ } ≤ ρD + ρ − ( p − 2) D p ≤ ρ a φ ( D ) + ρ − ( p − 2) D p . T aking e xpectations and using E [ D p ] ≤ 2 p − 1  E ∥ U ∥ p + E ∥ V ∥ p  ≤ 2 p − 1 M sw p ( s 0 ) giv es E [ D 2 ] ≤ ρ a E [ φ ( D )] + 2 p − 1 M sw p ( s 0 ) ρ p − 2 . 26 T aking the inﬁmum o ver couplings yields W 2 2 ( µ, ν ) ≤ ρ a W φ ( µ, ν ) + 2 p − 1 M sw p ( s 0 ) ρ p − 2 . Optimizing in ρ giv es W 2 ( µ, ν ) ≤ C tr ( p ) a ( s 0 ) − θ p  M sw p ( s 0 )  1 2( p − 1)  W φ ( µ, ν )  θ p , which is exactly (42). D.1 An explicit admissible moment criterion Proposition 2. F ix p > 2 , an admissible switch s 0 ∈ S adm , and let t s := T − s 0 . Assume the grid contains t s , say t K = t s , and suppose Z R d ∥ y ∥ p Ψ k ( x, d y ) ≤ (1 + A k ) ∥ x ∥ p + B k , x ∈ R d , k = 0 , . . . , K − 1 , for constants A k , B k ≥ 0 . Then E ∥ Z t s ∥ p ≤ M sw p, num ( s 0 ) := K − 1 Y j =0 (1 + A j ) ! E Z 0 ∼ b p T ∥ Z 0 ∥ p + K − 1 X i =0 B i K − 1 Y j = i +1 (1 + A j ) . Consequently , one admissible choice in (25) is M sw p ( s 0 ) = M sw p, num ( s 0 ) + E X ∼ p s 0 ∥ X ∥ p . Pr oof. Set m k := E ∥ Z t k ∥ p . Since Law( Z t k +1 ) = La w( Z t k )Ψ k , the one-step bound giv es m k +1 ≤ (1 + A k ) m k + B k . Iterating this scalar recursion yields the stated bound for E ∥ Z t s ∥ p . The second claim follows from La w( e X t s ) = p s 0 and Lemma 8. D.2 Sharpness of the con version exponent The next proposition explains why the exponent θ p = ( p − 2) / (2( p − 1)) should be viewed as structural rather than incidental. The same afﬁne-tail softness that makes the early metric usable is exactly what limits the price of con verting back to W 2 . Proposition 3. Fix p > 2 . Let φ : [0 , ∞ ) → [0 , ∞ ) be concave, nondecreasing, and afﬁne on [ R φ , ∞ ) with slope a φ > 0 . Then the exponent θ p := p − 2 2( p − 1) in the interpolation W 2 ( µ, ν ) ≲  M p ( µ, ν )  1 2( p − 1)  W φ ( µ, ν )  θ p cannot in general be impr oved within the class of afﬁne-tail concave costs and p -moment switch assumptions. More pr ecisely , ther e exist pairs ( µ R , ν R ) with sup R ≥ 1 M p ( µ R , ν R ) < ∞ such that W φ ( µ R , ν R ) ≍ R 1 − p , W 2 ( µ R , ν R ) ≍ R 1 − p/ 2 ≍  W φ ( µ R , ν R )  θ p . Pr oof. Let e 1 = (1 , 0 , . . . , 0) and deﬁne µ R := (1 − R − p ) δ 0 + R − p δ Re 1 , ν R := δ 0 . Then sup R ≥ 1 M p ( µ R , ν R ) < ∞ . Since ν R is a Dirac mass, the unique coupling giv es W 2 2 ( µ R , ν R ) = R − p ∥ Re 1 ∥ 2 = R 2 − p , W φ ( µ R , ν R ) = R − p φ ( R ) . Because φ is af ﬁne with positiv e slope for large R , one has φ ( R ) ≍ R , so W φ ( µ R , ν R ) ≍ R 1 − p . Hence  W φ ( µ R , ν R )  θ p ≍ R (1 − p ) p − 2 2( p − 1) = R 1 − p/ 2 = W 2 ( µ R , ν R ) . 27 E Late-window Euclidean pr opagation After the switch, the radial geometry has already done all of its work. What remains is a purely Euclidean short-horizon propagation problem. Accordingly , this section uses only Euclidean W 2 estimates on [ t s , T ] and k eeps separate the three quantities that arri ve there: the con verted switch-time discrepancy , late discretization, and late score forcing. Proposition 4. F ix s 0 ∈ [0 , T ] and assume that the grid contains the switc h time t s := T − s 0 = t K for some K ∈ { 0 , . . . , N } . Assume Assumptions 1, 2, and 3. Assume moreo ver that Γ( s 0 ) < ∞ , that the learned r everse SDE is str ongly well posed on [ t s , T ] , that the ideal r everse SDE (12) admits a global str ong solution ( e X t ) 0 ≤ t ≤ T with La w( e X t ) = p T − t , 0 ≤ t ≤ T , that the one-step defects satisfy W 2  P k ( x, · ) , Ψ k ( x, · )  ≤ ξ k ( x ) , x ∈ R d , k = K , . . . , N − 1 , and that the r elevant second moments ar e ﬁnite. Then W 2  La w( Z t N ) , p 0  ≤ ∆ disc late ( s 0 ) + e Γ( s 0 ) W 2  La w( Z t s ) , p s 0  + ∆ force late ( s 0 ) . (43) Pr oof. Let µ k := La w( Z t k ) , ν k := µ k P t k ,T , k = K , . . . , N . Then ν N = La w( Z t N ) , ν K = µ K P t s ,T . late discretization. By the triangle inequality , W 2 ( ν N , ν K ) ≤ N − 1 X k = K W 2 ( ν k +1 , ν k ) . Since ν k +1 = ( µ k Ψ k ) P t k +1 ,T , ν k = ( µ k P k ) P t k +1 ,T , it remains to propagate one-step defects from t k +1 to T . Fix k ∈ { K , . . . , N − 1 } , and let α, β ∈ P 2 ( R d ) . Choose any coupling π ∈ Π( α, β ) , let ( Y t k +1 , ¯ Y t k +1 ) ∼ π , and let Y , ¯ Y solve the learned re verse SDE on [ t k +1 , T ] driv en by the same Brownian motion. Set D t := Y t − ¯ Y t . Because the noise is synchronized, D is absolutely continuous and ˙ D t = b b t ( Y t ) − b b t ( ¯ Y t ) for a.e. t ∈ [ t k +1 , T ] . By Lemma 1(iii), ⟨ D t , b b t ( Y t ) − b b t ( ¯ Y t ) ⟩ ≤ b ( T − t ) ∥ D t ∥ 2 for a.e. t ∈ [ t k +1 , T ] . Applying Lemma 12 with A t := b b t ( Y t ) − b b t ( ¯ Y t ) , F t := 0 , β ( t ) := b ( T − t ) , yields ∥ D T ∥ ≤ e R T t k +1 b ( T − r ) d r ∥ D t k +1 ∥ = e Γ( T − t k +1 ) ∥ D t k +1 ∥ . Squaring and taking expectations gi ves E ∥ D T ∥ 2 ≤ e 2Γ( T − t k +1 ) E ∥ D t k +1 ∥ 2 . 28 Since π was arbitrary , taking the inﬁmum ov er all π ∈ Π( α, β ) yields W 2  αP t k +1 ,T , β P t k +1 ,T  ≤ e Γ( T − t k +1 ) W 2 ( α, β ) . Applying this with α = µ k Ψ k and β = µ k P k giv es W 2 ( ν k +1 , ν k ) ≤ e Γ( T − t k +1 ) W 2 ( µ k Ψ k , µ k P k ) . Apply Zhang [2013, Theorem 1.1] on the parameter space ( R d , B ( R d ) , µ k ) with µ x := Ψ k ( x, · ) , ν x := P k ( x, · ) , c ( y, z ) := ∥ y − z ∥ 2 . This yields a Borel measurable family x 7− → π x k ∈ Π(Ψ k ( x, · ) , P k ( x, · )) of optimal couplings for the quadratic cost. Integrating π x k against µ k (d x ) giv es a coupling of µ k Ψ k and µ k P k , and therefore W 2 2 ( µ k Ψ k , µ k P k ) ≤ Z R d W 2 2  Ψ k ( x, · ) , P k ( x, · )  µ k (d x ) ≤ E [ ξ k ( Z t k ) 2 ] . Therefore W 2 ( ν k +1 , ν k ) ≤ e Γ( T − t k +1 )  E [ ξ k ( Z t k ) 2 ]  1 / 2 . Summing ov er k = K , . . . , N − 1 yields W 2  La w( Z t N ) , µ K P t s ,T  ≤ ∆ disc late ( s 0 ) . (44) switch-time discrepancy and late score f orcing. W e separate the switch-time mismatch from the forcing accumulated after the switch: W 2 ( µ K P t s ,T , p 0 ) ≤ W 2 ( µ K P t s ,T , p s 0 P t s ,T ) + W 2 ( p s 0 P t s ,T , p 0 ) . For the ﬁrst term, choose an optimal coupling π ∈ Π( µ K , p s 0 ) , let ( Y t s , ¯ Y t s ) ∼ π , and let Y , ¯ Y solve the learned rev erse SDE on [ t s , T ] dri ven by the same Bro wnian motion. Set D t := Y t − ¯ Y t . Because the noise is synchronized, ˙ D t = b b t ( Y t ) − b b t ( ¯ Y t ) for a.e. t ∈ [ t s , T ] . By Lemma 1(iii), ⟨ D t , b b t ( Y t ) − b b t ( ¯ Y t ) ⟩ ≤ b ( T − t ) ∥ D t ∥ 2 for a.e. t ∈ [ t s , T ] . Applying Lemma 12 with F t := 0 and β ( t ) := b ( T − t ) yields ∥ D T ∥ ≤ e R T t s b ( T − r ) d r ∥ D t s ∥ = e Γ( s 0 ) ∥ D t s ∥ . Squaring, taking expectations, and using the optimality of π give W 2 ( µ K P t s ,T , p s 0 P t s ,T ) ≤ e Γ( s 0 ) W 2 ( µ K , p s 0 ) = e Γ( s 0 ) W 2 (La w( Z t s ) , p s 0 ) . It remains to control W 2 ( p s 0 P t s ,T , p 0 ) . Let e X be the global ideal rev erse solution from the theorem- lev el hypothesis. Because the learned reverse SDE is strongly well posed on [ t s , T ] , on the same ﬁltered space and with the same Brownian motion B that drives e X , there exists a unique strong solution Y ⋆ of the learned rev erse SDE on [ t s , T ] with Y ⋆ t s = e X t s . Since La w( e X t s ) = p s 0 , the Markov property gi ves La w( Y ⋆ T ) = p s 0 P t s ,T . 29 Moreov er , La w( e X r ) = p T − r for all r ∈ [ t s , T ] , in particular Law( e X T ) = p 0 . Set D t := Y ⋆ t − e X t . Because the Brownian motions are identical, ˙ D t = b b t ( Y ⋆ t ) − b b t ( e X t ) + δ t ( e X t ) for a.e. t ∈ [ t s , T ] . By Lemma 1(iii), ⟨ D t , b b t ( Y ⋆ t ) − b b t ( e X t ) ⟩ ≤ b ( T − t ) ∥ D t ∥ 2 for a.e. t ∈ [ t s , T ] . Applying Lemma 12 with A t := b b t ( Y ⋆ t ) − b b t ( e X t ) , F t := δ t ( e X t ) , β ( t ) := b ( T − t ) , and using D t s = 0 giv es ∥ D T ∥ ≤ Z T t s e R T r b ( T − τ ) d τ ∥ δ r ( e X r ) ∥ d r = Z T t s e Γ( T − r ) ∥ δ r ( e X r ) ∥ d r. T aking the L 2 -norm and using Minko wski’ s integral inequality yields  E ∥ D T ∥ 2  1 / 2 ≤ Z T t s e Γ( T − r )  E ∥ δ r ( e X r ) ∥ 2  1 / 2 d r . Also, since e X r ∼ p T − r ,  E ∥ δ r ( e X r ) ∥ 2  1 / 2 = g 2 ( T − r )  E X ∼ p T − r ∥ e T − r ( X ) ∥ 2  1 / 2 ≤ g 2 ( T − r ) ε ( T − r ) by Assumption 2. Therefore the coupling ( Y ⋆ T , e X T ) giv es W 2 ( p s 0 P t s ,T , p 0 ) ≤  E ∥ D T ∥ 2  1 / 2 ≤ Z T t s e Γ( T − r ) g 2 ( T − r ) ε ( T − r ) d r . Changing variables s = T − r gi ves W 2 ( p s 0 P t s ,T , p 0 ) ≤ ∆ force late ( s 0 ) . (45) Combining the two bounds prov es W 2 ( µ K P t s ,T , p 0 ) ≤ e Γ( s 0 ) W 2 (La w( Z t s ) , p s 0 ) + ∆ force late ( s 0 ) . (46) combine the two late pieces. Combining (44) with (46) and the triangle inequality proves (43). F Main theorem pr oof and Euclidean comparison F .1 Proof of the main theorem and switch selection Pr oof of Theorem 1. Fix an admissible grid-aligned switch s 0 ∈ S adm , with t K = t s := T − s 0 . By Lemma 14, ∆ sw φ ( s 0 ) ≤ ∆ sw φ ( s 0 ) . By Lemma 15, W 2  La w( Z t s ) , p s 0  ≤ C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p ≤ C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p . Finally , Proposition 4 yields W 2  La w( Z t N ) , p 0  ≤ ∆ late ( s 0 ) + e Γ( s 0 ) W 2  La w( Z t s ) , p s 0  . Substituting the switch-time con version estimate pro ves (26). 30 Deﬁnition 7 (Scalar switch-selection objecti ve) . For an admissible grid-aligned switch s 0 ∈ S adm , deﬁne B p ( s 0 ) := ∆ late ( s 0 ) + e Γ( s 0 ) C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p . (47) Corollary 2 (Discrete one-switch routing principle) . Deﬁne the admissible grid S h adm := S adm ∩ { T − t k : k = 1 , . . . , N − 1 } . Under the assumptions of Theor em 1, every switc h s 0 ∈ S h adm satisﬁes W 2  La w( Z t N ) , p 0  ≤ B p ( s 0 ) . Hence, whene ver S h adm  = ∅ , the one-switch r outing pr oblem on the numerical grid r educes to the scalar minimization min s 0 ∈S h adm B p ( s 0 ) . In particular , ther e exists s ⋆ 0 ∈ S h adm such that B p ( s ⋆ 0 ) = min s 0 ∈S h adm B p ( s 0 ) , W 2  La w( Z t N ) , p 0  ≤ B p ( s ⋆ 0 ) . Remark 8 . The substance of Corollary 2 is not the elementary fact that a minimum exists on a ﬁnite set. Its content is structural: Theorem 1 has already compressed an end-to-end geometric routing problem into the single scalar ﬁeld s 0 7− → B p ( s 0 ) = ∆ late ( s 0 ) | {z } late Euclidean propagation + e Γ( s 0 ) C sw p ( s 0 )  ∆ sw φ ( s 0 )  θ p | {z } early routed budget after one-time return to W 2 . Admissibility is the geometric existence statement that the early window supports a positi ve far -ﬁeld contraction reserve. Optimization is the quantitativ e routing statement, within the one-switch family studied here, that chooses where one should lea ve the ﬁrst contracting geometry , pay the one-time con version back to W 2 , and e xpose the remaining discrepancy to the late Euclidean load. In this sense, the corollary turns a pathwise choice of interface between two geometries into a one-dimensional design problem on the admissible grid. It also clariﬁes why an e xact routed-versus-direct comparison is possible: for ﬁx ed s 0 , the late- window contrib ution is the same in both bounds, so any strict improvement can only come from the early window . W e do not claim that this scalar optimization is globally optimal among multi-switch or continuously varying metric strate gies. Pr oof of Corollary 2. Theorem 1 together with Deﬁnition 7 yields W 2  La w( Z t N ) , p 0  ≤ B p ( s 0 ) , s 0 ∈ S h adm . If S h adm  = ∅ , then S h adm is ﬁnite, hence B p attains a minimum on S h adm . Applying Theorem 1 at a minimizer giv es the stated bound. F .2 Comparison with direct Euclidean propagation Proposition 5. Assume mor eover that the ideal r everse SDE (12) admits a global strong solution ( e X t ) 0 ≤ t ≤ T with La w( e X t ) = p T − t , 0 ≤ t ≤ T . Assume Assumptions 1, 2, and 3, that Γ( T ) < ∞ , and that the learned r ever se SDE is str ongly well posed on [0 , T ] . Assume moreo ver that the one-step numerical defects satisfy W 2  P k ( x, · ) , Ψ k ( x, · )  ≤ ξ k ( x ) , x ∈ R d , k = 0 , . . . , N − 1 , and that the r elevant second moments ar e ﬁnite. Then W 2  La w( Z t N ) , p 0  ≤ e Γ( T ) W 2 ( b p T , p T ) + N − 1 X k =0 e Γ( T − t k +1 )  E [ ξ k ( Z t k ) 2 ]  1 / 2 + Z T 0 e Γ( s ) g 2 ( s ) ε ( s ) d s. (48) 31 Pr oof. Apply Proposition 4 with the candidate switch s 0 = T . Then t s = 0 , K = 0 , and p s 0 = p T . The late-window bound becomes e xactly (48). Corollary 3. Assume the hypotheses of Theor em 1 and of Pr oposition 5. Fix an admissible grid- aligned switch s 0 ∈ S adm , with t K = t s := T − s 0 . Write d k :=  E [ ξ k ( Z t k ) 2 ]  1 / 2 , Λ s 0 ( r ) := Z r s 0 b ( u ) d u, r ∈ [ s 0 , T ] . Deﬁne the common late-window term L ( s 0 ) := ∆ disc late ( s 0 ) + ∆ force late ( s 0 ) , the r outed early transport term R sw ( s 0 ) := C sw p ( s 0 ) " e − c ( s 0 ) t s W φ s 0 ( b p T , p T )+ K − 1 X k =0 e − c ( s 0 )( t s − t k +1 ) d k + Z T s 0 e − c ( s 0 )( s − s 0 ) g 2 ( s ) ε ( s ) d s # θ p , and the dir ect early transport term R dir ( s 0 ) := e Λ s 0 ( T ) W 2 ( b p T , p T ) + K − 1 X k =0 e Λ s 0 ( T − t k +1 ) d k + Z T s 0 e Λ s 0 ( s ) g 2 ( s ) ε ( s ) d s. Then B p ( s 0 ) = L ( s 0 ) + e Γ( s 0 ) R sw ( s 0 ) , D dir ( s 0 ) = L ( s 0 ) + e Γ( s 0 ) R dir ( s 0 ) , and hence W 2  La w( Z t N ) , p 0  ≤ min  B p ( s 0 ) , D dir ( s 0 )  . Mor eover , B p ( s 0 ) ≤ D dir ( s 0 ) ⇐ ⇒ R sw ( s 0 ) ≤ R dir ( s 0 ) , with strict inequality if and only if R sw ( s 0 ) < R dir ( s 0 ) . Pr oof. The identity B p ( s 0 ) = L ( s 0 ) + e Γ( s 0 ) R sw ( s 0 ) is just Deﬁnition 7. For the direct route, note that e Γ( T ) = e Γ( s 0 ) e Λ s 0 ( T ) , e Γ( T − t k +1 ) = e Γ( s 0 ) e Λ s 0 ( T − t k +1 ) , and e Γ( s ) = e Γ( s 0 ) e Λ s 0 ( s ) , s ∈ [ s 0 , T ] . Substituting these identities giv es D dir ( s 0 ) = L ( s 0 ) + e Γ( s 0 ) R dir ( s 0 ) . The bound by min {B p ( s 0 ) , D dir ( s 0 ) } combines Theorem 1 with Proposition 5, and the comparison criterion follows by subtracting the common term L ( s 0 ) and then the common factor e Γ( s 0 ) . Remark 9 . The comparison becomes transparent only after factoring out the common post-switch factor e Γ( s 0 ) . After that reduction, the two routes propagate the same three pre-switch inputs in opposite geometric directions. The direct route weights an input created at noise lev el u ∈ [ s 0 , T ] by the residual Euclidean factor e Λ s 0 ( u ) , whereas the routed bound ﬁrst damps the same input by e − c ( s 0 )( u − s 0 ) , and then pays a single return to W 2 through C sw p ( s 0 ) and θ p . This identiﬁes the genuine mismatch regime: radial contraction is already av ailable, while direct Euclidean propagation on the same window is still net expansi ve. A conv enient suf ﬁcient indicator is m ( s 0 ) > 0 and Λ s 0 ( T ) = Z T s 0 b ( v ) d v > 0 . 32 Then the routed bound damps the early inputs, whereas the direct bound ampliﬁes them on net. The only competing loss is the one-time metric con version. Accordingly , a strict improvement should be expected when the early windo w is long enough and suf ﬁciently inside the admissible regime so that c ( s 0 ) is appreciable and C sw p ( s 0 ) is not too large. By contrast, if the residual Euclidean load is already nonpositi ve, or if s 0 is chosen too close to the admissibility threshold, there is no reason to expect a systematic adv antage from routing. This regime is made explicit in the v ariance-preserving example of Proposition 8. G A uxiliary comparison principles and sufﬁcient conditions Nothing in this section is needed for the proof of Theorem 1. It records auxiliary comparison principles and sufﬁcient conditions: a relation between Assumption 3 and classical global score- regularity hypotheses, conserv ati ve admissibility certiﬁcates for switch selection, suf ﬁcient conditions for the early reﬂection-coupling hypothesis, and an explicit v ariance-preserving comparison result. G.1 Relation of score-error monotonicity to global scor e regularity Remark 10 (What Assumption 3 actually controls) . Assumption 3 is the theorem-lev el geometric input on the score error . Although it is stated in one-sided Lipschitz form, what the argument actually uses is more speciﬁc: for a pair ( x, y ) , the score-error contribution to pairwise distance ev olution appears only through g 2 ( s ) ⟨ e s ( x ) − e s ( y ) , x − y ⟩ . Accordingly , the radial lower proﬁle κ s , the switch reserve m ( s ) , and the late Euclidean load b ( s ) depend on e s only through the component of e s ( x ) − e s ( y ) parallel to the separation vector x − y . Purely transverse or rotational components are invisible to this mechanism. In this sense Assumption 3 is a pairwise geometric monotonicity condition, closer to the one-sided dissipativity hypotheses standard in SDE analysis [Higham et al., 2002] than to a full global Lipschitz bound on the vector ﬁeld. By contrast, global Lipschitz assumptions are stronger veriﬁcation de vices, and are also the natural hypotheses under which direct Euclidean W 2 arguments are typically closed [Gao et al., 2025, Gao and Zhu, 2024]. Lemma 16 (Suf ﬁcient condition for Assumption 3) . Suppose that for a.e. s ∈ (0 , T ] , one of the following holds: (a) the map x 7→ e s ( x ) is C 1 , and ther e e xists a measurable ¯ ℓ : (0 , T ] → R such that λ max  1 2  ∇ e s ( x ) + ∇ e s ( x ) ⊤   ≤ ¯ ℓ ( s ) , x ∈ R d ; (b) ther e exists a measurable L e : (0 , T ] → [0 , ∞ ) such that ∥ e s ( x ) − e s ( y ) ∥ ≤ L e ( s ) ∥ x − y ∥ , x, y ∈ R d ; (c) ther e exist measurable L θ , L ⋆ : (0 , T ] → [0 , ∞ ) such that ∥ s θ ( x, s ) − s θ ( y , s ) ∥ ≤ L θ ( s ) ∥ x − y ∥ , ∥∇ log p s ( x ) −∇ log p s ( y ) ∥ ≤ L ⋆ ( s ) ∥ x − y ∥ , x, y ∈ R d . Then Assumption 3 holds with ℓ ( s ) = ¯ ℓ ( s ) in case (a) , ℓ ( s ) = L e ( s ) in case (b) , ℓ ( s ) = L θ ( s ) + L ⋆ ( s ) in case (c) . In particular , the symmetrized Jacobian bound, global Lipsc hitz contr ol of e s , and separate Lipschitz contr ol of s θ and ∇ log p s all imply Assumption 3, but the y are not necessary for it. Pr oof. Case (a) is the standard symmetrized-Jacobian criterion. Fix s such that the assumption holds and let x, y ∈ R d . Writing γ ( τ ) = y + τ ( x − y ) , one has e s ( x ) − e s ( y ) = Z 1 0 ∇ e s ( γ ( τ ))( x − y ) d τ . T aking the inner product with x − y , symmetrizing, and using the eigenv alue bound gives ⟨ e s ( x ) − e s ( y ) , x − y ⟩ ≤ ¯ ℓ ( s ) ∥ x − y ∥ 2 . 33 For case (b), Cauchy–Schw arz giv es ⟨ e s ( x ) − e s ( y ) , x − y ⟩ ≤ ∥ e s ( x ) − e s ( y ) ∥ ∥ x − y ∥ ≤ L e ( s ) ∥ x − y ∥ 2 . For case (c), since e s ( x ) − e s ( y ) =  s θ ( x, s ) − s θ ( y , s )  −  ∇ log p s ( x ) − ∇ log p s ( y )  , the triangle inequality yields ∥ e s ( x ) − e s ( y ) ∥ ≤  L θ ( s ) + L ⋆ ( s )  ∥ x − y ∥ , and case (b) applies. Remark 11 (Assumption 3 is strictly weaker than global Lipschitz) . The con verse of Lemma 16 fails dramatically . Let e s ( x ) = A s x, A ⊤ s = − A s , where ∥ A s ∥ op may be arbitrarily large. Then for ev ery x, y ∈ R d , ⟨ e s ( x ) − e s ( y ) , x − y ⟩ = ⟨ A s ( x − y ) , x − y ⟩ = 0 , so Assumption 3 holds with ℓ ( s ) = 0 . Howe ver , ∥ e s ( x ) − e s ( y ) ∥ ≤ ∥ A s ∥ op ∥ x − y ∥ , and the global Lipschitz constant can be arbitrarily large. Thus Assumption 3 controls only the symmetric/expansi ve part of the score error , not the full vector increment. This is exactly why it is the right structural condition for pairwise distance propagation. Remark 12 (Assumptions 2 and 3 are complementary) . Assumption 2 and Assumption 3 address different error channels. The former is an av eraged forcing bound under p s : E X ∼ p s ∥ e s ( X ) ∥ 2 ≤ ε 2 ( s ) . The latter is a pairwise geometric monotonicity bound: ⟨ e s ( x ) − e s ( y ) , x − y ⟩ ≤ ℓ ( s ) ∥ x − y ∥ 2 . Neither replaces the other . A ﬁeld may satisfy Assumption 3 with ℓ ( s ) = 0 and still have large L 2 ( p s ) -size; the ske w-symmetric example above already shows this as soon as p s has nontrivial second moment. Con versely , a ﬁeld may hav e small averaged L 2 ( p s ) -error while having a large positi ve one-sided slope on a re gion of small p s -mass; such a ﬁeld barely changes the forcing inte gral but can destro y pairwise contraction. So the two hypotheses are not redundant: Assumption 2 controls av eraged forcing, while Assumption 3 controls pairwise expansion. G.2 Conservative admissibility and switch-selection certiﬁcates Proposition 6 (Proxy admissibility and proxy certiﬁcates) . Assume Assumptions 1, 2 and 3, and suppose that ℓ ( s ) ≤ ¯ ℓ ( s ) and ε ( s ) ≤ ¯ ε ( s ) for a.e. s ∈ (0 , T ] , where ¯ ℓ : (0 , T ] → R and ¯ ε : (0 , T ] → [0 , ∞ ) are measur able. Deﬁne m pr ( s ) := g 2 ( s )  α s − ¯ ℓ ( s )  − f ( s ) , b pr ( s ) := f ( s ) + g 2 ( s )  M s + ¯ ℓ ( s ) − α s  . Then every grid-aligned s 0 with m pr ( s 0 ) := inf u ∈ [ s 0 ,T ] m pr ( u ) > 0 is admissible, and the pr oof of Theor em 1 goes thr ough with the conservative r eplacements m , b , ε ⇝ m pr , b pr , ¯ ε. In particular , every suc h switch yields a certiﬁed pr oxy bound W 2  La w( Z t N ) , p 0  ≤ B pr p ( s 0 ) , wher e B pr p ( s 0 ) is obtained from B p ( s 0 ) by r eplacing the exact geometric inputs by their pr oxy versions. 34 Pr oof. Introduce the proxy lower en velope κ pr s ( r ) := g 2 ( s )  α s − 1 r f M s ( r ) − ¯ ℓ ( s )  − f ( s ) . Since ℓ ( s ) ≤ ¯ ℓ ( s ) , one has κ s ( r ) ≥ κ pr s ( r ) , r > 0 , s ∈ (0 , T ] . T aking r ↓ 0 and r → ∞ shows m ( s ) ≥ m pr ( s ) , b ( s ) ≤ b pr ( s ) , s ∈ (0 , T ] . Hence m pr ( s 0 ) > 0 implies m ( s 0 ) > 0 , so e very proxy-admissible switch is genuinely admissible. The remaining claim follows by monotonicity of the constructions used in the proof: a smaller lower proﬁle can only worsen the early contraction input, a lar ger forcing bound ¯ ε can only enlarge the forcing terms, and a larger Euclidean load b pr can only enlarge the late propagation factor . Rerunning the early contraction, switch con version, and late Euclidean propagation arguments with κ pr , m pr , b pr , ¯ ε therefore produces a certiﬁed upper bound B pr p ( s 0 ) . G.3 Sufﬁcient conditions for the early reﬂection-coupling h ypothesis Proposition 7 (A con venient suf ﬁcient condition for the early coupling input) . F ix an admissible switch s 0 ∈ S adm and write t s := T − s 0 . Assume: (a) g is continuous on [ s 0 , T ] and g ( s 0 ) > 0 ; (b) the map ( t, x ) 7→ b b t ( x ) is Bor el measurable on [0 , t s ] × R d ; (c) ther e exists L s 0 < ∞ such that ∥ b b t ( x ) − b b t ( y ) ∥ ≤ L s 0 ∥ x − y ∥ , t ∈ [0 , t s ] , x, y ∈ R d ; (d) ther e exists B s 0 < ∞ such that ∥ b b t ( x ) ∥ ≤ B s 0 (1 + ∥ x ∥ ) , t ∈ [0 , t s ] , x ∈ R d . Then the learned re verse SDE (13) is str ongly well posed on [0 , t s ] . Mor eover , for every initial coupling π ∈ Π( µ, ν ) with µ, ν ∈ P 1 ( R d ) and every subinterval [ u, v ] ⊆ [0 , t s ] , ther e exists a coalescing r eﬂection coupling in the sense of Deﬁnition 5; see Eberle [2016], Eberle et al. [2019]. Pr oof. Assumptions (b)–(d) are the standard measurable, global-Lipschitz, and linear-gro wth hy- potheses yielding strong existence and pathwise uniqueness for (13) on [0 , t s ] . Fix a subinterv al [ u, v ] ⊆ [0 , t s ] and an initial coupling π ∈ Π( µ, ν ) . Because the diffusion coef ﬁcient is the deterministic isotropic matrix g ( T − t ) I d , with g continuous and bounded away from 0 on [ u, v ] by (a), the standard reﬂection-coupling construction for isotropic dif fusions with globally Lipschitz drift applies on [ u, v ] : one reﬂects the driving Bro wnian motion across the hyperplane orthogonal to the current separation vector up to the meeting time and switches to synchronous coupling afterwards. See Eberle [2016, Section 2] and Eberle et al. [2019, Section 2.1]. The present coefﬁcient differs from the unit-diffusion case only by a deterministic scalar factor , so the same construction extends directly on each ﬁnite subinterval. Therefore, for e very initial coupling π , there exists a coalescing reﬂection coupling in the sense of Deﬁnition 5. Corollary 4 (A score-model suf ﬁcient condition) . F ix an admissible switc h s 0 ∈ S adm . Assume g is continuous on [ s 0 , T ] with g ( s 0 ) > 0 , that f and g ar e bounded on [ s 0 , T ] , and that ther e exist constants L θ ( s 0 ) , B θ, 0 ( s 0 ) < ∞ such that ∥ s θ ( x, s ) − s θ ( y , s ) ∥ ≤ L θ ( s 0 ) ∥ x − y ∥ , ∥ s θ (0 , s ) ∥ ≤ B θ, 0 ( s 0 ) , s ∈ [ s 0 , T ] , x, y ∈ R d . Then Pr oposition 7 applies. In particular , the coupling input requir ed by Lemma 13 is automatic. 35 Pr oof. For t ∈ [0 , t s ] , b b t ( x ) − b b t ( y ) = f ( T − t )( x − y ) + g 2 ( T − t )  s θ ( x, T − t ) − s θ ( y , T − t )  , hence ∥ b b t ( x ) − b b t ( y ) ∥ ≤  ∥ f ∥ L ∞ ([ s 0 ,T ]) + ∥ g ∥ 2 L ∞ ([ s 0 ,T ]) L θ ( s 0 )  ∥ x − y ∥ . Like wise, ∥ b b t ( x ) ∥ ≤ ∥ f ∥ L ∞ ([ s 0 ,T ]) ∥ x ∥ + ∥ g ∥ 2 L ∞ ([ s 0 ,T ])  ∥ s θ (0 , T − t ) ∥ + L θ ( s 0 ) ∥ x ∥  , so ∥ b b t ( x ) ∥ ≤ B s 0 (1 + ∥ x ∥ ) for a suitable ﬁnite B s 0 . Thus all hypotheses of Proposition 7 are satisﬁed. G.4 V ariance-preser ving schedules: explicit routed-versus-direct certiﬁcates The point of this e xample is not merely to compute proxy admissibility e xplicitly . It is to e xhibit a concrete regime in which the phase-aware certiﬁcate is structurally sharper than the direct Euclidean certiﬁcate. For the v ariance-preserving schedule, the formulas become fully e xplicit. In the regime α < 1 , Gaussian smoothing increases α s monotonically , so both the opening of the admissible radial window and the routed-v ersus-direct comparison can be written in closed form. Proposition 8 (VP proxy formulas and explicit routed-versus-direct certiﬁcates) . Consider the variance-pr eserving schedule f ( s ) ≡ β 2 , g ( s ) ≡ p β , β > 0 , and assume the pr oxy bounds ℓ ( s ) ≤ ¯ ℓ, ε ( s ) ≤ ¯ ε, s ∈ (0 , T ] , for constants ¯ ℓ ∈ R and ¯ ε ≥ 0 . Let D ( s ) := α + (1 − α ) e − β s . Assume mor eover that the grid is uniform, t k = k h , and that  E [ ξ k ( Z t k ) 2 ]  1 / 2 ≤ C sch h q , k = 0 , . . . , N − 1 . F or a proxy-admissible grid-aligned switch s 0 , let t s := T − s 0 = t K , L := T − s 0 , and let φ pr s 0 , c pr ( s 0 ) , and C sw p, pr ( s 0 ) be the pr oxy switch metric, damping rate, and con version constant furnished by Pr oposition 6 and (34) – (35) after r eplacing m , b , ε ⇝ m pr , b pr , ¯ ε. Then the following hold. (i) The smoothed weak-lo g-concavity parameters ar e a ( s ) = e − β s/ 2 , σ 2 ( s ) = 1 − e − β s , α s = α D ( s ) , M s = M e − β s D ( s ) 2 , and the pr oxy reserve and load ar e m pr ( s ) = β  α D ( s ) − ¯ ℓ − 1 2  , b pr ( s ) = β  M e − β s D ( s ) 2 + ¯ ℓ + 1 2 − α D ( s )  . (ii) 36 (iii) Suppose α < 1 and ¯ ℓ < 1 2 . Deﬁne s ⋆ pr := max ( 0 , 1 β log  ( ¯ ℓ + 1 2 )(1 − α ) α ( 1 2 − ¯ ℓ )  ) . Then every grid-aligned s 0 ∈ [ s ⋆ pr , T ] is proxy-admissible , pr ovided s ⋆ pr ≤ T . Equivalently , pr oxy admissibility begins e xactly when α s 0 > ¯ ℓ + 1 2 . F or later use, it is also con venient to recor d the primitive Γ pr ( s 0 ) := Z s 0 0 b pr ( u ) d u = β  ¯ ℓ + 1 2  s 0 − log  αe β s 0 + 1 − α  + M VP ( s 0 ) , wher e M VP ( s 0 ) :=      M 1 − α  1 D ( s 0 ) − 1  , α  = 1 , M (1 − e − β s 0 ) , α = 1 . The second branc h is included only to r ecor d the closed-form primitive in the de gener ate case α = 1 . (iv) Deﬁne the common late-window proxy term ∆ pr late ( s 0 ) := C sch h q N − 1 X k = K e Γ pr ( T − t k +1 ) + β ¯ ε Z s 0 0 e Γ pr ( u ) d u, and the r outed early budget Ξ pr sw ( s 0 ) := e − c pr ( s 0 ) L W φ pr s 0 ( b p T , p T ) + C sch h q 1 − e − c pr ( s 0 ) L 1 − e − c pr ( s 0 ) h + β ¯ ε c pr ( s 0 )  1 − e − c pr ( s 0 ) L  . Then the phase-awar e pr oxy certiﬁcate satisﬁes W 2  La w( Z t N ) , p 0  ≤ B pr p ( s 0 ) ≤ ∆ pr late ( s 0 ) + e Γ pr ( s 0 ) C sw p, pr ( s 0 )  Ξ pr sw ( s 0 )  θ p . (v) Deﬁne the dir ect Euclidean pr oxy certiﬁcate by D dir , pr ( s 0 ) := ∆ pr late ( s 0 )+ e Γ pr ( T ) W 2 ( b p T , p T )+ C sch h q K − 1 X k =0 e Γ pr ( T − t k +1 ) + β ¯ ε Z T s 0 e Γ pr ( u ) d u. If b pr ( s 0 ) := inf u ∈ [ s 0 ,T ] b pr ( u ) > 0 , then D dir , pr ( s 0 ) − ∆ pr late ( s 0 ) ≥ e Γ pr ( s 0 ) Ξ pr dir ( s 0 ) , wher e Ξ pr dir ( s 0 ) := e b pr ( s 0 ) L W 2 ( b p T , p T ) + C sch h q e b pr ( s 0 ) L − 1 e b pr ( s 0 ) h − 1 + β ¯ ε b pr ( s 0 )  e b pr ( s 0 ) L − 1  . Consequently , whenever C sw p, pr ( s 0 )  Ξ pr sw ( s 0 )  θ p < Ξ pr dir ( s 0 ) , one has the strict certiﬁcate-level impr ovement B pr p ( s 0 ) < D dir , pr ( s 0 ) . 37 Pr oof. Under the VP schedule, a ( s ) = exp  − Z s 0 β 2 d u  = e − β s/ 2 , and σ 2 ( s ) = Z s 0 e − β ( s − u ) β d u = 1 − e − β s . Substituting into (16) giv es α s = α α + (1 − α ) e − β s = α D ( s ) , M s = M e − β s D ( s ) 2 , which prov es the formulas in (i). The expressions for m pr and b pr are the specialization of Proposi- tion 6 to f ≡ β / 2 , g 2 ≡ β , and the constant proxy bound ¯ ℓ . Assume now α < 1 . Then s 7→ α s is increasing, hence so is s 7→ m pr ( s ) . Therefore m pr ( s 0 ) > 0 ⇐ ⇒ m pr ( s 0 ) > 0 ⇐ ⇒ α D ( s 0 ) > ¯ ℓ + 1 2 . Solving this inequality giv es exactly s 0 ≥ s ⋆ pr , proving the admissibility-threshold claim in (ii). Independently of that monotonicity argument, the primiti ve Γ pr is obtained from Γ pr ( s 0 ) = β  ¯ ℓ + 1 2  s 0 + β Z s 0 0 M e − β u D ( u ) 2 d u − β Z s 0 0 α D ( u ) d u. The two integrals are e xplicit: β Z s 0 0 α D ( u ) d u = log  αe β s 0 + 1 − α  , and β Z s 0 0 M e − β u D ( u ) 2 d u = M VP ( s 0 ) . This giv es the stated closed form, with the α = 1 branch obtained by direct integration. For (iii), the late-windo w proxy term is just the split late contribution under the same conserv ative replacements. On the early window , K − 1 X k =0 e − c pr ( s 0 )( t s − t k +1 )  E [ ξ k ( Z t k ) 2 ]  1 / 2 ≤ C sch h q K − 1 X k =0 e − c pr ( s 0 )( t s − t k +1 ) . Since t s = t K = K h , t s − t k +1 = ( K − k − 1) h, and therefore K − 1 X k =0 e − c pr ( s 0 )( t s − t k +1 ) = K − 1 X j =0 e − c pr ( s 0 ) j h = 1 − e − c pr ( s 0 ) L 1 − e − c pr ( s 0 ) h . Because ¯ ε is constant, Z T s 0 e − c pr ( s 0 )( u − s 0 ) β ¯ ε d u = β ¯ ε c pr ( s 0 )  1 − e − c pr ( s 0 ) L  . Substituting these estimates into Proposition 6 prov es the routed proxy certiﬁcate. For (i v), D dir , pr ( s 0 ) is the direct full-horizon Euclidean certiﬁcate obtained by making the same conservati ve replacements in Proposition 5 and splitting at the same index K . If b pr ( s 0 ) > 0 , then for ev ery r ∈ [ s 0 , T ] , Γ pr ( r ) − Γ pr ( s 0 ) = Z r s 0 b pr ( u ) d u ≥ b pr ( s 0 ) ( r − s 0 ) . 38 Applying this to r = T , r = T − t k +1 , and inside the forcing integral gi ves e Γ pr ( T ) ≥ e Γ pr ( s 0 ) e b pr ( s 0 ) L , K − 1 X k =0 e Γ pr ( T − t k +1 ) ≥ e Γ pr ( s 0 ) K − 1 X j =0 e b pr ( s 0 ) j h = e Γ pr ( s 0 ) e b pr ( s 0 ) L − 1 e b pr ( s 0 ) h − 1 , and Z T s 0 e Γ pr ( u ) d u ≥ e Γ pr ( s 0 ) Z L 0 e b pr ( s 0 ) v d v = e Γ pr ( s 0 ) e b pr ( s 0 ) L − 1 b pr ( s 0 ) . This prov es the lower bound on D dir , pr ( s 0 ) − ∆ pr late ( s 0 ) , and the strict comparison criterion follows immediately . Remark 13 (Long mismatch windo ws) . This example makes the adv antage of routing completely explicit. If c pr ( s 0 ) L ≫ 1 , then the routed early budget has already saturated: the terminal mismatch carries the factor e − c pr ( s 0 ) L , the pre-switch discretization term is of order C sch h q − 1 /c pr ( s 0 ) , and the forcing term is of order β ¯ ε/c pr ( s 0 ) . By contrast, if b pr ( s 0 ) > 0 , then the direct Euclidean certiﬁcate transports the same three inputs with positi ve e xponential weights o ver the same windo w . So on a genuine mismatch window—radial damping already acti ve, residual Euclidean load still positi ve—the routed certiﬁcate can be strictly sharper , provided the one-time con version cost C sw p, pr ( s 0 ) is not too large. 39

Wasserstein Propagation for Reverse Diffusion under Weak Log-Concavity: Exploiting Metric Mismatch via One-Switch Routing

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment