Perturbation Duality for Robust and Distributionally Robust Optimization: Short and General Proofs

P erturbation Dualit y for Robust and Distributionally Robust Optimization: Short and General Pro ofs Louis L. Chen ∗ Jak e Roth † Johannes O. Ro yset ‡ Abstract Dualit y is a foundational to ol in robust and distributionally robust optimization (RO and DR O), underpinning both analytical insights and tractable reform ulations. The prev ailing approac hes in the literature primarily rely on saddle-p oin t arguments, Lagrangian tec hniques, and conic duality . In con- trast, this pap er applies perturbation duality in the sense of F enchel–Rock afellar conv ex analysis and demonstrates its eﬀectiveness as a general and unifying methodology for deriving dual formulations in R O and DR O. W e ﬁrst apply p erturbation duality to a recen tly prop osed DRO framew ork that uniﬁes ϕ -div ergence and W asserstein ambiguit y sets through optimal transp ort with conditional moment con- strain ts. W e establish the associated dual representation without imp osing compactness assumptions previously conjectured to b e necessary , instead introducing alternative conditions motiv ated by pertur- bation analysis. W e then revisit the concept of robust duality—commonly describ ed as “primal-w orst equals dual-b est”—and sho w that perturbation-based formulations provide a uniﬁed and transparent c haracterization of this principle. In particular, we develop a bifunction-based representation that en- compasses existing formulations in the literature and yields concise and general proofs, substan tially simplifying recent results. This work p ositions p erturbation duality as a v ersatile and underutilized framew ork for RO and DRO, oﬀering both conceptual uniﬁcation and technical generalit y across a broad class of mo dels. 1 In tro duction Dualit y is a fundamen tal to ol for b oth analysis and application in the ﬁeld of robust optimization. The literature ab ounds with tec hniques/paradigms for form ulating and proving dual problems but the most p opular approac hes in (distributionally) robust optimization include saddle min-max, Lagrangian, and conic dualit y . An alternative p erspective derives dual formulations through p erturbation duality in the sense of F enc hel–Ro c k afellar conv ex analysis ( Ro c k afellar , 1974 ). In this approac h, uncertain optimization problems are em b edded in to a family of p erturbation problems whose v alue function is analyzed using con vex conjugacy . Although this framework has long b een central in conv ex analysis and inﬁnite-dimensional optimization, it has only more recen tly b een applied explicitly to (distributionally) robust optimization models. W e show that suc h an application can b e natural, blending well with mixing arguments in the context of distributionally robust optimization, and useful in its generality for robust optimization as a whole. In this note, we use the approac h to: (1) deriv e new results and insigh ts in a recen t, prominent robust paradigm centered around optimal transp ort ( Blanchet et al. , 2025 ); and (2) unify a central, ov erarc hing principle for the ﬁeld known as r obust duality , or primal-worst e quals dual-b est ( Zhen et al. , 2025 ; Beck and Ben-T al , 2009 ), providing short and general pro ofs. ∗ Department of Op erations Research, Nav al P ostgraduate School. louis.chen@nps.edu † Department of Industrial and Systems Engineering, Universit y of Minnesota. rothjakem@gmail.com ‡ Department of Industrial and Systems Engineering, Universit y of Southern California. royset@usc.edu 1 1.1 Literature Review The ma jority of the robust optimization (RO) literature emplo ys conic duality , e.g., Ben-T al and Nemirovski ( 1999 , 2002 ); Ben-T al et al. ( 2009 ); Bertsimas and Sim ( 2004 ); Bertsimas et al. ( 2011 ). Conic duality is also p opular in the distributionally robust optimization (DR O) literature, e.g., Esfahani and Kuhn ( 2018 ); Zhao and Guan ( 2018 ), with some more recen t works that emplo y W asserstein constrain ts ( Blanchet and Murth y , 2019 ; Gao and Kleywegt , 2023 ; W ang et al. , 2025 ) lev eraging a com bination of Lagrangian and F enchel-t yp e dualit y theorems, including Kantoro vic h duality . The use of p erturbation analysis in the study of duality for (distributionally) robust mo dels is considerably less common, and we review the few en tries in the literature here. Li et al. ( 2011 ) and later Dinh et al. ( 2017 ) pro vide a general robust conjugate dualit y framew ork for con vex optimization problems with data un- certain ty , establishing dual represen tations of robust counterparts using conv ex conjugates and p erturbation functions; sp eciﬁcally , they deﬁne and inv estigate r obust (str ong) duality , ﬁrst introduced as primal-worst e quals dual-b est in Beck and Ben-T al ( 2009 ). Related p erturbation-based approaches ha ve also been stud- ied in the context of vector and multiob jectiv e robust optimization. F or example, Chai ( 2021 ) inv estigates robust strong duality for uncertain optimization problems using an abstract conjugate duality framework, in tro ducing generalized conjugate functions to derive dual problems even in noncon vex settings. 1.1.1 Related W ork W e conclude with a review of works most related to our contributions. Tw o metho ds to mo deling distributional am biguity—phi divergence ( Ben-T al et al. , 2013 ; W ang et al. , 2025 ) and W asserstein distance ( Esfahani and Kuhn , 2018 ; Blanc het and Murth y , 2019 ; Gao and Kleywegt , 2023 )—ha ve become p opular, if not standard, mo deling approaches in distributionally robust optimization. They ha ve been recently uniﬁed b y Blanchet et al. ( 2025 ) in a framework com bining optimal transp ort with conditional momen t constraints, and whose dualit y generalizes those of the prior mo dels. In this note, w e deriv e new results and insigh ts for this framework via p erturbation dualit y; sp eciﬁcally , w e circum ven t a compactness assumption that Blanc het et al. ( 2025 ) conjectured was necessary for its duality result. Lik e Zhang et al. ( 2024 ), we address the role of the Inter change ability Principle ( Rock afellar , 1971 ) in the form ulation of this distributionally robust dual formulation, and like Zhang et al. ( 2024 ), we also strive for cogen t, short and general pro ofs throughout via p erturbations. As far as a uniﬁed duality theory for robust and distributionally robust optimization, how ev er, a framew ork w as ﬁrst prop osed in Beck and Ben-T al ( 2009 ), and subsequen t inv estigation w as follow ed by Jey akumar and Li ( 2014 ); Li et al. ( 2011 ); Dinh et al. ( 2017 ), and most recen tly by Zhen et al. ( 2025 ). T ermed r obust duality as w ell as primal-worst e quals dual-b est , the literature has sought to pro vide w ays to deriv e dual formulations to the protot ypical min-max forms of (distributionally) robust optimization problems. T o date, there has b een some inconsistency , with b oth nonconv ex and conv ex formulations prop osed. W e demonstrate how p erturbations can provide a language general enough to unify this concept, and show ho w previous attempts to do so by Zhen et al. ( 2025 ) can b e greatly simpliﬁed. 1.2 Con tributions After Section 2 ’s review of conv ex dualit y in terms of bifunctions, Sections 3 and 4 presen t our contributions, the main ones highlighted here: • Section 3 : Theorems 1 , 2 , and 3 establish Blanchet et al. ( 2025 )’s duality result (w/ In terchangeabilit y Principle) but without compactness and instead with alternative p erturbation-inspired Assumptions 1 and 2 . 2 • Section 4 : Lev eraging bifunctions, Deﬁnition 4 uniﬁes the dual-best form ulations throughout the liter- ature. Prop ositions 3 and 4 demonstrate how pro ofs of the robust dualit y results in Zhen et al. ( 2025 ) can b e dramatically shortened via p erturbations. 2 Preliminaries 2.1 Notation W e brieﬂy set some notations. W e use “ : =” to denote deﬁnition and “ ≡ ” to denote equiv alence. W e let R : = R ∪ {±∞} and R + denote the extended reals and nonnegativ e reals, resp ectively . F or a p ositiv e integer m , we let [ m ] : = { 0 , 1 , . . . , m } and ( m ) : = { 1 , . . . , m } . Given a normed vector space X , we let X ∗ denote its contin uous dual space, with ⟨· , ·⟩ denote the canonical bilinear pairing, i.e., ⟨ x, x ∗ ⟩ ≡ ⟨ x ∗ , x ⟩ : = x ∗ ( x ) for all x ∈ X , x ∗ ∈ X ∗ . Then the eﬀectiv e domain of a conv ex function f : X → R is denoted dom f : = { x ∈ X : f ( x ) < + ∞} ; f is proper if f > −∞ and dom f  = ∅ ; the closure of f is denoted by cl f and f is closed if f = cl f ; its sub diﬀeren tial at x ∈ X is denoted ∂ f ( x ); its conv ex conjugate f ∗ : X ∗ → R is deﬁned via x ∗ 7→ sup x ⟨ x ∗ , x ⟩ − f ( x ), and the conca v e conjugate f ◦ : X ∗ → R via f ◦ ( x ∗ ) : = − ( − f ) ∗ ( − x ∗ ). F or a conca v e function g , we deﬁne the preceding for the con vex function ( − g ). W e use ι X ( x ) to denote the 0- ∞ indicator taking v alue zero if x ∈ X and + ∞ otherwise; similarly , 1 X ( x ) to denote the 1-0 indicator. F or a scalar c ∈ R + and prop er, con vex function f : X → R , con vexit y is preserv ed b y both left scalar m ultiplication: ( cf )( x ) : = c · f ( x ); and right scalar multiplication: ( f c )( x ) : = c · f ( x/c ) when c > 0 and ( f 0)( x ) : = ι { 0 } ( x ) when c = 0 (see p. 35 of Ro c k afellar ( 1970 )). W e also ﬁx some standard notation from functional analysis and measure theory . When ( X , A ) is a mea- surable space with measures µ, ν , we let d TV denote the total v ariation norm d TV ( µ, ν ) : = 1 2 sup A ∈A | µ ( A ) − ν ( A ) | . Giv en a ﬁnite measure µ on ( X , A ), we write L 1 ( µ ) for the vector space of (equiv alence classes of ) A -measurable functions f : X → R such that R X | f | dµ < ∞ , with norm ∥ f ∥ L 1 ( µ ) : = R X | f ( x ) | µ ( dx ) . Sim- ilarly , L ∞ ( µ ) denotes the v ector space of essen tially b ounded, A -measurable functions, equipp ed with the essen tial-supremum norm ∥ f ∥ L ∞ ( µ ) : = ess sup x ∈X | f ( x ) | . W e note the duality relationship ( L 1 ( µ )) ∗ = L ∞ ( µ ) for a ﬁnite signed measure µ (see Theorem 4.14 of Brezis ( 2011 )). W e also let C b ( X ) denote the vector space of contin uous, b ounded functions on X . 2.2 Bifunctions Let X be a normed vector space up on which if a conv ex (primal ob jectiv e) function f : X → R is deﬁned, then inf x ∈ X f ( x ) will b e referred to as a primal optimization problem. 1 Letting U b e another normed vector space, a conv ex function F : U × X → R satisfying F (0 , x ) = f ( x ) for all x ∈ X will b e referred to as a (primal) bifunction , which in turn yields a con vex p erturb ation function p : U → R b y p ( u ) : = (inf x F )( u ) = inf x ∈ X F ( u, x ) . In words, a bifunction F eﬀectiv ely yields a u - parametrized family of optimization problems, { p ( u ) : u ∈ U } . 2 1 W e adopt the conv ention of ha ving an optimization problem b e made synon ymous with its optimal value in R for the sake of exp ediency . W e refer the interested reader to discussions in Ro ck afellar ( 1970 , Sections 28-29) that clarify p ossible misunderstandings and the tec hnicalities around deﬁning a “problem” in this framework. 2 “This is not so muc h a new concept as a diﬀerent wa y of treating an old concept, the distinction betw een ‘v ariables’ and ‘parameters.’” ( Rock afellar , 1970 , P . 291) 3 2.2.1 Dual Bifunctions F urther, a bifunction F will admit a concav e dual bifunction F d : X ∗ × U ∗ → R deﬁned as F d ( x ∗ , u ∗ ) : = − ( F ∗ )( − u ∗ , x ∗ ) = inf x ∈ X,u ∈ U F ( u, x ) − ⟨ x ∗ , x ⟩ + ⟨ u ∗ , u ⟩ , where F ∗ denotes the con vex conjugate of F . Analogously , F d yields a dual p erturb ation function q : X ∗ → R giv en by q ( x ∗ ) : = (sup u ∗ F d )( x ∗ ) = sup u ∗ ∈ U ∗ F d ( x ∗ , u ∗ ) , whereb y q (0) = sup u ∗ ∈ U ∗ F d (0 , u ∗ ) is the dual optimization problem. Symmetrically , when G ( u, x ) is a conca ve bifunction, its con vex dual bifunction G d : X ∗ × U ∗ → R is given by G d ( x ∗ , u ∗ ) : = − G ◦ ( − u ∗ , x ∗ ) = sup x,u G ( u, x ) − ⟨ x ∗ , x ⟩ + ⟨ u ∗ , u ⟩ . W e note that ( F d ) d , deﬁned ov er U ∗∗ × X ∗∗ , agrees with cl F ov er the subspace U × X —the same holding for ( G d ) d and cl G . Hence, when X and U are reﬂexiv e, ( F d ) d ≡ F when F is closed. 2.2.2 Lagrangians Finally , giv en a con vex bifunction F : U × X → R , we deﬁne its Lagrangian L : U ∗ × X → R b y L ( u ∗ , x ) : = − [ F ( · , x )] ∗ ( − u ∗ ) = inf u F ( u, x ) + ⟨ u ∗ , u ⟩ . for which it will (usefully) follow that for an y u ∗∗ ∈ U sup u ∗ L ( u ∗ , x ) + ⟨ u ∗∗ , u ∗ ⟩ = [ −L ( · , x )] ∗ ( u ∗∗ ) = [ F ( · , x )] ∗∗ ( u ∗∗ ) = cl [ F ( · , x )]( u ∗∗ ) , ∀ x ∈ X. (1) Lagrangians asso ciated with conca ve bifunctions are treated symmetrically with “sup” in place of “inf ”. 2.3 Con v ex P erturbation Dualit y In the following prop osition, we record a summary of duality relations—for reference, Ekeland and T´ emam ( 1999 ) and ( Z˘ alinescu , 2002 , Theorem 2.6.1). Prop osition 1 (Con vex duality) . L et X and U b e norme d ve ctor sp ac es, with X ∗ and U ∗ denoting their c ontinuous dual sp ac es. L et F : U × X → R b e a given c onvex bifunction. L et p : U → R given by u 7→ inf x ∈ X F ( u, x ) denote its c orr esp onding c onvex p erturb ation function. Also let F d : X ∗ × U ∗ → R denote the dual (c onc ave) bifunction of F with (c onc ave) p erturb ation function q : X ∗ → R given by x ∗ 7→ sup u ∗ ∈ U ∗ F d ( x ∗ , u ∗ ) . Then the fol lowing statements hold: (a) W eak duality ( Ek eland and T´ emam ( 1999 , Prop osition II I.1.1)): p (0) ≥ ( cl p )(0) = q (0) ; (b) Normality (zero dualit y gap) (ib. Prop osition I II.2.1): p (0) = q (0) iﬀ ( cl p )(0) = p (0) ; (c) Stability (strong dualit y) (ib. Proposition I I I.2.2): ∂ p (0) is the set of optimal solutions to the dual pr oblem q (0) ; in p articular, if ∂ p (0)  = ∅ , then p (0) = q (0) ; (c ∗ ) Interior Slater (ib. Prop ositions I.2.5 and II I.2.3): if 0 ∈ int dom p and p (0) ∈ R , then ∂ p (0)  = ∅ . (c ∗∗ ) Relative in terior Slater: if 0 ∈ ri dom p , and p (0) ∈ R , then ∂ p (0)  = ∅ . ther e exists u ∗ ∈ ∂ p (0) ; 4 (d) If F is close d, then ( F d ) d | U × X = cl F = F ; if, in addition, X , U ar e r eﬂexive, then ( F d ) d ≡ F , and al l of the pr e c e ding statements r emain valid with p r eplac e d by the p erturb ation function asso ciate d with the c onvex bifunction − F d and q r eplac e d by the p erturb ation function asso ciate d with the c onc ave bifunction − ( F d ) d ; in wor ds, the dual of the dual is the primal (up to sign). In summary , a con vex bifunction F yields a pair of primal and dual optimization problems that are equal in v alue (respectively , equal with a dual solution) if and only if the corresponding p erturbation function p is closed at 0 (resp ectiv ely , sub diﬀeren tiable at 0). W e remark that although the use of Prop osition 1(c ∗ ) is a common strategy to argue for (strong) dualit y , it isn’t alwa ys a viable strategy in inﬁnite dimensional settings. Indeed, man y important sets, particularly those deﬁned b y abstract constraints (that will not be relaxed/p erturb ed), lac k interiors. F or example, it is readily v eriﬁed that the space of probability measures has no interior in the space of ﬁnite signed measures under the TV-norm and weak-* topologies. Some recent w orks ( Z˘ alinescu , 2015 ; Cuong et al. , 2022 , 2023 ; Cuong and T ran , 2025 ) hav e pro vided p ossible remedies via the notion of generalized interior, which guarantee either primal or dual solution existence. 3 Conditional Momen t W asserstein Dualit y via Perturbations 3.1 Problem Setting and F ormulation W e now establish the setting of Blanchet et al. ( 2025 ), introducing sev eral elemen ts—spaces, measures, as w ell as shorthand notations/conv en tions—that the exp osition to follow will crucially cen ter up on. 3.1.1 Measure Spaces V , W V will b e a conv ex subset of a v ector space and equipp ed with σ -algebra G . W will b e a con v ex subset of R and equipp ed with σ -algebra B . In the sequel, we will o ccasionally consider settings incorp orating top ological structure in which G = σ ( τ V ) and/or B = σ ( τ W ), where τ V and τ W are top ologies deﬁned on V and W resp ectiv ely; ho wev er, in all such cases, τ W will denote the standard (subspace) top ology for W em b edded in R . 3.1.2 Pro duct Measure Space U , An Empirical Measure, and Couplings W e write U : = V × W and F : = G × B . 3 W e denote the product measure space via ( U , F ) and let P ( U ) denote its set of probability measures, for which ˆ µ ∈ P ( U ) will b e giv en with marginal written ˆ ν ( · ) : = ˆ µ ( · × W ). Similarly , o ver the product space ( U × U , F × F ), w e let P ( U × U ) denote its set of probabilit y measures, a con vex subset of M ( U × U ), the vector space of coun tably additive signed measures of ﬁnite total v ariation o ver ( U × U , F × F ). W e let Γ( ˆ µ, µ ) : = { γ ∈ P ( U × U ) : γ ( F × U ) = ˆ µ ( F ) , γ ( U × F ) = µ ( F ) , ∀ F ∈ F } denote the set of couplings of ˆ µ, µ ∈ P ( U ) and Γ ˆ µ : = S µ ∈P ( U ) Γ( ˆ µ, µ ) the set of couplings with ˆ µ as ﬁrst marginal. Finally , for any γ ∈ Γ ˆ µ , we will write ( ˆ V , ˆ W , V , W ) ∼ γ with the understanding that ( ˆ V , ˆ W ) ∼ ˆ µ and ( V , W ) ∼ µ for some µ ∈ P ( U ). 3 Note that F = σ ( τ V × τ W ) when G = σ ( τ V ) and B = σ ( τ W ) since τ W has a coun table basis. 5 3.1.3 In tegration and Random V ariable Shorthands/Con ven tions In tegration will b e frequently used to ev aluate measures, and to facilitate exp osition we will adopt the follo wing (standard) con ven tions. Given a probabilit y space ( X , E , P ) we will write X ∼ P to mean that X is the iden tity function on X so that P ( X ∈ E ) = P ( E ) for any E ∈ E and E P [ g ( X )] = R g ( x ) d P ( x ) for an y function g : X → R that is E -measurable and integrable. In the case of the pro duct measure space ( U × U , F × F ) this practice will allow us to emphasize any combination of marginals in a more natural(/tidy) wa y . F or example, given γ ∈ Γ ˆ µ and ( ˆ V , ˆ W , V , W ) ∼ γ , w e can emphasize the marginal ( V , W ) b y wa y of writing γ (( V , W ) ∈ F ) = γ ( U × F ) for any F ∈ F and E γ [ g ( V , W )] = R g ( v , w ) dγ ( ˆ v , ˆ w , v , w ) for an y g : U → R that is F -measurable. In particular, we can also emphasize the lone marginal W so that E γ [ W ] = R w dγ ( ˆ v , ˆ w ; v , w ). 3.1.4 T ransport cost c , Momen t Constrain t h , and Ob jective f In our study , w e will adopt a discrepancy b etw een mem b ers of P ( U × U ), ﬁrst prop osed in Blanc het et al. ( 2025 ). This will incorp orate an ( F × F )-measurable (transp ort) cost c : U × U → ( −∞ , + ∞ ], a (constraint- rhs) function h ∈ L 1 ( ˆ ν ), and an F -measurable (ob jectiv e) function f : U → R . Finally , we let Γ W : = { γ ∈ P ( U × U ) : E γ [ | W | ] < ∞} . Deﬁnition 1 ( Blanchet et al. ( 2025 )) . Given ˆ µ, µ ∈ P ( U × U ) , and h ∈ L 1 ( ˆ ν ) for ˆ ν ( · ) = ˆ µ ( · × W ) , we deﬁne the r esulting optimal transp ort discrepancy with conditional momen t constraints by M h ( ˆ µ, µ ) : = ( inf γ ∈ Γ( ˆ µ,µ ) ∩ Γ W E γ [ c ( ˆ V , ˆ W ; V , W )] s.t. E γ [ W | ˆ V ] − h ( ˆ V ) = 0 , ˆ ν -a.s. ) . (2) The discrepancy M h ( ˆ µ, µ ) then yields the follo wing uncertaint y quantiﬁcation problem introduced and stud- ied in Blanchet et al. ( 2025 ). Deﬁnition 2 ( Blanc het et al. ( 2025 )) . F or ρ ∈ R and ˆ µ ∈ P ( U ) , we deﬁne the (primal) conditional-momen t- constrained W asserstein unc ertainty quantiﬁc ation pr oblem, or CM W asserstein problem , sup µ ∈P ( U ) E µ [ f ] s.t. M h ( ˆ µ, µ ) ≤ ρ. (P) Blanc het et al. ( 2025 ) demonstrate that ( P ) uniﬁes sev eral celebrated uncertaint y quantiﬁcation mo dels that emplo y a v ariety of discrepancies, including: generalized ϕ -divergence ( Ben-T al et al. , 2012 ; Agraw al and Horel , 2021 ), Sinkhorn ( W ang et al. , 2025 ), and traditional W asserstein optimal transp ort ( Blanchet and Murth y , 2019 ; Zhang et al. , 2024 ). Moreov er, it is noted that ( P ) shares similarities with martingale optimal transp ort ( Zhou et al. , 2021 ; Li et al. , 2022 ). 3.2 CM W asserstein Dualit y Decomposed and a Compactness Conjecture In this section, we brieﬂy summarize previous eﬀorts from the literature in deriving a dual problem to ( P ), including the decomp osition of this duality in to tw o relations. In particular, we presen t a conjecture p osed b y Blanchet et al. ( 2025 ), which w e address in the sections to follo w via a p erturbation persp ective. 6 3.2.1 Previous Results Cen tral to Li et al. ( 2022 ) and Blanchet et al. ( 2025 ) is the study of the follo wing dual(s) for ( P ): ( P ) ≤ inf λ ≥ 0 , ψ ∈ Ψ λρ + sup γ ∈ Γ ˆ µ ∩ Γ W E γ [ f ( V , W ) − ψ ( ˆ V ) · ( W − h ( ˆ V )) − λc ( ˆ V , ˆ W ; V , W )] (D Ψ ) ≤ inf λ ≥ 0 , ψ ∈ Ψ λρ + E ˆ µ h sup v ∈V ,w ∈W f ( v , w ) − ψ ( ˆ V ) · ( w − h ( ˆ V )) − λc ( ˆ V , ˆ W ; v , w ) i , (D Ψ + IP) where Ψ ⊆ R V is a set of real-v alued, G -measurable functions deﬁned on V . ( D Ψ ) is a conv ex (Lagrangian) dual problem to ( P ), and ( D Ψ + IP ) reﬂects an interc hange of the sup and exp ectation op erations in ( D Ψ ). In sum, for a choice of Ψ, the CM-Wasserstein duality [( P ) = ( D Ψ + IP )] is equiv alently the combination of t wo relations: I. [( P ) = ( D Ψ )] zero duality gap; and I I. [( D Ψ ) = ( D Ψ + IP )] In terchangeabilit y Principle (IP) ( Zhang et al. , 2024 ). This combination is sho wn under sp eciﬁc settings in b oth Li et al. ( 2022 ) and Blanchet et al. ( 2025 ). In particular, when ˆ µ is a ﬁnitely-supp orted probabilit y measure, Li et al. ( 2022 ) establish [( P ) = ( D Ψ + IP )] for Ψ a ﬁnite-dimensional v ector space (b y virtue of the ﬁnite supp ort of ˆ µ ). Their argumen t leverages ﬁnite dimensional interior conditions for semi-inﬁnite conic LPs (see Shapiro ( 2001 )). As for the case of arbitrary ˆ µ , Blanchet et al. ( 2025 ) establish [( P ) = ( D Ψ + IP )] for Ψ = C b ( V ) and h = 1, assuming: (i) U is compact; (ii) f : U → R is upp er semicon tinuous and f ∈ L 1 ( ˆ µ ); and (iii) c : U × U → ( −∞ , + ∞ ] is low er semicontin uous and c ( u, u ) = 0 for all u ∈ U . 3.2.2 A Compactness Conjecture No doubt assumption (i) that has U b e compact can pay great dividends. Indeed, compactness not only mak es av ailable classical minimax theorems (such as Sion ( 1958 )) but also contributes to the existence of measurable selections ( Blanchet et al. , 2025 )—simultaneously facilitating the tw o desired halves: zero dualit y gap and (IP). In fact, Blanchet et al. ( 2025 ) suggest that the compactness of U plays a crucial role: Without c onditional moment c onstr aints, a c omp actness c ondition akin to (i) is not ne e de d to establish str ong duality, [ ...but... ] we c onje ctur e that [ [( P ) = ( D C b ( V ) + IP )] , i.e., The or em 4.2 of Blanchet et al. ( 2025 ) ] c e ases to hold if (i) is r elaxe d. W e address this conjecture regarding the role of compactness in the sections to follo w. Section 3.3 reveals that, tak en verbatim, the conjecture is indeed true; speciﬁcally , w e show that [ ¬ (i) ∧ (ii) ∧ (iii) ] ⇏ [( P ) = ( D Ψ )]. Ho wev er, it is also true that (i) is in fact not necessary; in other w ords, [( P ) = ( D Ψ + IP )] can hold under some (natural) alternative assumptions without (i) ’s compactness. Indeed, Sections 3.4 and 3.5 establish the t wo requisite relations [( P ) = ( D Ψ )] zero duality gap and [( D Ψ ) = ( D Ψ + IP )] Interc hangeabilit y Principle, resp ectiv ely , without compactness assumptions when Ψ = C b ( V ). 3.3 On the Role of Compactness In this section we present a class of problem instances illustrating that [ ¬ (i) ∧ (ii) ∧ (iii) ] ⇏ [( P ) = ( D Ψ )]. Notably , in constructing suc h a class we use an upp er-semi-contin uous ob jectiv e function f that is unbounded 7 ab o v e—natural in ligh t of the fact that f is necessarily b ounded abov e on U when [ (i) ∧ (ii) ] holds. Lemma 1 (Unbounded f ) . L et V : = R and W : = R + b e endowe d with standar d top olo gies, F b e the pr o duct of their r esp e ctive Bor el sigma-ﬁelds, f ( v , w ) : = v · ( w − 1) , h ( ˆ V ) ≡ 1 , ρ > 0 , c ( ˆ v , ˆ w ; v , w ) take value + ∞ if v  = ˆ v and zer o otherwise. Also let ˆ ν b e any pr ob ability distribution on V that is not essential ly b ounde d ab ove, i.e., ˆ ν ( ˆ V > t ) > 0 for every t ∈ R , and let ˆ µ : = ˆ ν ⊗ δ 1 b e the pr o duct me asur e of ˆ ν with the p oint-mass on 1. Final ly let Ψ : = L ∞ ( ˆ ν ) . Then 0 = ( P ) < ( D Ψ ) = + ∞ ; mor e over, c onditions (ii) and (iii) hold. Pr o of. First note that (ii) and (iii) hold by construction. Indeed, since ˆ W ≡ 1 under ˆ µ , E ˆ µ [ f ] = 0 < + ∞ , and f , c are contin uous. Noting that M h ( ˆ µ, µ ) ≤ ρ is equiv alent to M h ( ˆ µ, µ ) = 0, we see that an y γ feasible to ( 2 ) must ha v e V = ˆ V , γ -a.s. In this case, we obtain ( P ) = 0 from the calculation E γ [ f ] = E γ [ ˆ V · ( W − 1)] = E ˆ ν [ E [ ˆ V · ( W − 1) | ˆ V ]] = E ˆ ν [ ˆ V · E [( W − 1) | ˆ V ]] = 0 . Mean while, ( D Ψ ) = + ∞ . T o see this, let ψ ∈ L ∞ ( ˆ ν ), let D : = { ˆ V > ∥ ψ ∥ L ∞ ( ˆ ν ) + ϵ } for some ϵ > 0 so that ˆ ν ( D ) > 0 and ˆ V − ψ ( ˆ V ) > ϵ > 0 on D . F or r ≥ 1, deﬁne a coupling ( V r , W r , ˆ V , ˆ W ) ∼ γ r ∈ Γ ˆ µ with V r = ˆ V , γ r -a.s. and W r : = r 1 D + 1 D c so that E γ r [ W r ] = r · ˆ ν ( D ) < ∞ , and hence γ r ∈ Γ ˆ µ ∩ Γ W as well. Then for all r ≥ 1, E γ r [( ˆ V − ψ ( ˆ V )) · ( W r − 1)] = ( r − 1) E ˆ ν [( ˆ V − ψ ( ˆ V )); D ] ≥ ( r − 1) E ˆ ν [( ˆ V − ∥ ψ ∥ L ∞ ( ˆ ν ) ); D ] , and E ˆ ν [( ˆ V − ∥ ψ ∥ L ∞ ( ˆ ν ) ) ; D ] > 0 since ˆ ν ( D ) > 0 and ˆ V − ∥ ψ ∥ L ∞ ( ˆ ν ) > ϵ > 0 on D , yielding sup γ ∈ Γ ˆ µ ∩ Γ W E γ [( ˆ V − ψ ( ˆ V )) · ( W r − 1)] ≥ sup r ≥ 1 ( r − 1) E ˆ ν [( ˆ V − ∥ ψ ∥ L ∞ ( ˆ ν ) ) ; D ] = + ∞ . Considering ψ ∈ L ∞ ( ˆ ν ) was arbitrary and ρ − E [ c ( ˆ V , ˆ W ; V r , W r )] = ρ > 0 , it clearly holds that ( D Ψ ) = + ∞ , as desired. Although Lemma 1 technically resolves the conjecture of Blanchet et al. ( 2025 ) that [ ¬ (i) ∧ (ii) ∧ (iii) ] ⇏ [( P ) = ( D Ψ + IP )], this is not to say that (i) is necessary . Across Sections 3.4 and 3.5 to follow, we presen t an alternativ e route to obtaining this relation—one that will circumv ent the assumption (i) that sees U be compact. As already discussed in Section 3.2 , the relation [( P ) = ( D Ψ + IP )] is comp osed of t wo parts—zero dualit y gap and interc hangeability principle. W e dedicate a section to each of these halves separately with Sections 3.4 and 3.5 . Our approach will b e fundamentally based on the p erturbation dualit y framework, whic h will inspire an alternative set of (natural) conditions to b e con trasted with (i) , (ii) and (iii) . 3.4 Circum v enting Compactness: Zero Dualit y Gap [( P ) = ( D Ψ )] As demonstrated in the previous se ction, zero duality gap [( P ) = ( D Ψ )] may fail to hold when the compactness of U (i) is relaxed, motiv ating the searc h for an appropriate replacement. Unfortunately , the classical Slater’s condition Prop osition 1(c ∗ ) often fails for inﬁnite dimensional problems as int M ( U ) P ( U ) = ∅ , limiting its utilit y . As an alternative, w e propose tw o conditions (Assumptions 1 and 2 ) to circum ven t the compactness condition (i) , with discussion and motiv ating examples to follow, that will b e sho wn to b e suﬃcien t for zero dualit y gap. Assumption 1. Ther e exists ( ˆ V , ˆ W , V 0 , W 0 ) ∼ γ 0 ∈ Γ ˆ µ ∩ Γ W satisfying E γ 0 [ c ] < ρ, E γ 0 [ W 0 | ˆ V ] − h ( ˆ V ) = 0 , ˆ ν -a.s. and E γ 0 [ f ( V 0 , W 0 )] > −∞ . We wil l adopt the shorthand a : = E γ 0 [ c ] − ρ > 0 in the se quel. Assumption 2. Ther e exists b > 0 , and ( ˆ V , ˆ W , V + , W + ) ∼ γ + , ( ˆ V , ˆ W , V − , W − ) ∼ γ − ∈ Γ ˆ µ ∩ Γ W satisfying E γ + [ W + | ˆ V ] − h ( ˆ V ) ≥ b, E γ − [ W − | ˆ V ] − h ( ˆ V ) ≤ − b, E γ ± [ c ] ≤ ρ, and E γ ± [ f ( V ± , W ± )] > −∞ . 8 3.4.1 Distinctness from Slater W e ﬁrst remark that Assumptions 1 and 2 , viewed collectively , are (critically) distinct from Prop osition 1 ’s (c ∗ ) , i.e., the Slater condition. The follo wing example highligh ts this distinction. Example 1 (Distinct from Slater) . L et V : = R and W : = R + b e endowe d with standar d top olo gies, F b e the pr o duct of their r esp e ctive Bor el sigma-ﬁelds, f b e arbitr ary, h ≡ 1 , ρ > 0 , c ≡ 0 , ( ˆ V , ˆ W ) ∼ ˆ µ b e such that the mar ginal ˆ V ∼ ˆ ν : = N (0 , 1) . Consider the p erturb ation function p : L 1 ( ˆ ν ) → R p ( θ ) =  sup γ ∈ Γ ˆ µ ∩ Γ W E γ [ f ] s.t. E γ [ W | ˆ V ] = 1 + θ ( ˆ V ) , ˆ V -a.e.  . Deﬁne a c ol le ction of functions { θ r } ∞ r =1 ⊆ L 1 ( ˆ ν ) . F or e ach p ositive inte ger r , let θ r ( ˆ v ) take value − 2 if ˆ v ∈ [ r , r + 1] and zer o otherwise. Then ∥ θ r ∥ L 1 ( ˆ ν ) = 2 · ˆ ν ([ r , r + 1]) ↓ 0 as r → ∞ , and yet for every inte ger r, it is the c ase that for al l γ ∈ Γ ˆ µ ∩ Γ W , E γ [ W | ˆ V ] − 1 ≥ 0 − 1 > − 2 = θ r ( ˆ V ) with me asur e ˆ ν ([ r , r + 1]) > 0 . In other wor ds, the Slater c ondition Pr op osition 1(c ∗ ) do es not hold, i.e., 0 / ∈ int dom p . With some mild additional assumption on f , we further obtain 0 / ∈ ri dom p . Inde e d, dom p ⊆ { θ ∈ L 1 ( ˆ ν ) : θ ≥ − 1 , ˆ ν -a.s. } = − 1 + L 1 + ( ˆ ν ) , so when dom p ⊇ − 1 + L 1 + ( ˆ ν ) , then aﬀ dom p = aﬀ L 1 + ( ˆ ν ) = L 1 ( ˆ ν ) , i.e., ri dom p = int aﬀ dom p dom p = int L 1 ( ˆ ν ) dom p = int dom p . F or example, given the c oupling γ θ with W = 1 + θ ( ˆ V ) , any c ondition on f ensuring that E γ θ [ f ] > −∞ for al l θ ∈ − 1 + L 1 + ( ˆ ν ) is suﬃcient for dom p ⊇ − 1 + L 1 + ( ˆ ν ) . However, Assumptions 1 and 2 hold. W e now proceed to examine these assumptions individually , discussing immediate implications as well as the exten t of their generality . 3.4.2 On Assumption 1 Although Assumption 1 do es not alw ays hold, if ( P ) is a feasible problem, then Assumption 1 holds in the case that ρ is replaced b y ρ + ϵ for an y ϵ > 0. W e remark that this p oin t is not insigniﬁcant, since man y previous works mo del the W asserstein budget parameter ρ as a tunable parameter representing a decision mak er’s level of conserv atism ( Ba yraksan and Lo ve , 2015 ; Esfahani and Kuhn , 2018 ; Kuhn et al. , 2019 ; Blanc het and Murth y , 2019 ; Rahimian and Mehrotra , 2022 ; Aolaritei et al. , 2026 ). An immediate consequence of Assumption 1 is that it aﬀords a reformulation of the feasible region to ( P ). A similar reform ulation is obtained in Blanc het et al. ( 2025 ) under a diﬀeren t set of assumptions. In contrast, w e obtain the reformulation with Assumption 1 by lev eraging a simple “mixing argument.” Lemma 2. Deﬁne the explicit pr oblem over c ouplings: sup γ ∈ Γ ˆ µ ∩ Γ W E γ [ f ( V , W )] s.t. E γ [ c ] ≤ ρ E γ [ W | ˆ V ] = h ( ˆ V ) , ˆ ν -a.s. (P ′ ) Under Assumption 1 , it holds that ( P ) = ( P ′ ) . Pr o of. By deﬁnition, it is clear that ( P ) ≥ ( P ′ ). Hence, it suﬃces to sho w that ( P ) ≤ ( P ′ ). Let ϵ > 0 b e arbitrary . Then let µ ϵ b e ϵ -optimal to ( P ), meaning E µ ϵ [ f ] > ( P ) − ϵ . By the deﬁnition of inﬁmum in M h , there exists γ ϵ ∈ Γ( ˆ µ, µ ϵ ) satisfying E γ ϵ [ W − h ( ˆ V ) | ˆ V ] = 0 , ˆ ν -a.s. and E γ ϵ [ c ] ≤ ρ + ϵ. 9 By Assumption 1 , let a = ρ − E γ 0 [ c ] > 0 with E γ 0 [ W 0 | ˆ V ] − h ( ˆ V ) = 0, ˆ ν -a.s. Next construct the mixture ¯ γ ϵ : = (1 − t ) γ ϵ + tγ 0 , for t : = ϵ a + ϵ ∈ (0 , 1) , whic h satisﬁes E ¯ γ ϵ [ W − h ( ˆ V ) | ˆ V ] = 0 , ˆ ν -a.s. and E ¯ γ ϵ [ c ] ≤ ρ . Then ¯ γ ϵ is feasible to ( P ′ ), and computing its ob jectiv e v alue, we observ e that ( P ′ ) ≥ a a + ϵ E µ ϵ [ f ] + ϵ a + ϵ E γ 0 [ f ] > a a + ϵ (( P ) − ϵ ) + ϵ a + ϵ E γ 0 [ f ] . Noting that E γ 0 [ f ] > −∞ and that ϵ > 0 was arbitrary , we may tak e ϵ ↓ 0 to obtain the desired conclusion. 3.4.3 On Assumption 2 In this section we introduce a suﬃcien t condition for Assumption 2 and provide an example motiv ating its usefulness. W e start by commen ting on the v eriﬁability of Assumption 2 , showing that Assumption 1 , paired with some regularity to the transport cost c and ob jective f , ensure that Assumption 2 holds. Lemma 3 (Suﬃcient condition for Assumption 2 ) . Supp ose γ 0 satisﬁes Assumption 1 . If ther e exists β > 0 such that: (a) h ( ˆ V ) ± β ∈ W , ˆ ν -a.s.; and (b) E γ 0 [ c ( ˆ V , ˆ W , V 0 , h ( ˆ V ) ± β )] < + ∞ and E γ 0 [ f ( V 0 , h ( ˆ V ) ± β )] > −∞ . Then Assumption 2 holds. In p articular, if (c) c ( ˆ v , ˆ w, v , · ) and f ( v , · ) ar e L -Lipschitz, then (b) holds. Pr o of. Let (a) ∧ (b) with β > 0 be giv en. Recall that b y Assumption 1 , the coupling ( ˆ V , ˆ W , V 0 , W 0 ) ∼ γ 0 ∈ Γ ˆ µ ∩ Γ W satisﬁes E γ 0 [ W − h ( ˆ V ) | ˆ V ] = 0 and E γ 0 [ c ] = ρ − a with a > 0. Deﬁne the coupling ( ˆ V , ˆ W , ¯ V + , ¯ W + ) ∼ ¯ γ + ∈ Γ ˆ µ ∩ Γ W via ( ˆ V , ˆ W , ¯ V + , ¯ W + ) : = ( ˆ V , ˆ W , V 0 , h ( ˆ V ) + β ) , so that E ¯ γ + [ ¯ W + − h ( ˆ V ) | ˆ V ] = β by construction. It follows from (b) that when t + : = ( a/  E ¯ γ + [ c ] − E γ 0 [ c ]  E ¯ γ + [ c ] > ρ 1 otherwise , the weigh ted com bination ( ˆ V , ˆ W , V + , W + ) ∼ γ + : = (1 − t + ) γ 0 + t + ¯ γ + satisﬁes γ + ∈ Γ ˆ µ ∩ Γ W , E γ + [ c ] ≤ ρ, E γ + [ W + − h ( ˆ V ) | ˆ V ] = β · t + , and E γ + [ f ( V + , W + )] > −∞ . W e can construct a coupling γ − analogously with a mixture w eight t − ∈ (0 , 1]. Then Assumption 2 holds with b : = β · min { t + , t − } > 0. Finally if (c) holds, then c ( ˆ V , ˆ W ; V 0 , h ( ˆ V ) ± β ) ≤ c ( ˆ V , ˆ W ; V 0 , W 0 ) + L | h ( ˆ V ) ± β − W 0 | and f ( V 0 , h ( ˆ V ) ± β ) ≥ f ( V 0 , W 0 ) − L | h ( ˆ V ) ± β − W 0 | . Up on integrating, and noting E γ 0 [ | h ( ˆ V ) ± β − W 0 | ] ≤ E γ 0 [ | W 0 | ] + E ˆ ν [ | h | ] + β < + ∞ , and similarly for f , veriﬁes condition (b) . W e mak e tw o remarks. First note that the conditions of Lemma 3 can b e satisﬁed in Example 1 , indicating that the collection of Lemma 3 as w ell as Assumptions 1 and 2 do not revert to the traditional Slater’s condition. Second we ﬁnd that the idea of mixing couplings, whic h ﬁrst appeared in Lemma 2 , also app ears in Lemma 3 . T o this p oin t, we motiv ate the imp ortance of Assumption 2 for pro viding an additional lev el of mixing on top of that provided b y Assumption 1 , and we show the insuﬃciency of Assumption 1 alone for obtaining zero duality gap in the next example. 10 Example 2 (Insuﬃciency of mixing b y Assumption 1 alone) . L et V : = { 0 } and W : = [0 , 1] b e endowe d with standar d top olo gies, F b e the pr o duct of their r esp e ctive Bor el sigma-ﬁelds, f ( v , w ) : = 1 { w< 1 } ( w ) , h : = 1 , ρ : = 1 + ϵ for any ϵ > 0 , and c ( ˆ u, u ) : = ( w − ˆ w ) 2 . A lso let ˆ ν : = δ 0 , ˆ µ : = ˆ ν ⊗ δ 0 , and Ψ : = L ∞ ( ˆ ν ) = R . Assumption 1 holds by c onsidering the c oupling γ 0 ∈ Γ ˆ µ with W ≡ 1 , as E γ 0 [ W − 1 | ˆ V ] = 0 , ˆ ν -a.s. and E γ 0 [ c ] = E [ W 2 ] = 1 < ρ . However, Assumption 2 fails sinc e for any c oupling γ ∈ Γ ˆ µ , W r e quir es 0 ≤ W ≤ 1 , γ -a.s., so E γ [ W − 1 | ˆ V ] ≤ 0 ˆ ν -a.s. Thus it is imp ossible to ﬁnd γ + ∈ Γ ˆ µ with E γ + [ W − 1 | ˆ V ] ≥ b > 0 . In this setting, ( P ) = ( P ′ ) by L emma 2 , and b e c ause any fe asible c oupling must have W ≡ 1 , we ﬁnd that ( P ) = 0 , yet ( D Ψ ) = ( D Ψ + IP ) = inf λ ≥ 0 ,ψ ∈ R λρ + sup w ∈ [0 , 1] 1 { w< 1 } ( w ) − λw 2 − ψ · ( w − 1) = 1 , sinc e along any se quenc e 1  = w ↑ 1 , for any λ ≥ 0 and ψ ∈ R , it holds that 1 { w< 1 } ( w ) = 1 , λ ( ρ − w 2 ) → λϵ ≥ 0 , and ψ · ( w − 1) → 0 . While Example 2 uses the fact that f is not upp er semicon tinuous, the failure of Assumption 2 is also imp ortan t. One w ay to satisfy Assumption 2 and close the gap would be to p erturb the upp er b ound of W b y a small amount. Indeed, redeﬁning W : = [0 , 1 + √ ϵ ], it would then b e p ossible to p erturb the (formerly unique) feasible W ≡ 1 “up” and “down” within the revised domain to W + : = W + √ ϵ and W − : = W − √ ϵ . Moreo ver, for a sequence δ r ↓ 0 with 0 < δ r ≤ √ ϵ , we may then deﬁne a sequence of couplings { γ r } r ≥ 1 with W r -marginal obtained by mixing W + and W − r : = W − δ r indep enden tly of W dra wn according to t r ∼ Bern( q r ) where q r : = δ r / ( δ r + √ ϵ ) ∈ (0 , 1), i.e., W r : = (1 − t r ) W − r + t r W + . Then E γ r [ f ] → 1 as r → ∞ and b oth E γ r [ W − 1 | ˆ V ] = 0 and E γ r [ c ] = 1 + ϵ = ρ for all r , yielding ( P ) = 1 to matc h ( D Ψ ). The ability to mix couplings aﬀorded by Assumption 2 will help enable us to obtain our main result via p erturbation arguments. After its proof in the next section, we will discuss its standing in the context of the existing literature in Section 3.6 . 3.4.4 Zero Dualit y Gap W e now presen t our main duality result to attain zero dualit y gap without assuming compactness of U . In fact, our result assumes neither (i) , (ii) nor (iii) . Indeed, we replace (i) , with Assumptions 1 and 2 . F urther, w e relax (ii) and (iii) , by no w requiring only that f be b ounded abov e and c bounded b elo w (note: [ (i) ∧ (ii) ∧ (iii) ] ⇒ f bounded ab o ve and c b ounded below). W e note that this assumption do es not rule out imp ortant loss functions (when considering the minimization form of ( P )) suc h as the 0-1 loss 1 { y ≤ 0 } ( y ) (whic h lacks lo wer semicon tinuit y), squared loss, and other commonly used loss functions. In addition, the follo wing result uses a mixing argumen t that works at the c onditional level. More speciﬁcally , whereas Lemma 2 mixed unconditional couplings, in the follo wing argument, there will b e mixing of versions of conditional couplings. F or this to be v alid, it will suﬃce to ensure that: given any γ ∈ Γ ˆ µ , there exists a collection { γ ˆ v ( · ) } ˆ v ∈V of measures on ( W × U , B × F ) such that γ ( D × E ) = R D γ ˆ v ( E ) d ˆ ν for all D ∈ G , E ∈ B × F . Put plainly , given an y random v ector ( ˆ V , ˆ W , V , W ), we tak e for gran ted the ability to “condition” on ˆ V . This is precisely pro vided when ( V , G , ˆ ν ) has the regular conditional probabilit y prop erty ( F aden , 1985 ). This assumption is hardly restrictive, esp ecially when w e note that in the setting of top ological spaces, many standard spaces hav e this prop ert y . F or example, the Disintegration Theorem guarantees this prop ert y for Radon spaces. Theorem 1. In the ab ove setting, if the tr ansp ort c ost c is b ounde d b elow, the obje ctive function f is b ounde d ab ove, Assumptions 1 and 2 hold, and ( V , G , ˆ ν ) has the R e gular Conditional Pr ob ability Pr op erty ( F aden , 1985 ), then ( P ) = ( D Ψ ) , with Ψ : = L ∞ ( ˆ ν ) . 11 Pr o of. Giv en an y γ ∈ Γ ˆ µ ∩ Γ W , it holds that E γ [ W | ˆ V ] − h ( ˆ V ) = ( Aγ )( ˆ V ), for some function Aγ ∈ L 1 ( ˆ ν ). W e will ﬁnd this shorthand con venien t, as the statement Aγ = 0 , ˆ ν -a.e., equiv alently expresses the conditional momen t constraint E γ [ W | ˆ V ] = h ( ˆ V ), almost surely . F urther, as this c hoice of notation suggests, it is clear that given any t wo probabilit y measures γ 1 , γ 2 ∈ Γ ˆ µ ∩ Γ W , it holds that A  (1 − t ) γ 1 + tγ 2  = (1 − t ) · Aγ 1 + t · Aγ 2 , ˆ ν -a.e., ∀ t ∈ [0 , 1] . Deﬁne a concav e bifunction G ′ ( τ , θ ; γ ) o ver R × L 1 ( ν ) × M ( U × U ) for ( P ′ ) by G ′ ( τ , θ ; γ ) : = E γ [ f ( V , W )] − ι ( −∞ ,τ ]  E γ [ c ( ˆ V , ˆ W ; V , W )] − ρ  − ι S 0 + θ ( Aγ ) − ι Γ ˆ µ ∩ Γ W ( γ ) where S 0 : = { s ∈ R V , G -measurable : s = 0 , ˆ ν -a.s. } . Let p ′ ( τ , θ ) : = sup γ ∈M ( U ×U ) G ′ ( τ , θ ; γ ) denote the p erturbation function of G ′ for whic h p ′ (0 , 0) = ( P ′ ). The dual bifunction G ′ d ( γ ∗ ; τ ∗ , θ ∗ ) o ver M ( U × U ) ∗ × R × L ∞ ( ν ) can b e computed under γ ∗ = 0 to ﬁnd G ′ d (0; τ ∗ , θ ∗ ) = sup γ ∈ Γ ˆ µ ∩ Γ W τ ∗ ·  ρ − E γ [ c ( ˆ V , ˆ W ; V , W )]  + E γ [ f ( V , W )] − Z θ ∗ · Aγ d ˆ ν , if τ ∗ ≥ 0; and G ′ d (0; τ ∗ , θ ∗ ) = −∞ if τ ∗ < 0. Its p erturbation function satisﬁes q ′ (0) : = inf τ ∗ ,θ ∗ G ′ d (0; τ ∗ , θ ∗ ) = ( D Ψ ). On a ﬁnal note, the perturbation function p ′ is prop er; indeed, p ′ < + ∞ since f is b ounded ab o ve, and p ′ (0 , 0) > −∞ b y the feasibility of ( P ). Consequen tly , the closure of p ′ at 0 admits the follo wing c haracterization that w e will leverage: ( cl p ′ )(0 , 0) = lim sup ( τ ,θ ) → 0 p ′ ( τ , θ ). Outline: T o ﬁnd ( P ) = ( D Ψ ) , it will suﬃce to establish lim sup ( τ ,θ ) → 0 p ′ ( τ , θ ) ≤ p ′ (0 , 0); ( ∗ ) indeed, by Prop osition 1 and the fact that p ′ (0 , 0) ≤ ( cl p ′ )(0 , 0), ( ∗ ) is equiv alent to the dualit y relation ( P ′ ) = ( D Ψ ), which suﬃces in light of Lemma 2 . Hence, we set out to establish ( ∗ ). T o wards this, we assume without loss of generality that there ex- ists a sequence { ( τ r , θ r ) } ∞ r =1 ⊆ R × L 1 ( ˆ ν ) with ( τ r , θ r ) → 0 strongly and for which p ′ ( τ r , θ r ) > −∞ for ev ery r ; otherwise, p ′ (0 , 0) ≥ −∞ = lim sup ( τ ,θ ) → 0 p ′ ( τ , θ ) trivially . An immediate consequence of this se- quence { ( τ r , θ r ) } ∞ r =1 is the existence of a sequence of measures { γ r } ∞ r =1 in whic h for every r it holds that ( ˆ V , ˆ W , V r , W r ) ∼ γ r is a feasible solution to the perturb ed problem p ′ ( τ r , θ r ) with ob jective v alue satisfying E γ r [ f ( V r , W r )] ≥ p ′ ( τ r , θ r ) − 1 /r, The strategy will be to sho w that for an y ϵ > 0, there exists an accompan ying sequence { ˜ γ r } ∞ r =1 ( ϵ dep endence suppressed) for which ( ˆ V , ˆ W , ˜ V r , ˜ W r ) ∼ γ r is feasible to p ′ (0 , 0) and satisﬁes E γ r [ f ( V r , W r )] ≤ E ˜ γ r [ f ( ˜ V r , ˜ W r )] + O ( τ r + ∥ θ r ∥ L 1 ( ˆ ν ) ) + ϵ, ( ∗∗ ) whic h will ensure that p ′ ( τ r , θ r ) − 1 /r ≤ E γ r [ f ( V r , W r )] ≤ p ′ (0 , 0) + O ( τ r + ∥ θ r ∥ L 1 ( ˆ ν ) ) + ϵ, ∀ ϵ > 0 , yielding ( ∗ ) and the completion of the pro of. In what follows, let ϵ > 0 be given. The remainder of the pro of pro ceeds in tw o steps: (1) constructing a feasible { ˜ γ r } ∞ r =1 sequence; and (2) verifying that it satisﬁes ( ∗∗ ). (1) F easibilit y: F or each r , ˜ γ r will be the result of tw o successiv e edits to γ r . F or the ﬁrst edit, we mix γ r with γ 0 in a precise wa y that dep ends on ϵ to attain ¯ γ r via ¯ γ r : = (1 − t r ) γ r + t r γ 0 , t r : = τ r + a · ϵ/κ ϵ τ r + a 12 for some κ ϵ > ϵ y et to b e sp eciﬁed explicitly . By design, this yields • E ¯ γ r [ c ] = ρ − a · ϵ/κ ϵ , ∥ A ¯ γ r ∥ L 1 ( ˆ ν ) ≤ a − a · ϵ/κ ϵ τ r + a ∥ θ r ∥ L 1 ( ˆ ν ) , • d TV ( γ r , ¯ γ r ) ≤ τ r + a · ϵ/κ ϵ τ r + a . F or the second edit, we will mo dify ¯ γ r using γ + and γ − . In the follo wing, given an y γ ∈ Γ ˆ µ , we will let the collection { γ ˆ v ( · ) } ˆ v ∈V denote a collection of measures on ( W × U , B × F ) such that γ ( D × E ) = R D γ ˆ v ( E ) d ˆ ν for all D ∈ G , E ∈ B × F . Such a collection is guaranteed to exist by the assumption that ( V , G , ˆ ν ) has the regular conditional probability prop ert y ( F aden , 1985 ). In this wa y , for ¯ γ r , let there b e giv en an asso ciated collection { ¯ γ r ˆ v } ˆ v ∈V , and we will edit ¯ γ r b y editing this collection. F or each ˆ v ∈ V , deﬁne the p oin twise mixing w eights t + ( ˆ v ) , t − ( ˆ v ) ∈ [0 , 1] t + r ( ˆ v ) : = max { 0 , − ( A ¯ γ r )( ˆ v ) } | ( A ¯ γ r )( ˆ v ) | + | ( Aγ + )( ˆ v ) | , t − r ( ˆ v ) : = max { 0 , ( A ¯ γ r )( ˆ v ) } | ( A ¯ γ r )( ˆ v ) | + | ( Aγ − )( ˆ v ) | , so that at most one of t + r ( ˆ v ) , t − r ( ˆ v ) is nonzero. No w deﬁne the edit ˜ γ r via ˜ γ r ˆ v : = (1 − t + r ( ˆ v ) − t − r ( ˆ v )) ¯ γ r ˆ v + t + r ( ˆ v ) γ + ˆ v + t − r ( ˆ v ) γ − ˆ v . By design, A ˜ γ r = 0 for each r . W e no w sho w that E ˜ γ r [ c ] ≤ ρ for suﬃciently large r , establishing that ˜ γ r is ev entually feasible to ( P ′ ). T o see this, we assume without loss of generalit y that c ≥ 0 (otherwise, replace c with c − inf c ) and let K ϵ b e such that b oth R [ R c dγ ± ˆ v ] >K [ R c dγ ± ˆ v ] d ˆ ν < a · ϵ/ (4 κ ϵ ) , as ensured by the absolute con tinuit y of the Leb esgue integral. Then we ﬁnd that for an y r, E ˜ γ r [ c ] = Z (1 − t + r − t − r )[ Z c d ¯ γ r ˆ v ] d ˆ ν + Z t + r [ Z c dγ + ˆ v ] d ˆ ν + Z t − r [ Z c dγ − ˆ v ] d ˆ ν ≤ ( ρ − aϵ/κ ϵ ) +  aϵ/ (4 κ ϵ ) + K ϵ Z t + r d ˆ ν  +  aϵ/ (4 κ ϵ ) + K ϵ Z t − r d ˆ ν  ≤ ρ − aϵ/ (2 κ ϵ ) + K ϵ Z t + r + t − r d ˆ ν ≤ ρ − aϵ/ (2 κ ϵ ) + K ϵ 1 b  a − a · ϵ/κ ϵ τ r + a ∥ θ r ∥ L 1 ( ˆ ν )  | {z } ↓ 0 , as r →∞ , where the last inequality follo ws from Z ( t + r + t − r ) d ˆ ν = Z max { t + r , t − r } d ˆ ν ≤ Z | ( A ¯ γ r )( ˆ v ) | | ( A ¯ γ r )( ˆ v ) | + b d ˆ ν ≤ 1 b ∥ A ¯ γ r ∥ L 1 ( ˆ ν ) ≤ 1 b  a − a · ϵ/κ ϵ τ r + a ∥ θ r ∥ L 1 ( ˆ ν )  . ( † ) Th us we ﬁnd E ˜ γ r [ c ] ≤ ρ for r suﬃciently large. Consequen tly , w e pro ceed under the assumption that at the conclusion of these tw o edits, the measure ˜ γ r that we obtain is feasible to ( P ′ ). (2) V erifying ( ∗∗ ) : In the following, without loss of generality , we suppose that f ≤ 0 (otherwise, replace f with f − sup f ). F or an y M > 0, deﬁne f M : = max { f , − M } , ∆ M : = f M − f ≥ 0 so that f = f M − ∆ M . Since 0 ≤ ∆ M ↓ 0 as M → ∞ , w e will let M ϵ > ϵ suc h that E γ 0 [∆ M ϵ ( V 0 , W 0 )] + E γ + [∆ M ϵ ( V + , W + )] + E γ − [∆ M ϵ ( V − , W − )] < ϵ/ 2 , justiﬁed by the Monotone Congergence Theorem: lim M →∞ R ∆ M dγ ′ = 0 for γ ′ ∈ { γ 0 , γ + , γ − } . Then E γ r [ f ( V r , W r )] − E ˜ γ r [ f ( ˜ V r , ˜ W r )] = η [ − M ϵ , 0] z }| { E γ r [ f M ϵ ( V r , W r )] − E ˜ γ r [ f M ϵ ( ˜ V r , ˜ W r )] 13 + η ( −∞ , − M ϵ ) z }| { E ˜ γ r [∆ M ϵ ( ˜ V r , ˜ W r )] − E γ r [∆ M ϵ ( V r , W r )] . W e conclude the pro of b y bounding each of η [ − M ϵ , 0] and η ( −∞ , − M ϵ ) . Sp eciﬁcally , w e sho w: (a) η [ − M ϵ , 0] ≤ O ( τ r + ∥ θ r ∥ L 1 ( ˆ ν ) ) + ϵ/ 2 and (b) η ( −∞ , − M ϵ ) < ϵ/ 2. F or (a), as η [ − M ϵ , 0] ≤ M ϵ · d TV ( γ r , ˜ γ r ), it suﬃces to sho w that d TV ( γ r , ˜ γ r ) ≤ O ( τ + ∥ θ ∥ L 1 ( ˆ ν ) ) + ϵ/ (2 M ϵ ). Indeed, it is by design of ˜ γ r that we hav e d TV ( ¯ γ r , ˜ γ r ) ≤ Z d TV  ¯ γ r ˆ v , (1 − t + r ( ˆ v ) − t − r ( ˆ v )) ¯ γ r ˆ v + t + r ( ˆ v ) γ + ˆ v + t − r ( ˆ v ) γ − ˆ v  ˆ ν ( d ˆ v ) ≤ Z  t r ( ˆ v ) + d TV ( ¯ γ r ˆ v , γ + ˆ v ) + t r ( ˆ v ) − d TV ( ¯ γ r ˆ v , γ − ˆ v )  ˆ ν ( d ˆ v ) ≤ Z max( t + r ( ˆ v ) , t − r ( ˆ v )) ˆ ν ( d ˆ v ) ( † ) ≤ 1 b  a − a · ϵ/κ ϵ τ r + a ∥ θ r ∥ L 1 ( ˆ ν )  , so we explicitly sp ecify κ ϵ : = 2 M ϵ to attain the desired result d TV ( γ r , ˜ γ r ) ≤ d TV ( γ r , ¯ γ r ) + d TV ( ¯ γ r , ˜ γ r ) ≤ τ r + a · ϵ/L τ r + a + O ( ∥ θ r ∥ L 1 ( ˆ ν ) ) ≤ O ( τ r + ∥ θ r ∥ L 1 ( ˆ ν ) ) + ϵ/ (2 M ϵ ) . F or (b), using ∆ M ϵ ≥ 0 and noting that 1 − t + r ( ˆ v ) + t − r ( ˆ v ) ≤ 1 for all ˆ v ∈ V , as well as 1 − t r ≤ 1, it follows that E ˜ γ r [∆ M ϵ ( ˜ V r , ˜ W r )] = Z ∆ M ϵ ( v , w ) d ˜ γ r ( ˆ v , ˆ w , v , w ) = Z h Z ∆ M ϵ ( v , w ) d ˜ γ r ˆ v ( ˆ w , v , w ) i ˆ ν ( d ˆ v ) ≤ Z h Z ∆ M ϵ ( v , w ) d ( γ r ˆ v + γ 0 ˆ v + γ + ˆ v + γ − ˆ v )( ˆ w , v , w ) i ˆ ν ( d ˆ v ) = E γ r [∆ M ϵ ( V r , W r )] + E γ 0 [∆ M ϵ ( V 0 , W 0 )] + E γ + [∆ M ϵ ( V + , W + )] + E γ − [∆ M ϵ ( V − , W − )] < E γ r [∆ M ϵ ( V r , W r )] + ϵ/ 2; in other words, η ( −∞ , − M ϵ ) < ϵ/ 2 , as desired. As a side remark, although ( P ) = ( P ′ ) is a consequence of Assumption 1 b y Lemma 2 , it is also a necessary condition for zero duality gap [( P ) = ( D Ψ )]. This is readily seen via p erturbation arguments. Indeed, when p ′ is the p erturbation function for ( P ′ ) in Theorem 1 , then b y Prop osition 1 and the fact that p ′ (0 , 0) ≤ ( cl p ′ )(0 , 0), the zero duality gap condition [( P ′ ) = ( D Ψ )] reveals ( P ′ ) ≤ ( P ) ≤ lim sup τ → 0 p ′ ( τ , 0) ≤ lim sup ( τ ,θ ) → 0 p ′ ( τ , θ ) ≤ p ′ (0 , 0) = ( P ′ ) = ( D Ψ ) . 3.5 Circum v enting Compactness: In terc hangeabilit y Principle [( D Ψ ) = ( D Ψ + IP )] Complemen ting Section 3.4 ’s inv estigation into attaining zero dualit y gap, this section inv estigates the other half of the story to [( P ) = ( D Ψ + IP )], the In terchangeabilit y Principle, and in doing so, completes our approac h that circum ven ts compactness. The plan in this section is as follows. Theorem 1 gran ts zero duality gap for the case of Ψ = L ∞ ( ˆ ν ). Up on endo wing U = V × W with more (top ological) structure—short of compactness—w e can in fact shrink Ψ to the smaller space C b ( V ) ⊆ L ∞ ( ˆ ν ). Sp eciﬁcally , we will let V b e a normal top ological space (e.g., a P olish Space), with G its Borel sigma algebra, suc h that ( V , G , ˆ ν ) is a Radon measure space, and W a b ounded subset of R . The utilit y of this reduction is clear, as kno wn results from the theory of normal integrands ( Ro ck afellar and W ets , 1998 ) provides one (direct) route to the Interc hangeabilit y Principle whenev er Ψ ⊆ C ( V ), the set of contin uous, real-v alued functions on V . Thus the main eﬀort is in showing the v alidity of the reduction to Ψ = C b ( V ), as summarized in the next result. 14 Theorem 2. Supp ose that h ∈ L 1 ( ˆ ν ) , W ⊆ R is b ounde d, V is a Polish Sp a c e with G its Bor el sigma algebr a. Then ( D L ∞ ( ˆ ν ) ) = ( D C b ( V ) ) . Pr o of. F or conv enience, let L C ( λ, ψ ; γ ) : = E γ [ f ( V , W ) − λc ( ˆ V , ˆ W , V , W ) − ψ ( ˆ V ) · ( W − h ( ˆ V )); [ ˆ V ∈ C ]] for some subset C ⊆ V , which w e tak e as V when not sp eciﬁed, and J ( λ, ψ ) : = λρ + sup γ ∈ Γ ˆ µ ∩ Γ W L ( λ, ψ ; γ ). Since L ∞ ( ˆ ν ) ⊇ C b ( V ), whic h implies ( D L ∞ ( ˆ ν ) ) ≤ ( D C b ( V ) ), w e will pro ceed under the assumption that ( D L ∞ ( ˆ ν ) ) < + ∞ ; otherwise the equalit y ( D L ∞ ( ˆ ν ) ) = ( D C b ( V ) ) follo ws trivially . Seeking to establish ( D L ∞ ( ˆ ν ) ) ≥ ( D C b ( V ) ), let ϵ : = ( D C b ( V ) ) − ( D L ∞ ( ˆ ν ) ) > 0 for the sak e of contradiction. Let ¯ λ ≥ 0, ¯ ψ ∈ L ∞ ( ˆ ν ), and ¯ γ ∈ Γ ˆ µ ∩ Γ W b e suc h that J ( ¯ λ, ¯ ψ ) ≤ ( D L ∞ ( ˆ ν ) ) + ϵ/ 4 and sup γ ∈ Γ ˆ µ ∩ Γ W L ( ¯ λ, ¯ ψ ; γ ) − ϵ/ 4 ≤ L ( ¯ λ, ¯ ψ ; ¯ γ ) . ( ∗ ) Next let δ > 0. As ( V , G , ˆ ν ) is a Radon measure space, by Lusin’s Theorem there exists a closed set C δ with complement C c δ satisfying ˆ ν ( C c δ ) < δ suc h that ¯ ψ is contin uous ov er C δ . F urther, we may construct a ¯ ψ ′ δ ∈ C b ( V ) based on ¯ ψ . Sp eciﬁcally , w e can start with the restriction ¯ ψ ′ δ : = ¯ ψ o ver C δ and then extend it to all of V such that ∥ ¯ ψ ′ δ ∥ L ∞ ( ˆ ν ) ≤ ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) , b y app ealing to the Tietze Extension Theorem. W e then obtain the following approximation, whic h is v alid for all γ ∈ Γ ˆ µ ∩ Γ W :   L ( ¯ λ, ¯ ψ ′ δ ; γ ) − L ( ¯ λ, ¯ ψ ; γ )   =   L C c δ ( ¯ λ, ¯ ψ ′ δ ; γ ) − L C c δ ( ¯ λ, ¯ ψ ; γ )   =   E γ  ¯ ψ ( ˆ V ) − ¯ ψ ′ δ ( ˆ V )  · ( W − h ( ˆ V ); [ ˆ V ∈ C c δ ]    ≤ 2 ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) · E γ [ | W − h ( ˆ V ) | ; [ ˆ V ∈ C c δ ]] . ( ∗∗ ) F rom this we obtain another approximation sup γ ∈ Γ ˆ µ ∩ Γ W L ( ¯ λ, ¯ ψ ′ δ ; γ ) = sup γ ∈ Γ ˆ µ ∩ Γ W L C δ ( ¯ λ, ¯ ψ ′ δ ; γ ) + L C c δ ( ¯ λ, ¯ ψ ′ δ ; γ ) = sup γ ∈ Γ ˆ µ ∩ Γ W L C δ ( ¯ λ, ¯ ψ ; γ ) + L C c δ ( ¯ λ, ¯ ψ ′ δ ; γ ) ( ∗∗ ) ≤ sup γ ∈ Γ ˆ µ ∩ Γ W L C δ ( ¯ λ, ¯ ψ ; γ ) + L C c δ ( ¯ λ, ¯ ψ ; γ ) + 2 ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) · E γ [ | W − h ( ˆ V ) | ; [ ˆ V ∈ C c δ ]] ≤ sup γ ∈ Γ ˆ µ ∩ Γ W L ( ¯ λ, ¯ ψ ; γ ) + 2 ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) η δ z }| { sup γ ∈ Γ ˆ µ ∩ Γ W E γ [ | W − h ( ˆ V ) | ; [ ˆ V ∈ C c δ ]] ( ∗ ) ≤  L ( ¯ λ, ¯ ψ ; ¯ γ ) + ϵ/ 4  + 2 ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) · η δ ( ∗∗ ) ≤  L ( ¯ λ, ¯ ψ ′ δ ; ¯ γ ) + ϵ/ 4  + 4 ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) · η δ . ( † ) Finally we compute J ( ¯ λ, ¯ ψ ′ δ ) to ﬁnd ¯ λρ + sup γ ∈ Γ ˆ µ ∩ Γ W L ( ¯ λ, ¯ ψ ′ δ ) ( † ) ≤ ¯ λρ +  L ( ¯ λ, ¯ ψ ′ δ ; ¯ γ ) + ϵ/ 4  + 4 η δ ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) ( ∗∗ ) ≤ ¯ λρ +  L ( ¯ λ, ¯ ψ ; ¯ γ ) + ϵ/ 4  + 6 η δ ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) ≤ J ( ¯ λ, ¯ ψ ) + ϵ/ 4 + 6 η δ ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) ( ∗ ) ≤ ( D L ∞ ( ˆ ν ) ) + ϵ/ 2 + 6 η δ ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) . Noting that W is b ounded, w e can ensure that η δ ↓ 0 as δ ↓ 0. T ogether with ∥ ¯ ψ ∥ L ∞ ( ˆ ν ) < ∞ , this giv es ( D C b ( V ) ) ≤ inf λ ≥ 0 ,ψ ∈C b ( V ) J ( λ, ψ ) ≤ J ( ¯ λ, ¯ ψ ′ δ ) ≤ ( D L ∞ ( ˆ ν ) ) + ϵ/ 2 , a contradiction, and w e conclude ( D L ∞ ( ˆ ν ) ) = ( D C b ( V ) ). Still omitting the assumption of compactness of U (i) , a further reﬁnement can b e obtained up on adding the assumptions (ii) and (iii) from Blanchet et al. ( 2025 ). Namely we will obtain what is referred to as the In terchangeabilit y Principle ( Zhang et al. , 2024 ). 15 Prop osition 2 (In terchangeabilit y Principle) . Supp ose that f is upp er semic ontinuous, c is lower semic on- tinuous, V ⊆ R n , W ⊆ R ar e de c omp osable with r esp e ct to ˆ µ and Ψ is a subset of C ( V ) , the c ontinuous, G -me asur able, r e al value d functions. Then ( D Ψ ) = ( D Ψ + IP ) . Pr o of. By assumption, the in tegrand in the inner supremum ( D Ψ ) is upp er semicontin uous and hence a normal integrand b y Example 14.31 of Ro c k afellar and W ets ( 1998 ). Moreo ver, the remaining conditions of Theorem 14.60 (ib.) are met, giving the desired conclusion. 3.6 CM W asserstein Dualit y Without Compactness In this section w e conclude our study of CM W asserstein duality [( P ) = ( D Ψ + IP )]. W e ﬁrst state a result that summarizes our eﬀorts to circumv ent compactness. W e then commen t on the nature of this dualit y and ho w it compares to standard approaches in the literature. The nonapplicablity of the traditional Slater’s condition Proposition 1(c ∗ ) to Example 1 diﬀerentiates our results from the traditional theory . In addition, the nonattainment demonstrated in Example 3 , 4 and 5 sho ws that our work is also distinct from results obtained b y the classic F enc hel Dualit y Theorem—see Ro c k afellar ( 1970 ) for the original, ﬁnite-dimensional statement, Brezis ( 2011 ) for an inﬁnite-dimensional statemen t, and Cuong et al. ( 2023 ) for a more recent generalization—or for that matter, any result that guaran tees the existence of a primal or dual solution. 3.6.1 Circum ven ting Compactness Completed The previous sections established zero dualit y gap and the Interc hangeability Principle under some assump- tions that do not imply compactness (of U or of the primal and dual feasible sets). W e consolidate those results into a single statement. That is, by Lemma 2 , Theorems 1 and 2 , and Prop osition 2 , resp ectiv ely , w e immediately obtain the following equalities. Theorem 3. Supp ose Assumptions 1 and 2 hold; f is b ounde d ab ove and upp er semic ontinuous; c is b ounde d b elow and lower semic ontinuous; Ψ : = C b ( V ) ; W ⊆ R is b ounde d, V ⊆ R n and G the standar d Bor el sigma algebr a; and ( V , G , ˆ ν ) has the R e gular Conditional Pr ob ability Pr op erty ( F aden , 1985 ). Then, ( P ) = ( P ′ ) = ( D L ∞ ( ˆ ν ) ) = ( D C b ( V ) ) = ( D C b ( V ) + IP ) . 3.6.2 On the (Non)existence of Solutions W e remark that the CM W asserstein duality [( P ) = ( D Ψ + IP )] is a statemen t ab out equalit y of optimal v alues but leav es unanswered the related question of solution attainment. W e tak e up the question of existence in this section. As a ﬁrst step, w e presen t an example showing that the existence of primal optimal solutions is not guaranteed when the compactness of U is relaxed, ev en if f and c remain closed, i.e., ¬ (i) ∧ (ii) ∧ (iii) . Example 3 (Primal nonexistence) . L et V = { 0 } and W : = R b e endowe d with the standar d (subsp ac e) top olo gies, F b e the pr o duct of their r esp e ctive Bor el sigma-ﬁelds, f ( v , w ) : = − 1 / (1 + | w | ) , h : = 1 , ρ > 0 , c : = 0 , ˆ µ : = δ (0 , 0) , and Ψ : = L ∞ ( ˆ ν ) = C b ( V ) = R . Then 0 = ( P ) = ( D Ψ ) = ( D Ψ + IP ) , but no optimal γ exists. Cle arly Assumptions 1 and 2 hold, so ( P ) = ( P ′ ) = ( D Ψ ) by The or em 1 . We ﬁnd ( D Ψ ) = ( D Ψ + IP ) by noting that f , c, ψ ar e c ontinuous and applying Pr op osition 2 . Final ly it is cle ar that f (0 , w ) < 0 for al l w ∈ R , so E µ [ f ] < 0 for any c oupling µ that is fe asible to ( P ) , pr eventing primal attainment. 16 Next we answer the dual existence question in the negative by developing a template for constructing instances where no optimal dual solution exists, y et there is zero duality gap (via Theorem 1 ). Lemma 4 (T emplate for dual nonexistence) . L et ( V , G ) b e a me asur able sp ac e with pr ob ability me asur e ˆ ν , W : = { 0 , 2 } , ˆ µ : = ˆ ν ⊗ δ 0 , h ≡ 1 , ρ > 0 , and c take value + ∞ if v  = ˆ v and zer o otherwise. L et g : V → ( −∞ , 0] b e G -me asur able with g ∈ L 1 ( ˆ ν ) , f ( v , w ) : = g ( v ) · 1 { w =0 } ( w ) , and Ψ : = L ∞ ( ˆ ν ) . Then ( P ) = ( D Ψ ) = ( D Ψ + IP ) = E ˆ ν [ g / 2] , and in addition, ψ ∗ ( ˆ v ) : = − g ( ˆ v ) / 2 is the ˆ ν -a.e. unique dual optimal solution to ( D L 1 ( ˆ ν ) + IP ) , i.e., with dual multiplier sp ac e L 1 ( ˆ ν ) . Pr o of. Let ¯ Γ : = Γ ˆ µ ∩ Γ W ∩ { γ : E γ [ c ] < + ∞} . By construction of c ∈ { 0 , + ∞} , it holds that E γ [ c ] < ∞ ⇒ V = ˆ V , γ -a.s. ( † ) W e no w v erify Assumptions 1 and 2 . Let γ 0 b e the coupling deﬁned b y ˆ V ∼ ˆ ν , ˆ W ≡ 0, V = ˆ V a.s., and γ 0 ( W = 0 | ˆ V ) = γ 0 ( W = 2 | ˆ V ) = 1 / 2, ˆ ν -a.s. so E γ 0 [ W − 1 | ˆ V ] = 0, ˆ ν -a.s. Then E γ 0 [ c ] = 0 < ρ and E γ 0 [ f ] = E ˆ ν [ g ] / 2 > −∞ since g ∈ L 1 ( ˆ ν ). Thus Assumption 1 holds, so ( P ) = ( P ′ ) reduces to a problem o ver couplings. Let γ ± b e couplings with ˆ V ∼ ˆ ν , ˆ W ≡ 0, V = ˆ V a.s., and W − ≡ 0 and W + ≡ 2, resp ectiv ely . Then E γ ± [ c ] = 0 and E γ ± [ W ± − 1 | ˆ V ] ≡ ± 1 , so Assumption 2 holds. If γ is feasible for ( P ′ ), then the conditional momen t constrain t E γ [ W | ˆ V ] = 1 and W ∈ { 0 , 2 } imply γ ( W = 0 | ˆ V ) = γ ( W = 2 | ˆ V ) = 1 / 2 ˆ ν -a.s. Hence ev ery feasible γ has the same ob jective v alue, so E γ [ g ( ˆ V ) · 1 { W =0 } ] = E ˆ ν [ g ] / 2 = ( P ) = ( D Ψ ) , where the last equality follo ws from Theorem 1 . The v alue of the dual program ( D Ψ ) can b e expressed with couplings restricted to ¯ Γ, i.e., ( D Ψ ) = inf λ> 0 ,ψ ∈ Ψ λρ + sup γ ∈ ¯ Γ E γ [ f − λc − ψ ( ˆ V ) · ( W − 1)] = inf λ ≥ 0 ,ψ ∈ Ψ λρ + sup γ ∈ ¯ Γ E γ [ f − λc − ψ ( ˆ V ) · ( W − 1)] . In the ab ov e, we ma y take λ = 0, since for an y γ ∈ ¯ Γ, it holds that c = 0, γ -a.s., so ( † ) ensures that the inner suprem um is indep enden t of λ , and hence the outer term λρ is minimized at λ = 0. Next we sho w that the IP holds. T o b egin, note that any γ ∈ ¯ Γ may b e expressed as γ ( d ˆ v , d ˆ w ; dv , dw ) = ˆ ν ( d ˆ v ) δ 0 ( d ˆ w ) δ ˆ v ( dv )  (1 − t ( ˆ v )) δ 0 ( dw ) + t (ˆ v ) δ 2 ( dw )  for some G -measurable t : V → [0 , 1], i.e., t ( ˆ v ) = γ ( W = 2 | ˆ V = ˆ v ). Th us the inner supremum ov er couplings reduces to a supremum o v er G -measurable t , and for ﬁxed ψ ∈ Ψ, satisﬁes sup γ ∈ ¯ Γ E γ [ f ( V , W ) − ψ ( ˆ V ) · ( W − 1)] = sup t : V → [0 , 1] Z  (1 − t ( ˆ v )) · [ f ( ˆ v , 0) + ψ ( ˆ v )] + t ( ˆ v ) · [ f ( ˆ v , 2) − ψ ( ˆ v )]  ˆ ν ( d ˆ v ) = Z max { g ( ˆ v ) + ψ ( ˆ v ) , − ψ ( ˆ v ) } ˆ ν ( d ˆ v ) . The last equality uses the p oin t wise identit y sup t ∈ [0 , 1] (1 − t ) a + tb = max { a, b } is attained b y t ∗ ( ˆ v ) = 1 {− ψ (ˆ v ) ≥ g (ˆ v )+ ψ ( ˆ v ) } ( ˆ v ), whic h is G -measurable since it is deﬁned b y comparing G -measurable functions. Th us the IP is justiﬁed, and ( D Ψ ) = ( D Ψ + IP ) = inf ψ ∈ Ψ E ˆ ν [max { g + ψ , − ψ } ] = E ˆ ν [ g ] / 2 + inf ψ ∈ Ψ E ˆ ν [ | ψ + g / 2 | ] , where the last equality uses the p oin twise identit y max { g ( ˆ v ) + ψ ( ˆ v ) , − ψ ( v ) } = g ( ˆ v ) / 2 + | ψ ( ˆ v ) + g ( ˆ v ) / 2 | . The second term is minimized if and only if ψ ( ˆ v ) = − g ( ˆ v ) / 2, ˆ ν -a.e., which is therefore the ˆ ν -a.e. unique dual optimizer to ( D L 1 ( ˆ ν ) + IP ) ov er the enlarged space of dual multipliers, L 1 ( ˆ ν ). Using Lemma 4 , we introduce a pathological example which shows that the existence of an optimal dual m ultiplier is not guaran teed in the setting of Theorem 4.2 of Blanc het et al. ( 2025 ) with Ψ : = C b ( V ), ev en when U is compact and f and c are closed, i.e., [ (i) ∧ (ii) ∧ (iii) ] holds. 17 Example 4 (Dual nonexistence in C b ( V )) . In the setting of L emma 4 , let V : = [0 , 1] and W : = { 0 , 2 } b e endowe d with standar d (subsp ac e) top olo gies, F b e the pr o duct of their r esp e ctive Bor el sigma-ﬁelds, ˆ ν : = Unif [0 , 1] , and Ψ : = L ∞ ( ˆ ν ) . L et C ⊆ V b e the (close d) Smith-V olterr a-Cantor set, for which ˆ ν ( C ) = 1 / 2 , and O : = V \ C denote its c omplement, which is op en in R . Final ly, set g ( v ) : = − 1 O ( v ) ∈ L ∞ ( ˆ ν ) \ C b ( V ) . Then ( P ) = ( D Ψ + IP ) = E ˆ ν [ g / 2] . A lso by L emma 4 , the ˆ ν -a.e. unique minimizer is ψ ∗ = − g / 2 , which is attaine d in Ψ . If we r estrict the dual multiplier sp ac e to C b ( V ) , then the optimal value is pr eserve d, but ther e do es not exist ψ ′ ∈ C b ( V ) achieving the optimal value. Inde e d, for dist( · , C ) denoting the distanc e to C , we may appr oximate g r ( v ) : = − min { 1 , r · dist( v , C ) } ∈ C b ( V ) (sinc e dist( · , C ) is c ontinuous) and ψ r : = − g r / 2 so that g r ↓ g and ψ r ↑ ψ ∗ with dual value appr o aching E ˆ ν [ g / 2] fr om ab ove by monotone c onver genc e the or em; however, if ψ ′ ∈ C b ( V ) attaine d the optimal value, then ψ ′ = ψ ∗ , ˆ ν -a.e., and sinc e ˆ ν has ful l supp ort, this would for c e ψ ′ = ψ ∗ everywher e, c ontr adicting the disc ontinuity of ψ ∗ . Mor e over, f ( v , w ) : = − 1 O ( v ) · 1 { w =0 } ( w ) , is upp er semic ontinuous. Inde e d, at w = 0 , f ( · , 0) = − 1 O ( · ) is upp er semic ontinuous sinc e 1 O ( · ) is lower semic ontinuous b e c ause O is op en. At w = 2 , f ( · , 2) = 0 is c ontinuous. This veriﬁes close dness of f (ii) ; mor e over, c omp actness of U (i) and close dness of c (iii) b oth hold. Finally we present an example in which dual attainment in Ψ : = L ∞ ( ˆ ν ) is not guaran teed, again using Lemma 4 . Example 5 (Dual nonexistence in L ∞ ( ˆ ν )) . In the setting of L emma 4 , let V : = (0 , 1] ⊂ R , ˆ ν : = Unif [0 , 1] , and g ∈ L 1 ( ˆ ν ) \ L ∞ ( ˆ ν ) b e given by g ( v ) : = − 1 / √ v so f ≤ 0 . Then ( P ) = ( D Ψ + IP ) = E ˆ ν [ g / 2] . By L emma 4 , the ˆ ν -a.e. unique minimizer is ψ ∗ = − g / 2 ; however, sinc e Ψ is a line ar sp ac e and g / ∈ Ψ , we ﬁnd ψ ∗ / ∈ Ψ and yet the optimal dual value E ˆ ν [ g / 2] is appr o ache d by a se quenc e { ψ r } ∞ r =1 ∈ Ψ wher e ψ r : = − g r / 2 for g r : = max { g , − r } satisfy g r → g in L 1 ( ˆ ν ) . 4 “Primal-W orst Equals Dual-Best” via P erturbations 4.1 The Primal-W orst Problem In this section, w e review the r obust c ounterp art of an optimization problem in v olving uncertain parameters, in which a decision maker anticipates the w orst case in terms of b oth feasibilit y and cost. Accordingly , Bec k and Ben-T al ( 2009 ) pro vided the alternative moniker, primal-worst , whic h we formally deﬁne b elow using z to denote the uncertain parameters. Deﬁnition 3 (Primal-W orst) . L et X, Z b e LCTV Hausdorﬀ sp ac es, Z ⊆ Z b e a close d, c onvex, nonempty set. If ϕ : X × Z → R is a c onvex-c onc ave function, then we r efer to inf x sup z ϕ ( x, z ) as a primal-worst pr oblem. In particular, for m ≥ 0, giv en functions { f i : X × Z → R } m i =0 suc h that: { f i ( · , z ) } m i =0 are prop er, closed, con vex for an y z ∈ Z ; and {− f i ( x, · ) } m i =0 are prop er, closed, con vex for an y x ∈ X , then inf x ∈ X F 0 ( x ) : = sup z 0 ∈Z f 0 ( x, z 0 ) s.t. F i ( x ) : = sup z i ∈Z f i ( x, z i ) ≤ 0 , i ∈ ( m ) (P - W) is a primal-w orst problem. W e remark that −∞ < f i < + ∞ for all x ∈ X and z ∈ Z ; consequen tly , both f i ( · , z ) and f i ( x, · ) are contin uous for an y ( x, z ) ∈ X × Z . F or primal-w orst problems of the form ( P - W ), Bec k and Ben-T al ( 2009 ) introduce a pro cedure for obtaining a low er b ound in the form of another optimization problem, which they refer to as the dual-b est problem. When tigh t, a duality relationship holds b etw een 18 p essimistic and optimistic mathematical programming formulations termed primal-worst e quals dual-b est . Since Beck and Ben-T al ( 2009 ), there hav e been sev eral follow-up works, including Jey akumar and Li ( 2010 , 2014 ) and Zhen et al. ( 2025 ), that explore this dualit y . As it stands, the literature no w presen ts a v ariety of dual-b est problems derived and studied under diﬀeren t (com binations of ) dualit y theories, e.g., Lagrangian, Conic, and F enchel, resulting in b oth con vex and noncon vex form ulations. In the remainder of this section, we advocate and pursue a p erturbation dualit y approach to the formulation of dual-b est and the study of its equality to primal-w orst. W e demonstrate the ﬂexibility of this approach in unifying the v arious formulations from the literature, as well as its capacity to facilitate short, simpler pro ofs. 4.2 F ormulating Dual-Bests W e no w deﬁne the notion of a dual-b est. As in Section 2.2 ’s use of a bifunction to embed a primal mini- mization problem in a family of p erturbed minimization problems, we will use a collection of bifunctions to em b ed a min-max problem in a family of p erturbed min-max problems. Deﬁnition 4 (Dual-Best) . Given a primal-worst pr oblem with ϕ fr om Deﬁnition 3 , let F z : U × X → R denote a c onvex bifunction for e ach z ∈ Z satisfying inf x sup z ϕ ( x, z ) = inf x sup z F z (0 , x ) . We r efer to sup z ∈ Z,u ∗ ∈ U ∗ ( F z ) d (0 , u ∗ ) as a dual-b est pr oblem. Just as a dual con vex problem low er b ounds its primal problem, the dual-b est aﬀorded by a family of bifunctions { F z } z ∈ Z is also a low er b ound for its primal-w orst inf x sup z ϕ ( x, z ); one wa y to see this is as an application of weak dualit y to the r obust bifunction (sup z F z ) in conjunction with in terchanging inf x,u sup z : inf x sup z ϕ ( x, z ) ≡ inf x sup z F z (0 , x ) ≥ sup u ∗ (sup z F z ) d (0 , u ∗ ) = sup u ∗ inf x,u sup z F z ( u, x ) + ⟨ u ∗ , u ⟩ (i ′ ) ≥ sup u ∗ sup z inf x,u F z ( u, x ) + ⟨ u ∗ , u ⟩ = sup z ,u ∗ ( F z ) d (0 , u ∗ ) (ii ′ ) Ho wev er, unlike a dual to a primal problem, a dual-b est to a primal-worst ma y not necessarily b e a conv ex form ulation. Importantly , w e also note that Deﬁnition 4 admits more than one dual-b est to a primal-w orst, and that the form of a dual-b est relies crucially on the selected family { F z } z ∈ Z of conv ex bifunctions. Unifying Dual-Bests to ( P - W ) Deﬁnition 4 aﬀords a ﬂexibility that allows us to unify v arious “dual-b ests” from the literature asso ciated with a primal-worst problem of the form ( P - W ). 4.2.1 A Noncon v ex Dual-Best Consider the family { F ′ z } z ∈ Z [ m ] b e deﬁned via F ′ z ( u, x ) : = f 0 ( x, z 0 ) − ι Z ( z 0 ) + X i ∈ ( m ) ι ( −∞ ,u i ] ( f i ( x, z i ) − ι Z ( z i )) , ∀ z ∈ Z [ m ] , (3) where u : = ( u i ) m i =1 ∈ R m represen ts perturbations to the range of the comp onent functions. The resulting dual-b est is sup z ,u ∗ ( F ′ z ) d (0 , u ∗ ) = sup z ∈Z [ m ] h sup u ∗ ∈ R m + inf x f 0 ( x, z 0 ) + X i ∈ ( m ) u ∗ i · f i ( x, z i ) i , whic h recov ers precisely the (noncon vex) dual-best program studied in Bec k and Ben-T al ( 2009 ) and Jey aku- mar and Li ( 2010 , 2014 ). 19 4.2.2 A Con v ex Dual-Best T o obtain a diﬀerent (con v ex) dual-b est, we ma y alternatively consider the family { F z } z ∈ Z [ m ] deﬁned via F z ( u, x ) : = f 0 ( x − d 0 , z 0 ) − ι Z ( z 0 ) + X i ∈ ( m ) ι ( −∞ ,w i ] ( f i ( x − d i , z i ) − ι Z ( z i )) , ∀ z ∈ Z [ m ] , (4) where u : = ( { w i } m i =1 , { d i } m i =0 ) ∈ R m × X [ m ] represen ts p erturbations to the constraints and decisions, resp ectiv ely . The dual-b est problem is sup u ∗ ,z F d z (0 , u ∗ ) ≡ sup u ∗ ,z inf x,u F z ( u, x ) + ⟨ u ∗ , u ⟩ ≡ sup u ∗ sup z inf x,u f 0 ( x − d 0 , z 0 ) − ι Z ( z 0 ) + P i ∈ ( m ) ι ( −∞ ,w i ] ( f i ( x − d i , z i ) − ι Z ( z i )) + w ∗ i w i + ⟨ d ∗ i , d i ⟩ = sup { w ∗ i ≥ 0 } m i =1 P m i =0 d ∗ i =0 sup { z i ∈Z } m i =0 inf { y i } m i =0 f 0 ( y 0 , z 0 ) + P i ∈ ( m ) w ∗ i · f i ( y i , z i ) − P m i =0 ⟨ d ∗ i , y i ⟩ | {z } ℓ ( y ; z,u ∗ ) : = , (5) where it will b e conv enien t to record the expression ℓ ( y ; z , u ∗ ), and the equality holds via the co ordinate c hange y i : = x − d i ∈ X for i ∈ [ m ] (see p. 322-323 of Ro ck afellar ( 1970 )). W e can attain a more explicit form to ( 5 ) by using the following con vex conjugacy calculation (compare with Ro c k afellar ( 1970 , Theorem 16.1)): sup z i ∈ Z inf x ∈ X c · f i ( x, z i ) − ⟨ d ∗ i , x ⟩ − ι Z ( z i ) = sup z i ∈ Z − ( h i c )( z i , d ∗ i ) − ι c ·Z ( z i ) , ∀ c ∈ [0 , + ∞ ) , ∀ i ∈ [ m ] , (C) where h i : Z × X ∗ → R is deﬁned by h i ( z i , d ∗ i ) : = [ f i ( · , z i )] ∗ ( d ∗ i ) are conv ex, and for w ∗ i ≥ 0, ( h i w ∗ i ) denotes the right scalar m ultiplication of h i b y w ∗ i . Using ( C ), we arriv e at the following maximization form: sup { z i } m i =0 , { w ∗ i } m i =1 , { d ∗ i } m i =0 − h 0 ( z 0 , d ∗ 0 ) − X i ∈ ( m ) ( h i w ∗ i )( z i , d ∗ i ) s.t. X i ∈ [ m ] d ∗ i = 0 , z 0 ∈ Z , and z i ∈ w ∗ i · Z , w ∗ i ≥ 0 , i ∈ ( m ) , (D - B) whic h we note is equiv alen t to Zhen et al. ( 2025 )’s (D-B ′ )—the latter being explicitly obtainable after closing the ob jective (via the upp er semi-contin uous hull) and the feasible region (via the addition of recession directions) of ( D - B ). 4.3 The Primal-W orst Equals Dual-Best Principle With v arious conceptualizations of dual-b ests uniﬁed under Deﬁnition 4 , w e now turn our atten tion to the concept known as primal-worst e quals dual-b est , introduced and studied b y Beck and Ben-T al ( 2009 ); Jey akumar and Li ( 2010 , 2014 ); Zhen et al. ( 2025 ). Under our unifying framew ork deriv ed from bifunctions, this will b e understo o d as the case when a family { F z } z ∈ Z yields a dual-b est equal in v alue to the primal- w orst, that is, inf x sup z F z (0 , x ) = sup z ,u ∗ ( F z ) d (0 , u ∗ ) . It is clear that primal-worst equals dual-best holds when ( i ′ ) and ( ii ′ ) are satisﬁed with equality , i.e., I ′ . inf x (sup z F z )(0 , x ) = sup u ∗ (sup z F z ) d (0 , u ∗ ), normality of the conv ex bifunction (sup z F z ); I I ′ . sup u ∗ (sup z F z ) d (0 , u ∗ ) = sup u ∗ sup z ( F z ) d (0 , u ∗ ), comm utativity of “sup z ” and bifunction duality “( · ) d ”. Consequen tly , ( I ′ ) and ( I I ′ ) can provide a straightforw ard roadmap for the design of { F z } z ∈ Z and accompa- n ying assumptions so as to obtain dual-bests for which primal-worst equals dual-best. In fact, w e pro ceed to illustrate that this p ersp ectiv e can yield arguably simpler and shorter pro ofs than those currently found in the literature. Sp eciﬁcally , we will detail how ( P - W ) and ( D - B ) can b e a primal-w orst, dual-best pairing under suitable selection of bifunctions, and subsequently be made equal under familiar suﬃcien t conditions. In doing so, we will generalize and extend the results of Zhen et al. ( 2025 , Theorem 5). 20 4.3.1 ( P - W ) = ( D - B ) b y P erturbing Decisions and Constrain ts Here we consider the family of bifunctions { F z } z ∈ Z [ m ] giv en in ( 4 ), for which it was already noted that ( P - W ) = inf x sup z F z (0 , x ), i.e., the primal-worst coincides with ( P - W ). Moreov er, w e recall that a conju- gacy calculation ensures that its dual-b est coincides with ( D - B ). It follows that [( P - W ) = ( D - B )] is then simply a matter of collecting suﬃcient conditions to ensure [( I ′ ) ∧ ( I I ′ )] holds for the family { F z } z ∈ Z [ m ] . W e now show ho w a straigh tforward and rapid proof can b e devised for Zhen et al. ( 2025 )’s Theorem 5 (i)-(ii). Prop osition 3 ( Zhen et al. ( 2025 )’s Theorem 5 (i)-(ii)) . Consider the setting of Deﬁnitions 3 and 4 , and let Z b e c omp act. If one of the fol lowing c onditions hold: ( i ) ther e exists ¯ x ∈ ∩ m i =0 ri dom F i such that F i ( ¯ x ) < 0 for al l i = 1 , . . . , m ; ( ii ) X is a r eﬂexive Banach sp ac e, and ( P - W ) has a nonempty, b ounde d fe asible r e gion then [( P - W ) = ( D - B )] . If ( i ) , then ( D - B ) has an optimal solution; if ( ii ) then ( P - W ) has an optimal solution. Pr o of. Let { F z } z ∈ Z [ m ] b e the family of bifunctions deﬁned in ( 4 ) for which inf x sup z F z (0 , x ) = ( P - W ) and sup u ∗ sup z ( F z ) d (0 , u ∗ ) = ( D - B ). T o conclude [( P - W ) = ( D - B )], we will show that either of ( i ) or ( ii ) yields ( I ′ ), and compactness of Z yields ( I I ′ ). Eviden tly , ( I I ′ ) can b e immediately concluded from the compactness of Z via Sion’s Minimax Theorem: sup u ∗ ,z F d z (0 , u ∗ ) ≡ sup { w ∗ i ≥ 0 } m i =1 P m i =0 d ∗ i =0 max { z i ∈Z } m i =0 inf { y i } m i =0 f 0 ( y 0 , z 0 ) + P i ∈ ( m ) w ∗ i · f i ( y i , z i ) − P m i =0 ⟨ d ∗ i , y i ⟩ = sup { w ∗ i ≥ 0 } m i =1 P m i =0 d ∗ i =0 inf { y i } m i =0 max { z i ∈Z } m i =0 f 0 ( y 0 , z 0 ) + P i ∈ ( m ) w ∗ i · f i ( y i , z i ) − P i ∈ [ m ] ⟨ d ∗ i , y i ⟩ ≡ sup { w ∗ i ≥ 0 } m i =1 P m i =0 d ∗ i =0 inf { y i } m i =0 F 0 ( y 0 ) + P i ∈ ( m ) w ∗ i · F i ( y i ) − P i ∈ [ m ] ⟨ d ∗ i , y i ⟩ ≡ sup { w ∗ i ≥ 0 } m i =1 P m i =0 d ∗ i =0 (sup z F z ) d (0 , u ∗ ) . Condition ( ii ) , giv en that the ob jectiv e F 0 of ( P - W ) is closed and con vex, ensures the existence of an optimal solution to ( P - W ) b y a w eakly con v ergent subsequence ( Ekeland and T ´ emam , 1999 , Prop osition I I.1.2). Using the closedness of (sup z F z ), we may app eal to Prop osition 1(d) with p (0) : = inf u ∗ − (sup z F z ) d (0 , u ∗ ) and q (0) : = sup x − (sup z F z )(0 , x ) ; sp eciﬁcally , ( P - W ) has an optimal solution means the problem q (0) has an optimal solution, so that Proposition 1(c) yields p (0) = q (0), and hence ( I ′ ). T ak en with our establishmen t of ( I I ′ ) ab o ve, w e can conclude [( P - W ) = ( D - B )], with ( P - W ) having an optimal solution. Condition ( i ) guaran tees sub diﬀeren tiability of the p erturbation function inf x (sup z F z ) at the origin via Prop osition 1(c ∗ ) so that inf x (sup z F z )(0 , x ) = max u ∗ (sup z F z ) d (0 , u ∗ ) and hence ( I ′ ), with existence of an optimal u ∗ . The existence of an accompan ying collection of optimal z i ∈ Z to the problem sup u ∗ ,z F d z (0 , u ∗ ) follo ws from the compactness of Z . W e conclude [( P - W ) = ( D - B )], with ( D - B ) ha ving an optimal solution. W e brieﬂy remark that when ( P - W ) is a feasible problem, then a practical and easy suﬃcient condition for ( ii ) Proposition 3 is the existence of a δ > 0 and ¯ z i ∈ Z for some i ∈ ( m ) such that { x : f i ( x, ¯ z i ) ≤ δ } is b ounded. 21 4.3.2 ( P - W ) = ( D - B ) b y P erturbing Lagrangians In this subsection, we restrict X and Z to ﬁnite-dimensional normed spaces, and show ho w another p er- turbation sc heme can b e designed to reveal another suﬃcient condition for [( P - W ) = ( D - B )]— Zhen et al. ( 2025 )’s Theorem 5 (iii). This alternative sc heme will in fact b e a family {F z } z ∈ Z [ m ] of trivially one mem b er; in other words, F z ≡ F for all z ∈ Z [ m ] . T rivially , (sup z F z ) = F and (sup z F z ) d = F d = sup z F d z , so that primal-w orst equals dual-b est reduces to normalit y of the bifunction F . Let ∆ : = span ( Z − Z ), and for any x ∈ X and any i ∈ [ m ], deﬁne the concav e, closed bifunction G i x ( δ i , z i ) : = f i ( x, z i − δ i 0 ) − ι Z ( z i − δ i 1 ) with p erturbations ( δ i 0 , δ i 1 ) ∈ ∆ × ∆ and decision z i ∈ Z and taking v alues in [ −∞ , + ∞ ). With L i x denoting the Lagrangian asso ciated with G i x , it holds that sup z i G i x (0 , z i ) ≤ inf δ ∗ i ( G i x ) d ( δ ∗ i , 0) ≡ inf δ ∗ i sup z i ∈ ∆ sup δ i G i x ( δ i , z i ) + ⟨ δ ∗ i 0 , δ i 0 ⟩ + ⟨ δ ∗ i 1 , δ i 1 ⟩ ≡ inf δ ∗ i sup z i ∈ ∆ L i x ( δ ∗ i , z i ) (6) = inf δ ∗ i sup z i ∈ ∆ L i x ( δ ∗ i , 0) + ⟨ δ ∗ i 0 + δ ∗ i 1 , z i ⟩ = inf δ ∗ i  L i x ( δ ∗ i , 0) : δ ∗ i 0 + δ ∗ i 0 = 0  , where the ﬁrst equalit y holds b ecause L i x ( δ ∗ i , z i ) = L i x ( δ ∗ i , 0) + ⟨ δ ∗ i 0 + δ ∗ i 1 , z i ⟩ for any z i ∈ ∆. W e also highligh t that L i x ( δ ∗ i , 0) is closed and jointly con v ex in ( δ ∗ i , x ), as it is the supremum of closed, join tly conv ex functions in ( δ ∗ i , x ). Consequently , for the con vex bifunction F with perturbations u : = ( { x i } m i =0 , { w i } m i =1 , { ζ ∗ i } m i =0 ) ∈ U for U : = X [ m ] × R m × ∆ ∗ and its dual F d with prices u ∗ : = ( { x ∗ i } m i =0 , { w ∗ i } m i =1 , { ζ ∗∗ i } m i =0 ) ∈ U ∗ , F ( u, ( δ ∗ , x )) : = ( L 0 x − x 0 )( δ ∗ 0 , 0) + X i ∈ ( m ) ι ( −∞ ,w i ]  ( L i x − x i )( δ ∗ i , 0)  + X i ∈ [ m ] ι { ζ ∗ i } ( δ ∗ i 0 + δ ∗ i 1 ) (7) F d (( δ ∗∗ , x ∗ ) , u ∗ ) ≡ inf { y i ,δ ∗ i } m i =0 L 0 y 0 ( δ ∗ 0 , 0) + ⟨ δ ∗ 00 , ζ ∗∗ 0 − δ ∗∗ 00 ⟩ + ⟨ δ ∗ 01 , ζ ∗∗ 0 − δ ∗∗ 01 ⟩ − ⟨ x ∗ 0 , y 0 ⟩ + X i ∈ ( m ) w ∗ i · L i y i ( δ ∗ i , 0) + ⟨ δ ∗ i 0 , ζ ∗∗ i − δ ∗∗ i 0 ⟩ + ⟨ δ ∗ i 1 , ζ ∗∗ i − δ ∗∗ i 1 ⟩ − ⟨ x ∗ i , y i ⟩ + X i ∈ ( m ) ι [0 , + ∞ ) ( w ∗ i ) − ι { x ∗ } ( P j ∈ [ m ] x ∗ j ) (8) where the equiv alence follows from the deﬁnition of L i x , changing v ariables y i : = x − x i ∈ X to obtain the separable form. F or eac h i ∈ ( m ) , for w ∗ i > 0 , inf y i ¯ w ∗ i · h inf δ ∗ i L i y i ( δ ∗ i , ¯ ζ ∗∗ i / ¯ w ∗ i ) + ⟨ δ ∗ i , δ ∗∗ i / ¯ w ∗ i ⟩ i − ⟨ ¯ x ∗ i , y i ⟩ ( 1 ) = inf y i w ∗ i · cl ( G i y i ( · , ζ ∗∗ i /w ∗ i ))( δ ∗∗ i /w ∗ i ) − ⟨ x ∗ i , y i ⟩ = inf y i w ∗ i [ f i ( y i , ζ ∗∗ i /w ∗ i − δ ∗∗ i /w ∗ i ) − ι Z ( z i − δ i 1 )] − ⟨ x ∗ i , y i ⟩ ( C ) = − ( h i w ∗ i )( ζ ∗∗ i − δ ∗∗ i 0 , x ∗ i ) − ι { w ∗ i ·Z } ( ζ ∗∗ i − δ ∗∗ i 1 ) , (9) and the same conclusion is found when w ∗ i = 0 . The computation for i = 0 is similar. In summary , w e hav e a primal problem inf δ ∗ ,x F (0 , ( δ ∗ , x )) and a dual problem sup u ∗ F d (0 , u ∗ ) such that inf δ ∗ ,x F (0 , ( δ ∗ , x )) ≡ inf x, { δ ∗ i } m i =0 n L 0 x ( δ ∗ 0 , 0) s.t. L i x ( δ ∗ i , 0) ≤ 0 and δ ∗ i 0 + δ ∗ i 1 = 0 , i ∈ [ m ] o ( 6 ) ≥ ( P - W ) sup u ∗ F d (0 , u ∗ ) = ( D - B ) . It is no w clear that to obtain [( P - W ) = ( D - B )], we ma y collect suﬃcien t conditions for [inf δ ∗ ,x F (0 , ( δ ∗ , x ) = ( P - W )] and [inf δ ∗ ,x F (0 , ( δ ∗ , x )) = sup u ∗ F d (0 , u ∗ )]. Prop osition 1(c ∗ ) provides clear direction for this; 22 sp eciﬁcally , 0 ∈ int dom sup z i G i x ( · , z i ) for all x ∈ X for the former, and 0 ∈ int dom inf x F ( · , x ) for the latter. In this w ay , we see that the primal-worst equals dual-best result of Zhen et al. ( 2025 )’s Theorem 5 (iii) admits a short pro of via p erturbations. Prop osition 4 ( Zhen et al. ( 2025 )’s Theorem 5 (iii)) . Consider the setting of Deﬁnitions 3 and 4 . If X and Z ar e ﬁnite-dimensional, norme d ve ctor sp ac es, and mor e over, ther e exists ¯ u ∗ : = ( { ¯ x ∗ i } m i =1 , { ¯ w ∗ i } m i =1 , { ¯ z i } m i =0 ) fe asible to ( D - B ) such that for e ach i ∈ [ m ] : ¯ w ∗ i > 0 with w ∗ 0 : = 1 , ¯ z i / ¯ w ∗ i ∈ ri ( Z ) , and ( ¯ x ∗ i , ¯ w ∗ i , ¯ z i ) ∈ ri dom g i , wher e g i ( x ∗ i , w ∗ i , z i ) : = − ( h i w ∗ i )( z i , x ∗ i ) − ι [0 , + ∞ ) ( w ∗ i ) ; then ( D - B ) = ( P - W ) , and ( P - W ) has an optimal solution. Pr o of. In this pro of, w e sho w that for the family {F z } z ∈ Z with F z ≡ F in ( 7 ): ﬁrstly , [inf δ ∗ ,x F (0 , ( δ ∗ , x )) = ( P - W )]; secondly , [( I ′ ) ∧ ( I I ′ )]. T o sho w the former, it will suﬃce to establish that the collection of bifunctions G i x all exhibit normalit y for all i and x. T ow ards this, we note that for an y i and x, it holds that (0 , ¯ z i /w ∗ i ) ∈ ri  dom G i x = { ( δ i 0 , δ i 1 , z i ) : f i ( x, z i − δ i 0 ) > −∞ , z i − δ i 1 ∈ Z }  , so 0 ∈ p roj ( ri dom G i x ) = ri proj dom G i x , as desired for Prop osition 1(c ∗∗ ) to b e in vok ed. What remains is to establish the latter. Seeing as how [( I I ′ )] holds trivially , it remains to establish [( I ′ )], the normalit y/stability of F . In fact, in light of Proposition 1(d) , it will suﬃce to target the stability of the coun terpart, F d . Observe that dom F d =    (( δ ∗∗ , x ∗ ) , u ∗ ) : h 0 ( ζ ∗∗ 0 − δ ∗∗ 00 , x ∗ 0 ) < + ∞ , ζ ∗∗ 0 − δ ∗∗ 01 ∈ Z , x ∗ 0 ∈ X ∗ ( h i w ∗ i )( ζ ∗∗ i − δ ∗∗ i 0 , x ∗ i ) < + ∞ , ζ ∗∗ i − δ ∗∗ i 1 ∈ w ∗ i Z , x ∗ i ∈ X ∗ , w ∗ i ≥ 0 , i ∈ ( m ) P i ∈ [ m ] x ∗ i = x ∗    , so by hypothesis (0 , 0 , ¯ u ∗ ) ∈ ri ( dom F d ). It follo ws that (0 , 0) = proj (0 , 0 , ¯ u ∗ ) ∈ proj ri ( dom F d ). Given X , Z are ﬁnite-dimensional Banach spaces, proj ri ( dom F d ) = ri proj ( dom F d ); hence, Prop osition 1(c ∗∗ ) yields the conclusion. 5 Conclusions This pap er leverages the p erturbation dualit y p erspective for study of robust and distributionally robust optimization, demonstrating that this form of analysis can provide a natural and unifying foundation for deriving and pro ving dual results in these settings. New results and insigh ts w ere deriv ed for Blanchet et al. ( 2025 )’s recent distributionally robust mo del. It w as also sho wn that p erturbation-based formulations pro vide a unifying treatmen t of robust dualit y—captured b y the “primal-w orst equals dual-best” principle—and lead to concise pro ofs that simplify and clarify existing results in the literature. These ﬁndings suggest that p erturbation duality is a v ersatile alb eit p ossibly underutilized tool for robust and distributionally robust optimization. References R. Agraw al and T. Horel. Optimal b ounds betw een f -divergences and integral probabilit y metrics. J. of Mach. L e arn. R es. , 22, 2021. L. Aolaritei, S. Shaﬁee, and F. D¨ orﬂer. W asserstein distributionally robust estimation in high dimensions: P erformance analysis and optimal hyperparameter tuning. Math. Pr o g. , 2026. 23 G. Ba yraksan and D. K. Lov e. Data-driven stochastic programming using ϕ -div ergences. INFORMS T utO- Ri als in Op er. R es. , 2015. A. Beck and A. Ben-T al. Duality in robust optimization: Primal w orst equals dual b est. Op er. R es. L ett. , 37(1), 2009. A. Ben-T al and A. Nemiro vski. Robust solutions of uncertain linear programs. Op er. R es. L ett. , 25(1), 1999. A. Ben-T al and A. Nemiro vski. Robust optimization–metho dology and applications. Math. Pr o g. , 92(3), 2002. A. Ben-T al, A. Nemirovski, and L. El Ghaoui. Robust optimization. Op er. R es. L ett. , 2009. A. Ben-T al, D. den Hertog, A. D. W aegenaere, B. Melenberg, and G. Rennen. Robust solutions of optimiza- tion problems aﬀected by uncertain probabilities. Manage. Sci. , 59(2), 2012. A. Ben-T al, D. den Hertog, A. De W aegenaere, B. Melen b erg, and G. Rennen. Robust solutions of optimiza- tion problems aﬀected by uncertain probabilities. Manage. Sci. , 59(2), 2013. D. Bertsimas and M. Sim. The price of robustness. Op er. R es. , 52(1), 2004. D. Bertsimas, D. B. Brown, and C. Caramanis. Theory and applications of robust optimization. SIAM R ev. , 53(3), 2011. J. Blanc het and K. Murthy . Quantifying distributional mo del risk via optimal transp ort. Math. Op er. R es. , 44(2), 2019. J. Blanc het, D. Kuhn, J. Li, and B. T a¸ skesen. Unifying distributionally robust optimization via optimal transp ort theory , 2025. . H. Brezis. F unctional A nalysis, Sob olev Sp ac es, and Partial Diﬀer ential Equations . Univ ersitext. Springer, 2011. Y. Chai. Robust strong duality for noncon vex optimization problem under data uncertaint y in constraint. AIMS Math. , 6(11), 2021. D. V. Cuong and T. T ran. Dualit y theory on vector spaces, 2025. . D. V. Cuong, B. S. Mordukhovic h, N. M. Nam, and G. Sandine. Generalized diﬀerentiation and duality in inﬁnite dimensions under p olyhedral con vexit y . Set-V alue d V ar. Anal. , 30, 2022. D. V. Cuong, B. S. Mordukho vich, N. M. Nam, and G. Sandine. F enchel–Rock afellar theorem in inﬁnite dimensions via generalized relative in teriors. Optim. , 72(1), 2023. N. Dinh, M. A. Gob erna, M. A. L´ op ez, and M. V olle. Characterizations of robust and stable duality for linearly perturb ed uncertain optimization problems. In Jonathan M. Borwein Commemor ative Conf. , v olume 313. Springer, 2017. I. Ekeland and R. T´ emam. Convex Analysis and V ariational Pr oblems . SIAM, 1999. P . M. Esfahani and D. Kuhn. Data-driv en distributionally robust optimization using the W asserstein metric: P erformance guarantees and tractable reformulations. Math. Pr o g. , 171(1–2), 2018. A. M. F aden. The existence of regular conditional probabilities: Necessary and suﬃcient conditions. Ann. of Pr ob ab. , 13(1), 1985. R. Gao and A. Kleywegt. Distributionally robust sto c hastic optimization with W asserstein distance. Math. Op er. R es. , 48(2), 2023. 24 V. Jey akumar and G. Li. Strong duality in robust conv ex programming: complete c haracterizations. SIAM J. on Optim. , 20(6), 2010. V. Jeyakumar and G. Li. Strong dualit y in robust semi-deﬁnite linear programming under data uncertaint y . Optim. , 63(5), 2014. D. Kuhn, P . Moha jerin Esfahani, V. A. Nguyen, and S. Shaﬁeezadeh-Abadeh. W asserstein distributionally robust optimization: Theory and applications in machine learning. INFORMS T utORials in Op er. R es. , 2019. G. Li, V. Jeyakumar, and G. M. Lee. Robust conjugate dualit y for conv ex optimization under uncertaint y with application to data classiﬁcation. Nonline ar A nal.: The ory, Metho ds & Appl. , 74(6), 2011. J. Li, S. Lin, J. Blanc het, and V. A. Nguyen. Tikhono v regularization is optimal transp ort robust under martingale constraints. In A dv. in Neur al Inf. Pr o c. Sys. , 2022. H. Rahimian and S. Mehrotra. F rameworks and results in distributionally robust optimization. Op en J. of Math. Optim. , 3, 2022. R. T. Ro c k afellar. Convex Analysis , v olume 11. Princeton Universit y Press, 1970. R. T. Ro c k afellar. Conv ex integral functionals and dualit y . Contrib. to Nonline ar F unct. Anal. , 1971. R. T. Ro c k afellar. Conjugate Duality and Optimization . Conference Board of Math. Sciences Series, SIAM Publications, 1974. R. T. Ro c k afellar and R. J.-B. W ets. V ariational Analysis . Springer, 1998. A. Shapiro. On duality theory of conic linear problems. In M. ´ A. Gob erna and M. A. L´ opez, editors, Semi-Inﬁnite Pr o gr am.: R e c ent A dv. , chapter 7. Springer US, 2001. M. Sion. On general minimax theorems. Paciﬁc J. Math , 8(1), 1958. J. W ang, R. Gao, and Y. Xie. Sinkhorn distributionally robust optimization. Op er. R es. , 2025. L. Zhang, J. Y ang, and R. Gao. A short and general duality proof for wasserstein distributionally robust optimization. Op er. R es. , 73(4), 2024. C. Zhao and Y. Guan. Data-driven risk-a verse sto chastic optimization with W asserstein metric. Op er. R es. L ett. , 46(2), 2018. J. Zhen, D. Kuhn, and W. Wiesemann. A uniﬁed theory of robust and distributionally robust optimization via the primal-worst-equals-dual-best principle. Op er. R es. , 73(2), 2025. Z. Zhou, J. Blanc het, and P . W. Glynn. Distributionally robust martingale optimal transp ort, 2021. arXiv:2106.07191 . C. Z˘ alinescu. Convex analysis in gener al ve ctor sp ac es . W orld Scientiﬁc, 2002. C. Z˘ alinescu. On the use of the quasi-relativ e interior in optimization. Optim. , 64(8), 2015. 25

Perturbation Duality for Robust and Distributionally Robust Optimization: Short and General Proofs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment