On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans

This note gives a self-contained overview of some important properties of the Gromov-Wasserstein (GW) distance, compared with the standard linear optimal transport (OT) framework. More specifically, I explore the following questions: are GW optimal t…

Authors: Titouan Vayer

On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans
On sparsit y , extremal structure, and monotonicit y prop erties of W asserstein and Gromo v-W asserstein optimal transp ort plans Titouan V a yer Inria, Rennes, F rance. Abstract This note giv es a self-contained o verview of some important properties of the Gromov–W asserstein (GW) distance, compared with the standard linear optimal transp ort (OT) framew ork. More sp ecifically , I explore the following questions: are GW optimal transp ort plans sparse? Under what conditions are they supp orted on a p erm utation? Do they satisfy a form of cyclical monotonicity? In particular, I present the conditionally negative semi-definite prop ert y and show that, when it holds, there are GW optimal plans that are sparse and supp orted on a p erm utation. 1 In tro duction This note originated from the discussions with colleagues: I find out that a simple and p edagogical exp osition of the fundamentals prop erties of the Gromov-W asserstein (GW) optimal plans w as maybe a bit missing. The aim here is not to presen t new results, but to highligh t a few prop erties of GW that I find particularly interesting. While these results exist in the literature, they are rarely gathered in a single place; m y goal is to offer the most self-contained exp osition p ossible. I rely on only a few external theorems and instead prov e most statemen ts directly . T o me, GW is a particularly fascinating ob ject in optimal transp ort (OT), and man y of its prop- erties are still not fully understo od. I hop e this note pro vides an instructive p ersp ectiv e that helps the reader develop a clearer intuition for GW, and p ossibly con tributes, even if mo destly , to a deep er o verall understanding of its structure. 1.1 Linear and quadratic OT I b egin this note by fixing the notations and recalling the fundamentals of discrete OT. The goal is to b e concise, so readers seeking more details can refer to Peyr ´ e et al. ( 2019 ). Standard linear OT aims to align tw o distributions according to a least-effort principle. W e denote b y ∆ n ≜ { a ∈ R n + : P n i =1 a i = 1 } . Let C ∈ R n × m b e a cost matrix, for instance enco ding the pairwise distances b et w een points from the t w o distributions, and let a ∈ ∆ n and b ∈ ∆ m b e probabilit y v ectors represen ting the a v ailable mass and the demand, respectively . The set of couplings, or transport plans, with prescrib ed marginals a and b , is defined by Π( a , b ) ≜ { P ∈ R n × m + : P1 m = a , P ⊤ 1 n = b } , (1.1) where 1 n is the vector of ones. A sp ecial case of a coupling is when n = m and when the mass is uniform a = b = 1 n 1 n : in this case a coupling P can b e supp orted b y a p ermutation, that is P ∈ Perm( n ) where P erm( n ) ≜ ( P ∈ R n × n : ∃ σ ∈ S n , P ij = ( 1 n if j = σ ( i ) 0 otherwise ) , (1.2) 1 where S n is the set of all p erm utations of [ [ n ] ]. Linear OT searches for the transp ort plan P ∈ Π( a , b ) that minimizes the shifting cost ⟨ C , P ⟩ ≜ P ij C ij P ij . In the follo wing, we note OT( C , a , b ) ≜ min P ∈ Π( a , b ) ⟨ C , P ⟩ . (LinOT) The quan tity defined in problem ( LinOT ) is commonly referred to as the W asserstein distance when C represen ts a pairwise distance matrix. A key feature of this formulation is that the ob jective is line ar in P , in contrast with the “quadratic” nature of the Gromov-W asserstein problem. W e in tro duce b elo w a delib erately general version of this quadratic formulation, which will b e sp ecified in more detail later. Let L = ( L ij kl ) b e a 4D tensor with ( i, j ) ∈ [ [ n ] ] × [ [ m ] ] , ( k , l ) ∈ [ [ n ] ] × [ [ m ] ]. The GW problem also aims to align the tw o distributions, but it do es so b y minimizing the quadratic cost P ij kl L ij kl P ij P kl . By in tro ducing the tensor–matrix pro duct L ⊗ P , defined as the matrix L ⊗ P ≜  X ij L ij kl P ij  ( k,l ) ∈ [ [ n ] ] × [ [ m ] ] , the ob jective minimized by GW can b e written compactly as ⟨ L ⊗ P , P ⟩ . W e note GW( L , a , b ) ≜ min P ∈ Π( a , b ) ⟨ L ⊗ P , P ⟩ . (QuadOT) As announced, problem ( QuadOT ) is quadr atic in P , whic h mak es b oth the optimization and the theoretical analysis significantly more in v olved. In practice, the tensor L is typically constructed as follo ws: given tw o “intra” cost matrices C ∈ R n × n and C ∈ R m × m , which enco de pairwise similarities within eac h space, together with a function L : R × R → R designed to measure ho w comparable tw o similarities are, one defines L as L = ( L ij kl ) where L ij kl = L  C ik , C j l  . (1.3) A standard example is the squared-loss setting, where L ( a, b ) = ( a − b ) 2 and C and C are the matrices of squared pairwise distances within each distribution. In what follows, we say that L is symmetric if, for all ( i, j, k , l ), one has L ij kl = L kl ij , meaning that swapping i with k and j with l leav es the tensor unc hanged. W e will also need the notion of the supp ort of P , defined as the set of indices corresp onding to the nonzero en tries of the coupling: supp( P ) ≜ { ( i, j ) ∈ [ [ n ] ] × [ [ m ] ] : P ij > 0 } . (1.4) Finally , tw o general definitions. F or a conv ex set C , an extr eme p oint of C is a p oin t that cannot be written as a nontri vial conv ex combination 1 of other p oin ts in C . In a graph G = ( V , E ), a cycle is a sequence of no des u 1 , u 2 , · · · , u k in V , such that each consecutive pair ( u i , u i +1 ) is connected by an edge in E , it starts and ends at the same vertex ( u k = u 1 ), and all other vertices are distinct. 2 Some imp ortan t prop erties of linear OT The fundamental prop erties of linear OT that w e aim to in v estigate for GW in this note are the sparsity and monotonicity of optimal transp ort plans, as w ell as the “tightness” of the coupling relaxation. W e detail these three prop erties b elo w and pro vide pro ofs for each. 1 If x is such p oin t and x = (1 − t ) y + tz with 0 < t < 1 then x = y = z . 2 2.1 Cyclical monotonicity This is one of the most fundamental prop erties of linear OT, sometimes referred to as the shortening principle . T o illustrate, consider the following simple example: supp ose that ( i, j ) and ( i ′ , j ′ ) are matc hed b y P that is optimal, they b elong to supp( P ). This means that the pairs ( i, j ) and ( i ′ , j ′ ) are matc hed b ecause doing so incurs minimal cost. Intuitiv ely , switching the matches to ( i, j ′ ) and ( i ′ , j ) should result in a higher cost; otherwise, P would not b e optimal. F ormally , this can b e seen by considering a matrix Q ∈ R n × m that is identical to P except at these four indices: Q ij = P ij − ε, Q i ′ j ′ = P i ′ j ′ − ε, Q ij ′ = P ij ′ + ε, Q i ′ j = P i ′ j + ε, (2.1) where ε = min { P ij , P i ′ j ′ } > 0. It is then straightforw ard to verify that Q ∈ Π( a , b ), since the marginals remain unc hanged and all entries are nonnegativ e by the choice of ε . Additionally , ⟨ C , Q ⟩ − ⟨ C , P ⟩ = ε ( − C ij − C i ′ j ′ + C ij ′ + C i ′ j ) . (2.2) Using that P is optimal implies ⟨ C , Q ⟩ − ⟨ C , P ⟩ ≤ 0 thus C ij + C i ′ j ′ ≤ C ij ′ + C i ′ j whic h can b e rephrased as 2 “b etter not to cross the path” ! This argument applies to just t wo pairs of p oin ts in the supp ort, but the remark able fact is that extending this prop ert y to all pairs leads to a full c haracterization: a transp ort plan is optimal if and only if, for every pair of p oin ts in its supp ort, the total cost of the matc hed p oin ts is less than or equal to the total cost obtained by swapping them. Theorem 2.1. F or any c osts C , a c oupling P ∈ Π( a , b ) is optimal for ( LinOT ) if and only if for any N ∈ N ∗ , ( i 1 , j 1 ) , · · · , ( i N , j N ) ∈ supp( P ) N and p ermutation σ ∈ S N , N X k =1 C i k j k ≤ N X k =1 C i k j σ ( k ) . (2.3) The direction “ P optimal = ⇒ monotonicit y” can b e pro ved in the exact same w ay as the case N = 2 ab ov e. The other direction is a little bit more inv olv ed and I will not write the proof here (e.g., it can b e pro v ed using dualit y of linear OT). 2.2 Sparsit y of some optimal plans Another k ey prop ert y is that, among all optimal trans- p ort plans, there exist sp arse plans with relatively few nonzero entries—specifically , no more than n + m − 1. T o establish this, w e first need a small result regard- ing the structure of coupling matrices. An y coupling P defines a bipartite graph G ( P ) = ( S ∪ T , E ) where S = [ [ n ] ] , T = [ [ m ] ] are the source and target no des that corresp onds to the t w o distributions and E = supp( P ) (see Figure 1 ). Figure 1: (Left) Bipartite graph G ( P ) induced by P . W eigh ts on the edges are the v alues P ij . (Righ t) It con tains a 3- cycle i 1 , j 1 , i 2 , j 2 , i 3 , j 3 , i 1 . The forward edges i → j are mark ed with a + ε p er- turbation, the backw ard with a − ε . 2 “Lorsque le transp ort du deblai se fait de mani` ere que la somme des pro duits des mol´ ecules par l’espace parcouru est un minimum, les routes de deux p oints quelconques A & B, ne doiv ent plus se coup er entre leurs extr ´ emit ´ es, car la somme Ab + Ba des routes qui se coup en t est toujours plus grande que la somme Aa + Bb de celles qui ne se coup ent pas” ( Monge , 1781 ). 3 Prop osition 2.2. P is an extr eme p oint of Π( a , b ) if and only if the gr aph G ( P ) has no cycle. Pr o of. W e first pro ve the direction “ G has no cycle = ⇒ P is an extreme p oint”. W e prov e it b y contraposition. Supp ose that P is not an extreme p oin t: there exists P 1  = P 2 ∈ Π( a , b ) and t ∈ (0 , 1) such that P = (1 − t ) P 1 + t P 2 . T aking ( i, j ) ∈ supp( P ) implies that 0 = (1 − t )[ P 1 ] ij + t [ P 2 ] ij = ⇒ [ P 1 ] ij = [ P 2 ] ij = 0. No w consider H = P 2 − P 1  = 0, the previous reasoning implies that supp( H ) ⊆ supp( P ). Since H  = 0 w e can consider ( i 1 , j 1 ) suc h that H i 1 j 1  = 0. Lo oking at the line i 1 w e hav e, since P 1 , P 2 ∈ Π( a , b ), P j H i 1 j = 0 th us there exists j 2  = j 1 suc h that H i 1 j 2  = 0. W e can do exactly the same for the column corresp onding to j 2 : w e obtain a H i 2 j 2  = 0 with i 2  = i 1 . W e iterate this pro cess and obtain a sequence ( i 1 , j 1 ) , ( i 1 , j 2 ) , ( i 2 , j 2 ) , · · · , each in the support of H and th us P . The size N of this sequence is arbitrary , but since [ [ n ] ] × [ [ m ] ] is finite there m ust b e an N suc h that i N = i 1 or j N = j 1 . Thus, there m ust b e a cycle in the supp ort of P . W e now pro ve the conv erse, we follow the same pro of as Peyr ´ e et al. ( 2019 , Prop osition 3.3). Consider P an extreme p oin t. Supp ose by contradiction that G has a cycle. Consider a 3-cycle i 1 , j 1 , i 2 , j 2 , i 3 , j 3 , i 1 as illustrated in Figure 1 (an y other cycle with arbitrary length can b e treated the same wa y). It c orresponds to a set of edges S = { ( i 1 , j 1 ) , ( i 2 , j 1 ) , ( i 2 , j 2 ) , · · · , ( i 1 , j 3 ) } in supp( P ). As sho wn in this figure, on this cycle w e mark the i → j as forward edges, and the j → i as bac kward edges. W e consider a matrix E defined as E ij =      +1 if ( i, j ) is a forw ard edge , − 1 if ( i, j ) is a bac kward edge , 0 otherwise . (2.4) Since this is a cycle, there are as many forward and backw ard edges, and any no de on this cycle receiv es exactly one +1 and one − 1. Consequently , E1 n = 0 , E ⊤ 1 m = 0. Now, for some sufficien tly small ε > 0, define P 1 = P + ε E , P 2 = P − ε E , so that P = P 1 + P 2 2 . Since the matrix E has ro w and column sums equal to zero, b oth P 1 and P 2 share the same marginals as P . By choosing 0 < ε < min ( i,j ) ∈ S P ij , we ensure that P 1 and P 2 r emain nonnegativ e and hence v alid coupling matrices. This shows that P is not an extreme p oin t, yielding a con tradiction. This prop ert y of coupling matrices, together with the cyclical monotonicity discussed earlier, lead to the follo wing result: some optimal plans in linear OT are b oth sparse and corresp ond to couplings that are extreme p oin ts of Π( a , b ). Prop osition 2.3. F or any c ost C , ther e exists an optimal c oupling P ∈ Π( a , b ) for pr oblem ( LinOT ) that is an extr eme p oint of Π( a , b ) . It satisfies card(supp( P )) ≤ n + m − 1 . Pr o of. Consider P an optimal coupling with the smal lest supp ort . W e will sho w that the corresponding graph has no cycle, and so it will b e an extreme p oin t by Prop osition 2.2 . W e will conclude that card(supp( P )) ≤ n + m − 1. Supp ose that there is a cycle with length k = 3 in the supp ort of P as in Figure 1 i 1 , j 1 , i 2 , j 2 , i 3 , j 3 , i 1 (again, any longer cycle for k  = 3 can b e treated similarly). W e consider the p erturbation E as in the previous pro of, b y marking as forward the i → j edges and as backw ard the j → i edges, with ε = min ( i,j ) ∈ B P ij > 0 where B is the set of b ackwar d e dges corresponding the cycle. W e define Q = P + ε E . With the same arguments as the previous pro of, Q ∈ Π( a , b ) since E1 n = 0 , E ⊤ 1 m = 0 4 and the fact that it is nonnegativ e (indeed for ( i, j ) in forward edges εE ij > 0 and for ( i, j ) in bac kw ard edges P ij + εE ij = P ij − ε ≥ 0 since ε is the smallest P ij among bac kward edges). Moreo ver, ⟨ C , Q ⟩ − ⟨ C , P ⟩ = X ij C ij E ij = ε  C i 1 ,j 1 + C i 2 ,j 2 + C i 3 ,j 3 − C i 2 ,j 1 − C i 3 ,j 2 − C i 1 ,j 3  . (2.5) The RHS quantit y is of the form P k C i k ,j k − P k C i k +1 ,j k with ( i k , j k ) in the supp ort. By cyclical monotonicit y of the transp ort plan, this is ≤ 0, hence Q is also an optimal coupling. Ho w ever, Q has strictly few er strictly p ositive en tries than P : the entries Q ij where the minim um min ( i,j ) ∈ B P ij is attained b ecome zero. This is a con tradiction since P has the smallest supp ort. Thus, the graph G ( P ) has no cycle. Finally , a bipartite graph with no cycle has less than n + m − 1 edges. Indeed, start with n + m isolated v ertices, so with a graph with n + m comp onen ts. Each new added edge either forms a cycle or connects tw o comp onen ts. Since cycles are forbidden, each edge reduces the num b er of comp onents b y 1. After n + m − 1 edges there is a single comp onen t; adding another edge would create a cycle. This prop ert y lies at the heart of discrete algorithms for solving OT, such as the net work simplex metho d. The key idea is to restrict attention to sparse transp ort plans—sp ecifically , those whose supp ort graphs con tain no cycles—throughout the iterative optimization pro cess. By fo cusing on suc h acyclic, sparse plans, these algorithms can efficiently navigate the feasible set while maintaining optimalit y (see discussions in P eyr´ e et al. 2019 , Chapter 3). 2.3 Tigh tness of the coupling relaxation The final imp ortan t prop erty I wan t to discuss concerns the sp ecial case of uniform weigh ts, that is when n = m and a = b = 1 n 1 n . In this setting, one can equiv alently search for a p erm utation matrix instead of a general coupling, a form ulation kno wn as the Monge problem. A fundamental result, guaran teed by Birkhoff ’s theorem, is that these t wo formulations are equiv alent, as I detail b elo w. Theorem 2.4 (Birkhoff ) . Extr eme p oints of Π( 1 n 1 n , 1 n 1 n ) ar e the p ermutation matric es P erm( n ) . Pr o of. First, if P ∈ P erm( n ), then P ∈ Π( 1 n 1 n , 1 n 1 n ) and it is clear that the graph asso ciated to supp( P ) has no cycle (it is a permutation matrix, only one nonzero per line/column). Moreo v er, b y Prop osition 2.2 we know that P is an extreme p oint of Π( 1 n 1 n , 1 n 1 n ). Con versely , we w ant to sho w that any extreme p oin t of Π( 1 n 1 n , 1 n 1 n ) is a p erm utation matrix. The pro of is a small adaptation of the pro of of Peyr ´ e ( 2025 , Theorem 2). Consider P ∈ Π( 1 n 1 n , 1 n 1 n ) an extreme p oin t. Supp ose that it is not a p erm utation matrix. So there must b e indices ( i 1 , j 1 ) , ( i 1 , j 2 ) with j 1  = j 2 in the supp ort of P . Moreov er, at this no de i 1 , w e ha ve P i 1 j 1 < 1 n otherwise P i 1 j 2 w ould b e zero (since in this case P i 1 j 1 = 1 n and all the mass would hav e b een sent). Thus, there must b e an index i 2  = i 1 suc h that P i 2 j 1 > 0 (since j 1 do es not receive enough mass). Similarly , there must b e an index i 3  = i 1 , P i 3 j 2 > 0. No w we ha ve tw o pairs ( i 1 , j 2 ) , ( i 3 , j 2 ) with i 1  = i 3 in the supp ort of P . If i 3 = i 2 w e hav e a cycle (mak e a drawing). If i 3  = i 2 , then, from the same reasoning, i 2 m ust send mass to some j 3 and i 3 m ust send mass to some j 4 : if j 3 = j 4 w e hav e a cycle, otherwise we can iterate the pro cess. Since the graph has a finite n um b er of v ertices, there is a n umber of steps N that necessarily leads to a cycle i 1 , j 1 , · · · , i N , j N , i N +1 = i 1 . This cycle can b e used to split the graph into tw o set of edges and construct P 1  = P 2 ∈ Π( 1 n 1 n , 1 n 1 n ) suc h that P = 1 2 ( P 1 + P 2 ), contradicting the hypothesis that P is an extreme p oint. These matrices can b e obtained exactly as in the pro of of Prop osition 2.2 : w e mark forward and backw ard edges with +1 and − 1, and we define E as in ( 2.4 ) with ε sufficien tly small. 5 Com bining Prop osition 2.3 with this theorem yields the well-kno wn result often summarized as “Monge = Kantoro vic h” result: Corollary 2.5. L et n = m, a = b = 1 n 1 n . Ther e exists an optimal solution of ( LinOT ) that solves min P ∈ Perm( n ) ⟨ C , P ⟩ and this quantity is e qual to OT( C , a , b ) . Pr o of. First, as any p ermutation is a v alid coupling OT( C , a , b ) ≤ min P ∈ Perm( n ) ⟨ C , P ⟩ . Proposition 2.3 sho ws that there exists an optimal solution of ( LinOT ) that is an extreme p oin t, whic h is a p erm utation b y Birkhoff ’s theorem. 3 What ab out GW optimal transp ort plans ? The natural question no w is: do these prop erties extend to the GW problem ( QuadOT )? A sp oiler: in general, it is muc h harder to establish suc h prop erties for GW, so the answer is usually no. Nevertheless, I will describ e one sufficien t condition, commonly used in the literature, that allows similar results to b e derived for the GW case. 3.1 Conditionally negative semi-definite tensor This prop ert y stems from the observ ation that the conca vity of the GW loss can be exploited to derive results about the extremality of its solutions. It w as first formally introduced for GW in S ´ ejourn´ e et al. ( 2021 ) and has since b een applied in w orks suc h as Beier et al. ( 2023 ); M´ emoli and Needham ( 2024 ); Dumon t et al. ( 2025 ); Assel et al. ( 2025 ); Houry et al. ( 2026 ). It corresp onds to a particular structure on the 4D tensor L . The formal definition is given b elow, and Section 3.5 will discuss in detail the conditions under which this prop ert y holds. Definition 3.1. We say that a symmetric 4D tensor L is c onditional ly ne gative semi-definite (CND) with r esp e ct to Π ≜ Π( a , b ) − Π( a , b ) = { P 1 − P 2 , ( P 1 , P 2 ) ∈ Π( a , b ) × Π( a , b ) } if ∀ Q ∈ Π , ⟨ L ⊗ Q , Q ⟩ ≤ 0 . (3.1) As suggested ab o ve, the lemma b elow sho ws that it is exactly a reform ulation of the fact that the GW loss is concav e. Lemma 3.2. The 4D tensor L is CND with r esp e ct to Π if and only if f : P ∈ Π( a , b ) → ⟨ L ⊗ P , P ⟩ is c onc ave, that is, the GW loss function is c onc ave on Π( a , b ) . Pr o of. The function f is concav e if and only if it satisfies the midp oin t inequalit y f ( P 1 + P 2 2 ) ≥ 1 2 ( f ( P 1 ) + f ( P 2 )) for any P 1 , P 2 ∈ Π( a , b ). How ev er, since L is symmetric, f ( P 1 + P 2 2 ) − 1 2 ( f ( P 1 ) + f ( P 2 )) = 1 4 ⟨ L ⊗ P 1 , P 1 ⟩ + 1 4 ⟨ L ⊗ P 2 , P 2 ⟩ + 2 4 ⟨ L ⊗ P 1 , P 2 ⟩ − 2 4 ⟨ L ⊗ P 1 , P 1 ⟩ − 2 4 ⟨ L ⊗ P 2 , P 2 ⟩ = 1 4 (2 ⟨ L ⊗ P 1 , P 2 ⟩ − ⟨ L ⊗ P 1 , P 1 ⟩ − ⟨ L ⊗ P 2 , P 2 ⟩ ) = − 1 4 ⟨ L ⊗ Q , Q ⟩ . Before stating when this property holds, we first describ e what consequences it has for the GW problem. 6 3.2 First consequence: sparsity of some optimal plans The k ey idea is that minimizing a concav e function ov er a b ounded con v ex polytop e can b e ac hiev ed b y considering only the extr eme p oints of the p olytop e . By combining this with the fact that the extreme p oin ts of the set of coupling matrices are sparse, one can deduce the sparsit y of some GW solutions. More precisely , let C ⊂ R d b e a con vex set that can b e expressed as the conv ex hull of its extreme p oin ts, and let f : C → R b e a contin uous concav e function. Then there exists an extreme p oin t of C that solv es 3 min x ∈ C f ( x ) . Indeed, let x ∈ C b e a minimizer of f . Since x lies in the conv ex h ull of extreme p oin ts of C , by Carath ´ eo dory’s theorem it can b e expressed as a conv ex combination of at most d + 1 extreme p oin ts: x = P d +1 i =1 λ i x i , λ i ≥ 0 , P d +1 i =1 λ i = 1. By conca vity and Jensen’s inequality , f ( x ) = f  d +1 X i =1 λ i x i  ≥ d +1 X i =1 λ i f ( x i ) ≥ min i f ( x i ) . Th us, there exists at least one index i suc h that f ( x i ) = f ( x ), meaning that the corresp onding x i , an extreme p oin t of C , is a minimizer of f . In particular, this reasoning applies whenev er every p oin t in C can b e expressed as a conv ex com bination of its extreme p oin ts. The go od news is: C = Π( a , b ) is suc h a set ! Prop osition 3.3. Any p oint P ∈ Π( a , b ) c an b e written as P = P n i =1 λ i P i wher e n ≥ 1 , P 1 , · · · , P n ar e extr eme p oints of Π( a , b ) and λ i ≥ 0 , P n i =1 λ i = 1 . Pr o of. T o pro v e this result, one could app eal to general theorems about b ounded conv ex p olytop es, but here we provide a constructiv e proof. If P is already an extreme p oint, the statemen t is immediate. Otherwise, supp ose P is not an extreme p oin t; the pro of then pro ceeds in a manner v ery similar to the previous argumen ts. F rom Prop osition 2.2 , then the graph G ( P ) con tains a cycle. Consider the 3-cycle i 1 , j 1 , i 2 , j 2 , i 3 , j 3 , i 1 in Figure 1 (any longer cycle leads to the same idea). W e mark again the forward and bac kward edges as in the figure and consider ε − = min ( i,j ) ∈ B P ij > 0 and ε + = min ( i,j ) ∈ F P ij > 0 where B , F are the sets of backw ard and forward edges and E as in ( 2.4 ). Now we define P 1 = ( P + ε − E ) , P 2 = ( P − ε + E ) , λ = ε + ε + + ε − , (3.2) suc h that 1 − λ = ε − ε + + ε − . With similar reasoning as before w e can chec k that P 1 , P 2 ha ve the same marginals as P and are b oth nonnegative. Also, P = λ P 1 + (1 − λ ) P 2 . The crucial p oin t is that w e ha v e remo ved at least one edge in each P 1 and P 2 ; that is card(supp( P 1 )) , card(supp( P 1 )) < card(supp( P )). In P 1 w e remov ed the backw ard edges corresp onding to min ( i,j ) ∈ B P ij and in P 2 the forw ard edges corresp onding to min ( i,j ) ∈ F P ij . If P 1 and P 2 do not hav e a cycle we are done. Otherwise, w e can iterate the pro cess on P 1 , P 2 un til there is no cycle anymore. In the end we end up with P = P i λ i P i with all the P i that ha ve no cycle, thus are extreme b y Prop osition 2.2 . Using this result, together with the earlier reasoning on concav e functions, we can conclude that some GW optimal plans are sparse. Corollary 3.4. When the 4D tensor L is CND with r esp e ct to Π , ther e exists an optimal solution P of pr oblem ( QuadOT ) which is an extr eme p oint of Π( a , b ) and with card(supp( P )) ≤ n + m − 1 . 3 This extends to any compact conv ex set and is known as Bauer’s minimum principle. 7 Pr o of. When L is CND the GW loss is concav e and contin uous on Π( a , b ). As an y coupling can b e written as conv ex combination of extreme p oin ts, as detailed in Prop osition 3.3 , so there exists an extreme p oin t that is an optimal solution b y the previous discussion. But as written in the pro of of Prop osition 2.3 , since the bipartite graph asso ciated to supp( P ) has no cycle, card(supp( P )) ≤ n + m − 1. 3.3 Second consequence: tightness of the coupling relaxation Similarly , when the tensor L is CND, one can sho w that the coupling relaxation is tight—that is, a “Monge = Kantoro vic h”–type result holds for GW. This observ ation w as already noted for quadratic programs in the great pap er Maron and Lipman ( 2018 ). By combining the facts that the extreme p oin ts of Π  1 n 1 n , 1 n 1 n  are p erm utation matrices (Theorem 2.4 ) and that at least one extreme point is an optimal solution (Corollary 3.4 ), we obtain: Corollary 3.5. L et n = m, a = b = 1 n 1 n . Supp ose that the 4D tensor L is CND with r esp e ct to Π . Ther e exists an optimal solution of ( QuadOT ) that solves min P ∈ Perm( n ) ⟨ L ⊗ P , P ⟩ and this quantity is e qual to GW ( L , a , b ) . 3.4 Third consequence: as small detour around the bilinear relaxation Another noteworth y consequence of the CND case is that a certain biline ar r elaxation b ecomes exact. Before wrapping up, we briefly in tro duce this concept. The bilinear problem, first formally in tro duced for OT in Titouan et al. ( 2020 ), is formulated as min P 1 , P 2 ∈ Π( a , b ) ⟨ L ⊗ P 1 , P 2 ⟩ . (BilinOT) In other words, instead of seeking a single global transp ort plan, w e lo ok for tw o plans that realign the distributions. F rom a numerical standp oin t, this can b e adv an tageous b ecause the problem b ecomes bilinear rather than quadratic, which opens the do or to algorithms based on linear OT ( Titouan et al. , 2020 ; S ´ ejourn´ e et al. , 2021 ; Beier et al. , 2023 ). A simple b ound sho ws that this form ulation is indeed a relaxation: min P 1 , P 2 ∈ Π( a , b ) ⟨ L ⊗ P 1 , P 2 ⟩ ≤ GW( L , a , b ) , and the natural question is whether this relaxation is tigh t. In the CND case, the answ er is affirmative. Prop osition 3.6. If the tensor L is CND with r esp e ct to Π then ( BilinOT ) and ( QuadOT ) ar e e quivalent. Mor e pr e cisely, if ( P 1 , P 2 ) is optimal for ( BilinOT ) then b oth P 1 and P 2 ar e optimal solutions for ( QuadOT ) and if P is optimal for ( QuadOT ) then ( P , P ) is an optimal solution for ( BilinOT ) . In this c ase, GW( L , a , b ) = min P 1 , P 2 ∈ Π( a , b ) ⟨ L ⊗ P 1 , P 2 ⟩ . Pr o of. In the pro of w e define g ( P 1 , P 2 ) ≜ ⟨ L ⊗ P 1 , P 2 ⟩ the bilinear loss and f ( P ) ≜ g ( P , P ) the GW loss, whic h is concav e due to the hypothesis (Lemma 3.2 ). Moreov er, a small calculus shows that g ( P 1 , P 2 ) = 1 2 ( f ( P 1 + P 2 ) − f ( P 1 ) − f ( P 2 )) . (3.3) Since f is conca ve it satisfies the midp oint inequality f ( P 1 + P 2 2 ) ≥ 1 2 ( f ( P 1 ) + f ( P 2 )) which gives f ( P 1 + P 2 ) ≥ 2( f ( P 1 ) + f ( P 2 )). Com bining with ( 3.3 ) we get g ( P 1 , P 2 ) ≥ 1 2 ( f ( P 1 ) + f ( P 2 )) ≥ min { f ( P 1 ) , f ( P 2 ) } ≥ min P f ( P ) = min P g ( P , P ) = GW( L , a , b ) and thus min P 1 , P 2 ∈ Π( a , b ) ⟨ L ⊗ P 1 , P 2 ⟩ ≥ GW ( L , a , b ). Using the con verse inequalit y shows GW( L , a , b ) = min P 1 , P 2 ∈ Π( a , b ) ⟨ L ⊗ P 1 , P 2 ⟩ and the fact that the solutions are equiv alen t. 8 3.5 When is the tensor CND ? The case of separable losses No w that I hav e presen ted some consequences of the CND case, I will explain when this situation actually o ccurs. As written in the introduction, in most of the applications the tensor can b e written as L ij kl = L  C ik , C j l  for some loss function L : R × R → R . In fact, a lot of losses L for GW that are used in practice are sep ar able , mainly for practical reasons: as describ ed in P eyr´ e et al. ( 2016 ) this reduces the computation complexity of the GW loss from O ( n 2 m 2 ) to O ( nm 2 + mn 2 ). These losses can b e written as L ( a, b ) = f 1 ( a ) + f 2 ( b ) − h 1 ( a ) h 2 ( b ) , (3.4) and they cov er a wide range of loss functions. F or instance, they include all Br e gman diver genc es that can b e written as L ( a, b ) = ϕ ( a ) − ϕ ( b ) − ϕ ′ ( b )( a − b ) ≥ 0 for some (strictly) con vex and differentiable function ϕ . This corresp onds to f 1 ( a ) = ϕ ( a ) , f 2 ( b ) = − ϕ ( b ) + ϕ ′ ( b ) b, h 1 ( a ) = a, h 2 ( b ) = ϕ ′ ( b ). Notable examples include the squared loss L ( a, b ) = L 2 ( a, b ) ≜ 1 2 ( a − b ) 2 , and the Kullback-Leibler divergence L ( a, b ) = L KL ( a, b ) ≜ a log ( a/b ) − a + b , whic h corresp onds to the Bregman divergence asso ciated to ϕ ( x ) = x log( x ) − x . When the loss is separable the expression for the GW loss simplifies to ⟨ L ⊗ P , P ⟩ = ⟨ f 1 ( C ) a1 ⊤ m + 1 n b ⊤ f 2 ( C ) ⊤ , P ⟩ − ⟨ h 1 ( C ) P h 2 ( C ) ⊤ , P ⟩ , (3.5) as shown in P eyr´ e et al. 2016 , Prop osition 1. In ( 3.5 ), the expressions f 1 ( C ), f 2 ( C ), h 1 ( C ), and h 2 ( C ) are to b e interpreted comp onen t-wise. The goal of this section is to c haracterize the CND prop ert y for these separable losses. W e will use the following definition: Definition 3.7. A symmetric matrix C ∈ R n × n is c al le d c onditional ly ne gative semi-definite (r esp. p ositive definite), abbr eviate d as CND (r esp. CPD), if for any u ∈ R n s.t. u ⊤ 1 n = 0 we have u ⊤ Cu ≤ 0 (r esp. ≥ 0 ). W e will also use the following simple result: a matrix is CND if and only if it is negativ e semi- definite after centering its rows and columns. Lemma 3.8. L et H n ≜ I n − 1 n 1 n 1 ⊤ n b e the c entering matrix wher e I n is the n × n identity matrix. C ∈ R n × n is CND (r esp. CPD) if and only if H n CH n is ne gative semi-definite (r esp. p ositive semi-definite). The pro of is straightforw ard by using that, for any u ∈ R n , ( Hu ) ⊤ 1 n = 0. With separable losses in mind, we arriv e at the follo wing main result, whic h characterizes the CND prop ert y for this class of losses. Prop osition 3.9. L et L b e a 4D tensor that c an b e written as L ij kl = L  C ik , C j l  for a sep ar able loss L ( a, b ) = f 1 ( a ) + f 2 ( b ) − h 1 ( a ) h 2 ( b ) and symmetric matric es C , C . The fol lowing ar e e quivalent: (i) Then 4D tensor L is CND with r esp e ct to Π . (ii) The GW loss P → ⟨ L ⊗ P , P ⟩ is c onc ave on Π( a , b ) . (iii) h 1 ( C ) , h 2 ( C ) ar e b oth CND or b oth CPD matric es. 9 Pr o of. The equiv alence b etw een the t w o first points was already established in Lemma 3.2 ; here, we only pro ve the equiv alence b et w een the last tw o p oin ts. W e define M ≜ f 1 ( C ) a1 ⊤ m + 1 n b ⊤ f 2 ( C ) ⊤ . T o ease the notation we note C 1 ≜ h 1 ( C ) , C 2 ≜ − h 2 ( C ). As written in ( 3.5 ) the loss c an b e written as f ( P ) ≜ ⟨ L ⊗ P , P ⟩ = ⟨ M , P ⟩ + ⟨ C 1 PC 2 , P ⟩ = ⟨ M , P ⟩ + tr( P ⊤ C 1 PC 2 ) (the matrices C 1 , C 2 are symmetric w e can remov e the transp ose). Supp ose that h 1 ( C ) , h 2 ( C ) are both CND, we show that f is conca ve on Π( a , b ). T o do this w e first sho w that, for any P 1 , P 2 ∈ Π( a , b ), tr  Q ⊤ C 1 QC 2  ≤ 0 where Q ≜ P 1 − P 2 ∈ R n × m . (3.6) This matrix satisfies Q1 m = 0 , Q ⊤ 1 n = 0 since the couplings hav e the same marginals. Also, with the cen tering matrices H n , H m defined in Lemma 3.8 , we hav e H n QH m = ( I n − 1 n 1 n 1 ⊤ n ) QH m = ( Q − 1 n 1 n ( 1 ⊤ n Q )) H m = QH m = Q . (3.7) Hence, tr  Q ⊤ C 1 QC 2  = tr  ( H n QH m ) ⊤ C 1 ( H n QH m ) C 2  = tr  Q ⊤ ( H n C 1 H n ) Q ( H m C 2 H m )  = − tr  Q ⊤ [ − ( H n C 1 H n )] Q ( H m C 2 H m )  . (3.8) Since h 1 ( C ) , h 2 ( C ) are b oth CND, C 1 = h 1 ( C ) is CND and C 2 = − h 2 ( C ) is CPD. Th us A ≜ − ( H n C 1 H n ) , B ≜ ( H m C 2 H m ) are symmetric p ositiv e semi-definite b y Lemma 3.8 , th us they admit a square ro ot. Consequently , ( 3.8 ) implies tr  Q ⊤ C 1 QC 2  = − tr  Q ⊤ A 1 / 2 A 1 / 2 QB 1 / 2 B 1 / 2  = −∥ A 1 / 2 QB 1 / 2 ∥ 2 F ≤ 0 , (3.9) where ∥ · ∥ F is the F robenius norm. The conca vity of f on Π( a , b ) is a direct consequence, since it sho ws the midp oin t inequality f ( P 1 + P 2 2 ) ≥ 1 2 ( f ( P 1 ) + f ( P 2 )) for an y P 1 , P 2 ∈ Π( a , b ). Indeed, with the same calculus as in the pro of of Lemma 3.2 , f ( P 1 + P 2 2 ) − 1 2 ( f ( P 1 ) + f ( P 2 )) ∗ = − 1 4 tr( Q ⊤ C 1 QC 2 ) = 1 4 ∥ A 1 / 2 QB 1 / 2 ∥ 2 F ≥ 0 , (3.10) where in ( ∗ ) the linear terms get cancelled. When C 1 , C 2 are b oth CPD we make the same reasoning, but instead w e consider A = ( H n C 1 H n ) , B = − ( H m C 2 H m ): this do es not change the conclusion. No w supp ose that h 1 ( C ) is CND but not h 2 ( C ): in other words, C 1 = h 1 ( C ) is CND and H m h 2 ( C ) H m has a p ositiv e eigenv alue. Since C 2 = − h 2 ( C ), there exists a negative eigenv alue λ < 0 of H m C 2 H m , with corresp onding eigenv ector v . W e will construct P 1 , P 2 ∈ Π( a , b ) suc h that the midp oin t difference in the LHS of ( 3.10 ) is negativ e. W e first construct Q ∈ R m × n with Q1 m = 0 , Q ⊤ 1 n = 0 and suc h that tr( Q ⊤ C 1 QC 2 ) > 0. Consider the rank-one matrix Q = H n u ( H m v ) ⊤ = H n uv ⊤ H m where u is an y eigen v ector of H n C 1 H n asso ciated to an eigenv alue µ ≤ 0: it satisfies the men tioned prop erties. No w tr(( H n uv ⊤ H m ) ⊤ C 1 H n uv ⊤ H m C 2 ) = ( u ⊤ H n C 1 H n u ) · ( v ⊤ H m C 2 H m v ) = µ · λ > 0. Con- sequen tly , this Q satisfies tr( Q ⊤ C 1 QC 2 ) > 0. This is true for an y Q ′ = α Q with α > 0. W e finally show that we can decomp ose it as Q = P 1 − P 2 with P 1 , P 2 ∈ Π( a , b ). F or this w e consider P 1 = ab ⊤ + ε Q , P 2 = ab ⊤ − ε Q for 0 < ε < min ( i,j ): Q ij  =0 a i b j | Q ij | small enough. W e ha ve P 1 − P 2 = 2 ε Q . and P 1 , P 2 ∈ Π( a , b ). W e consider Q ′ = 2 ε Q : by the previous reasoning tr( Q ′⊤ C 1 Q ′ C 2 ) > 0 which concludes. 10 Belo w, we present a few examples that satisfy the conditions of Prop osition 3.9 . They can all b e view ed as corollaries of the Bregman divergence setting asso ciated with a conv ex function ϕ . By the previous prop osition, the corresp onding GW loss is concav e if and only if b oth C and ϕ ′ ( C ) are CND or CPD. Example 1: Squared case L = L 2 . The squared case corresp onds simply to ϕ ′ = id, so the problem is concav e whenever both C and C are CND (or CPD). Examples of CND and CPD matrices can b e found in the comprehensiv e treatment by W endland ( 2004 ) or in Maron and Lipman ( 2018 , Section 2). The goal here is not to pro vide an exhaustive list of examples. Ho wev er, a particularly imp ortan t and widely used setting in GW is when b oth C and C are squared Euclidean distance matrices, i.e., C ik = ∥ x i − x k ∥ 2 2 , C j l = ∥ y j − y l ∥ 2 2 for some x 1 , · · · , x n and y 1 , · · · , y m . It is easy to see that if u ⊤ 1 n = 0, then u ⊤ Cu =    X i u i x i    2 2 ≥ 0 , so b oth matrices are CPD. In this case, the GW problem is concav e, admits sparse optimal solutions, and b oth the coupling and bilinear relaxations are tigh t. Remark ably , this example essen tially captures the whole picture thanks to the celebrated Schoenberg theorem ( Sc ho en b erg , 1938 ): in short, if C is a symmetric n × n matrix with zero diagonal, then C is CND if and only if it can b e written as C ik = ∥ x i − x k ∥ 2 H , for some p oin ts x 1 , . . . , x n in a Hilbert space H . In other words, in the squared case L = L 2 , most CND situations corresp ond precisely to squared distance matrices for C and C . Example 2: Kullback-Leibler case L = L KL . The KL case is in teresting because it highligh ts the role of a particular type of matrices. It corresp onds to a Bregman divergence with ϕ ( x ) = x log ( x ) − x , i.e., ϕ ′ ( x ) = log( x ). The problem is conca ve when b oth C and log( C ) are CND or CPD. Matrices whose logarithm is CPD are well-studied in the literature: they are called infinitely divisible matrices. One c haracterization is that any elemen twise p o wer of the matrix should b e CPD ( Bhatia , 2006 ). Remark 3.10. The pr evious c onclusions r emain valid if a line ar term is adde d to the loss, i.e., for obje ctives of the form P 7→ ⟨ A , P ⟩ + ⟨ L ⊗ P , P ⟩ . T o study c onc avity, it suffic es to analyze the quadr atic p art. As long as L is CND with r esp e ct to Π , the structur al c onse quenc es for minimizers—such as sp arsity and tightness of the c oupling r elaxation—r emain unchange d. 3.6 What ab out cyclical monotonicity ? T o conclude, I no w return to the last prop ert y left aside: the cyclical monotonicity of optimal transp ort plans. F or GW, deriving monotonicit y-type results and extending this notion is considerably more c hallenging. Still, I will present an argumen t that is sometimes used to study optimal GW plans, and discuss its limitations. The key idea is that a solution of a quadratic program (QP) is also a solution of a suitably asso ciated linear program (LP). By analyzing this LP , w e can gain insight in to the structure of the QP solutions. This persp ectiv e w as used in Vincent-Cuaz et al. ( 2021 ) to differentiate the GW distance with resp ect to the weigh ts a and b , and more recently in Murra y and Pic k arski ( 2025 ) to analyze optimal transp ort plans in the semi-r elaxe d GW setting, in particular to detect when a Monge map exists (i.e., when the coupling relaxation is tight). W e state the result b elo w. 11 Prop osition 3.11. Consider a symmetric 4D tensor L . If P ⋆ is a solution of ( QuadOT ) then it is also a solution of the line ar pr oblem min P ∈ Π( a , b ) ⟨ L ⊗ P ⋆ , P ⟩ . (3.11) In other wor ds, P ⋆ solves ( LinOT ) with C = L ⊗ P ⋆ . Pr o of. A pro of can b e found in Murty and Y u ( 1988 , Theorem 1.12) but w e write it for completeness. W e note f ( P ) = ⟨ L ⊗ P , P ⟩ the GW loss. Let P 0 b e a solution of the linear problem ( LinOT ) with C = L ⊗ P ⋆ . W e consider for λ ∈ (0 , 1) the matrix P λ = λ P 0 + (1 − λ ) P ⋆ = P ⋆ + λ ( P 0 − P ⋆ ) . (3.12) Then, b y conv exit y of Π( a , b ), w e hav e P λ ∈ Π( a , b ). Also, since P ⋆ is optimal, f ( P ⋆ ) − f ( P λ ) ≤ 0 . (3.13) But f ( P λ ) = f ( P ⋆ + λ ( P 0 − P ⋆ )) = ⟨ L ⊗ P ⋆ , P ⋆ ⟩ + ⟨ L ⊗ P ⋆ , λ ( P 0 − P ⋆ ) ⟩ + ⟨ L ⊗ λ ( P 0 − P ⋆ ) , P ⋆ ⟩ + λ 2 ⟨ L ⊗ ( P 0 − P ⋆ ) , ( P 0 − P ⋆ ) ⟩ = f ( P ⋆ ) + 2 λ ⟨ L ⊗ P ⋆ , ( P 0 − P ⋆ ) ⟩ + λ 2 ⟨ L ⊗ ( P 0 − P ⋆ ) , ( P 0 − P ⋆ ) ⟩ (3.14) Using f ( P ⋆ ) − f ( P λ ) ≤ 0 and dividing b y λ > 0 implies 2 ⟨ L ⊗ P ⋆ , ( P 0 − P ⋆ ) ⟩ + λ ⟨ L ⊗ ( P 0 − P ⋆ ) , ( P 0 − P ⋆ ) ⟩ ≥ 0 . (3.15) Since this is true for any λ ∈ (0 , 1), by letting λ → 0 + w e obtain 2 ⟨ L ⊗ P ⋆ , ( P 0 − P ⋆ ) ⟩ ≥ 0 = ⇒ ⟨ L ⊗ P ⋆ , P 0 ⟩ ≥ ⟨ L ⊗ P ⋆ , P ⋆ ⟩ . (3.16) Since P 0 is an y optimal solution for the linear problem this implies that P ⋆ is an optimal solution, whic h concludes the pro of. The previous result shows that an optimal GW plan is also optimal for a linear OT problem, with the imp ortan t twist that the cost itself dep ends on the solution. As a consequence, we obtain the follo wing small monotonicity-t yp e result for GW optimal plans. Corollary 3.12. L et P ⋆ b e optimal for GW and C = C ( P ⋆ ) ≜ L ⊗ P ⋆ . Then for any for any N ∈ N ∗ , ( i 1 , j 1 ) , · · · , ( i N , j N ) ∈ supp( P ⋆ ) N and p ermutation σ ∈ S N , N X k =1 C i k j k ≤ N X k =1 C i k j σ ( k ) . (3.17) This result is mostly a curiosity: in a sense, GW plans exhibit a form of monotonicity , but it is not something w e can readily exploit. The conv erse is, to the b est of my kno wledge, false, and since the cost itself dep ends on the optimal plan, it is difficult to deriv e broad general statemen ts. Still, when L has additional structure, this p ersp ectiv e can b e pushed further to obtain meaningful information ab out optimal GW plans ( Murra y and Pick arski , 2025 ). 12 4 Conclusion I hav e sho wn in this note that the CND prop erty allo ws one to recov er GW coun terparts of several classical linear OT results: in particular, the existence of sparse optimal transport plans and a “Monge = Kantoro vich” situation. A natural question is ho w far one can go b ey ond the CND setting. My view is that man y of these prop erties no longer hold in general: I b eliev e that there are GW instances where ev ery optimal plan has “dense” supp ort, and there are choices of L for which no p erm utation solution is optimal. Ho wev er, as noted b y Maron and Lipman ( 2018 , Section 3), suc h situations appear to b e uncommon in practice: CND-type energies arise quite frequen tly . References Gabriel Peyr ´ e, Marco Cuturi, et al. Computational optimal transp ort: With applications to data science. F oundations and T r ends ® in Machine L e arning , 11(5-6):355–607, 2019. Gaspard Monge. M´ emoir e sur la th ´ eorie des d´ eblais et des r emblais . De l’Imprimerie Roy ale, 1781. Gabriel P eyr´ e. Optimal transp ort for machine learners. arXiv pr eprint arXiv:2505.06589 , 2025. Thibault S´ ejourn´ e, F ran¸ cois-Xa vier Vialard, and Gabriel Peyr ´ e. The un balanced gromov w asserstein distance: Conic formulation and relaxation. Neur al Information Pr o c essing Systems (NeurIPS) , 34, 2021. Florian Beier, Rob ert Beinert, and Gabriele Steidl. Multi-marginal gromov–w asserstein transp ort and barycen tres. Information and Infer enc e: A Journal of the IMA , 12(4):2753–2781, 2023. F acundo M´ emoli and T om Needham. Comparison results for gromov–w asserstein and gromo v–monge distances. ESAIM: Contr ol, Optimisation and Calculus of V ariations , 30:78, 2024. Th ´ eo Dumont, Th ´ eo Lacombe, and F ran¸ cois-Xa vier Vialard. On the existence of monge maps for the gromo v–wasserstein problem. F oundations of Computational Mathematics , 25, 2025. Hugues V an Assel, C ´ edric Vincen t-Cuaz, Nicolas Court y , R´ emi Flamary , Pascal F rossard, and Titouan V ay er. Distributional reduction: Unifying dimensionality reduction and clustering with gromov- w asserstein. T r ansactions on Machine L e arning R ese ar ch , 2025. Guillaume Houry , Jean F eydy , and F ran¸ cois-Xa vier Vialard. Gromo v-wasserstein at scale, b ey ond squared norms. arXiv pr eprint arXiv:2602.06658 , 2026. Haggai Maron and Y aron Lipman. (probably) conca ve graph matc hing. Neur al Information Pr o c essing Systems (NeurIPS) , 31, 2018. V ay er Titouan, Ievgen Redko, R ´ emi Flamary , and Nicolas Court y . Co-optimal transp ort. Neur al Information Pr o c essing Systems (NeurIPS) , 33, 2020. Gabriel Peyr ´ e, Marco Cuturi, and Justin Solomon. Gromov-w asserstein av eraging of k ernel and dis- tance matrices. In International Confer enc e on Machine L e arning (ICML) , 2016. Holger W endland. Sc atter e d dat a appr oximation , v olume 17. Cambridge universit y press, 2004. Isaac J Sc ho en b erg. Metric spaces and p ositiv e definite functions. T r ansactions of the Americ an Mathematic al So ciety , 44, 1938. 13 Ra jendra Bhatia. Infinitely divisible matrices. The Americ an Mathematic al Monthly , 113(3):221–235, 2006. C ´ edric Vincen t-Cuaz, Titouan V ay er, R ´ emi Flamary , Marco Corneli, and Nicolas Court y . Online graph dictionary learning. In International Confer enc e on Machine L e arning (ICML) , 2021. Ry an Murra y and Adam Pick arski. On probabilistic embeddings in optimal dimension reduction. Journal of Machine L e arning R ese ar ch (JMLR) , 26, 2025. Katta G Murt y and F eng-Tien Y u. Line ar c omplementarity, line ar and nonline ar pr o gr amming , vol- ume 3. Heldermann Berlin, 1988. 14

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment