On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans

February 20, 2026

Reading time: 6 minute

...

📝 Original Info

Title: On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans
ArXiv ID: 2602.16265
Date: 2026-02-18
Authors: ** 작성자: 익명(또는 “Author”) – 논문 자체에 명시된 저자 정보가 없으며, 주로 개인적인 메모와 토론을 바탕으로 작성된 것으로 보인다. — **

📝 Abstract

This note gives a self-contained overview of some important properties of the Gromov-Wasserstein (GW) distance, compared with the standard linear optimal transport (OT) framework. More specifically, I explore the following questions: are GW optimal transport plans sparse? Under what conditions are they supported on a permutation? Do they satisfy a form of cyclical monotonicity? In particular, I present the conditionally negative semi-definite property and show that, when it holds, there are GW optimal plans that are sparse and supported on a permutation.

💡 Deep Analysis

📄 Full Content

This note originated from the discussions with colleagues: I find out that a simple and pedagogical exposition of the fundamentals properties of the Gromov-Wasserstein (GW) optimal plans was maybe a bit missing. The aim here is not to present new results, but to highlight a few properties of GW that I find particularly interesting. While these results exist in the literature, they are rarely gathered in a single place; my goal is to offer the most self-contained exposition possible. I rely on only a few external theorems and instead prove most statements directly.

To me, GW is a particularly fascinating object in optimal transport (OT), and many of its properties are still not fully understood. I hope this note provides an instructive perspective that helps the reader develop a clearer intuition for GW, and possibly contributes, even if modestly, to a deeper overall understanding of its structure.

I begin this note by fixing the notations and recalling the fundamentals of discrete OT. The goal is to be concise, so readers seeking more details can refer to Peyré et al. (2019).

Standard linear OT aims to align two distributions according to a least-effort principle. Let C ∈ R n×m be a cost matrix, for instance encoding the pairwise distances between points from the two distributions, and let a ∈ ∆ n and b ∈ ∆ m be probability vectors representing the available mass and the demand, respectively. The set of couplings, or transport plans, with prescribed marginals a and b, is defined by Π(a, b) ≜ {P ∈ R n×m

where 1 n is the vector of ones.

A special case of a coupling is when n = m and when the mass is uniform a = b = 1 n 1 n : in this case a coupling P can be supported by a permutation, that is P ∈ Perm(n) where

where S n is the set of all permutations of [[n]]. Linear OT searches for the transport plan P ∈ Π(a, b) that minimizes the shifting cost ⟨C, P⟩ ≜ ij C ij P ij . In the following, we note OT(C, a, b) ≜ min P∈Π(a,b) ⟨C, P⟩ .

(LinOT)

The quantity defined in problem (LinOT) is commonly referred to as the Wasserstein distance when C represents a pairwise distance matrix. A key feature of this formulation is that the objective is linear in P, in contrast with the “quadratic” nature of the Gromov-Wasserstein problem. We introduce below a deliberately general version of this quadratic formulation, which will be specified in more detail later.

Let L = (L ijkl ) be a 4D tensor with (i, j)

The GW problem also aims to align the two distributions, but it does so by minimizing the quadratic cost ijkl L ijkl P ij P kl . By introducing the tensor-matrix product L ⊗ P, defined as the matrix

the objective minimized by GW can be written compactly as ⟨L ⊗ P, P⟩. We note

As announced, problem (QuadOT) is quadratic in P, which makes both the optimization and the theoretical analysis significantly more involved. In practice, the tensor L is typically constructed as follows: given two “intra” cost matrices C ∈ R n×n and C ∈ R m×m , which encode pairwise similarities within each space, together with a function L : R × R → R designed to measure how comparable two similarities are, one defines L as

(1.3)

A standard example is the squared-loss setting, where L(a, b) = (a -b) 2 and C and C are the matrices of squared pairwise distances within each distribution. In what follows, we say that L is symmetric if, for all (i, j, k, l), one has L ijkl = L klij , meaning that swapping i with k and j with l leaves the tensor unchanged.

We will also need the notion of the support of P, defined as the set of indices corresponding to the nonzero entries of the coupling:

(1.4)

Finally, two general definitions. For a convex set C, an extreme point of C is a point that cannot be written as a nontrivial convex combination1 of other points in C.

) is connected by an edge in E, it starts and ends at the same vertex (u k = u 1 ), and all other vertices are distinct.

2 Some important properties of linear OT

The fundamental properties of linear OT that we aim to investigate for GW in this note are the sparsity and monotonicity of optimal transport plans, as well as the “tightness” of the coupling relaxation. We detail these three properties below and provide proofs for each.

This is one of the most fundamental properties of linear OT, sometimes referred to as the shortening principle. To illustrate, consider the following simple example: suppose that (i, j) and (i ′ , j ′ ) are in supp(P) and that P is optimal. This means that the pairs (i, j) and (i ′ , j ′ ) are matched because doing so incurs minimal cost. Intuitively, switching the matches to (i, j ′ ) and (i ′ , j) should result in a higher cost; otherwise, P would not be optimal. Formally, this can be seen by considering a matrix Q ∈ R n×m that is identical to P except at these four indices:

where ε = min{P ij , P i ′ j ′ } > 0. It is then straightforward to verify that Q ∈ Π(a, b), since the marginals remain unchanged and all entries are nonnegative by the choice of ε. Add

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on open access ArXiv data.

On sparsity, extremal structure, and monotonicity properties of Wasserstein and Gromov-Wasserstein optimal transport plans

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Activation-Space Uncertainty Quantification for Pretrained Networks

Conjugate Learning Theory: Uncovering the Mechanisms of Trainability and Generalization in Deep Neural Networks

Empirical Cumulative Distribution Function Clustering for LLM-based Agent System Analysis

Start searching

No results found