Decision-Focused Optimal Transport
We propose a fundamental metric for measuring the distance between two distributions. This metric, referred to as the decision-focused (DF) divergence, is tailored to stochastic linear optimization problems in which the objective coefficients are random and may follow two distinct distributions. Traditional metrics such as KL divergence and Wasserstein distance are not well-suited for quantifying the resulting cost discrepancy, because changes in the coefficient distribution do not necessarily change the optimizer of the underlying linear program. Instead, the impact on the objective value depends on how the two distributions are coupled (aligned). Motivated by optimal transport, we introduce decision-focused distances under several settings, including the optimistic DF distance, the robust DF distance, and their entropy-regularized variants. We establish connections between the proposed DF distance and classical distributional metrics. For the calculation of the DF distance, we develop efficient computational methods. We further derive sample complexity guarantees for estimating these distances and show that the DF distance estimation avoids the curse of dimensionality that arises in Wasserstein distance estimation. The proposed DF distance provides a foundation for a broad range of applications. As an illustrative example, we study the interpolation between two distributions. Numerical studies, including a toy newsvendor problem and a real-world medical testing dataset, demonstrate the practical value of the proposed DF distance.
💡 Research Summary
The paper introduces a novel metric called the Decision‑Focused (DF) divergence for quantifying the impact of changes in the distribution of objective‑coefficient vectors on stochastic linear optimization problems. Traditional distributional distances such as total variation, KL‑divergence, or Wasserstein distance compare probability measures without regard to how those differences affect the optimizer or the resulting objective value. In many operations‑research settings, the decision‑relevant quantity is not the geometric shift of the distribution but the induced change in the optimal solution set and the associated cost. The authors therefore propose to measure distance through the lens of downstream optimization by explicitly considering couplings (transport plans) between two distributions.
Two primary variants are defined: the optimistic DF distance, which takes the coupling that minimizes the expected decision regret, and the robust DF distance, which takes the coupling that maximizes it. Both are formalized as expectations of the difference between optimal objective values under paired coefficient realizations drawn from a joint distribution π∈Π(P,Q). Entropy‑regularized versions add a KL penalty on π to improve computational stability.
The paper establishes several theoretical properties. First, Lipschitz‑type bounds relate DF distances to classic Wasserstein distances (W₁, W₂), showing that DF is bounded above and below by constants times Wasserstein metrics. Second, sandwich inequalities connect DF to total variation and KL, guaranteeing that DF does not blow up when supports are disjoint. Third, the entropy‑regularized DF distances are bounded by the unregularized ones with explicit dependence on the regularization parameter.
From a computational standpoint, the authors show that evaluating DF can be reduced to a W₂‑type optimal transport problem when the feasible region of the linear program is a polyhedron with finitely many extreme points. By introducing dual variables for each extreme point, the joint optimization over π becomes a quadratic program that can be solved with standard OT solvers (e.g., Sinkhorn iterations) after appropriate reformulation. Importantly, the sample complexity of estimating DF distances is O(1/√n) and independent of the ambient dimension, because the downstream feasible set has a finite number of extreme points. This contrasts sharply with the curse of dimensionality that plagues Wasserstein estimation (rate n^{-1/d}).
The authors illustrate the utility of DF through two experiments. In a newsvendor setting, they demonstrate that many distributional shifts leave the optimal order quantity unchanged, yielding near‑zero DF divergence, whereas KL or Wasserstein distances remain large, indicating that traditional metrics over‑estimate decision risk. In a real‑world medical example, bone‑mineral‑density (BMD) distributions for two age groups (40 and 50) are coupled in three ways: optimistic (minimal individual shift), independent (no persistence), and robust (maximal shift). Using DF‑based McCann interpolation to estimate the 45‑year‑old distribution, they show that the optimistic coupling predicts a small expected treatment loss, the independent coupling overestimates risk, and the robust coupling yields the worst‑case loss. These results confirm that DF captures the decision‑relevant discrepancy that classical distances miss.
Overall, the paper provides a rigorous, computationally tractable framework for measuring distributional change in a decision‑focused manner. It bridges optimal transport theory with stochastic optimization, offers dimension‑free statistical guarantees, and opens avenues for applications such as robust optimization, decision‑aware learning, uncertainty set construction, and clustering where the ultimate goal is to preserve decision quality rather than merely match distributions.
Comments & Academic Discussion
Loading comments...
Leave a Comment