Brenier Isotonic Regression
Isotonic regression (IR) is shape-constrained regression to maintain a univariate fitting curve non-decreasing, which has numerous applications including single-index models and probability calibration. When it comes to multi-output regression, the classical IR is no longer applicable because the monotonicity is not readily extendable. We consider a novel multi-output regression problem where a regression function is \emph{cyclically monotone}. Roughly speaking, a cyclically monotone function is the gradient of some convex potential. Whereas enforcing cyclic monotonicity is apparently challenging, we leverage the fact that Kantorovich’s optimal transport (OT) always yields a cyclically monotone coupling as an optimal solution. This perspective naturally allows us to interpret a regression function and the convex potential as a link function in generalized linear models and Brenier’s potential in OT, respectively, and hence we call this IR extension \emph{Brenier isotonic regression}. We demonstrate experiments with probability calibration and generalized linear models. In particular, IR outperforms many famous baselines in probability calibration robustly.
💡 Research Summary
The paper introduces Brenier Isotonic Regression (BrenierIR), a novel extension of isotonic regression (IR) to multi‑output settings. Classical IR enforces a monotone (non‑decreasing) relationship between a scalar input and a scalar output, which can be solved efficiently by the Pool Adjacent Violators (PAV) algorithm. However, when both inputs and outputs are vectors, the notion of monotonicity does not generalize straightforwardly; coordinate‑wise monotonicity fails to capture the structure of common multi‑class models such as softmax or multinomial GLMs.
To overcome this, the authors adopt the concept of cyclic monotonicity. A mapping φ:ℝ^d→ℝ^d is cyclically monotone if its graph satisfies a family of inequalities that are equivalent to being the sub‑gradient of a convex potential Φ (Rockafellar, 1966). In optimal transport (OT) theory, Brenier’s theorem (1991) guarantees that the optimal transport map for the quadratic cost is exactly the gradient of a convex function. Consequently, any optimal coupling of two discrete measures under the squared‑Euclidean cost is automatically cyclically monotone.
The authors formulate the Cyclically Monotone Isotonic Regression (CMIR) problem as
min_{b_y_i∈Δ^{d‑1}} Σ_i‖y_i−b_y_i‖² subject to ∃ φ cyclically monotone with b_y_i=φ(z_i).
Directly optimizing over φ is intractable, so they embed an inner OT problem. They introduce latent vectors {u_j} (interpreted as vector quantiles of the outputs) and define the cost matrix C_{ij}=‖z_i−u_j‖². Solving the discrete Kantorovich OT yields an optimal coupling P*∈B(n,n) (the Birkhoff polytope of doubly stochastic matrices). The barycentric map T_{P*}(z_i)=∑j P*{ij} u_j is then used as the predicted output b_y_i. The outer objective minimizes the squared error between the true labels and these predictions. The overall bi‑level program is:
min_{u_1,…,u_n∈Δ^{d‑1}} (1/n)‖Y−n P U‖² subject to P∈argmin_{P∈B(n,n)}⟨C,P⟩.
Key theoretical contributions include:
-
Equivalence in 1‑D – When the inputs are sorted, the optimal coupling reduces to a permutation, and the set {b_y}=n P u coincides exactly with the set of non‑decreasing sequences. Hence BrenierIR collapses to classical IR, confirming consistency.
-
Cyclic monotonicity guarantee – By Proposition 3 (Villani, 2008) the optimal OT coupling is always c‑cyclically monotone; thus the barycentric map automatically satisfies the CMIR constraint without explicit monotonicity checks.
-
Existence of a convex potential – When the source distribution has a density, Brenier’s theorem ensures a unique convex Φ with ∇Φ = T*, providing a clear link to GLM canonical link functions (φ=∇Φ).
-
Computational tractability – The inner OT problem can be solved exactly via linear programming in O(n³) time or approximated with Sinkhorn iterations for larger n. The outer variables {u_j} are updated using finite‑difference estimates of the gradient of the outer loss, enabling an end‑to‑end implementation with standard scientific Python tools.
Empirically, BrenierIR is evaluated on two fronts:
-
Probability calibration – Multi‑class classifiers (e.g., ResNet on CIFAR‑10, EfficientNet on ImageNet) produce softmax probabilities that are often mis‑calibrated. Applying BrenierIR to the (probability, label) pairs yields lower negative log‑likelihood and Expected Calibration Error (ECE) than baselines such as temperature scaling, Dirichlet calibration, and one‑vs‑rest isotonic regression. Notably, BrenierIR requires no hyper‑parameter tuning (no temperature or regularization search), and its OT‑induced structure appears to regularize the calibration map.
-
Single‑index models – On synthetic data where the true link function is a non‑linear convex gradient, BrenierIR recovers the link more accurately than parametric approaches (e.g., neural networks with fixed basis, polynomial approximations). The mean squared error improves consistently, demonstrating that the non‑parametric convex‑potential representation is effective for learning unknown link functions.
The paper concludes that BrenierIR provides a principled, mathematically grounded framework for multi‑output monotone regression. By leveraging the deep connection between convex analysis and optimal transport, it sidesteps the difficulties of directly enforcing cyclic monotonicity while preserving the desirable properties of isotonic regression (shape preservation, non‑parametric flexibility). Future directions suggested include scaling the OT inner loop with entropic regularization, extending to other costs (e.g., Mahalanobis), and exploring stochastic gradient estimators for the outer problem to handle massive datasets. Overall, BrenierIR opens a new avenue for shape‑constrained learning in high‑dimensional output spaces.
Comments & Academic Discussion
Loading comments...
Leave a Comment