A Geometry-Aware Efficient Algorithm for Compositional Entropic Risk Minimization

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

This paper studies optimization for a family of problems termed $\textbf{compositional entropic risk minimization}$, in which each data’s loss is formulated as a Log-Expectation-Exponential (Log-E-Exp) function. The Log-E-Exp formulation serves as an abstraction of the Log-Sum-Exponential (LogSumExp) function when the explicit summation inside the logarithm is taken over a gigantic number of items and is therefore expensive to evaluate. While entropic risk objectives of this form arise in many machine learning problems, existing optimization algorithms suffer from several fundamental limitations including non-convergence, numerical instability, and slow convergence rates. To address these limitations, we propose a geometry-aware stochastic algorithm, termed $\textbf{SCENT}$, for the dual formulation of entropic risk minimization cast as a min–min optimization problem. The key to our design is a $\textbf{stochastic proximal mirror descent (SPMD)}$ update for the dual variable, equipped with a Bregman divergence induced by a negative exponential function that faithfully captures the geometry of the objective. Our main contributions are threefold: (i) we establish an $O(1/\sqrt{T})$ convergence rate of the proposed SCENT algorithm for convex problems; (ii) we theoretically characterize the advantages of SPMD over standard SGD update for optimizing the dual variable; and (iii) we demonstrate the empirical effectiveness of SCENT on extreme classification, partial AUC maximization, contrastive learning and distributionally robust optimization, where it consistently outperforms existing baselines.

💡 Research Summary

The paper tackles a broad class of machine‑learning problems whose loss can be written as a Log‑Expectation‑Exponential (Log‑E‑Exp) function, a generalization of the Log‑Sum‑Exp that becomes intractable when the inner sum involves a massive number of terms (e.g., extreme‑class classification, partial AUC maximization, contrastive representation learning, and KL‑regularized distributionally robust optimization). Existing stochastic methods—biased mini‑batch approximations, alternating SGD on the min‑min dual, or compositional stochastic gradient schemes—suffer from biased gradients, numerical overflow due to the exponential, and sub‑optimal convergence rates, especially for convex objectives.

The authors reformulate the problem via a classic dual representation dating back to Ben‑Tal and Teboulle (1986), yielding a min‑min objective:
min_{w∈W, ν∈ℝⁿ} F(w,ν)= (1/n)∑{i=1}ⁿ E{ζ∼P_i}

A Geometry-Aware Efficient Algorithm for Compositional Entropic Risk Minimization

💡 Research Summary

Comments & Academic Discussion

Leave a Comment