Efficient Quadratic Corrections for Frank-Wolfe Algorithms
We develop a Frank-Wolfe algorithm with corrective steps, generalizing previous algorithms including blended conditional gradients, blended pairwise conditional gradients, and fully-corrective Frank-Wolfe. For this, we prove tight convergence guarantees together with an optimal face identification property. Furthermore, we propose two highly efficient corrective steps for convex quadratic objectives based on linear optimization or linear system solving, akin to Wolfe’s minimum-norm point, and show that they converge in finite time under suitable conditions. Beyond optimization problems that are directly quadratic, we revisit two algorithms - split conditional gradient and second-order conditional gradient sliding - which can leverage quadratic corrections to accelerate their quadratic subproblems. We demonstrate improved convergence rates for the first and broader applicability for the second, which may be of independent interest. Finally, we show substantial computational speedups for Frank-Wolfe-based algorithms with quadratic corrections across the considered problem classes.
💡 Research Summary
The paper introduces a unifying framework called Corrective Frank‑Wolfe (CFW) that generalizes several existing Frank‑Wolfe (FW) variants—such as Away‑step FW (AFW), Blended Conditional Gradients (BCG), Blended Pairwise Conditional Gradients (BPCG), and Fully‑Corrective FW (FCFW). In CFW each iteration computes an “away” vertex a_t and a “local FW” vertex s_t from the current active set S_t, as well as the global FW vertex v_t via a linear minimization oracle (LMO). The algorithm compares the local pairwise gap ⟨∇f(x_t), a_t−s_t⟩ with the global FW gap ⟨∇f(x_t), x_t−v_t⟩; if the former is larger, a corrective step is taken, otherwise a standard FW step is performed.
A corrective step must satisfy either (i) a “drop” step that reduces the size of S while not increasing the objective, or (ii) a “descent” step that guarantees a decrease at least half the local pairwise gap. This simple criterion subsumes the more complex KKT‑based corrections of FCFW while preserving convergence guarantees.
Theoretical analysis assumes L‑smoothness and (c,θ)-sharpness of the objective. For general convex smooth functions CFW achieves the classic O(1/T) rate. When the feasible set X is a polytope and f is (c,½)-sharp, the algorithm enjoys linear convergence with a rate depending on the pyramidal width δ of X. For sharper functions with θ<½, a sublinear rate O(T^{-(1−2θ)^{-1}}) is proved. Moreover, under strict complementarity CFW identifies the optimal face F* of the polytope in finite time, after which the problem reduces to a lower‑dimensional subspace.
The most original contribution concerns quadratic objectives f(x)=½xᵀAx+bᵀx+c. The authors propose two highly efficient corrective steps tailored to this setting:
-
QC‑LP (Quadratic Correction via Linear Programming) – The quadratic problem is relaxed from minimizing over conv(S) to minimizing over the affine hull aff(S). By fixing an anchor vertex w∈S, the optimality conditions become a linear system: ⟨AVλ+b, v−w⟩=0 for all v∈S{w}, ∑_{v∈S}λ_v=1, λ≥0. This system can be solved as a linear program, yielding a new barycentric representation λ that respects the convex hull constraints.
-
QC‑MNP (Quadratic Correction via Minimum‑Norm Point) – Inspired by Wolfe’s Minimum‑Norm Point algorithm, this variant solves the same affine‑hull problem by directly finding the point of minimum norm in the affine subspace defined by S, then projects it back onto conv(S). The method requires only solving a linear system (A V λ + b orthogonal to the affine space) and is particularly cheap when A is positive definite.
Both corrections are shown to converge in finite time for positive‑definite A, and a hybrid scheme that mixes QC‑LP, QC‑MNP, and local pairwise steps (as in BPCG) is proposed for practical robustness.
The paper further integrates these quadratic corrections into two recent algorithms:
-
Split Conditional Gradient (SCG) – SCG alternates between updates on intersecting convex sets, solving at each step a quadratic subproblem with an increasing penalty term. By applying QC‑LP/QC‑MNP, the subproblem is solved much faster, and the authors improve SCG’s convergence from O(log T/√T) to the optimal O(1/√T) for the underlying nonsmooth problem.
-
Second‑Order Conditional Gradient Sliding (SOCGS) – SOCGS combines an inexact Newton step with Conditional Gradient Sliding, achieving very fast rates for smooth problems. Its bottleneck is the quadratic subproblem; the proposed corrections replace the expensive interior‑point or iterative solvers with a single linear‑system solve, dramatically reducing runtime. Additionally, the authors prove global linear convergence of SOCGS for generalized self‑concordant functions, removing the need for global Lipschitz smoothness or strong convexity.
Extensive experiments on a variety of quadratic‑structured tasks—sparse regression, entanglement detection, projections onto the Birkhoff polytope, and tensor completion—demonstrate that CFW equipped with QC‑LP or QC‑MNP (or their hybrid) consistently outperforms the baseline FW variants. Speed‑ups range from 2× to over 5× while achieving comparable or better solution quality. The corrections also accelerate ALM and SCG, and enable SOCGS to solve its subproblems efficiently.
In summary, the paper provides a clean, theoretically grounded framework for corrective steps in Frank‑Wolfe methods, introduces practical quadratic‑correction techniques that require only linear programming or linear‑system solves, and shows how these tools improve both the theory and practice of advanced FW‑based algorithms.
Comments & Academic Discussion
Loading comments...
Leave a Comment