Adaptive Conditional Gradient Sliding: Projection-Free and Line-Search-Free Acceleration

Adaptive Conditional Gradient Sliding: Projection-Free and Line-Search-Free Acceleration
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study convex optimization problems over a compact convex set where projections are expensive but a linear minimization oracle (LMO) is available. We propose the adaptive conditional gradient sliding method (AdCGS), a projection-free and line-search-free method that retains Nesterov’s acceleration with adaptive stepsizes based on local Lipschitz estimates. AdCGS combines an accelerated outer scheme with an LMO-based inner routine. It reuses gradients across multiple LMO calls to reduce gradient evaluations, while controlling the subproblem inexactness via a prescribed accuracy level coupled with adaptive stepsizes. We prove accelerated convergence rates for convex objective functions matching those of projection-based accelerated methods, while requiring no projection oracle. For strongly convex objective functions, we further establish linear convergence without additional geometric assumptions on the constraint set, such as polytopes or strongly convex sets. Experiments on constrained $\ell_p$ regression, logistic regression with real-world datasets, and least-squares problems demonstrate improvements over both projection-free and projection-based baselines.


💡 Research Summary

The paper addresses convex optimization over a compact set P where projections are costly but a linear minimization oracle (LMO) is cheap. The authors propose Adaptive Conditional Gradient Sliding (AdCGS), a method that combines the accelerated outer scheme of Nesterov with an inner conditional gradient (Frank‑Wolfe) routine, while eliminating both projection and line‑search operations.

Key ingredients

  1. Local Lipschitz estimation – At each outer iteration k the algorithm computes a local smoothness constant Lₖ from the most recent two iterates (equations (3)–(4)). This replaces the need for a global Lipschitz constant and enables a fully line‑search‑free stepsize ηₖ.
  2. Inexact inner solves – The inner subproblem (2) is a strongly convex quadratic plus a linear term. Instead of solving it to optimality, the algorithm stops the Frank‑Wolfe loop when the Frank‑Wolfe gap falls below a prescribed tolerance δₖ. The sequence {δₖ} is chosen to decay with the outer iteration count, thereby controlling the accumulated error Sₖ while keeping the number of LMO calls modest.
  3. Accelerated updates – Parameters βₖ (momentum weight) and τₖ (Nesterov’s scaling) are set in a simple deterministic way (βₖ=β for k≥2, τₖ=k/2). The updates for yₖ and xₖ follow the classic Nesterov formulas, guaranteeing acceleration when the inner subproblem is solved with sufficient accuracy.

Algorithmic flow (Algorithm 1)

  • Initialize x₀=y₀=z₀∈P.
  • For each outer iteration k:
    • Call the inner CG routine with gradient ∇f(x_{k‑1}), previous inner point y_{k‑1}, stepsize ηₖ, and tolerance δₖ. The CG routine repeatedly queries the LMO (line 13) and stops when the FW‑gap ≤ δₖ, returning zₖ.
    • Update yₖ = (1‑β) y_{k‑1}+β zₖ and xₖ = τₖ/(1+τₖ) x_{k‑1}+1/(1+τₖ) zₖ.
    • Compute the local Lipschitz estimate Lₖ using (3) for k=1 and (4) for k≥2, then set η_{k+1} according to a rule that depends only on Lₖ (no line‑search).

Theoretical contributions

  • One‑step inequality (Prop. 3.1): Relates the progress of the outer scheme to the inner inexactness, introducing error terms D_i and the cumulative error Sₖ = Σ η_{i+1}(δ_{i+1}+δ_i).
  • Accelerated convergence (Thm. 3.2): Shows that, under mild conditions on {βₖ, τₖ, ηₖ}, the sequence {xₖ} satisfies
    f(xₖ)‑f(x*) ≤ (E+Sₖ)/

Comments & Academic Discussion

Loading comments...

Leave a Comment