Fully First-Order Algorithms for Online Bilevel Optimization
In this work, we study non-convex-strongly-convex online bilevel optimization (OBO). Existing OBO algorithms are mainly based on hypergradient descent, which requires access to a Hessian-vector product (HVP) oracle and potentially incurs high computational costs. By reformulating the original OBO problem as a single-level online problem with inequality constraints and constructing a sequence of Lagrangian function, we eliminate the need for HVPs arising from implicit differentiation. Specifically, we propose a fully first-order algorithm for OBO, and provide theoretical guarantees showing that it achieves regret of $O(1 + V_T + H_{2,T})$. Furthermore, we develop an improved variant with an adaptive inner-iteration scheme, which removes the dependence on the drift variation of the inner-level optimal solution and achieves regret of $O(\sqrt{T} + V_T)$. This regret have the advatange when $V_{T}\ge O(\sqrt{T})$.
💡 Research Summary
The paper tackles the challenging setting of non‑convex‑strongly‑convex online bilevel optimization (OBO), where at each round a decision $x_t$ must track the time‑varying optimal solution of a bilevel problem. Traditional OBO methods rely on hypergradient descent, which requires Hessian‑vector products (HVP) to compute the implicit gradient $\nabla_y y^*_t(x)$. This dependence makes them computationally heavy and sometimes infeasible when HVP queries are unavailable.
The authors propose a fundamentally different approach: they reformulate the bilevel problem as a single‑level constrained optimization by introducing the inequality constraint $g_t(x,y)-g_t(x,y^_t(x))\le0$ and embedding it in a Lagrangian $L_t(x,y,\lambda_t)=f_t(x,y)+\lambda_t\big(g_t(x,y)-g_t(x,y^t(x))\big)$. By treating $\lambda_t$ as a penalty multiplier that grows over time, the minimizer of $L_t$ converges to the original bilevel solution as $\lambda_t\to\infty$. Crucially, because the inner minimizer $y^*{\lambda_t,t}(x)$ satisfies $\nabla_y L_t=0$, the hypergradient can be expressed solely with first‑order information: \
Comments & Academic Discussion
Loading comments...
Leave a Comment