PL conditions do not guarantee convergence of gradient descent-ascent dynamics

Reading time: 5 minute
...

📝 Original Info

  • Title: PL conditions do not guarantee convergence of gradient descent-ascent dynamics
  • ArXiv ID: 2602.16517
  • Date: 2026-02-18
  • Authors: ** (논문에 명시된 저자 정보가 제공되지 않았으므로, 저자 이름은 원문을 참고하시기 바랍니다.) **

📝 Abstract

We give an example of a function satisfying a two-sided Polyak-Lojasiewicz condition but for which a gradient descent-ascent flow line fails to converge to the saddle point, circling around it instead.

💡 Deep Analysis

📄 Full Content

A function f ∶ R d → R that is bounded from below is said to satisfy a Polyak-Łojasiewicz (PL) condition [8,14] if there exists C < +∞ such that for all z ∈ R d , (1.1)

Under this condition, all gradient descent trajectories converge to a minimizer of f at linear speed [5]. A natural question is whether analogous conditions can guarantee convergence for saddle-point problems. Min-max optimization problems of the form min

arise in a variety of contexts, including robust optimization [1] and generative adversarial networks [4]. For definiteness, we set X and Y to be Euclidean spaces. A natural procedure for finding a saddle point is to iteratively take small steps in the direction of (-∇ x f, ∇ y f ). In the regime of infinitesimally small steps, this amounts to studying the gradient descent-ascent (GDA) flow (z(t)) t⩾0 solving

If the functions (f (⋅, y)) y∈Y are uniformly strongly convex, and the functions (f (x, ⋅)) x∈X are uniformly strongly concave, then one can show that the GDA flow converges to a saddle point of f [2]. If f is only convex-concave (without the “strongly” qualifier), then there are counter-examples to convergence of the GDA flow, such as with the function f ∶ (x, y) ↦ xy for which the flow circles around the origin. Yet under the sole convexity-concavity condition, other first-order algorithms such as the extragradient method do succeed in finding a saddle point for f (assuming that one exists) [6].

There has been significant effort to weaken the strong convexity-concavity assumption, for instance by imposing strong convexity in only one variable, The flow lines with a color scale from dark blue to yellow are the level lines of the function f we build for Theorem 1.1 (the color scale indicates the magnitude of v). The value of f is not shown and is prescribed along the two orange lines according to (3.2). These two lines are also the set of points at which the level line of f is horizontal or vertical (i.e., where

The red trajectory is a gradient descent-ascent flow line on f . The ellipses are the level lines of the quadratic forms inside the two occurrences of the function φ in (3.1), for the values 1/2 and 1. In particular, the level lines of f are tangent to the vector field given in (3.4) on the region contained by every ellipse, and to the vector field given in (3.7) on the region that is outside of every ellipse.

or by replacing strong convexity-concavity with PL conditions [3,7,9,12,18]. We focus on the latter direction. We say that f satisfies a two-sided PL condition if the functions (f (⋅, y)) y∈Y and (-f (x, ⋅)) x∈X satisfy a PL condition with a uniform constant. Under this condition, modified versions of the GDA flow were shown to converge to a saddle point in [3,18]. The modifications crucially impose a sufficiently large separation of timescales between the evolutions of the variables x and y; see also [7,9,12] for related two-timescale approaches. Another positive result under the two-sided PL condition is that the GDA flow (1.2) itself converges to a saddle point if initialized sufficiently close to it (see Proposition 1.2 below).

In view of these results, one may expect that the two-sided PL condition in fact guarantees global convergence of the GDA flow to a saddle point (assuming one exists). The point of this paper is to show that this is not so.

R with a unique critical point at the origin, and a constant C < +∞ such that for every

and yet, for every z( 0) in some open subset of [-1, 1] 2 , the GDA flow given by (1.2) is periodic.

It is immediate to verify that if a function f satisfies (1.3) and admits a critical point, say at the origin (i.e. ∇f (0, 0) = 0), then this critical point is a saddle point, or more precisely, for every x, y ∈ [-1, 1], we have

The function f that we build to show Theorem 1.1 is displayed on Figure 1 (up to a rescaling of the variables to bring them back to [-1, 1] 2 ). For every (x, y) in a neighborhood of the origin, this function is given by f (x, y) = γ 2 x 2 + xy -γ 2 y 2 , for some γ ≃ 0.2531. It is straightforward to check that this implies the convergence of the GDA flow towards the saddle point if one initializes the flow sufficiently close to the origin. This is not specific to this example, as we clarify in Proposition 1.2 below.

As we move away from the origin, we will progressively deform the function f so that the GDA flow then admits an integral of motion (i.e. a quantity that is preserved along the flow). This quantity is the L 4 norm after a rotation by π/8 (see also the red trajectory on Figure 1).

In the statement of Theorem 1.1, it is possible to replace the compact domain [-1, 1] 2 by R 2 if one so wishes. In order to do so, it indeed suffices to undo the said deformation, so that outside of a sufficiently large bounded region, the function f becomes again the quadratic form that we set it to be near the origin.

As announced, we also show for reference that the two-sided PL condition in (1.3) does guaran

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut