Implicit regularization of normalized gradient descent
How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.
š” Research Summary
The paper investigates how normalized gradient descent (NGD), a classic algorithm originally proposed for nonsmooth optimization, can be harnessed to achieve implicit regularization toward flat minima when combined with a slowly diminishing stepāsize schedule. The authors begin by recalling that NGD updates the iterate as
āāx_{k+1}=x_kāα_kāÆāf(x_k)/āāf(x_k)ā,
provided the gradient is nonāzero, and that convergence is guaranteed under the classical RobbinsāMonro conditions āα_k=ā and āα_k²<ā. The novelty lies in interpreting NGD as a discretization of the differential inclusion Ėxābāf(x), where bāf is a setāvalued ānormalized subādifferentialā defined to be the unitānorm direction of the gradient when it exists and the closed unit ball otherwise. This mapping is upperāsemicontinuous with nonāempty compact convex values for locally Lipschitz functions, a property that enables existence of continuousātime trajectories and boundedness of maximal solutions.
The core theoretical contribution is the introduction of a pādāLyapunov function g:āāæāā. A function g is called a pādāLyapunov function for the discrete dynamics if for every iteration
āāg(x_{k+1})āg(x_k) ⤠āĻāÆĪ±_k^p,
with Ļ>0 and pā
Comments & Academic Discussion
Loading comments...
Leave a Comment