Implicit regularization of normalized gradient descent

Implicit regularization of normalized gradient descent
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

How to find flat minima? We propose running normalized gradient descent, usually reserved for nonsmooth optimization, with sufficiently slowly diminishing step sizes. This induces implicit regularization towards flat minima if an appropriate Lyapunov functions exists in the gradient dynamics. Our analysis shows that implicit regularization is intrinsically a question of nonsmooth analysis, for which we deploy the full power of variational analysis and stratification theory.


šŸ’” Research Summary

The paper investigates how normalized gradient descent (NGD), a classic algorithm originally proposed for nonsmooth optimization, can be harnessed to achieve implicit regularization toward flat minima when combined with a slowly diminishing step‑size schedule. The authors begin by recalling that NGD updates the iterate as
ā€ƒā€ƒx_{k+1}=x_kāˆ’Ī±_kā€Æāˆ‡f(x_k)/ā€–āˆ‡f(x_k)‖,
provided the gradient is non‑zero, and that convergence is guaranteed under the classical Robbins‑Monro conditions āˆ‘Ī±_k=āˆž and āˆ‘Ī±_k²<āˆž. The novelty lies in interpreting NGD as a discretization of the differential inclusion Ė™x∈bāˆ‡f(x), where bāˆ‡f is a set‑valued ā€œnormalized sub‑differentialā€ defined to be the unit‑norm direction of the gradient when it exists and the closed unit ball otherwise. This mapping is upper‑semicontinuous with non‑empty compact convex values for locally Lipschitz functions, a property that enables existence of continuous‑time trajectories and boundedness of maximal solutions.

The core theoretical contribution is the introduction of a p‑d‑Lyapunov function g:ā„āæā†’ā„. A function g is called a p‑d‑Lyapunov function for the discrete dynamics if for every iteration
ā€ƒā€ƒg(x_{k+1})āˆ’g(x_k) ≤ āˆ’Ļ‰ā€ÆĪ±_k^p,
with ω>0 and p∈


Comments & Academic Discussion

Loading comments...

Leave a Comment