Convergence of high-index saddle dynamics for degenerate saddle points on critical manifolds

Convergence of high-index saddle dynamics for degenerate saddle points on critical manifolds
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The high-index saddle dynamics (HiSD) method provides a powerful framework for finding saddle points and constructing solution landscapes. While originally derived for nondegenerate critical points, HiSD has demonstrated empirical success in degenerate cases, where the Hessian matrix exhibits zero eigenvalues. However, the mathematical and numerical analysis of HiSD for degenerate saddle points remains unexplored. In this paper, utilizing Morse-Bott functions, we present a rigorous analysis of HiSD for computing degenerate saddle points on a critical manifold. We prove the local convergence of the continuous HiSD and establish the linear convergence rate of the discrete HiSD algorithm. Furthermore, we provide a theoretical explanation for the gradient alignment tendency, revealing that the gradient direction asymptotically aligns with a specific Hessian eigenvector. Our analysis also elucidates the flexibility in selecting the index for HiSD in the context of degenerate saddle points. We validate our analytical results through numerical experiments on neural-network loss landscapes and demonstrate that momentum-accelerated variants of HiSD achieve rapid convergence to degenerate saddle points.


💡 Research Summary

This paper provides a rigorous theoretical foundation for the High‑Index Saddle Dynamics (HiSD) method when applied to degenerate saddle points that lie on a critical manifold, a situation where the Hessian of the objective function possesses zero eigenvalues. By employing the framework of Morse‑Bott functions, the authors extend the classical non‑degenerate analysis of HiSD to settings where critical points form smooth submanifolds rather than isolated points.

The work begins with a concise review of Morse‑Bott theory, emphasizing that for any point on a critical submanifold the tangent space coincides with the kernel of the Hessian, while the Hessian is non‑degenerate on the normal space. Lemma 2.4 (the Morse‑Bott lemma) supplies a local normal form: near a point on the manifold the function is constant along the manifold and behaves like a non‑degenerate quadratic form in the normal directions, with s negative and (d‑m‑s) positive quadratic terms. This structure is the key to understanding how HiSD can be formulated without the usual non‑degeneracy assumption.

HiSD is a coupled ODE system that evolves the state variable θ and a set of k orthonormal vectors {v₁,…,v_k} intended to approximate the eigenvectors associated with the k smallest eigenvalues of the Hessian. The dynamics performs gradient ascent in the subspace spanned by these vectors and gradient descent in its orthogonal complement. The authors prove that, for any k ranging from the true index s up to s + m (where m is the dimension of the zero‑eigenvalue subspace), the continuous HiSD flow is locally asymptotically stable with respect to the critical manifold M. In particular, the error θ − P_M(θ) lies in the image of the Hessian, which is orthogonal to the tangent space, guaranteeing exponential decay of the normal component.

The discrete algorithm (explicit Euler) is analyzed under the assumption that the eigen‑solver (EigenSol) provides exact eigenvectors. The authors derive a linear convergence rate of the form
ρ = β λ_min / (1 + β λ_min),
where β is the step size and λ_min is the smallest positive eigenvalue of the Hessian restricted to the normal space. They also show that the gradient direction aligns asymptotically with the eigenvector corresponding to the most negative eigenvalue, providing a theoretical explanation for the empirically observed “gradient alignment” phenomenon.

To address the slow convergence typical of explicit schemes, the paper introduces momentum‑accelerated variants: a Heavy‑Ball version and an Nesterov‑type scheme. Both are shown to preserve the convergence guarantees while offering a constant‑factor speed‑up, as the momentum term effectively reduces the spectral radius of the iteration matrix.

Numerical experiments focus on over‑parameterized neural‑network loss landscapes, which are known to exhibit large critical manifolds with many zero Hessian eigenvalues. The authors demonstrate that HiSD, with indices chosen anywhere between s and s + m, reliably converges to degenerate saddle points on these manifolds. Momentum‑accelerated HiSD reduces the number of iterations by roughly 30–70 % compared with the plain Euler version, and the gradient alignment effect is observed early in the iteration process.

In summary, the paper makes four major contributions: (1) it establishes local asymptotic stability of the continuous HiSD flow on Morse‑Bott critical manifolds; (2) it proves linear convergence of the discrete HiSD algorithm with explicit rates; (3) it provides a rigorous justification for gradient alignment and the flexibility of the index choice in degenerate settings; and (4) it validates the theory with extensive experiments on high‑dimensional machine‑learning models, showing that momentum‑accelerated HiSD achieves rapid and robust convergence to degenerate saddle points. This work significantly broadens the applicability of HiSD to realistic, high‑dimensional problems where degeneracy is the norm rather than the exception.


Comments & Academic Discussion

Loading comments...

Leave a Comment