Twice Epi-Differentiability of Spectral Functions and its applications
Second-order variational properties have been shown to play important theoretical and numerical roles for different classes of optimization problems. Among such properties, twice epi-differentiability has a special place because of its ubiquitous presence in various classes of extended-real-valued functions that are important for optimization problems. We provide a useful characterization of this property for spectral functions by demonstrating that it can be characterized via the same property of the symmetric part of the spectral representation of an eigenvalue function. Our approach allows us to bypass the rather restrictive convexity assumption, used in many recent works that targeted second-order variational properties of spectral functions. By this theoretical tool, several applications on the proto-differentiability of subgradient mappings, the directional differentiability of the proximal mapping of spectral functions are achieved. We finally use our established theory to study twice epi-differentiability of leading eigenvalue functions and practical regularization terms that have important applications in statistics and the robust PCA.
💡 Research Summary
This paper investigates the second‑order variational property known as twice epi‑differentiability for spectral functions—functions that are invariant under orthogonal similarity transformations. A spectral function g : Sⁿ→ℝ∪{±∞} can always be written as the composition g(X)=θ(λ(X)), where λ(X)∈ℝⁿ is the vector of eigenvalues of X sorted in non‑increasing order and θ : ℝⁿ→ℝ is a symmetric (permutation‑invariant) scalar function. While earlier works established twice epi‑differentiability of g only when θ is convex, this study removes the convexity requirement and shows that the property of g is completely inherited from θ.
The authors first recall the first‑order expansion of eigenvalues (Proposition 2.1) and the known sub‑gradient formula for spectral functions (Proposition 2.2). They then define the second‑order difference quotient Δ²ₜ f( x, v )(w) and the second sub‑derivative d²f( x, v )(w). The central result (Theorem 3.1) proves that g is twice epi‑differentiable at X̄ for a sub‑gradient V∈∂g(X̄) if and only if θ is twice epi‑differentiable at λ(X̄) for the corresponding sub‑gradient v∈∂θ(λ(X̄)). Moreover, the second sub‑derivative of g can be expressed as d²g( X̄, V )(W)=d²θ( λ(X̄), v )( λ′( X̄; W) ), where λ′( X̄; W) denotes the directional derivative of the eigenvalue map.
A key technical difficulty in the non‑convex setting is that sub‑gradients of θ need not be ordered in the same way as the eigenvalues, which would break compatibility with the eigenvalue map. The authors overcome this by introducing a restricted permutation group Pₙˣ that only permutes entries within each block of equal eigenvalues, and by applying Fan’s inequality to align the vectors appropriately. This alignment ensures that the sub‑differential regularity condition required for the chain rule holds even without convexity.
From this inheritance principle, several important consequences are derived:
-
Proto‑differentiability of sub‑gradient mappings – The proto‑derivative of ∂g at (X̄, V) exists and is obtained directly from the proto‑derivative of ∂θ. This gives a precise linear approximation of the sub‑gradient mapping, which is essential for sensitivity analysis.
-
Directional differentiability of the proximal operator – For the proximal mapping prox_{γg}(X)=argmin_Y{ g(Y)+½γ⁻¹‖Y−X‖² }, the authors prove that it is directionally differentiable for all directions, and its derivative can be expressed via the derivative of the proximal operator of θ acting on the eigenvalues. This result enables the design of efficient Newton‑type algorithms for problems involving spectral regularizers.
-
Generalized twice differentiability – Using the recent notion of “generalized twice differentiable” functions, the paper shows that spectral functions inherit this property from θ, providing a unified framework for establishing second‑order optimality conditions in non‑convex spectral optimization.
The theory is illustrated on several practically relevant examples:
-
Leading eigenvalue function λ₁(X) – By taking θ(x)=x₁, the authors compute the second sub‑derivative of λ₁, revealing its dependence on the eigenvalue gap λ₁−λ₂. This yields explicit curvature information useful for algorithms that target the dominant eigenvalue.
-
MCP (Minimax Concave Penalty) on eigenvalues – The MCP is a non‑convex sparsity‑inducing penalty. The paper shows that the composite function g_MCP(X)=∑_i MCP(λ_i(X)) satisfies the twice epi‑differentiability property, providing the exact second‑order sub‑derivative needed for robust statistical estimators.
-
Regularizers used in robust PCA – Functions such as the sum of eigenvalues beyond a rank‑k truncation (∑{i>k} λ_i) are treated as θ(x)=∑{i>k} x_i. Despite being non‑convex, they meet the sub‑differential regularity condition, and thus their spectral counterparts inherit twice epi‑differentiability.
Overall, the paper extends the second‑order variational analysis of spectral functions beyond the convex regime, offering a systematic tool for deriving curvature information, establishing second‑order optimality conditions, and designing Newton‑type algorithms for a broad class of non‑convex matrix optimization problems. Future directions suggested include exploring broader classes of symmetric functions θ, extending the results to infinite‑dimensional operators, and leveraging the theory to develop accelerated algorithms for large‑scale spectral regularization tasks.
Comments & Academic Discussion
Loading comments...
Leave a Comment