From SGD to Spectra: A Theory of Neural Network Weight Dynamics

From SGD to Spectra: A Theory of Neural Network Weight Dynamics
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Deep neural networks have revolutionized machine learning, yet their training dynamics remain theoretically unclear-we develop a continuous-time, matrix-valued stochastic differential equation (SDE) framework that rigorously connects the microscopic dynamics of SGD to the macroscopic evolution of singular-value spectra in weight matrices. We derive exact SDEs showing that squared singular values follow Dyson Brownian motion with eigenvalue repulsion, and characterize stationary distributions as gamma-type densities with power-law tails, providing the first theoretical explanation for the empirically observed ‘bulk+tail’ spectral structure in trained networks. Through controlled experiments on transformer and MLP architectures, we validate our theoretical predictions and demonstrate quantitative agreement between SDE-based forecasts and observed spectral evolution, providing a rigorous foundation for understanding why deep learning works.


💡 Research Summary

This paper introduces a rigorous continuous‑time, matrix‑valued stochastic differential equation (SDE) framework that bridges the microscopic updates of stochastic gradient descent (SGD) with the macroscopic evolution of singular‑value spectra in neural‑network weight matrices. The authors begin by formulating the discrete SGD step as a differential equation
  dW = −η ∇_W L dt + √(ηD) dW̃,
where η is the learning rate, D an effective diffusion constant, and dW̃ an isotropic matrix‑valued Wiener process. Two regimes are examined.

In the “negligible‑gradient” regime (early training), they perform an SVD of W and apply Itô calculus to each singular value σ_k. The resulting dynamics are captured by Theorem 3.1:

  dσ_k =


Comments & Academic Discussion

Loading comments...

Leave a Comment