Predicting the outputs of finite deep neural networks trained with noisy gradients

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

💡 Research Summary

This paper addresses a gap in the theoretical understanding of deep neural networks (DNNs) trained with stochastic gradients. While recent works have shown that infinitely wide DNNs trained by gradient flow converge to Gaussian processes (GPs) – either the Neural Tangent Kernel (NTK) or the Neural Network Gaussian Process (NNGP) – these results either assume “lazy learning” (weights stay near initialization) or deterministic training, which do not reflect practical training regimes that involve weight decay, injected noise, and finite learning rates.

The authors propose a specific training protocol: full‑batch gradient descent with weight decay and additive white Gaussian noise (Langevin dynamics). In the limit of infinitesimal step size the dynamics become a continuous‑time Langevin equation whose stationary distribution is a Gibbs measure
(P(w)\propto\exp

Predicting the outputs of finite deep neural networks trained with noisy gradients

💡 Research Summary

Comments & Academic Discussion

Leave a Comment