Statistical Learning Analysis of Physics-Informed Neural Networks
We study the training and performance of physics-informed learning for initial and boundary value problems (IBVP) with physics-informed neural networks (PINNs) from a statistical learning perspective. Specifically, we restrict ourselves to parameterizations with hard initial and boundary condition constraints and reformulate the problem of estimating PINN parameters as a statistical learning problem. From this perspective, the physics penalty on the IBVP residuals can be better understood not as a regularizing term bus as an infinite source of indirect data, and the learning process as fitting the PINN distribution of residuals $p(y \mid x, t, w) q(x, t) $ to the true data-generating distribution $δ(0) q(x, t)$ by minimizing the Kullback-Leibler divergence between the true and PINN distributions. Furthermore, this analysis show that physics-informed learning with PINNs is a singular learning problem, and we employ singular learning theory tools, namely the so-called Local Learning Coefficient (Lau et al., 2025) to analyze the estimates of PINN parameters obtained via stochastic optimization for a heat equation IBVP. Finally, we discuss implications of this analysis on the quantification of predictive uncertainty of PINNs and the extrapolation capacity of PINNs.
💡 Research Summary
The paper presents a statistical‑learning‑theoretic reinterpretation of physics‑informed neural networks (PINNs) for solving initial‑ and boundary‑value problems (IBVPs). By restricting the network architecture to hard‑constraint parameterizations that satisfy the initial and boundary conditions for any choice of parameters, the authors recast the PINN training objective as a conventional statistical estimation problem. The residual loss, defined as the mean squared PDE residual evaluated at a set of n collocation points, is shown to be equivalent to minimizing the Kullback‑Leibler (KL) divergence between the true data‑generating distribution (q(x,t,y)=\delta(0)q(x,t)) (where y denotes the PDE residual) and the model‑induced distribution (p(x,t,y|w)=p(y|x,t,w)q(x,t)). In this view, the “physics penalty” is not a regularizer but an infinite source of indirect data: each collocation point provides a labeled pair ((x,t, y=0)).
Recognizing that deep neural networks are singular models—i.e., the parameter‑to‑distribution map is not one‑to‑one and the Fisher information matrix is not everywhere positive definite—the authors argue that PINNs inherit this singularity. Consequently, the loss landscape does not consist of isolated quadratic minima but of broad, flat regions that persist even as the number of collocation points grows. This explains empirical observations that increasing n smooths the loss surface without eliminating the multiplicity of near‑optimal solutions.
To quantify the flatness of these regions, the paper adopts the Local Learning Coefficient (LLC) from singular learning theory (SLT). The LLC (\lambda(w^\star)) characterizes how the volume of parameter space with loss within (\epsilon) of a local minimum (w^\star) scales as (\exp{\lambda(w^\star)}) when (\epsilon\to0). Smaller (\lambda) indicates a flatter region and higher parameter uncertainty. The authors derive an estimator for (\lambda) using a tempered posterior (\exp{-n\beta L_n(w)-\gamma|w-w^\star|^2}) with inverse temperature (\beta=1/\log n) and a Gaussian prior of variance (\gamma^{-1}). By sampling from this posterior with the No‑U‑Turn Sampler (NUTS) and computing the expectation of the loss, they obtain (\hat\lambda_\gamma(w^\star)=n^\beta\big(E_{w|w^\star,\beta,\gamma}
Comments & Academic Discussion
Loading comments...
Leave a Comment