Do physics-informed neural networks (PINNs) need to be deep? Shallow PINNs using the Levenberg-Marquardt algorithm
This work investigates the use of shallow physics-informed neural networks (PINNs) for solving forward and inverse problems of nonlinear partial differential equations (PDEs). By reformulating PINNs as nonlinear systems, the Levenberg-Marquardt (LM) algorithm is employed to efficiently optimize the network parameters. Analytical expressions for the neural network derivatives with respect to the input variables are derived, enabling accurate and efficient computation of the Jacobian matrix required by LM. The proposed approach is tested on several benchmark problems, including the Burgers, Schrödinger, Allen-Cahn, and three-dimensional Bratu equations. Numerical results demonstrate that LM significantly outperforms BFGS in terms of convergence speed, accuracy, and final loss values, even when using shallow network architectures with only two hidden layers. These findings indicate that, for a wide class of PDEs, shallow PINNs combined with efficient second-order optimization methods can provide accurate and computationally efficient solutions for both forward and inverse problems.
💡 Research Summary
This paper revisits the prevailing assumption that deep neural networks are required for high‑accuracy physics‑informed neural networks (PINNs). The authors propose to treat PINNs not as a generic loss‑minimization problem but as a nonlinear system F(W)=0, where F(W) collects the pointwise PDE residuals f_i(W)=u_t+N(u) at a set of collocation points. By reformulating the training task in this way, the classical Levenberg‑Marquardt (LM) algorithm—originally designed for nonlinear least‑squares—can be applied directly.
The LM method updates parameters W through the linear system (JᵀJ+λI)ΔW = –Jᵀr, where J is the Jacobian of the residual vector F with respect to all network weights and biases, r = F(W) is the residual, and λ is a damping factor that interpolates between Gauss‑Newton and gradient‑descent steps. A major obstacle in using LM for PINNs is the efficient computation of J. The authors address this by deriving closed‑form analytical expressions for the network output ũ(x,t,W) and its first‑ and second‑order derivatives with respect to the inputs (t, x). The network architecture is a feed‑forward net with two hidden layers (sizes m₁ and m₂) and a single scalar output. Using chain‑rule expansions, they obtain explicit formulas for ∂ũ/∂W, ∂ũ_t/∂W, ∂ũ_x/∂W, and ∂ũ_xx/∂W. These formulas are vectorized, allowing the Jacobian to be assembled efficiently even for tens of thousands of collocation points.
Boundary and initial conditions are enforced analytically by constructing the final solution as u = ũ·p + q, where p and q are known functions that satisfy the prescribed conditions. This embedding ensures that every weight influences both the solution and its derivatives, preserving the full Jacobian structure.
The methodology is tested on four benchmark nonlinear PDEs: (1) the one‑dimensional Burgers equation, (2) the nonlinear Schrödinger equation, (3) the Allen‑Cahn equation, and (4) the three‑dimensional Bratu problem. For each PDE the authors solve both forward problems (computing the field) and inverse problems (identifying unknown coefficients such as viscosity or reaction rates). Collocation points range from 10⁴ to 10⁵, and the shallow network typically uses m₁=m₂≈50, resulting in a total parameter count of only a few hundred.
Numerical results show that LM converges dramatically faster than limited‑memory BFGS (L‑BFGS). In forward simulations LM reaches a loss below 10⁻⁸ within 15–30 iterations, whereas L‑BFGS often requires several hundred iterations and stalls at loss levels of 10⁻⁴–10⁻⁵. In inverse settings LM recovers unknown parameters with relative errors below 0.5 %, outperforming L‑BFGS by a factor of two or more. Accuracy, measured by L₂ error of the reconstructed field, is consistently lower for LM across all four PDEs. Memory consumption is also reduced because the Jacobian is built analytically rather than via automatic differentiation.
A comparison with deeper networks (5–10 hidden layers, thousands of parameters) reveals that the shallow LM‑based PINNs achieve comparable or superior accuracy, indicating that depth is not a prerequisite for expressive power when a second‑order optimizer is employed. The authors argue that LM’s ability to take large, curvature‑aware steps early in training avoids the oscillations and slow convergence typical of first‑order methods such as Adam.
Key contributions of the paper are: (1) Recasting PINNs as a nonlinear least‑squares system, enabling the use of classical second‑order solvers; (2) Providing explicit analytical formulas for network derivatives and the Jacobian, bypassing the need for automatic differentiation; (3) Demonstrating that LM dramatically outperforms L‑BFGS and first‑order optimizers in both speed and final error; (4) Showing that shallow networks with only two hidden layers are sufficient for high‑fidelity solutions of a broad class of nonlinear PDEs, both forward and inverse.
The findings suggest a new design paradigm for PINNs: prioritize accurate Jacobian computation and second‑order optimization over network depth. This approach yields a computationally efficient, memory‑light framework that can handle high‑dimensional, nonlinear PDEs without resorting to deep architectures, potentially broadening the applicability of PINNs in scientific computing and engineering simulation.
Comments & Academic Discussion
Loading comments...
Leave a Comment