Jacobian Regularization Stabilizes Long-Term Integration of Neural Differential Equations
Hybrid models and Neural Differential Equations (NDE) are getting increasingly important for the modeling of physical systems, however they often encounter stability and accuracy issues during long-term integration. Training on unrolled trajectories is known to limit these divergences but quickly becomes too expensive due to the need for computing gradients over an iterative process. In this paper, we demonstrate that regularizing the Jacobian of the NDE model via its directional derivatives during training stabilizes long-term integration in the challenging context of short training rollouts. We design two regularizations, one for the case of known dynamics where we can directly derive the directional derivatives of the dynamic and one for the case of unknown dynamics where they are approximated using finite differences. Both methods, while having a far lower cost compared to long rollouts during training, are successful in improving the stability of long-term simulations for several ordinary and partial differential equations, opening up the door to training NDE methods for long-term integration of large scale systems.
💡 Research Summary
This paper tackles a fundamental challenge in Neural Differential Equations (NDEs): the tendency of models trained on short rollout horizons to become unstable when integrated over long time spans. While training with long rollouts can mitigate this issue, the associated computational cost—stemming from back‑propagation through many solver steps—makes it impractical for high‑dimensional or expensive physical simulations. The authors propose a Jacobian‑based regularization framework that stabilizes long‑term integration while retaining the efficiency of short‑rollout training.
The theoretical foundation rests on the relationship between the Jacobian of the learned dynamics (F_\theta) and its Lipschitz constant. Using Grönwall’s lemma, the authors show that the trajectory error grows proportionally to (\exp(L_{F_\theta} t)). Consequently, even a small model‑error (\epsilon_\theta) can lead to divergent trajectories if (L_{F_\theta}) is large. By directly minimizing the discrepancy between the Jacobian of the true dynamics (J_F) and that of the learned model (J_{F_\theta}), the Lipschitz constant of (F_\theta) is forced to stay close to that of the true system (Proposition 2.3).
Computing full Jacobians is prohibitive, so the authors employ the Hutchinson trace estimator to reduce the problem to Jacobian‑vector products (JVPs). JVPs are exactly the directional derivatives of the error function and can be obtained efficiently via forward‑mode automatic differentiation. Two practical regularization losses are derived:
-
Known‑Dynamics (AD) Regularization – When the analytical form of the true dynamics (F) is available, the loss directly penalizes the L2 distance between the JVPs of (F_\theta) and (F) for a set of random directions sampled from a standard normal distribution. This yields an unbiased, low‑variance estimate of the Jacobian norm difference.
-
Unknown‑Dynamics (FD) Regularization – When (F) is not accessible, the authors approximate directional derivatives using finite differences along the observed trajectory. The loss compares the change in the residual ((F_\theta - F)) between successive time steps, normalized by the squared step size. In the limit (\Delta t \to 0), this expression converges to the same quantity as the AD‑based loss, providing an unsupervised way to regularize the Jacobian without any knowledge of the true vector field.
Both regularizations are added to the standard trajectory loss with a weighting hyper‑parameter (\lambda). Experiments are conducted on two ordinary differential equation benchmarks (a nonlinear oscillator and a logistic growth model) and two partial differential equation benchmarks (1‑D Burgers and 2‑D Navier‑Stokes at low resolution). For each case, models trained with short rollouts (5–10 steps) and Jacobian regularization are compared against baseline models trained with long rollouts (≈100 steps) and against unregularized short‑rollout models.
Results demonstrate that Jacobian‑regularized models achieve long‑term trajectory errors comparable to or better than the long‑rollout baselines, while keeping the Lipschitz constant of the learned dynamics close to that of the true system. Energy spectra and conservation properties are preserved in the PDE experiments, indicating that the regularization does not merely damp the dynamics but enforces physically meaningful behavior. Computationally, the AD‑based regularization adds roughly a 20 % overhead due to extra JVP evaluations, whereas the FD‑based version incurs negligible extra cost because it reuses already computed state differences.
The paper discusses trade‑offs: overly strong regularization can suppress model expressivity, leading to larger (\epsilon_\theta); thus careful tuning of (\lambda) is essential. The authors suggest future work on adaptive weighting schemes, direction sampling informed by physics (e.g., flow directions), and scaling the approach to large‑scale climate or fluid‑structure interaction models.
In summary, the work provides a principled, computationally efficient method to bridge the gap between short‑rollout training efficiency and long‑term integration stability in Neural Differential Equations, opening the door to practical deployment of NDEs in large‑scale scientific simulations.
Comments & Academic Discussion
Loading comments...
Leave a Comment