Loss Landscape Geometry and the Learning of Symmetries: Or, What Influence Functions Reveal About Robust Generalization
We study how neural emulators of partial differential equation solution operators internalize physical symmetries by introducing an influence-based diagnostic that measures the propagation of parameter updates between symmetry-related states, defined as the metric-weighted overlap of loss gradients evaluated along group orbits. This quantity probes the local geometry of the learned loss landscape and goes beyond forward-pass equivariance tests by directly assessing whether learning dynamics couple physically equivalent configurations. Applying our diagnostic to autoregressive fluid flow emulators, we show that orbit-wise gradient coherence provides the mechanism for learning to generalize over symmetry transformations and indicates when training selects a symmetry compatible basin. The result is a novel technique for evaluating if surrogate models have internalized symmetry properties of the known solution operator.
💡 Research Summary
The paper addresses a fundamental gap in the evaluation of neural network emulators for partial differential equation (PDE) solvers: existing tests focus on forward‑pass equivariance, i.e., whether the model’s output is unchanged under symmetry transformations of the input. Such tests, however, do not reveal whether the learning dynamics themselves respect the underlying physical symmetries. To fill this gap, the authors introduce an influence‑function based diagnostic that quantifies the propagation of parameter updates between symmetry‑related states. The core quantity is a metric‑weighted overlap of loss gradients evaluated along group orbits, formally expressed as a Lie derivative of the loss along gradient vector fields weighted by the regularized neural‑tangent‑kernel (NTK) metric χ. In practice, for a training example x and its transformed counterpart g x, the influence L_V C(g x) = (∂µ C(g x)) χ µν (−∂ν C(x)) measures whether the update induced by x reduces (or increases) the loss on g x. This “orbit‑wise gradient coherence” serves as a direct probe of the local geometry of the learned loss landscape.
The authors evaluate the diagnostic on two families of PDE data: (1) two‑dimensional compressible Euler flows with three classes of Riemann‑type initial conditions, and (2) Navier‑Stokes flows generated from three distinct initial‑condition families (BB, Gaussian, Sine). Two neural architectures are compared: a conventional UNet (≈13 M parameters) and a Vision Transformer (ViT, ≈5 M parameters). Both models are trained autoregressively using the Adam optimizer (lr = 5 × 10⁻⁴, weight decay = 10⁻⁶) and a scaled mean‑squared‑error (SMSE) loss that normalizes each channel by its RMS. Training proceeds on two A100 GPUs with three random seeds, and results are reported with median and quantile ranges across seeds and mini‑batches.
After training, the authors compute influence matrices for six test mini‑batches, standardizing them against the empirical variance of stochastic perturbations so that a value of one corresponds to pure noise. They then examine two complementary metrics: (a) forward‑pass equivariance error, i.e., the increase in SMSE when applying a symmetry transformation to the input, and (b) the magnitude of influence matrix entries, i.e., the degree of gradient alignment between original and transformed inputs.
Key findings emerge from experiments on the dihedral group D₄ (four rotations and reflections). Both UNet and ViT achieve low SMSE and high influence for the identity, 180° rotation, and certain reflection‑combined rotations, but they catastrophically fail on the remaining 90° rotations and pure reflections: SMSE inflates by up to four orders of magnitude and the corresponding influence values collapse toward zero. The authors attribute this to a pronounced directional bias in the Navier‑Stokes data (vorticity‑dominated structures) that is not invariant under those transformations. Importantly, the failure is not merely a forward‑pass issue; the loss landscape has converged to a basin where gradients for the problematic group elements are large but do not coherently accumulate across the orbit, resulting in near‑zero cross‑influence. This demonstrates that data‑induced anisotropies can be absorbed into the local geometry of the loss, effectively breaking symmetry at the level of learning dynamics.
In translation‑group experiments, UNet exhibits nearly uniform positive influence across all translations, indicating that updates are consistently constructive and that there is no type‑II gradient misalignment. ViT, by contrast, concentrates influence on a subset of translations, yielding stronger updates for privileged directions but weaker or absent influence for others. This highlights a trade‑off: strong architectural inductive bias (e.g., convolutional equivariance) enforces uniform coupling but can restrict the space of admissible update directions, potentially slowing convergence; a more flexible architecture can converge faster but may internalize only a partial symmetry structure.
The paper frames these observations in terms of “symmetry‑compatible basins” of the loss landscape. When orbit‑wise gradient coherence is high, the basin respects the physical symmetries, and the model generalizes robustly to symmetry‑transformed inputs. When coherence is low, the basin encodes a symmetry‑breaking geometry, leading to zero‑shot generalization gaps despite high pointwise accuracy on the training distribution.
Overall, the contribution is threefold: (1) an interpretability tool that leverages influence functions and the NTK metric to probe learning dynamics beyond forward‑pass behavior; (2) empirical evidence that symmetry learning can be understood as basin selection governed by gradient coherence; and (3) a practical diagnostic for scientific machine learning that can guide architecture and loss‑design choices toward models that truly internalize the symmetries of the underlying physical operators. The work bridges interpretability, generalization theory, and physics‑informed deep learning, offering a principled method to assess whether surrogate models have genuinely learned the symmetry properties of the solution operators they emulate.
Comments & Academic Discussion
Loading comments...
Leave a Comment