Direct Learning of Calibration-Aware Uncertainty for Neural PDE Surrogates
Neural PDE surrogates are often deployed in data-limited or partially observed regimes where downstream decisions depend on calibrated uncertainty in addition to low prediction error. Existing approaches obtain uncertainty through ensemble replication, fixed stochastic noise such as dropout, or post hoc calibration. Cross-regularized uncertainty learns uncertainty parameters during training using gradients routed through a held-out regularization split. The predictor is optimized on the training split for fit, while low-dimensional uncertainty controls are optimized on the regularization split to reduce train-test mismatch, yielding regime-adaptive uncertainty without per-regime noise tuning. The framework can learn continuous noise levels at the output head, within hidden features, or within operator-specific components such as spectral modes. We instantiate the approach in Fourier Neural Operators and evaluate on APEBench sweeps over observed fraction and training-set size. Across these sweeps, the learned predictive distributions are better calibrated on held-out splits and the resulting uncertainty fields concentrate in high-error regions in one-step spatial diagnostics.
💡 Research Summary
The paper tackles a critical shortcoming of current uncertainty quantification methods for neural PDE surrogates: they either rely on fixed stochastic mechanisms (e.g., dropout, ensembles) or on post‑hoc calibration, both of which struggle to adapt to regime‑specific data scarcity and observation sparsity. To address this, the authors introduce a Cross‑Regularized (XReg) learning framework that treats uncertainty parameters as first‑class learnable variables, updating them directly from gradients computed on a held‑out regularization set.
The model is decomposed into three groups of parameters: deterministic backbone weights θ, predictive‑noise parameters ψ (which capture aleatoric residual variance needed for fitting), and generalization‑noise parameters ρ (which act as an epistemic regularizer). Training proceeds in two alternating loops. In the “train” loop, (θ, ψ) are optimized on the main training data D_train using a Monte‑Carlo marginal likelihood loss L_train; ρ is frozen. In the “regularization” loop, the current (θ, ψ) are kept fixed while ρ is updated on a separate regularization set D_reg using a loss L_reg that measures the mismatch between the model’s predictive distribution and the held‑out targets. Two forms of L_reg are explored: a mixture negative‑log‑likelihood (averaging over S Monte‑Carlo samples) and a moment‑matched NLL that collapses the mixture to a single Gaussian. This asymmetry forces ρ to grow when the train‑reg gap is large (e.g., under high masking or low sample count) and shrink when the model already generalizes well.
Uncertainty injection can be placed at various points in the architecture. The authors instantiate three variants within a Fourier Neural Operator (FNO) backbone: (1) output‑head scaling where σ_pred and σ_gen are learned directly, (2) internal‑feature scaling where multiplicative Gaussian perturbations are applied to selected latent blocks, and (3) spectral‑mode scaling where selected Fourier channels receive noise. This flexibility avoids committing to a Bayesian weight posterior while still allowing a clear operational split between predictive (aleatoric) and generalization (epistemic) uncertainty.
Empirical evaluation is performed on the APEBench benchmark, sweeping both the observed‑fraction (0.4, 0.6, 0.8, 1.0) and the number of training trajectories. Metrics include one‑step negative log‑likelihood (NLL), mixture Expected Calibration Error (ECE), and spatial diagnostics that compare per‑pixel error to predicted uncertainty. Across all regimes, XReg consistently achieves lower ECE than MC dropout (fixed p = 0.1) and a 3‑member deep ensemble, while maintaining comparable or better NLL. The calibration advantage is most pronounced when only 30 % of the field is observed, confirming that the regularization‑driven ρ adapts to data scarcity. Visualizations show that high‑error regions are highlighted by elevated uncertainty fields, demonstrating spatially selective uncertainty allocation.
A further case study uses an Optimal Transport Neural Operator (OTNO) to predict pressure on a 3‑D car surface. With an extreme data‑limited split (30 training samples, 50 regularization samples), XReg again produces uncertainty bands that widen precisely around regions of large prediction error, and the learned internal‑layer ρ exhibits non‑uniform growth across layers, indicating that the model automatically discovers where regularization is most needed.
Computational overhead is modest: regularization updates are performed every k_reg steps (k_reg = 5 in the main experiments), leading to roughly a 20 % increase in training time compared to standard training. Ablation studies confirm that the benefit stems from the gradient routing rather than simply adding extra parameters.
In summary, the work proposes a novel paradigm where uncertainty is not a post‑hoc artifact but an integral part of the training objective, driven by a held‑out regularization loss. By learning both predictive and generalization noise scales directly from the train‑reg mismatch, the method yields regime‑adaptive, well‑calibrated uncertainty estimates for neural PDE surrogates, making them more reliable for downstream decision‑making under limited data conditions.
Comments & Academic Discussion
Loading comments...
Leave a Comment