SIMSHIFT: A Benchmark for Adapting Neural Surrogates to Distribution Shifts
Neural surrogates for Partial Differential Equations (PDEs) often suffer significant performance degradation when evaluated on problem configurations outside their training distribution, such as new initial conditions or structural dimensions. While Unsupervised Domain Adaptation (UDA) techniques have been widely used in vision and language to generalize across domains without additional labeled data, their application to complex engineering simulations remains largely unexplored. In this work, we address this gap through two focused contributions. First, we introduce SIMSHIFT, a novel benchmark dataset and evaluation suite composed of four industrial simulation tasks spanning diverse processes and physics: hot rolling, sheet metal forming, electric motor design and heatsink design. Second, we extend established UDA methods to state-of-the-art neural surrogates and systematically evaluate them. Extensive experiments on SIMSHIFT highlight the challenges of out-of-distribution neural surrogate modeling, demonstrate the potential of UDA in simulation, and reveal open problems in achieving robust neural surrogates under distribution shifts in industrially relevant scenarios. Our codebase is available at https://github.com/psetinek/simshift
💡 Research Summary
The paper introduces SIMSHIFT, a comprehensive benchmark designed to evaluate the robustness of neural surrogate models for partial differential equations (PDEs) under distribution shifts that commonly occur in industrial engineering simulations. The authors identify a gap in the literature: while unsupervised domain adaptation (UDA) has been extensively studied in computer vision and natural language processing, its application to physics‑based simulations—especially those involving unstructured meshes and high‑dimensional output fields—remains largely unexplored. To address this, the work makes two principal contributions.
First, the authors curate four realistic industrial datasets: hot rolling, sheet metal forming, electric motor design, and heatsink design. These datasets are generated using a mix of commercial (Abaqus) and open‑source (HOTINT, OpenFOAM) solvers, covering both 2‑D and 3‑D problems, with thousands of mesh nodes and hundreds of thousands of samples. Each dataset includes a clear parametric description of the input space (e.g., slab thickness, roll gap, friction coefficient, fin geometry) and a set of output fields (plastic strain, stress components, temperature, velocity, pressure). The authors partition the input parameter space into non‑overlapping source and target domains and define three levels of shift difficulty—easy, medium, hard—by moving along the dominant input parameter. To quantify the true difficulty of each shift, they compute a Proxy A‑Distance (PAD) directly in the output‑field space, providing a more faithful measure of distributional divergence than raw input distances.
Second, the paper extends several state‑of‑the‑art UDA techniques to neural operator architectures (e.g., Fourier Neural Operator, Graph Neural Operator) and a shared conditioning network ϕ. The training objective combines a reconstruction loss on source labels (L_recon) with a domain‑alignment loss (L_DA) that encourages the latent representations of source and target inputs to be similar. The authors evaluate three families of UDA methods: importance‑weighting, statistical discrepancy minimization (MMD, CORAL), and adversarial domain alignment (DANN). Because target labels are unavailable, they also investigate unsupervised model‑selection strategies, including entropy minimization on target inputs, pseudo‑label cross‑validation, and simple source‑validation loss. Their experiments reveal that model‑selection choice can affect target performance by more than 10 % absolute error, underscoring its critical role.
Experimental results show that vanilla neural surrogates experience modest error growth (≈5–7 %) on easy shifts but suffer severe degradation (≥30 % relative error) on hard shifts. Applying UDA consistently reduces target error, with the most pronounced gains (12–15 % relative improvement) on the hardest shifts. Nevertheless, absolute errors often remain above industrial tolerances, indicating that current UDA methods are not yet sufficient for high‑stakes engineering design. The study also demonstrates that naïvely selecting the checkpoint with lowest source loss leads to poor target performance, whereas unsupervised validation metrics yield more reliable selections.
Finally, the SIMSHIFT framework is released as an open‑source, modular benchmarking suite. It allows researchers to plug in new simulators, novel domain‑adaptation algorithms, and alternative model‑selection criteria with minimal effort. By providing both the datasets (hosted on Hugging Face) and the evaluation code, the authors aim to catalyze systematic research on robust neural surrogates for engineering applications, bridging the gap between machine‑learning advances and real‑world industrial deployment.
Comments & Academic Discussion
Loading comments...
Leave a Comment