Minimum Distance Summaries for Robust Neural Posterior Estimation
Simulation-based inference (SBI) enables amortized Bayesian inference by first training a neural posterior estimator (NPE) on prior-simulator pairs, typically through low-dimensional summary statistics, which can then be cheaply reused for fast inference by querying it on new test observations. Because NPE is estimated under the training data distribution, it is susceptible to misspecification when observations deviate from the training distribution. Many robust SBI approaches address this by modifying NPE training or introducing error models, coupling robustness to the inference network and compromising amortization and modularity. We introduce minimum-distance summaries, a plug-in robust NPE method that adapts queried test-time summaries independently of the pretrained NPE. Leveraging the maximum mean discrepancy (MMD) as a distance between observed data and a summary-conditional predictive distribution, the adapted summary inherits strong robustness properties from the MMD. We demonstrate that the algorithm can be implemented efficiently with random Fourier feature approximations, yielding a lightweight, model-free test-time adaptation procedure. We provide theoretical guarantees for the robustness of our algorithm and empirically evaluate it on a range of synthetic and real-world tasks, demonstrating substantial robustness gains with minimal additional overhead.
💡 Research Summary
Simulation‑based inference (SBI) has become a cornerstone of modern Bayesian analysis for complex mechanistic simulators, largely because amortized neural posterior estimators (NPEs) can be trained offline on simulated (θ, x) pairs and then queried cheaply at test time. However, NPEs are trained under the prior‑predictive distribution m(x) and consequently are brittle when the true data‑generating process deviates from this distribution—a situation known as model misspecification or simulation gap. Existing robust SBI methods typically intertwine robustness with the inference network: they modify the training objective, require observed data during training, or augment the simulator with explicit error models. This coupling erodes the primary advantage of amortization—reusing a single pretrained NPE across many downstream tasks.
The paper introduces Minimum‑Distance Summaries (MDS), a plug‑in, test‑time adaptation technique that leaves the pretrained NPE qψ(θ | s) untouched and instead adapts the summary statistic s used to query the NPE. The adaptation is driven by minimizing a robust statistical divergence between the empirical distribution of the observed dataset (\tilde{x}{1:N}) and a summary‑conditional data distribution (P{x|s}). The authors choose the Maximum Mean Discrepancy (MMD) with a bounded characteristic kernel as this divergence because MMD is known to possess strong robustness properties under contamination and can be estimated directly from samples without density evaluation.
To make the MMD computation tractable, the authors employ Random Fourier Features (RFF), which approximate shift‑invariant kernels with a finite‑dimensional linear map (z(x)). Under this approximation, the MMD reduces to the Euclidean distance between the mean feature embeddings of the two distributions: \
Comments & Academic Discussion
Loading comments...
Leave a Comment