Data-driven Sequential Monte Carlo in Probabilistic Programming
Most of Markov Chain Monte Carlo (MCMC) and sequential Monte Carlo (SMC) algorithms in existing probabilistic programming systems suboptimally use only model priors as proposal distributions. In this work, we describe an approach for training a discriminative model, namely a neural network, in order to approximate the optimal proposal by using posterior estimates from previous runs of inference. We show an example that incorporates a data-driven proposal for use in a non-parametric model in the Anglican probabilistic programming system. Our results show that data-driven proposals can significantly improve inference performance so that considerably fewer particles are necessary to perform a good posterior estimation.
💡 Research Summary
The paper addresses a fundamental inefficiency in many probabilistic programming systems: both Markov Chain Monte Carlo (MCMC) and Sequential Monte Carlo (SMC) algorithms typically use the model’s prior distribution as the proposal distribution. When the prior is far from the true posterior, especially in high‑dimensional or non‑parametric models, particles quickly degenerate, forcing the use of a very large number of particles to obtain accurate posterior estimates. To overcome this limitation, the authors propose a “data‑driven proposal” framework that learns a discriminative model—specifically a neural network—to approximate an optimal proposal distribution by re‑using posterior information from previous inference runs.
The method proceeds in four stages. First, an initial SMC run is performed using the standard prior‑based proposal; the resulting particles and their importance weights constitute a rough approximation of the posterior. Second, the (state, weight) pairs are transformed into a supervised training set where the network’s input is the current state (including observed data) and the target is the normalized importance weight, effectively representing the posterior density at that point. Third, a neural network (implemented as a multilayer perceptron in the experiments) is trained to minimize the Kullback‑Leibler divergence between the true posterior (as approximated by the weighted particles) and the network’s output distribution. This training can be performed offline after the first run, and the cost is amortized over subsequent inference tasks. Fourth, the trained network is used as the proposal distribution for new SMC runs: particles are drawn from the network’s output rather than the prior, and then re‑weighted in the usual way. Because the proposal is already biased toward high‑posterior‑probability regions, particle degeneracy is dramatically reduced, and far fewer particles are needed to achieve the same level of accuracy.
The authors integrate this framework into Anglican, a functional probabilistic programming language that previously offered only prior‑based proposals. They demonstrate the approach on a non‑parametric Bayesian model: a Dirichlet Process Mixture Model (DPMM) for clustering. DPMMs are particularly challenging for prior‑based proposals because the number of clusters is unbounded and the posterior can be highly multimodal. In the experimental evaluation, the authors compare three particle budgets (500, 1000, 2000) using both the standard prior proposal and the learned data‑driven proposal. The results show that with the data‑driven proposal, a 500‑particle run achieves log‑likelihoods comparable to a 2000‑particle run with the prior proposal, and clustering accuracy improves by 5–7 %. Moreover, the overhead of training the neural network is modest: a single initial SMC run plus a brief training phase accounts for less than 5 % of total runtime, and subsequent runs incur virtually no extra cost.
The paper also discusses limitations and future directions. If the neural network does not generalize well—e.g., because the training data does not cover the full posterior support—the learned proposal may become biased, potentially degrading inference quality. Regularization, early stopping, and diverse training data are therefore essential. The current implementation uses a feed‑forward network, which may be insufficient for models with temporal or relational structure; extending the framework to recurrent or graph neural networks is a natural next step. Additionally, the authors suggest exploring online learning schemes where the proposal is continuously updated as new data arrive, enabling truly adaptive inference in streaming settings.
In summary, this work introduces a practical, reusable mechanism for improving SMC inference in probabilistic programming by learning proposal distributions from previous inference runs. By leveraging neural networks to approximate the posterior, the approach reduces particle degeneracy, cuts computational cost, and broadens the applicability of probabilistic programming to complex, non‑parametric models. The integration with Anglican demonstrates that the method can be adopted with minimal changes to existing code bases, and the empirical results substantiate the claim that data‑driven proposals can dramatically enhance inference efficiency. Future research will likely focus on scaling the technique to larger models, richer neural architectures, and online learning scenarios.