Entropic Mirror Monte Carlo

Entropic Mirror Monte Carlo
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Importance sampling is a Monte Carlo method which designs estimators of expectations under a target distribution using weighted samples from a proposal distribution. When the target distribution is complex, such as multimodal distributions in highdimensional spaces, the efficiency of importance sampling critically depends on the choice of the proposal distribution. In this paper, we propose a novel adaptive scheme for the construction of efficient proposal distributions. Our algorithm promotes efficient exploration of the target distribution by combining global sampling mechanisms with a delayed weighting procedure. The proposed weighting mechanism plays a key role by enabling rapid resampling in regions where the proposal distribution is poorly adapted to the target. Our sampling algorithm is shown to be geometrically convergent under mild assumptions and is illustrated through various numerical experiments.


💡 Research Summary

The paper introduces Entropic Mirror Monte Carlo (EM2C), a novel adaptive importance‑sampling framework designed to efficiently approximate expectations under a complex target distribution π, especially when π is multimodal and high‑dimensional. Classical importance sampling suffers when the proposal µ poorly overlaps π, leading to exponential growth of the mean‑squared error with KL or Rényi divergences. Existing adaptive schemes (e.g., Cappé, Cornuet) improve the proposal iteratively but still struggle to scale with dimension and to explore distant modes.

EM2C builds on two ideas. First, Entropic Mirror Descent (EMD) defines a deterministic mapping
Fₑ(µ) ∝ (dπ/dµ)^ε · µ,
which contracts the KL divergence by a factor ρ = 1 − ε (ε∈(0,1]). This guarantees geometric convergence of KL(π‖µ_t) to zero if the mapping could be applied exactly. However, Fₑ is intractable: Monte‑Carlo approximations rely on weighted samples from µ, and when µ fails to cover some high‑probability regions of π, the approximation may miss those regions entirely, breaking convergence.

To remedy this, the authors introduce a Markov transition kernel Kπ that can move particles into unexplored regions. They define a mixed update
Fₑₘ(µ; λ, Kπ, ε) = λ Fₑ(µ) + (1 − λ) F_{Kπ}(µ),
where F_{Kπ}(µ) ∝ (dπ/dµ)^ε · µKπ and λ∈(0,1] balances contraction (first term) against exploration (second term). Lemma 1 shows the mixed mapping is a valid probability measure under mild absolute‑continuity assumptions, and Proposition 1 proves that if Kπ is π‑invariant, the KL contraction property is preserved: KL(π‖µ_{t+1}) ≤ (1 − ε) KL(π‖µ_t). Even when ε = 1, an appropriate λ‑schedule yields the same geometric decay.

In practice the authors employ the Unadjusted Langevin Algorithm (ULA) as Kπ, which is not π‑invariant but possesses a unique invariant distribution π_γ that is close to π when the step size γ is small. Under standard log‑Sobolev (LSI) and L‑Lipschitz smoothness assumptions on log π, Theorem 2 establishes a total‑variation bound
‖π − µ_t‖_{TV} ≤ (1 − ε)^{t/2} · C₁ + C₂ √γ,
showing that the bias introduced by ULA is controlled and does not destroy the geometric contraction induced by the entropic mirror component.

Algorithm 1 (EM2C) implements the mixed update via particles. At iteration t, N samples X_i are drawn from the current proposal ˜µ_t. Each X_i is propagated through the Markov kernel K to obtain Y_i. Importance weights are computed as ω_i ∝ (dπ/d˜µ_t)(X_i)^ε and ϖ_i ∝ (dπ/d˜µ_t)(Y_i)^ε. A resampling step draws new particles Z_i from the mixture λ_t ∑ ω_j δ_{X_j} + (1 − λ_t) ∑ ϖ_j δ_{Y_j}. Finally, the algorithm fits a parametric family {µ_θ} to the empirical distribution of Z_i by minimizing KL(·‖µ_θ), yielding the updated proposal ˜µ_{t+1}. The λ_t schedule can be chosen to emphasize exploration early (small λ) and contraction later (λ → 1).

Theoretical contributions include: (i) a proof that the mixed mapping retains KL contraction under any π‑invariant kernel; (ii) a quantitative analysis of the bias when using a non‑invariant kernel (ULA) under LSI and smoothness conditions; (iii) a convergence guarantee in total variation that combines geometric decay with a controllable discretization error.

Empirical evaluation covers three settings: (a) a 2‑D bimodal Gaussian mixture where the initial proposal captures only one mode; EM2C rapidly discovers the second mode via the Langevin moves and converges to the correct mixture weights. (b) a 20‑D multimodal Bernoulli–Gaussian mixture, where EM2C outperforms adaptive importance sampling (Cappé) and plain ULA in effective sample size (ESS) and mean‑squared error. (c) a Bayesian logistic regression on a real dataset, demonstrating comparable posterior estimates to Hamiltonian Monte Carlo but with ~30 % less computational cost.

Overall, EM2C offers a principled way to combine the strong contraction properties of entropic mirror descent with the global exploration capability of Markov kernels. It provides geometric convergence guarantees even when the exploratory kernel is biased, and its particle‑based implementation is compatible with both parametric and non‑parametric proposal families. The method is thus a promising tool for high‑dimensional, multimodal Bayesian inference, rare‑event simulation, and other contexts where traditional importance sampling or MCMC struggle.


Comments & Academic Discussion

Loading comments...

Leave a Comment