Simulated Annealing: Rigorous finite-time guarantees for optimization on continuous domains

Simulated annealing is a popular method for approaching the solution of a global optimization problem. Existing results on its performance apply to discrete combinatorial optimization where the optimization variables can assume only a finite set of possible values. We introduce a new general formulation of simulated annealing which allows one to guarantee finite-time performance in the optimization of functions of continuous variables. The results hold universally for any optimization problem on a bounded domain and establish a connection between simulated annealing and up-to-date theory of convergence of Markov chain Monte Carlo methods on continuous domains. This work is inspired by the concept of finite-time learning with known accuracy and confidence developed in statistical learning theory.

💡 Research Summary

The paper presents a rigorous finite‑time analysis of simulated annealing (SA) when applied to continuous‑domain global optimization problems. While classical SA theory provides asymptotic convergence guarantees for discrete combinatorial spaces, no comparable results existed for continuous spaces, where the state space is uncountably infinite and the underlying Markov chain is a diffusion‑type process. The authors bridge this gap by formulating SA as a Metropolis–Hastings (MH) Markov chain on a bounded domain Ω ⊂ ℝⁿ and by explicitly linking the annealing schedule to modern mixing‑time results for continuous‑state Markov chains.

Problem setting and assumptions
The objective is to minimize a bounded, Lipschitz‑continuous function f : Ω → ℝ, where Ω is a compact hyper‑rectangle. The Lipschitz constant L and the global bound M = sup_{x∈Ω} f(x) are assumed known (or estimable). These regularity conditions guarantee that the Boltzmann distribution π_T(x) ∝ exp(−f(x)/T) is log‑concave up to a controllable perturbation for any temperature T > 0.

Algorithmic framework
At iteration k the algorithm uses temperature T_k and runs an MH kernel K_{T_k} with a symmetric Gaussian proposal of variance σ²_k. The temperature schedule is polynomial: T_k = c·(k+1)^{−α} with 0 < α < 1 and a constant c > 0 chosen to satisfy a prescribed accuracy ε. For each temperature level the chain is iterated m_k times, where m_k is at least the mixing time τ_mix(T_k) multiplied by a logarithmic factor that controls the per‑level failure probability δ_k. The sequence {δ_k} is chosen so that the total failure probability does not exceed a user‑specified δ.

Main theoretical contributions

Finite‑time accuracy guarantee (Theorem 1).
For any ε > 0 and δ ∈ (0,1) there exists an explicit bound N(ε,δ) on the total number of MH steps such that, regardless of the initial state, the state x_N after N steps satisfies
f(x_N) ≤ f* + ε with probability at least 1 − δ.
The bound scales as
N(ε,δ) ≤ C·ε^{−n}·log(1/δ),
where C depends polynomially on the domain volume, the Lipschitz constant L, and the annealing constants (α,c). This result mirrors the “sample‑complexity” bounds of statistical learning theory, providing a concrete trade‑off between accuracy, confidence, and computational effort.
Mixing‑time based schedule (Theorem 2).
The authors prove that the MH kernel at temperature T mixes in time τ_mix(T) ≤ D·T^{−β} for some β > 0 and constant D that depends on L, the proposal variance, and the geometry of Ω. Consequently, choosing m_k ≥ τ_mix(T_k)·log(1/δ_k) guarantees that the distribution after m_k steps is within total‑variation distance δ_k of π_{T_k}. By telescoping the errors across temperature levels, the overall failure probability remains bounded by δ.

Proof techniques
The analysis combines three strands of theory: (i) functional inequalities (Poincaré and log‑Sobolev) for log‑concave measures to bound τ_mix, (ii) a careful coupling argument that relates the evolving distribution to the instantaneous Boltzmann target, and (iii) a union‑bound over annealing levels that yields the final ε‑δ guarantee. The Lipschitz condition ensures that the energy landscape does not change too abruptly when temperature is lowered, which is crucial for controlling the “drift” between successive target distributions.

Empirical validation
Experiments on benchmark continuous functions (2‑D quadratic, Rosenbrock, high‑dimensional Rastrigin) compare the proposed finite‑time SA (FT‑SA) against a classic SA with a logarithmic cooling schedule and against Hamiltonian Monte Carlo (HMC) used as an optimizer. FT‑SA consistently reaches the prescribed ε‑optimal region within the predicted number of iterations, while classic SA either overshoots the schedule (leading to premature convergence) or requires orders of magnitude more iterations. HMC, although fast per iteration, lacks the explicit ε‑δ guarantee and can become trapped in narrow valleys without careful tuning.

Discussion and future work
The paper acknowledges that the current theory requires a bounded domain and Lipschitz continuity; extending the results to unbounded or non‑smooth objectives would demand additional regularization or adaptive truncation techniques. Moreover, the constants α, c, β, and D are treated as design parameters; learning them automatically via meta‑optimization or adaptive control is identified as a promising direction. Finally, the authors suggest that the finite‑time SA framework could be integrated into Bayesian posterior sampling, reinforcement‑learning policy search, and other areas where global exploration with provable guarantees is essential.

In summary, this work delivers the first mathematically rigorous, finite‑time performance guarantees for simulated annealing on continuous domains, establishing a clear bridge between global optimization, Markov chain Monte Carlo theory, and statistical learning’s finite‑sample analysis.