The Computational Complexity of Estimating Convergence Time

An important problem in the implementation of Markov Chain Monte Carlo algorithms is to determine the convergence time, or the number of iterations before the chain is close to stationarity. For many Markov chains used in practice this time is not known. Even in cases where the convergence time is known to be polynomial, the theoretical bounds are often too crude to be practical. Thus, practitioners like to carry out some form of statistical analysis in order to assess convergence. This has led to the development of a number of methods known as convergence diagnostics which attempt to diagnose whether the Markov chain is far from stationarity. We study the problem of testing convergence in the following settings and prove that the problem is hard in a computational sense: Given a Markov chain that mixes rapidly, it is hard for Statistical Zero Knowledge (SZK-hard) to distinguish whether starting from a given state, the chain is close to stationarity by time t or far from stationarity at time ct for a constant c. We show the problem is in AM intersect coAM. Second, given a Markov chain that mixes rapidly it is coNP-hard to distinguish whether it is close to stationarity by time t or far from stationarity at time ct for a constant c. The problem is in coAM. Finally, it is PSPACE-complete to distinguish whether the Markov chain is close to stationarity by time t or far from being mixed at time ct for c at least 1.

💡 Research Summary

The paper investigates the computational difficulty of determining when a Markov chain Monte Carlo (MCMC) algorithm has effectively converged to its stationary distribution. While practitioners routinely employ convergence diagnostics such as Gelman‑Rubin statistics, effective sample size estimates, or autocorrelation analyses, the theoretical underpinnings of these methods are weak: there is no guarantee that a practical algorithm can reliably decide whether the chain is “close enough” to stationarity after a given number of steps.

To formalize the problem, the authors consider a rapidly mixing Markov chain (P) on a finite state space and a distinguished start state (x). For a fixed error parameter (\varepsilon > 0) and a constant (c > 1), they define two decision problems:

Gap‑Mixing Decision (GMD) – Given (t), decide whether the total variation distance between the distribution of the chain after (t) steps, (P^{t}(x,\cdot)), and the stationary distribution (\pi) is at most (\varepsilon) (the “close” case) or at least (2\varepsilon) after (ct) steps (the “far” case).
Generalized Gap‑Mixing (GGM) – The same as GMD but with the “far” case defined as distance at least (\delta) for some constant (\delta) (or simply “not mixed”) at time (ct).

The paper’s main contributions are three hardness results for these decision problems, together with matching upper‑bound classifications.

Result 1 – SZK‑hardness and AM∩coAM containment.
The authors reduce the canonical SZK‑complete problem Statistical Difference to GMD. Given two efficiently samplable distributions (D_{0}, D_{1}), they construct a Markov chain whose transition matrix encodes the sampling procedures. The distance between (P^{t}(x,\cdot)) and (\pi) after (t) steps mirrors the statistical distance between (D_{0}) and (D_{1}). Consequently, any polynomial‑time algorithm that solves GMD would solve Statistical Difference, implying SZK ⊆ P. Conversely, they show GMD lies in AM∩coAM by designing an Arthur‑Merlin protocol where the prover supplies a short description of the transition matrix and the verifier estimates total variation distance via random sampling. This places the problem in a low‑level interactive‑proof class, but still suggests that a deterministic polynomial‑time solution is unlikely.

Result 2 – coNP‑hardness and coAM containment.
A reduction from UNSAT (the complement of SAT) is presented. For any Boolean formula (\phi), they build a Markov chain that mixes rapidly if (\phi) is unsatisfiable, but retains a “sticky” state that prevents mixing within (ct) steps if (\phi) is satisfiable. The construction uses gadgets that encode clause satisfaction as transition probabilities, ensuring that the total variation distance after (ct) steps is large precisely when a satisfying assignment exists. Hence, deciding the “far” case is at least as hard as coNP. The authors also give a coAM protocol: the verifier can be convinced that the chain is far from stationarity by a prover who supplies a witness assignment, while the complement case can be verified by sampling.

Result 3 – PSPACE‑completeness for arbitrary constants (c \ge 1).
The strongest result shows that when the gap parameter (c) is allowed to be any constant ≥ 1, the decision problem becomes PSPACE‑complete. The reduction is from Quantified Boolean Formula (QBF), a canonical PSPACE‑complete problem. Variables and quantifier alternations are simulated by layers of the Markov chain; the transition probabilities are set so that the chain mixes within (t) steps iff the QBF evaluates to true, and fails to mix within (ct) steps otherwise. This establishes PSPACE‑hardness. Membership in PSPACE follows from a straightforward simulation of the chain for (ct) steps using only polynomial space, because the state space can be traversed implicitly.

Together, these three results delineate a hierarchy of hardness: even under the optimistic assumption that the chain mixes rapidly (i.e., has polynomial mixing time), distinguishing a “good” convergence time from a “bad” one is SZK‑hard, coNP‑hard, or PSPACE‑complete depending on how the gap is parameterized. The upper‑bound inclusions (AM∩coAM, coAM, PSPACE) show that the problems are not beyond known interactive‑proof frameworks, but they remain intractable for deterministic polynomial‑time algorithms unless major complexity‑class collapses occur.

Implications for practice.
The findings explain why convergence diagnostics lack rigorous guarantees: they are attempting to solve a problem that is provably hard in the worst case. The paper suggests that reliable diagnostics must either (a) exploit additional structure of the specific chain (e.g., log‑concave target distributions, spectral gap bounds, coupling arguments) or (b) accept probabilistic or heuristic assurances without worst‑case guarantees. The authors also point to future research directions, such as identifying subclasses of Markov chains where the gap‑mixing decision becomes tractable, or developing average‑case analyses that bypass the worst‑case hardness demonstrated here.

In summary, the paper provides a comprehensive complexity‑theoretic foundation for the longstanding empirical difficulty of estimating MCMC convergence times, establishing that the problem is computationally hard across several well‑studied complexity classes.

💡 Research Summary

📜 Original Paper Content