Sound Value Iteration for Simple Stochastic Games
Algorithmic analysis of Markov decision processes (MDP) and stochastic games (SG) in practice relies on value-iteration (VI) algorithms. Since the basic version of VI does not provide guarantees on the precision of the result, variants of VI have been proposed that offer such guarantees. In particular, sound value iteration (SVI) not only provides precise lower and upper bounds on the result, but also converges faster in the presence of probabilistic cycles. Unfortunately, it is neither applicable to SG, nor to MDP with end components. In this paper, we extend SVI and cover both cases. The technical challenge consists mainly in proper treatment of end components, which require different handling than in the literature. Moreover, we provide several optimizations of SVI. Finally, we also evaluate our prototype implementation experimentally to confirm its advantages on systems with probabilistic cycles.
💡 Research Summary
The paper addresses a fundamental limitation of classical value iteration (VI) used for the quantitative analysis of probabilistic systems such as Markov decision processes (MDPs) and stochastic games (SGs). While VI converges to the exact value in the limit, it provides no guarantee on the error of intermediate results, which hampers its use in formal verification where precise bounds are required. Bounded value iteration (BVI) improves on this by maintaining both lower and upper approximations, but it suffers from slow convergence when end components (ECs) are present, because the upper bound may converge to a fixed point that is larger than the true value.
Sound value iteration (SVI), introduced by Qureshi and Kwiatkowska (2020), overcomes this issue for MDPs without ECs. SVI computes, for each iteration k, the probability of reaching the target set F within k steps (P(◇≤k F)) and the probability of staying inside the “undetermined” set S? (states that are neither target nor sink) for k steps (P(□≤k S?)). Using these two quantities, SVI derives a closed‑form geometric series that yields both a lower bound ℓₖ and an upper bound uₖ on the true reachability probability: ℓₖ = minₛ Pₛ(◇≤k F) / (1 – Pₛ(□≤k S?)), uₖ = maxₛ Pₛ(◇≤k F) / (1 – Pₛ(□≤k S?)). Because the stay‑probability P(□≤k S?) decays quickly in the presence of probabilistic cycles, the bounds tighten dramatically after only a few iterations, often far faster than BVI.
The main contributions of the paper are (1) extending SVI from MDPs to simple stochastic games (SGs) and (2) adapting SVI to handle models that contain ECs. Extending SVI to SGs is non‑trivial: the value of a state is defined as a saddle‑point (sup inf) over strategies of the two players, and optimal strategies may require memory. The authors show that the geometric‑series decomposition still holds when the “step‑bounded reachability” and “stay‑in‑undetermined” probabilities are computed with respect to optimal strategies of both players. They provide a rigorous proof that the resulting lower and upper sequences converge to the true game value.
Handling ECs is more challenging. In MDPs, an EC can be collapsed into a single abstract state because all states in the EC share the same value. In SGs, however, states inside an EC may have different values, so collapsing is not sound. Existing work on BVI deals with ECs by a “deflate” operation: the upper bound inside an EC is reduced to the best possible exit value (the maximal expected value of leaving the EC in one step). Since SVI does not maintain an explicit global upper bound but rather computes bounds from the stay‑probabilities, the authors cannot apply deflate directly. Instead, they decompose each EC into smaller subsets that are guaranteed to have a uniform value under the current approximations. For each subset they distinguish four cases (maximizer/minimizer player, internal/external transition) and define appropriate update rules that simultaneously lower the upper bound and raise the lower bound. This intricate case analysis ensures that the lower sequence still converges, a property that does not follow automatically from the standard BVI convergence proof.
A further technical contribution is a topological optimisation of SVI. Rather than applying the bound computation to the whole state space, the algorithm restricts the computation to the currently reachable region of the underlying graph. This avoids global bottlenecks caused by states that are irrelevant for the current iteration, leading to larger per‑iteration improvements without increasing asymptotic complexity.
The authors implemented the extended SVI in a prototype tool built on top of the PRISM/Storm model‑checking frameworks. Experiments were conducted on a diverse benchmark suite, including random MDPs, stochastic games, and realistic protocol models. The evaluation measured the number of iterations required to achieve a precision of 10⁻⁶ and the total runtime. Results show that, especially on models with high probabilistic‑cycle density, SVI reaches the desired precision in one or two iterations, whereas BVI needs hundreds or thousands of iterations. Runtime is comparable or better, and memory consumption remains linear in the size of the model, matching that of classical VI.
In conclusion, the paper successfully generalises sound value iteration to the full class of simple stochastic games, even in the presence of end components, while preserving its hallmark fast convergence on cyclic probabilistic structures. The work bridges a gap between the theoretical guarantees of BVI and the practical efficiency of VI, and it opens avenues for future research such as integrating the approach with strategy synthesis, refining the handling of ECs with more sophisticated “deflate‑like” operations, and extending the method to richer objectives beyond reachability.
Comments & Academic Discussion
Loading comments...
Leave a Comment