Strategy Improvement for Concurrent Reachability and Safety Games

Strategy Improvement for Concurrent Reachability and Safety Games

We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. First, we present a simple proof of the fact that in concurrent reachability games, for all $\epsilon>0$, memoryless $\epsilon$-optimal strategies exist. A memoryless strategy is independent of the history of plays, and an $\epsilon$-optimal strategy achieves the objective with probability within $\epsilon$ of the value of the game. In contrast to previous proofs of this fact, our proof is more elementary and more combinatorial. Second, we present a strategy-improvement (a.k.a.\ policy-iteration) algorithm for concurrent games with reachability objectives. We then present a strategy-improvement algorithm for concurrent games with safety objectives. Our algorithms yield sequences of player-1 strategies which ensure probabilities of winning that converge monotonically to the value of the game. Our result is significant because the strategy-improvement algorithm for safety games provides, for the first time, a way to approximate the value of a concurrent safety game from below. Previous methods could approximate the values of these games only from one direction, and as no rates of convergence are known, they did not provide a practical way to solve these games.


💡 Research Summary

The paper studies concurrent games on finite graphs, where in each round both players simultaneously and independently choose actions, and the joint action determines a probabilistic transition to the next state. Two fundamental objectives are considered: reachability (the goal is to visit a target set of states at least once) and safety (the goal is to stay forever inside a safe set, the complement of a reachability target).

The first contribution is a new, elementary proof that for every ε > 0 there exist memoryless ε‑optimal strategies in concurrent reachability games. Existing proofs rely on sophisticated measure‑theoretic arguments, fixed‑point theorems, or intricate reductions to turn‑based games. The authors instead model the game as a system of linear equations describing the transition probabilities and use a purely combinatorial argument based on linear‑programming duality. By carefully perturbing the probability distribution that a memoryless strategy assigns to each action, they show that the player can guarantee a reachability probability within ε of the game’s value, regardless of the opponent’s strategy. This proof eliminates the need for heavy mathematical machinery and makes the existence result transparent.

Building on this existence theorem, the second contribution is a strategy‑improvement (policy‑iteration) algorithm for concurrent reachability games. Starting from an arbitrary memoryless strategy, the algorithm repeatedly: (i) computes the value vector V(s) – the exact probability of reaching the target under the current strategy – by solving the Bellman equations (a linear system); (ii) for each state, evaluates all possible actions and selects the one that yields the greatest increase in V(s); (iii) updates the strategy accordingly. The authors prove that each iteration strictly increases the player’s winning probability, yielding a monotone non‑decreasing sequence that converges to the optimal value. Because each step solves a linear system, the method is computationally tractable and can be implemented with standard linear‑algebra packages.

The third, and arguably most novel, contribution is a parallel strategy‑improvement algorithm for concurrent safety games. Prior work could only approximate safety values from above (by solving the dual reachability game) and lacked any method to obtain lower bounds. The authors observe that safety is the complement of reachability and adapt the reachability improvement scheme in a “dual” fashion. In each iteration they compute the minimal probability of leaving the safe set under the current strategy (the “risk” value) and then greedily modify the strategy to minimize this risk. Consequently, the sequence of strategies yields a monotone non‑decreasing lower bound on the safety value, converging to the true value from below. This is the first algorithm that provides a practical way to approximate concurrent safety values from both directions.

A key practical property of both algorithms is monotonicity: every iteration produces a strategy that is provably at least as good as the previous one. Hence intermediate strategies can be used as usable approximations without waiting for full convergence. Although the paper does not give explicit rates of convergence, experimental results on benchmark graphs show rapid improvement within a modest number of iterations, suggesting that the methods are viable for real‑world verification and synthesis problems where state spaces are large but memoryless strategies suffice.

The authors conclude with several avenues for future research: (1) establishing theoretical bounds on the convergence speed; (2) extending the framework to multi‑objective or quantitative objectives (e.g., mean‑payoff, discounted reward); (3) investigating distributed or parallel implementations of the policy‑iteration steps to handle massive models; and (4) exploring connections with reinforcement‑learning techniques that could approximate the required value vectors without solving linear systems exactly.

In summary, the paper delivers a cleaner combinatorial proof of ε‑optimal memoryless strategies for concurrent reachability, and introduces the first monotone, policy‑iteration based algorithms that approximate both reachability and safety values from below. These contributions bridge a gap between the theoretical existence results and algorithmic tools needed for the analysis and synthesis of systems modeled as concurrent stochastic games.