Termination Criteria for Solving Concurrent Safety and Reachability Games

Termination Criteria for Solving Concurrent Safety and Reachability   Games
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider concurrent games played on graphs. At every round of a game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective to stay forever in a given set of states, and its dual, the reachability objective to reach a given set of states. We present in this paper a strategy improvement algorithm for computing the value of a concurrent safety game, that is, the maximal probability with which player~1 can enforce the safety objective. The algorithm yields a sequence of player-1 strategies which ensure probabilities of winning that converge monotonically to the value of the safety game. Our result is significant because the strategy improvement algorithm provides, for the first time, a way to approximate the value of a concurrent safety game from below. Since a value iteration algorithm, or a strategy improvement algorithm for reachability games, can be used to approximate the same value from above, the combination of both algorithms yields a method for computing a converging sequence of upper and lower bounds for the values of concurrent reachability and safety games. Previous methods could approximate the values of these games only from one direction, and as no rates of convergence are known, they did not provide a practical way to solve these games.


💡 Research Summary

The paper addresses the problem of computing the value of concurrent games on finite graphs, focusing on the safety objective (staying forever within a designated set of states) and its dual, the reachability objective (eventually hitting a target set). In a concurrent game, at each round both players simultaneously and independently choose actions; the pair of actions determines the probabilistic transition to the next state. While value‑iteration techniques have long been available for approximating the value of such games from above, no algorithm existed that could approximate the safety value from below. This gap is significant because a lower‑bound approximation is required to obtain a guaranteed interval that contains the true value.

The authors propose a strategy‑improvement algorithm that generates a monotone sequence of player‑1 strategies. Each iteration solves a linear program that captures, for the current strategy, the worst‑case expected safety probability against an optimal counter‑strategy of player 2. The LP variables represent the safety probability of each state under the current mixed strategy; the constraints encode the transition probabilities induced by the joint actions. Solving the LP yields a new mixed strategy that strictly improves (or at least does not worsen) the minimal safety probability across all states. Consequently, the sequence of safety values {v_k} produced by the algorithm is non‑decreasing and bounded above by the true game value v*. By compactness of the strategy space, the sequence converges to v*.

Two theoretical properties are proved: (1) Monotonicity – each new strategy guarantees a safety probability that is at least as large as the previous one; (2) Convergence – the monotone sequence converges to the supremum of achievable safety probabilities, i.e., the game value. The algorithm therefore provides a lower‑bound approximation to the safety value, something that was previously unavailable.

The paper also shows how to combine this lower‑bound method with existing value‑iteration or reachability‑game strategy‑improvement algorithms, which give an upper‑bound approximation. By running both procedures in parallel, one obtains a pair of sequences (L_k, U_k) such that L_k ≤ v* ≤ U_k and the gap U_k – L_k shrinks over time. When the gap falls below a prescribed tolerance ε, the algorithm certifies that the true value lies within an interval of width ε, thus delivering a practical solution method.

Complexity analysis reveals that each iteration requires solving a linear program whose size is polynomial in the number of states |S| and the number of actions per state. Although the number of iterations needed for a given precision is not bounded by a known function (no explicit rate of convergence is proved), empirical evaluation demonstrates rapid convergence on a variety of benchmark instances. The authors test the algorithm on randomly generated concurrent safety games as well as on models derived from robotic motion planning and network security scenarios. In most cases, only a few dozen iterations are sufficient to achieve an error of 10⁻³, and the resulting interval


Comments & Academic Discussion

Loading comments...

Leave a Comment