Symmetric Strategy Improvement
Symmetry is inherent in the definition of most of the two-player zero-sum games, including parity, mean-payoff, and discounted-payoff games. It is therefore quite surprising that no symmetric analysis techniques for these games exist. We develop a novel symmetric strategy improvement algorithm where, in each iteration, the strategies of both players are improved simultaneously. We show that symmetric strategy improvement defies Friedmann’s traps, which shook the belief in the potential of classic strategy improvement to be polynomial.
💡 Research Summary
The paper addresses a fundamental limitation of classic strategy‑improvement (SI) methods for two‑player zero‑sum graph games such as parity, mean‑payoff, and discounted‑payoff games. Traditional SI algorithms improve only one player’s positional strategy per iteration: given a current Max strategy σ, they compute the optimal counter‑strategy τ⁽ᶜ⁾_σ for Min, evaluate the game under (σ, τ⁽ᶜ⁾_σ), identify the set Prof(σ) of locally profitable edge switches, and then apply a chosen subset of these switches to obtain a new σ. This asymmetric approach, while often fast in practice, is vulnerable to the exponential‑time “Friedmann traps” that have been constructed to prove lower bounds for many SI variants.
The authors first discuss a naïve symmetric idea—simultaneously replacing σ by τ⁽ᶜ⁾_σ and τ by σ⁽ᶜ⁾_τ—but note that this can lead to cycles because the two updates may conflict and provide no guarantee of improvement. To overcome this, they propose the Symmetric Strategy Improvement (SSI) algorithm, which simultaneously improves both players but only using compatible profitable moves.
The SSI algorithm proceeds as follows for each iteration i:
- Compute optimal counter‑strategies: Determine τ⁽ᶜ⁾{σ_i} (the optimal Min response to σ_i) and σ⁽ᶜ⁾{τ_i} (the optimal Max response to τ_i). The existence of such positional counter‑strategies follows from positional determinacy of the considered game classes.
- Identify profitable moves: Compute Prof(σ_i) and Prof(τ_i), the sets of edge switches that would strictly improve the value of σ_i and τ_i respectively when evaluated against the opponent’s optimal counter‑strategy.
- Apply only intersecting moves: Update Max’s strategy to σ_{i+1} by applying any subset of Prof(σ_i) that also belongs to σ⁽ᶜ⁾{τ_i} (i.e., moves that agree with the optimal response to τ_i). Analogously, update Min’s strategy to τ{i+1} using any subset of Prof(τ_i) that also belongs to τ⁽ᶜ⁾_{σ_i}.
Because each player only adopts moves that are simultaneously profitable and consistent with the opponent’s optimal counter‑strategy, the value vectors for Max (ordered by ≤) and for Min (ordered by ≥) are guaranteed to improve monotonically. Since the number of positional strategies is finite, the process must eventually reach a fixed point where no intersecting profitable moves exist; at that point the pair (σ, τ) is optimal for both players.
The authors formalize the class of games “good for strategy improvement”: they must be positionally determined, have combinable profitable updates, and be maximum‑identifying (i.e., Prof(σ)=∅ iff σ is optimal). Parity, mean‑payoff, and discounted‑payoff games satisfy these conditions. Lemma 3.1 proves that SSI terminates on any such class, extending the classic termination arguments (Theorems 2.9 and 2.10) to the symmetric setting.
A major contribution is the empirical and theoretical demonstration that SSI defeats Friedmann’s traps. The paper reproduces several known exponential‑time constructions (binary counters, “snare” structures, etc.) and shows that, because SSI never adopts a move that contradicts the opponent’s optimal counter‑strategy, the pathological sequences that force classic SI into exponential loops cannot arise. Consequently, SSI achieves polynomial‑time convergence on these worst‑case instances.
From a complexity standpoint, each SSI iteration requires two optimal counter‑strategy computations (one for each player) and the evaluation of Prof sets, operations already required by classic SI. Hence the per‑iteration overhead is modest; the total number of iterations is empirically far lower than in asymmetric SI, especially on hard instances. The paper also discusses various policy choices for selecting subsets of intersecting moves (e.g., applying all, random sampling, or heuristic prioritisation) and notes that these affect iteration counts but not correctness.
In conclusion, the paper introduces a conceptually simple yet powerful symmetric refinement of strategy improvement. By leveraging the symmetry inherent in zero‑sum graph games, SSI guarantees monotone improvement for both players, terminates on all games that admit positional optimal strategies, and avoids known exponential lower‑bound constructions. This advances the theoretical understanding of strategy improvement and opens avenues for practical solvers in model checking, synthesis, and verification where parity and quantitative games are central. Future work may explore tighter bounds on iteration counts, adaptive move‑selection heuristics, and extensions to stochastic or multi‑player settings.
Comments & Academic Discussion
Loading comments...
Leave a Comment