The complexity of interior point methods for solving discounted turn-based stochastic games

The complexity of interior point methods for solving discounted   turn-based stochastic games
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We study the problem of solving discounted, two player, turn based, stochastic games (2TBSGs). Jurdzinski and Savani showed that 2TBSGs with deterministic transitions can be reduced to solving $P$-matrix linear complementarity problems (LCPs). We show that the same reduction works for general 2TBSGs. This implies that a number of interior point methods for solving $P$-matrix LCPs can be used to solve 2TBSGs. We consider two such algorithms. First, we consider the unified interior point method of Kojima, Megiddo, Noma, and Yoshise, which runs in time $O((1+\kappa)n^{3.5}L)$, where $\kappa$ is a parameter that depends on the $n \times n$ matrix $M$ defining the LCP, and $L$ is the number of bits in the representation of $M$. Second, we consider the interior point potential reduction algorithm of Kojima, Megiddo, and Ye, which runs in time $O(\frac{-\delta}{\theta}n^4\log \epsilon^{-1})$, where $\delta$ and $\theta$ are parameters that depend on $M$, and $\epsilon$ describes the quality of the solution. For 2TBSGs with $n$ states and discount factor $\gamma$ we prove that in the worst case $\kappa = \Theta(n/(1-\gamma)^2)$, $-\delta = \Theta(\sqrt{n}/(1-\gamma))$, and $1/\theta = \Theta(n/(1-\gamma)^2)$. The lower bounds for $\kappa$, $-\delta$, and $1/\theta$ are obtained using the same family of deterministic games.


💡 Research Summary

The paper investigates the computational complexity of solving discounted two‑player turn‑based stochastic games (2TBSGs) by leveraging interior‑point methods for P‑matrix linear complementarity problems (LCPs). The authors first extend the reduction originally presented by Jurdzinski and Savani for deterministic 2TBSGs to the general stochastic case. In this reduction each state‑action pair becomes a variable, and the game’s cost and transition data are encoded into a matrix M and a vector q, forming an LCP (M,q). They provide an alternative proof that the resulting matrix M is always a P‑matrix, i.e., all its principal minors are positive, which guarantees a unique solution to the LCP.

Having obtained a P‑matrix formulation, the paper studies two well‑known interior‑point algorithms:

  1. Unified interior‑point method (Kojima, Megiddo, Noma, Yoshise) – its running time is O((1+κ) n^{3.5} L), where n is the dimension of M, L is the bit‑length of the input, and κ is the smallest non‑negative number such that M is a P∗(κ)‑matrix. The authors prove that for any 2TBSG with n states and discount factor γ, κ = Θ( n / (1−γ)^2 ). Consequently the algorithm runs in O( n^{4.5} L / (1−γ)^2 ) time in the worst case.

  2. Potential‑reduction interior‑point method (Kojima, Megiddo, Ye) – its running time is O( (−δ / θ) n^4 log ε^{−1}), where ε is the desired accuracy, δ is the smallest eigenvalue of (M+Mᵀ)/2, and θ is the positive P‑matrix number of M (θ = min_{‖x‖=1} max_i x_i (M x)_i). The paper shows that for the matrices arising from 2TBSGs, −δ = Θ( √n / (1−γ) ) and 1/θ = Θ( n / (1−γ)^2 ). Hence the algorithm’s worst‑case complexity becomes O( n^{5} log ε^{−1} / (1−γ)^3 ).

To demonstrate that these bounds are tight, the authors construct a family of deterministic games G_n with two actions per state. By carefully choosing the transition structure and the discount factor, they obtain matrices for which κ, −δ, and 1/θ achieve the lower bounds Ω( n / (1−γ)^2 ), Ω( √n / (1−γ) ), and Ω( n / (1−γ)^2 ), respectively. Thus the derived upper bounds are essentially optimal for the considered algorithms.

The paper compares these interior‑point results with the classic value‑iteration algorithm, which solves discounted 2TBSGs in O( n m L / (1−γ) log 1/(1−γ) ) time when each state has two actions (m = 2n). The interior‑point methods incur higher powers of 1/(1−γ) (quadratic or cubic) and larger polynomial factors in n, indicating that, with the currently known analyses, they do not improve upon value iteration for the general case where γ is part of the input.

Nevertheless, the authors argue that interior‑point techniques remain promising because they bring powerful tools from linear programming and complementarity theory. They suggest that tighter analyses of the parameters κ, δ, and θ, or the development of new interior‑point schemes tailored to the structure of 2TBSGs, could lead to more efficient algorithms. The paper concludes by emphasizing the importance of further research on interior‑point methods for stochastic games, as they may ultimately provide polynomial‑time solutions for discounted 2TBSGs without fixing the discount factor.


Comments & Academic Discussion

Loading comments...

Leave a Comment