A mathematical programming based characterization of Nash equilibria of some constrained stochastic games
We consider two classes of constrained finite state-action stochastic games. First, we consider a two player nonzero sum single controller constrained stochastic game with both average and discounted cost criterion. We consider the same type of constraints as in [1], i.e., player 1 has subscription based constraints and player 2, who controls the transition probabilities, has realization based constraints which can also depend on the strategies of player 1. Next, we consider a N -player nonzero sum constrained stochastic game with independent state processes where each player has average cost criterion as discussed in [2]. We show that the stationary Nash equilibria of both classes of constrained games, which exists under strong Slater and irreducibility conditions [3], [2], has one to one correspondence with global minima of certain mathematical programs. In the single controller game if the constraints of player 2 do not depend on the strategies of the player 1, then the mathematical program reduces to the non-convex quadratic program. In two player independent state processes stochastic game if the constraints of a player do not depend on the strategies of another player, then the mathematical program reduces to a non-convex quadratic program. Computational algorithms for finding global minima of non-convex quadratic program exist [4], [5] and hence, one can compute Nash equilibria of these constrained stochastic games. Our results generalize some existing results for zero sum games [1], [6], [7].
💡 Research Summary
The paper investigates two families of constrained finite state‑action stochastic games and establishes a precise correspondence between stationary Nash equilibria and the global minima of specially constructed mathematical programs. The first family is a two‑player non‑zero‑sum game in which one player (player 2) alone controls the transition probabilities (the “single‑controller” setting). Player 1 faces subscription‑type constraints that are linear in the occupation measures of his own actions, while player 2 is subject to realization‑type constraints that involve expected values of state‑action pairs and may depend on player 1’s strategy. The second family consists of an N‑player non‑zero‑sum game where each player’s state evolves independently according to his own Markov chain; each player minimizes an average cost and is subject to linear constraints that depend only on his own state‑action distribution.
Under two technical conditions—(i) a strong Slater condition guaranteeing that all constraints are strictly feasible, and (ii) irreducibility of the underlying Markov chains (ensuring a unique stationary distribution)—the authors prove that a stationary Nash equilibrium exists for both models. The core contribution is the construction of a unified Lagrangian‑based mathematical program whose decision variables are the stationary strategies of all players together with their associated Lagrange multipliers. The objective function aggregates each player’s expected cost plus a penalty term proportional to the multiplier times the amount by which the corresponding constraint is violated. Linear constraints enforce (a) the consistency of occupation measures with the transition dynamics (the “balance” equations) and (b) the original linear constraints on expected costs or resource usage.
The main theorems show a one‑to‑one mapping: (a) any stationary Nash equilibrium together with suitable multipliers yields a feasible point that globally minimizes the program, and (b) any global minimizer of the program defines a stationary strategy profile that is a Nash equilibrium. The proof relies on the Karush‑Kuhn‑Tucker (KKT) conditions for each player’s individual constrained Markov decision problem and on the fact that, under the Slater and irreducibility assumptions, the KKT conditions are also sufficient for optimality.
A particularly useful specialization is examined. If player 2’s constraints in the single‑controller game are independent of player 1’s strategy, the joint program collapses to a non‑convex quadratic program (QP) with linear constraints. Similarly, in the N‑player independent‑state game, if a player’s constraints do not involve the strategies of any other player, the overall program again reduces to a non‑convex QP. Although non‑convex QPs are NP‑hard in general, a substantial body of literature provides global optimization algorithms (branch‑and‑bound, semidefinite relaxations, convex‑concave procedures, etc.) that can locate global minima for problems of moderate size. Consequently, the paper offers a concrete computational pathway for obtaining Nash equilibria in these constrained stochastic games.
The authors situate their results within the existing literature. Earlier works on zero‑sum single‑controller games
Comments & Academic Discussion
Loading comments...
Leave a Comment