We give polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decisions of the MDP. Using our result on MDPs, we show that turn-based stochastic games with limsup and liminf objectives can be solved in NP \cap coNP.
Deep Dive into Probabilistic Systems with LimSup and LimInf Objectives.
We give polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decisions of the MDP. Using our result on MDPs, we show that turn-based stochastic games with limsup and liminf objectives can be solved in NP \cap coNP.
arXiv:0809.1465v1 [cs.GT] 9 Sep 2008
Probabilistic Systems with
LimSup and LimInf Objectives
Krishnendu Chatterjee1 and Thomas A. Henzinger1,2
1 EECS, UC Berkeley, USA
2 EPFL, Switzerland
{c krish,tah}@eecs.berkeley.edu
Abstract. We give polynomial-time algorithms for computing the val-
ues of Markov decision processes (MDPs) with limsup and liminf objec-
tives. A real-valued reward is assigned to each state, and the value of
an infinite path in the MDP is the limsup (resp. liminf) of all rewards
along the path. The value of an MDP is the maximal expected value of
an infinite path that can be achieved by resolving the decisions of the
MDP. Using our result on MDPs, we show that turn-based stochastic
games with limsup and liminf objectives can be solved in NP ∩coNP.
1
Introduction
A turn-based stochastic game is played on a finite graph with three types of
states: in player-1 states, the first player chooses a successor state from a given
set of outgoing edges; in player-2 states, the second player chooses a successor
state from a given set of outgoing edges; and probabilistic states, the successor
state is chosen according to a given probability distribution. The game results in
an infinite path through the graph. Every such path is assigned a real value, and
the objective of player 1 is to resolve her choices so as to maximize the expected
value of the resulting path, while the objective of player 2 is to minimize the
expected value. If the function that assigns values to infinite paths is a Borel
function (in the Cantor topology on infinite paths), then the game is determined
[12]: the maximal expected value achievable by player 1 is equal to the minimal
expected value achievable by player 2, and it is called the value of the game.
There are several canonical functions for assigning values to infinite paths.
If each state is given a reward, then the max (resp. min) functions choose the
maximum (resp. minimum) of the infinitely many rewards along a path; the
limsup (resp. liminf ) functions choose the limsup (resp. liminf) of the infinitely
many rewards; and the limavg function chooses the long-run average of the
rewards. For the Borel level-1 functions max and min, as well as for the Borel
level-3 function limavg, computing the value of a game is known to be in NP
∩coNP [10]. However, for the Borel level-2 functions limsup and liminf, only
special cases have been considered so far. If there are no probabilistic states (in
this case, the game is called deterministic), then the game value can be computed
in polynomial time using value-iteration algorithms [1]; likewise, if all states are
given reward 0 or 1 (in this case, limsup is a B¨uchi objective, and liminf is a
coB¨uchi objective), then the game value can be decided in NP ∩coNP [3]. In
this paper, we show that the values of general turn-based stochastic games with
limsup and liminf objectives can be computed in NP ∩coNP.
It is known that pure memoryless strategies suffice for achieving the value of
turn-based stochastic games with limsup and liminf objectives [9]. A strategy is
pure if the player always chooses a unique successor state (rather than a proba-
bility distribution of successor states); a pure strategy is memoryless if at every
state, the player always chooses the same successor state. Hence a pure memory-
less strategy for player 1 is a function from player-1 states to outgoing edges (and
similarly for player 2). Since pure memoryless strategies offer polynomial wit-
nesses, our result will follow from polynomial-time algorithms for computing the
values of Markov decision processes (MDPs) with limsup and liminf objectives.
We provide such algorithms.
An MDP is the special case of a turn-based stochastic game which contains no
player-1 (or player-2) states. Using algorithms for solving MDPs with B¨uchi and
coB¨uchi objectives, we give polynomial-time reductions from MDPs with limsup
and liminf objectives to MDPs with max objectives. The solution of MDPs with
max objectives is computable by linear programming, and the linear program
for MDPs with max objectives is obtained by generalizing the linear program
for MDPs with reachability objectives. This will conclude our argument.
Related work. Games with limsup and liminf objectives have been widely
studied in game theory; for example, Maitra and Sudderth [11] present several
results about games with limsup and liminf objectives. In particular, they show
the existence of values in limsup and liminf games that are more general than
turn-based stochastic games, such as concurrent games, where the two players re-
peatedly choose their moves simultaneously and independently, and games with
infinite state spaces. Gimbert and Zielonka have studied the strategy complexity
of games with limsup and liminf objectives: the sufficiency of pure memoryless
strategies for deterministic games was shown in [8], and for turn-based stochas-
tic games, in [9]. Polynomial-time algorithms for MDPs with B¨uchi and coB¨uchi
objec
…(Full text truncated)…
This content is AI-processed based on ArXiv data.