Probabilistic Systems with LimSup and LimInf Objectives

Reading time: 5 minute
...

📝 Original Info

  • Title: Probabilistic Systems with LimSup and LimInf Objectives
  • ArXiv ID: 0809.1465
  • Date: 2008-09-10
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We give polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decisions of the MDP. Using our result on MDPs, we show that turn-based stochastic games with limsup and liminf objectives can be solved in NP \cap coNP.

💡 Deep Analysis

Deep Dive into Probabilistic Systems with LimSup and LimInf Objectives.

We give polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decisions of the MDP. Using our result on MDPs, we show that turn-based stochastic games with limsup and liminf objectives can be solved in NP \cap coNP.

📄 Full Content

arXiv:0809.1465v1 [cs.GT] 9 Sep 2008 Probabilistic Systems with LimSup and LimInf Objectives Krishnendu Chatterjee1 and Thomas A. Henzinger1,2 1 EECS, UC Berkeley, USA 2 EPFL, Switzerland {c krish,tah}@eecs.berkeley.edu Abstract. We give polynomial-time algorithms for computing the val- ues of Markov decision processes (MDPs) with limsup and liminf objec- tives. A real-valued reward is assigned to each state, and the value of an infinite path in the MDP is the limsup (resp. liminf) of all rewards along the path. The value of an MDP is the maximal expected value of an infinite path that can be achieved by resolving the decisions of the MDP. Using our result on MDPs, we show that turn-based stochastic games with limsup and liminf objectives can be solved in NP ∩coNP. 1 Introduction A turn-based stochastic game is played on a finite graph with three types of states: in player-1 states, the first player chooses a successor state from a given set of outgoing edges; in player-2 states, the second player chooses a successor state from a given set of outgoing edges; and probabilistic states, the successor state is chosen according to a given probability distribution. The game results in an infinite path through the graph. Every such path is assigned a real value, and the objective of player 1 is to resolve her choices so as to maximize the expected value of the resulting path, while the objective of player 2 is to minimize the expected value. If the function that assigns values to infinite paths is a Borel function (in the Cantor topology on infinite paths), then the game is determined [12]: the maximal expected value achievable by player 1 is equal to the minimal expected value achievable by player 2, and it is called the value of the game. There are several canonical functions for assigning values to infinite paths. If each state is given a reward, then the max (resp. min) functions choose the maximum (resp. minimum) of the infinitely many rewards along a path; the limsup (resp. liminf ) functions choose the limsup (resp. liminf) of the infinitely many rewards; and the limavg function chooses the long-run average of the rewards. For the Borel level-1 functions max and min, as well as for the Borel level-3 function limavg, computing the value of a game is known to be in NP ∩coNP [10]. However, for the Borel level-2 functions limsup and liminf, only special cases have been considered so far. If there are no probabilistic states (in this case, the game is called deterministic), then the game value can be computed in polynomial time using value-iteration algorithms [1]; likewise, if all states are given reward 0 or 1 (in this case, limsup is a B¨uchi objective, and liminf is a coB¨uchi objective), then the game value can be decided in NP ∩coNP [3]. In this paper, we show that the values of general turn-based stochastic games with limsup and liminf objectives can be computed in NP ∩coNP. It is known that pure memoryless strategies suffice for achieving the value of turn-based stochastic games with limsup and liminf objectives [9]. A strategy is pure if the player always chooses a unique successor state (rather than a proba- bility distribution of successor states); a pure strategy is memoryless if at every state, the player always chooses the same successor state. Hence a pure memory- less strategy for player 1 is a function from player-1 states to outgoing edges (and similarly for player 2). Since pure memoryless strategies offer polynomial wit- nesses, our result will follow from polynomial-time algorithms for computing the values of Markov decision processes (MDPs) with limsup and liminf objectives. We provide such algorithms. An MDP is the special case of a turn-based stochastic game which contains no player-1 (or player-2) states. Using algorithms for solving MDPs with B¨uchi and coB¨uchi objectives, we give polynomial-time reductions from MDPs with limsup and liminf objectives to MDPs with max objectives. The solution of MDPs with max objectives is computable by linear programming, and the linear program for MDPs with max objectives is obtained by generalizing the linear program for MDPs with reachability objectives. This will conclude our argument. Related work. Games with limsup and liminf objectives have been widely studied in game theory; for example, Maitra and Sudderth [11] present several results about games with limsup and liminf objectives. In particular, they show the existence of values in limsup and liminf games that are more general than turn-based stochastic games, such as concurrent games, where the two players re- peatedly choose their moves simultaneously and independently, and games with infinite state spaces. Gimbert and Zielonka have studied the strategy complexity of games with limsup and liminf objectives: the sufficiency of pure memoryless strategies for deterministic games was shown in [8], and for turn-based stochas- tic games, in [9]. Polynomial-time algorithms for MDPs with B¨uchi and coB¨uchi objec

…(Full text truncated)…

📸 Image Gallery

cover.png page_2.webp page_3.webp

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut