Strategy Improvement for Concurrent Safety Games

We consider concurrent games played on graphs. At every round of the game, each player simultaneously and independently selects a move; the moves jointly determine the transition to a successor state. Two basic objectives are the safety objective: ``…

Authors: Krishnendu Chatterjee, Luca de Alfaro, Thomas A. Henzinger

Strategy Impro v emen t for Concurren t Safet y Games ∗ Krishnendu Chatterjee § Luca de Alfaro § Thomas A. Henzinger †‡ § CE, Univ ersit y of C alifornia, San ta Cruz,USA † EECS, Univ ersit y of California, Berk eley ,USA ‡ Computer and Communicatio n Sciences, EPFL, Switzerland { c krish,ta h } @eecs.ber keley.edu, luca@so e.ucsc.ed u Abstract W e c onsider concurr ent games play ed on gra phs . A t every round of the game, each player simult aneo usly and independently selects a mov e; the mo ves join tly determine the transition to a successor state. Two basic o b jectives are the safety ob jective: “stay forever in a set F of states”, and its dual, the rea chability ob jective, “r each a s et R of states”. W e present in this pap er a str ategy improvemen t algor ithm for c o mputing the value of a concurr ent safet y game, that is, the maximal probability with which play er 1 ca n enforce the safety ob jective. The algorithm yields a sequence of play er-1 strategies which ensur e probabilities of winning that conv erge monotonically to the v alue o f the safety game. The significance of the result is t wofold. First, while strategy improv ement algorithms w ere known for Ma rko v decision pro cesses and turn-ba sed g ames, as w ell as for concurr ent reac hability games, this is the first strateg y impr ov ement algorithm for concurrent safety g ames. Second, and most imp o rtantly , the impro vemen t a lgorithm provides a way to approximate the v alue of a concurrent sa fet y game fr om b el ow (the known v alue-iteration algorithms a pproximate the v alue from ab ov e). Thu s, when used together with v alue-iteration algorithms, or with strategy improv emen t alg orithms for reachability games, our algor ithm leads to the first pr actical algorithm for computing c o nv erging upper and low er b ounds for the v alue of reachabilit y and safety games. 1 In tro duction W e consider games pla y ed b et w een t wo pla y ers on graphs. A t ev ery roun d of the game, eac h of the t w o pla y ers selects a mov e; th e mov es of the pla y ers then determine the tr ansition to the successor state. A pla y of the game giv es rise to a path on th e graph. W e co nsid er t wo basic goals for the pla y ers: r e achability, and safety. In the reac hab ility goal, pla y er 1 must reac h a s et of target states or, if rand omizatio n is needed to pla y the game, then pla y er 1 m u st m aximize the p robabilit y of reac h ing the target set. In the safet y goal, pla y er 1 m ust ensure that a set of target states is n ev er left or, if randomization is r equired, then pla ye r 1 m ust ensure that the probabilit y of lea ving th e target set is as lo w as p ossible. The t wo goal s are dual, and the games are determined: the m aximal probabilit y with whic h pla y er 1 can reac h a target set is equal to o ne minus the maximal probabilit y with whic h pla ye r 2 can confine the game in the complemen t set [18]. ∗ This research was sup p ored in part by the N SF grants CCR-0132780, CNS-0720884, and CCR-0225610, and by the Swiss National Science F oundation. 1 These ga mes on graphs can b e divided into tw o classes: turn-b ase d and c oncurr ent. In turn- based games, only one pla ye r has a c hoice of m ov es at eac h state; in concurrent games, at ea c h state b oth pla ye rs c ho ose a mo v e, s im ultaneously and in d ep endently , fr om a set of a v ailable mo v es. F or turn-based games, the solution of games with reac habilit y and s afety goals has long b een kno wn. If the mo ve pla y ed determines uniquely the successor state, the games can b e solv ed in linear-time in the size of the game graph . If the mov e pla yed determines a p robabilit y distribution o v er th e successor state, the problem of deciding whether a safet y of r eac hab ility can b e wo n with probabilit y greate r th an p ∈ [0 , 1 ] is in NP ∩ co -NP [5], and the exact v alue of a game can b e computed by strategy imp ro v ement algorithms [6]. These results all hin ge on the fact that turn- based reac h abilit y and safet y games can b e optimal ly w on with deterministic, and memoryless, strategies. These strategies are f unctions from states to mo ve s, so they are finite in num b er, and guaran tees the termination of the algo rithms . The situation is d ifferent for the concurrent case, where rand omization is n eeded ev en in the case in wh ic h th e m o ves pla y ed by the pla y ers un iquely determine the su ccessor state. The value of the game is defined, as usual, as the sup-inf v alue: the supremum, ov er all strategies of pla y er 1, of the infimum, o v er all strategies of pla y er 2, of the prob ab ility of ac hieving the safet y or reac habilit y goal. In concurrent reac habilit y games, pla ye rs are only guaran teed the existence of ε -optimal strategies, that ensur e that the v alue of the game is ac h iev ed within a sp ecified ε > 0 [17]; these strategies (which dep end on ε ) are memoryless, bu t in general need randomization [10 ]. Ho w ev er, for concurr en t safety games m emoryless o ptimal strategie s exist [1 1]. Th us, these strate gies are mappings from states, to probabilit y d istr ibutions o v er mo ves. While complexit y resu lts are a v ailable for the solution of concurren t reac habilit y and safet y games, practical algorithms for their solution, that can provide b oth a v alue, and an estimated error, ha v e so far b een lac king. The question of whether the v alue of a concurren t reac h ab ility or safet y game is at least p ∈ [0 , 1] can b e decided in PSp ace via a r eduction to the th eory of the real closed field [13 ]. This yields a binary-search algorithm to appro ximate the v alue. This approac h is theoretical, but complex du e to the complex decision algorithms for the theory of reals. Th us far, the o nly practical approac h to th e solution of co ncurr en t safet y and reac h abilit y games has b een via v alue iteration, and via s tr ategy improv emen t for reac habilit y games. In [11] it w as sho wn ho w to construct a series of v aluations that appr o ximates from b elo w, and conv erges, to the v alue of a r eac hability game; the same algorithm p ro vides v aluations c onv erging from ab o v e to the v alue of a safet y game. In [4 ], it was sho wn ho w to construct a series of strategies for reac h abilit y games that con verge to wa rds optimalit y . Neither scheme is guarantee d to terminate, not eve n strategy impro v ement , since in general only ε -optimal strategies are guaran teed to exist. Both of t hese approxima tion schemes lead to practical algorithms. The prob lem with b oth sc hemes, ho w ev er, is that they provide only lower b ounds for the v alue of reac habilit y games, and only upp er b ound s for the v alue of safet y games. As no b ound s are av ailable for th e sp eed of conv ergence of these algorithms, the question of how to deriv e the matc h ing b ounds has so far b een op en. In this pap er, w e present the first s tr ategy impro ve ment algo rithm for the solution of concurren t safet y games. Giv en a safet y goal for p lay er 1, the algorithm computes a sequence of memoryless, randomized strategies π 0 1 , π 1 1 , π 2 1 , . . . for play er 1 that con v erge to wards optimalit y . Alb eit memory- less randomized optimal strategies exist for safet y goals [11], the strategy impro v ement algorithm ma y not con ve rge in fi nitely man y iterations: ind eed, optimal strategies ma y requ ire mov es to b e pla y ed with ir rational probabilities, wh ile the strateg ies pro du ced b y the algorithm pla y mov es w ith probabilities that are rational num b ers. T he main signifi cance of the algorithm is that it pro vides 2 a conv erging sequence of lower b ound s f or th e v alue of a safet y game, and dually , of upp er b ounds for the v alue of a reac habilit y game. T o obtain suc h b ounds , it suffices to compute the v alue v k ( s ) pro vided b y π k 1 at a state s , for k > 0. Once π k 1 is fixed, the game is redu ced to a Mark ov decision pro cess, and the v alue v k ( s ) of the s afet y game c an b e computed at all s e.g. via linear programming [7, 3]. Thus, toge ther with the v alue or s trategy impro v ement algorithms of [11, 4 ], the algorithm present ed in this pap er pro vides the fir st practical wa y of compu ting con v erging low er and upp er b ound s for the v alues of concurrent reac habilit y and safet y games. W e also presen t a detailed anal- ysis of termination criteria for turn-b ased sto chastic games, and obtain an impro v ed u pp er b ou n d for termination for turn-based sto chasti c games. The strategy impro v emen t algo rithm for reac habilit y games of [4 ] is based on lo cally improving the strategy on the basis of th e v aluation it yields. This approac h d o es not suffice for safet y games: the sequence of str ategie s obtained w ould yield increasing v alues to p la y er 1, b ut these v alue w ould not n ecessarily con verge to the v alue of the game. In this pap er, we int ro d uce a no v el, and non -lo cal, impro ve ment step, whic h augments the standard v alue-based improv emen t step. The non-lo cal step in volv es the analysis of an ap p ropriately-constructed turn -based game. As v alue iteration for safet y games conv erges from ab o v e, while our sequences of strategies y ields v alues that conv erge from b elo w, the pr o of of con v ergence for our algorithm cannot b e derived from a connectio n with v alue iteratio n, as w as the case for reac habilit y games. Th us, w e dev elop ed new pro of techniques to show b oth the monotonicit y of the strateg y v alues pro duced b y our alg orithm, and to sho w con v ergence to the v alue of the game. 2 Definitions Notation. F or a countable set A , a pr ob ability distribution on A is a function δ : A → [0 , 1] suc h that P a ∈ A δ ( a ) = 1. W e denote the set of p robabilit y distrib utions on A b y D ( A ). Giv en a distribution δ ∈ D ( A ), w e denote by Supp ( δ ) = { x ∈ A | δ ( x ) > 0 } the supp ort set of δ . Definition 1 (Concurren t games) A (two-player) co ncur ren t game stru cture G = h S, M , Γ 1 , Γ 2 , δ i c onsists of the fol lowing c omp onents: • A finite state sp ac e S and a finite set M of moves or actions. • Two move assignments Γ 1 , Γ 2 : S → 2 M \ ∅ . F or i ∈ { 1 , 2 } , assignment Γ i asso ciates with e ach state s ∈ S a nonempty set Γ i ( s ) ⊆ M of moves available to player i at state s . • A pr ob abilistic tr ansition function δ : S × M × M → D ( S ) that give s the pr ob ability δ ( s, a 1 , a 2 )( t ) of a tr ansition fr om s to t when player 1 cho oses at state s move a 1 and player 2 cho oses move a 2 , for al l s, t ∈ S and a 1 ∈ Γ 1 ( s ) , a 2 ∈ Γ 2 ( s ) . W e denote by | δ | the size of transition function, i.e., | δ | = P s ∈ S,a ∈ Γ 1 ( s ) ,b ∈ Γ 2 ( s ) ,t ∈ S | δ ( s, a, b )( t ) | , w here | δ ( s, a, b )( t ) | is the n umber of bits required to sp ecify the transition probabilit y δ ( s , a, b )( t ). W e denote by | G | the size of the game graph, and | G | = | δ | + | S | . At ev ery state s ∈ S , pla y er 1 chooses a m o ve a 1 ∈ Γ 1 ( s ), and sim ultaneously and ind ep endent ly play er 2 chooses a mo ve a 2 ∈ Γ 2 ( s ). T he game then pr o ceeds to the s uccessor state t with probabilit y δ ( s, a 1 , a 2 )( t ), for all t ∈ S . A state s is an absorbing state if for all a 1 ∈ Γ 1 ( s ) and a 2 ∈ Γ 2 ( s ), w e ha v e δ ( s , a 1 , a 2 )( s ) = 1. In ot her w ords, at an absorbing state s for all c h oices of mov es of the t wo pla y ers, the su ccessor state is alw a ys s . 3 Definition 2 (T urn-based stochastic games) A turn-based sto chastic game graph ( 2 1 / 2 - pla y er game graph ) G = h ( S, E ) , ( S 1 , S 2 , S R ) , δ i c onsists of a finite dir e cte d gr aph ( S, E ) , a p artition ( S 1 , S 2 , S R ) of the finite set S of states, and a pr ob abilistic tr ansition function δ : S R → D ( S ) , wher e D ( S ) denotes the set of pr ob ability distributions over the stat e sp ac e S . The states in S 1 ar e the play er-1 states, wher e player 1 de c ides the suc c essor state; the states in S 2 ar e the play er-2 states, wher e player 2 de ci des the suc c essor state; and the states in S R ar e the r andom or pr obabilis- tic states, wh er e the suc c essor state is cho sen a c c or ding to the pr ob abilistic tr ansition function δ . We assume that for s ∈ S R and t ∈ S , we have ( s, t ) ∈ E iff δ ( s )( t ) > 0 , and we often write δ ( s, t ) for δ ( s )( t ) . F or te chnic al c onvenienc e we assume tha t every state in the gr aph ( S, E ) has at le ast one outgoing e dge. F or a state s ∈ S , we write E ( s ) to denote the set { t ∈ S | ( s, t ) ∈ E } of p ossi- ble suc c essors. We denote by | δ | the size of the tr ansition function, i.e., | δ | = P s ∈ S R ,t ∈ S | δ ( s )( t ) | , wher e | δ ( s )( t ) | is the numb er of bits r e quir e d to sp e cif y the tr ansition pr ob ability δ ( s )( t ) . We denote by | G | the size of the game g r aph, and | G | = | δ | + | S | + | E | . Pla ys. A play ω o f G is an infin ite sequence ω = h s 0 , s 1 , s 2 , . . . i of states in S suc h that f or all k ≥ 0, there are mo v es a k 1 ∈ Γ 1 ( s k ) and a k 2 ∈ Γ 2 ( s k ) with δ ( s k , a k 1 , a k 2 )( s k +1 ) > 0. W e d enote by Ω the set of all pla ys, and b y Ω s the set of all pla ys ω = h s 0 , s 1 , s 2 , . . . i such that s 0 = s , that is, the set of pla ys starting from state s . Selectors and strategies. A sele ctor ξ for pla yer i ∈ { 1 , 2 } is a fu nction ξ : S → D ( M ) suc h that for all states s ∈ S and mov es a ∈ M , if ξ ( s )( a ) > 0, th en a ∈ Γ i ( s ). A selector ξ for pla ye r i at a state s is a distribu tion o v er mo ves such that if ξ ( s )( a ) > 0, then a ∈ Γ i ( s ). W e denote b y Λ i the set of al l s electors for pla y er i ∈ { 1 , 2 } , and sim ilarly , we denote by Λ i ( s ) the set of all selectors for p la y er i at a stat e s . The selector ξ is pur e if for every state s ∈ S , there is a mov e a ∈ M such that ξ ( s )( a ) = 1. A str ate gy for play er i ∈ { 1 , 2 } is a f unction π : S + → D ( M ) that asso ciates with eve ry fi nite, nonempty sequence of states, represen ting the history of the pla y so far, a selecto r for pla y er i ; that is, for all w ∈ S ∗ and s ∈ S , w e h a v e Supp ( π ( w · s )) ⊆ Γ i ( s ). The strategy π is pur e if it alwa ys c ho oses a pure selector; that is, for all w ∈ S + , there is a mo v e a ∈ M su c h that π ( w )( a ) = 1. A memoryless strategy is indep endent of the history of the pla y and dep ends only on the cur ren t state. Memoryless strategies corresp ond to selec tors; we write ξ for the memoryless strategy consisting in playing forev er the selector ξ . A strategy is pur e memoryless if it is b oth pure and memoryless. In a tu r n-based stoc hastic game, a strategy for pla y er 1 is a fun ction π 1 : S ∗ · S 1 → D ( S ), such that for al l w ∈ S ∗ and for all s ∈ S 1 w e ha ve Supp ( π 1 ( w · s )) ⊆ E ( s ). Memoryless strateg ies and pure m emoryless strategies are obtained as the restriction of strategies as in the case of concur ren t game graphs. The family of strategies for pla y er 2 are defined analogo usly . W e denote b y Π 1 and Π 2 the sets of all strategies for pla ye r 1 and pla yer 2, r esp ectiv ely . W e denote by Π M i and Π PM i the sets of memoryless strategies and pure memoryless strategies for pla ye r i , resp ectiv ely . Destinations of mo v es and selectors. F or all states s ∈ S and mov es a 1 ∈ Γ 1 ( s ) and a 2 ∈ Γ 2 ( s ), we in dicate b y Dest ( s, a 1 , a 2 ) = Supp ( δ ( s, a 1 , a 2 )) the set of p ossib le successors of s when the mo ve s a 1 and a 2 are c hosen. Giv en a state s , and selectors ξ 1 and ξ 2 for the t wo pla y ers, w e 4 denote b y Dest ( s, ξ 1 , ξ 2 ) = [ a 1 ∈ Supp ( ξ 1 ( s )) , a 2 ∈ Supp ( ξ 2 ( s )) Dest ( s, a 1 , a 2 ) the set of p ossible successors of s with resp ect to the selecto rs ξ 1 and ξ 2 . Once a starting state s and str ategie s π 1 and π 2 for the t wo pla yers are fixed, the game is reduced to an ordin ary stochastic pro cess. Hence, the probabilities of ev en ts are uniquely defin ed, where an event A ⊆ Ω s is a measurable set of pla ys. F or an even t A ⊆ Ω s , we denote by Pr π 1 ,π 2 s ( A ) the p r obabilit y that a pla y b elongs to A when the game starts from s and the p lay ers follo ws the strategies π 1 and π 2 . Similarly , for a measurable function f : Ω s → I R, w e den ote b y E π 1 ,π 2 s ( f ) the exp ected v alue of f when the game starts from s and the pla y ers foll o w the strategies π 1 and π 2 . F or i ≥ 0, w e denote by Θ i : Ω → S the random v ariable d enoting the i -th state along a pla y . V aluations. A valuation is a mappin g v : S → [0 , 1] asso ciating a r eal n umber v ( s ) ∈ [0 , 1] w ith eac h state s . Giv en t wo v aluations v , w : S → I R, w e write v ≤ w when v ( s ) ≤ w ( s ) for all states s ∈ S . F or an ev en t A , w e d enote by Pr π 1 ,π 2 ( A ) the v aluation S → [0 , 1] defin ed for all states s ∈ S b y  Pr π 1 ,π 2 ( A )  ( s ) = Pr π 1 ,π 2 s ( A ). Similarly , for a measurable function f : Ω s → [0 , 1], w e denote by E π 1 ,π 2 ( f ) the v aluation S → [0 , 1] d efined for all s ∈ S b y  E π 1 ,π 2 ( f )  ( s ) = E π 1 ,π 2 s ( f ). Reac ha bilit y and safet y ob jectiv es. Giv en a set F ⊆ S of safe states, the ob jectiv e of a safet y game co nsists in nev er lea vin g F . Th erefore, we define the set of winning pla ys as the set Safe( F ) = {h s 0 , s 1 , s 2 , . . . i ∈ Ω | s k ∈ F for all k ≥ 0 } . Giv en a subset T ⊆ S of tar get states, th e ob jectiv e of a reac habilit y ga me consists in reac h ing T . Corresp ondingly , the set winning pla ys is Reac h ( T ) = {h s 0 , s 1 , s 2 , . . . i ∈ Ω | s k ∈ T for some k ≥ 0 } of plays that visit T . F or all F ⊆ S and T ⊆ S , the sets Safe( F ) and Reac h ( T ) is m easurable. An ob jectiv e in general is a measurable set, and in this pap er we w ould consider on ly reac habilit y and s afety ob jectiv es. F or an ob jectiv e Φ, the probability of satisfying Φ fr om a state s ∈ S u nder strategies π 1 and π 2 for pla y ers 1 and 2, resp ectiv ely , is Pr π 1 ,π 2 s (Φ). W e defin e the value for pla y er 1 of game with ob jectiv e Φ fr om the s tate s ∈ S as h h 1 i i val (Φ)( s ) = sup π 1 ∈ Π 1 inf π 2 ∈ Π 2 Pr π 1 ,π 2 s (Φ); i.e., th e v alue is the maximal p robabilit y with whic h pla yer 1 can guaran tee th e satisfaction of Φ against all pla y er 2 s tr ategie s. Giv en a p la yer-1 strategy π 1 , w e use the notation h h 1 i i π 1 val (Φ)( s ) = inf π 2 ∈ Π 2 Pr π 1 ,π 2 s (Φ) . A strategy π 1 for pla y er 1 is optima l for an ob j ectiv e Φ if for all states s ∈ S , w e ha v e h h 1 i i π 1 val (Φ)( s ) = h h 1 i i val (Φ)( s ) . F or ε > 0, a strategy π 1 for pla y er 1 is ε -optimal if for all s tates s ∈ S , we h av e h h 1 i i π 1 val (Φ)( s ) ≥ h h 1 i i val (Φ)( s ) − ε. The notion of v alues and optimal strategies for play er 2 are d efined analogo usly . Reac habilit y and safet y ob jectiv es are dual, i.e., we ha ve Reac h ( T ) = Ω \ Safe( S \ T ). The q u an titativ e d eterminacy result of [18] ensur es that for all states s ∈ S , we h av e h h 1 i i val (Safe( F ))( s ) + h h 2 i i val (Reac h ( S \ F ))( s ) = 1 . 5 Theorem 1 (Memoryless determinacy) F or al l c oncurr ent gam e gr aphs G , f or al l F , T ⊆ S , such that F = S \ T , the fol lowing assertions hold. 1. [14] M emoryless optimal str ate gies exist for safety obje ctives Safe ( F ) . 2. [4, 13] F or al l ε > 0 , memoryless ε -optimal str ate gi es e xi st for r e achability obje ctives R e ach ( T ) . 3. [5] If G is a turn-b ase d sto chastic game gr aph, then pur e memoryless optimal str ate gies exist for r e achability obje c tives R e ach ( T ) and safety obje ctive s Safe ( F ) . 3 Mark o v Decision Pro cesses T o dev elop our arguments, we n eed some facts ab out one-pla ye r v ersions of concurr en t sto c hastic games, kno wn as Markov de c ision pr o c esses (MDPs) [12, 2]. F or i ∈ { 1 , 2 } , a pla yer- i MD P (for short, i -MDP) is a concurren t game where, for all s tates s ∈ S , w e ha v e | Γ 3 − i ( s ) | = 1. Given a concurrent game G , if we fix a memoryless strategy corresp onding to selector ξ 1 for pla y er 1, the game is equiv alen t to a 2-MDP G ξ 1 with the transition function δ ξ 1 ( s, a 2 )( t ) = X a 1 ∈ Γ 1 ( s ) δ ( s, a 1 , a 2 )( t ) · ξ 1 ( s )( a 1 ) , for all s ∈ S and a 2 ∈ Γ 2 ( s ). Similarly , if w e fi x selectors ξ 1 and ξ 2 for b oth pla y ers in a concurren t game G , w e obtain a Marko v chain, whic h w e denote by G ξ 1 ,ξ 2 . End comp onen ts. In an MDP , the sets of states that p la y an equiv alen t role to the closed recurrent classes of Marko v chains [16] are called “end comp onent s” [7, 8]. Definition 3 (End components) An end comp onen t of an i -M DP G , for i ∈ { 1 , 2 } , is a sub set C ⊆ S of the states such that ther e is a sele c tor ξ for player i so that C is a close d r e curr ent class of the M arkov c hain G ξ . It is not d ifficult to see that an equiv alen t characte rization of an en d comp onen t C is the follo wing. F or eac h state s ∈ C , there is a subset M i ( s ) ⊆ Γ i ( s ) of mo v es such that: 1. (close d) if a m ov e in M i ( s ) is c hosen by pla yer i at state s , then all successor s tates that are obtained with nonzero pr obabilit y lie in C ; and 2. (r e curr ent) the graph ( C , E ), where E consists of the transitio ns that o ccur with nonzero probabilit y wh en mo v es in M i ( · ) are c hosen by p la yer i , is strongly connected. Giv en a pla y ω ∈ Ω , w e d enote by Inf ( ω ) the set of states that occur s infi nitely often along ω . Giv en a set F ⊆ 2 S of subsets of stat es, w e denote by In f ( F ) th e ev ent { ω | Inf ( ω ) ∈ F } . T he follo win g theorem states that in a 2-MD P , for eve ry strategy of pla y er 2, the set of states that are visited infinitely often is, with probab ility 1, an end comp onen t. Corollary 1 follo ws easily from Theorem 2. Theorem 2 [8] F or a player-1 sele ctor ξ 1 , let C b e the set of end c omp onents of a 2-MDP G ξ 1 . F or al l player-2 str ate gies π 2 and al l states s ∈ S , we have Pr ξ 1 ,π 2 s (Inf ( C )) = 1 . 6 Corollary 1 F or a player-1 sele ctor ξ 1 , let C b e the set of end c omp onents of a 2-M DP G ξ 1 , and let Z = S C ∈C C b e the set of states of al l end c omp onents. F or al l player-2 str ate gies π 2 and al l states s ∈ S , we have Pr ξ 1 ,π 2 s ( R e ach ( Z )) = 1 . MDPs with reac hability ob jectives. G ive n a 2-MDP with a reac habilit y ob jectiv e Reac h ( T ) for p lay er 2, where T ⊆ S , the v alues can b e obtained as the solution of a linear p r ogram [14]. The linear program has a v ariable x ( s ) for all states s ∈ S , and the ob jectiv e fun ction and the constrain ts are as follo ws: min X s ∈ S x ( s ) sub ject to x ( s ) ≥ X t ∈ S x ( t ) · δ ( s, a 2 )( t ) for all s ∈ S and a 2 ∈ Γ 2 ( s ) x ( s ) = 1 for all s ∈ T 0 ≤ x ( s ) ≤ 1 for all s ∈ S The correctness of the ab o ve linear program to compute the v alues follo ws fr om [12, 14]. 4 Strategy Imp r o v emen t for Safet y G ames In this s ection w e present a strategy improv emen t algorithm for concurren t games with safet y ob jectiv es. The algorithm will p ro du ce a sequence of selecto rs γ 0 , γ 1 , γ 2 , . . . for pla yer 1, suc h that: 1. f or all i ≥ 0, we ha ve h h 1 i i γ i val (Safe( F )) ≤ h h 1 i i γ i +1 val (Safe( F )); 2. if there is i ≥ 0 such that γ i = γ i +1 , then h h 1 i i γ i val (Safe( F )) = h h 1 i i val (Safe( F )); and 3. lim i →∞ h h 1 i i γ i val (Safe( F )) = h h 1 i i val (Safe( F )). Condition 1 guarantee s th at the algorithm computes a sequence of m on otonically imp ro ving selec- tors. Condition 2 guaran tees that if a selecto r cannot b e impr o v ed, then it is optimal. Condition 3 guaran tees that th e v alue guarantee d b y the selectors con v erges to the v alue of th e game, or equiv- alen tly , that for all ε > 0, there is a num b er i of iterations such that the m emoryless play er-1 strategy γ i is ε -optimal. Note that for concurrent safet y game s, there ma y b e no i ≥ 0 suc h that γ i = γ i +1 , that is, the algorithm ma y fail to generate an optimal selector. Th is is b ecause there are concurrent safety games su c h that the v alues are ir rational [11]. W e s tart with a few notatio ns The P r e op erator and optimal selectors. Giv en a v aluation v , an d t wo selecto rs ξ 1 ∈ Λ 1 and ξ 2 ∈ Λ 2 , w e define the v aluations P r e ξ 1 ,ξ 2 ( v ), P r e 1: ξ 1 ( v ), and P r e 1 ( v ) as follo ws, for all states s ∈ S : P r e ξ 1 ,ξ 2 ( v )( s ) = X a,b ∈ M X t ∈ S v ( t ) · δ ( s, a, b )( t ) · ξ 1 ( s )( a ) · ξ 2 ( s )( b ) P r e 1: ξ 1 ( v )( s ) = inf ξ 2 ∈ Λ 2 P r e ξ 1 ,ξ 2 ( v )( s ) P r e 1 ( v )( s ) = sup ξ 1 ∈ Λ 1 inf ξ 2 ∈ Λ 2 P r e ξ 1 ,ξ 2 ( v )( s ) 7 In tuitiv ely , P r e 1 ( v )( s ) is th e greatest exp ectation of v that p la yer 1 can guarante e at a successor state of s . Also note that giv en a v aluation v , the compu tation of P r e 1 ( v ) reduces to the solution of a zero-sum one-shot matrix game, and can b e solv ed by linear pr ogramming. Similarly , P r e 1: ξ 1 ( v )( s ) is the greatest exp ectat ion of v that pla y er 1 can guarantee at a successor state of s by pla ying the selector ξ 1 . Note that all of these op erators on v aluations are monotonic: for t wo v aluations v , w , if v ≤ w , then for all selectors ξ 1 ∈ Λ 1 and ξ 2 ∈ Λ 2 , w e ha v e P r e ξ 1 ,ξ 2 ( v ) ≤ P r e ξ 1 ,ξ 2 ( w ), P r e 1: ξ 1 ( v ) ≤ P r e 1: ξ 1 ( w ), and P r e 1 ( v ) ≤ P r e 1 ( w ). Giv en a v aluation v and a state s , w e define by OptSel ( v , s ) = { ξ 1 ∈ Λ 1 ( s ) | P r e 1: ξ 1 ( v )( s ) = P r e 1 ( v )( s ) } the set of optimal selectors for v at state s . F or an optimal selector ξ 1 ∈ OptSel ( v , s ), we define the set of coun ter-optimal actions as follo ws: CountOpt ( v , s, ξ 1 ) = { b ∈ Γ 2 ( s ) | P r e ξ 1 ,b ( v )( s ) = P r e 1 ( v )( s ) } . Observe th at for ξ 1 ∈ OptSel ( v , s ), for all b ∈ Γ 2 ( s ) \ CountOpt ( v , s, ξ 1 ) w e ha ve P r e ξ 1 ,b ( v )( s ) > P r e 1 ( v )( s ). W e d efine the set of optimal selector supp ort and the coun ter-optimal a ction set as follo ws : OptSelCount ( v , s ) = { ( A, B ) ⊆ Γ 1 ( s ) × Γ 2 ( s ) | ∃ ξ 1 ∈ Λ 1 ( s ) . ξ 1 ∈ OptSel ( v , s ) ∧ Supp ( ξ 1 ) = A ∧ CountOpt ( v , s, ξ 1 ) = B } ; i.e., it consists of pairs ( A, B ) of ac tions o f p la yer 1 and play er 2, suc h that there is a n o ptimal selector ξ 1 with supp ort A , and B is the set of coun ter-optimal actions to ξ 1 . T urn-based reduction. Give n a concurrent ga me G = h S, M , Γ 1 , Γ 2 , δ i and a v aluation v w e construct a turn -b ased sto chastic game G v = h ( S , E ) , ( S 1 , S 2 , S R ) , δ i as follo ws: 1. T he set of states is as follo w s : S = S ∪ { ( s, A, B ) | s ∈ S, ( A, B ) ∈ OptS elCount ( v, s ) } ∪ { ( s, A, b ) | s ∈ S, ( A, B ) ∈ Opt SelCount ( v , s ) , b ∈ B } . 2. T he state space partition is as follo ws: S 1 = S ; S 2 = { ( s, A, B ) | s ∈ S, ( A, B ) ∈ OptSelCount ( v , s ) } ; and S R = S \ ( S 1 ∪ S 2 ) . 3. T he set of edges is as follo ws: E = { ( s, ( s, A, B )) | s ∈ S, ( A, B ) ∈ OptS elCount ( v , s ) } ∪ { (( s, A, B ) , ( s, A, b )) | b ∈ B } ∪ { (( s, A, b ) , t ) | t ∈ [ a ∈ A Dest ( s, a, b ) } . 4. T he transition function δ for all states in S R is uniform o v er its successors. In tuitiv ely , the reduction is as follo w s. Give n the v aluatio n v , s tate s is a pla yer 1 state wh ere pla y er 1 can sele ct a pair ( A, B ) (and mo ve to state ( s, A, B )) with A ⊆ Γ 1 ( s ) a nd B ⊆ Γ 2 ( s ) suc h that there is an optimal sele ctor ξ 1 with supp ort exa ctly A and the set of coun ter-optimal actions to ξ 1 is the set B . F rom a pla y er 2 state ( s, A, B ), play er 2 can choose an y action b from the set B , and mo ve to state ( s, A, b ). A state ( s, A, b ) is a probabilistic state where all the states in S a ∈ A Dest ( s, a, b ) are c hosen un if orm ly at r andom. Giv en a set F ⊆ S we d en ote b y F = F ∪ { ( s, A, B ) ∈ S | s ∈ F } ∪ { ( s, A, b ) ∈ S | s ∈ F } . W e refer to the ab o v e reduction as TB , i.e., ( G v , F ) = TB ( G, v , F ). V alue-class of a v a lua tion. Giv en a v aluation v and a r eal 0 ≤ r ≤ 1, the value-class U r ( v ) of v alue r is the set of states with v aluation r , i.e., U r ( v ) = { s ∈ S | v ( s ) = r } 8 2/3 s 0 s 1 s 3 s 2 s 6 s 5 2/3 1/3 1/3 Figure 1: A turn-based sto c hastic s afety game. 4.1 The strategy improv emen t algorithm Ordering of strategies. Let G b e a concur ren t game and F b e the set of safe states. Let T = S \ F . Giv en a concur ren t game graph G with a safet y ob jectiv e Safe( F ), the set of almost- sur e winning s tates is th e set of states s suc h that the v alue at s is 1, i.e., W 1 = { s ∈ S | h h 1 i i val (Safe( F )) = 1 } is the set of almost-sure w inning sta tes. An optimal strate gy from W 1 is referred as an almost-sure winning strategy . The set W 1 and an almost-sure winning strategy can b e computed in linear time by the alg orithm giv en in [9 ]. W e assu m e without loss of generalit y that all states in W 1 ∪ T are absorb ing. W e define a p reorder ≺ on the s trategies for pla yer 1 as follo ws : giv en t wo pla ye r 1 strategies π 1 and π ′ 1 , let π 1 ≺ π ′ 1 if the f ollo wing t w o cond itions hold: (i) h h 1 i i π 1 val (Safe( F )) ≤ h h 1 i i π ′ 1 val (Safe( F )); and (ii) h h 1 i i π 1 val (Safe( F ))( s ) < h h 1 i i π ′ 1 val (Safe( F ))( s ) for some state s ∈ S . F urthermore, w e w r ite π 1  π ′ 1 if either π 1 ≺ π ′ 1 or π 1 = π ′ 1 . W e first present an example that sho ws the impro v ement s b ased only on P r e 1 op erators are not su ffi cien t for safet y games, ev en on turn-b ased games and th en presen t our algorithm. Example 1 Consider the turn-b ase d sto chastic g ame shown in Fig 1, wher e the ✷ states ar e player 1 states, the ✸ states ar e player 2 states, and  stat es ar e r andom states with pr ob abil- ities lab ele d on e dges. The safety go al is to avoid the state s 6 . Consider a memoryless str ate gy π 1 for player 1 that cho oses the suc c essor s 0 → s 2 , and the c ounter-str ate gy π 2 for player 2 cho oses s 1 → s 0 . Given the str ate gies π 1 and π 2 , the value at s 0 , s 1 and s 2 is 1 / 3 , and sinc e al l suc c essors of s 0 have value 1 / 3 , the value c annot b e impr ove d by P r e 1 . However, note that if player 2 is r estricte d to cho ose only value optimal sele ctors for the value 1 / 3 , then player 1 c an switch to the str ate gy s 0 → s 2 and ensur e that the game stays in the value class 1 / 3 with pr ob ability 1. Henc e switching to s 0 → s 2 would for c e player 2 to sele ct a c ounter-str ate gy that switches to the str ate gy s 1 → s 3 , and thus player 1 c an get a v alue 2 / 3 . Informal description of Algorithm 1. W e no w present the strategy impro vemen t algorithm (Algorithm 1) for computin g the v alues for all states in S \ W 1 . The algorithm iterativ ely improv es pla y er-1 strategie s according to the preorder ≺ . The algorithm starts with the r an d om selector γ 0 = ξ unif 1 that pla ys at all states all actions unif orm ly at random. A t iteratio n i + 1, the algorithm considers the memoryless pla yer-1 strategy γ i and computes the v alue h h 1 i i γ i val (Safe( F )). Observe that since γ i is a memoryless strategy , the computation of h h 1 i i γ i val (Safe( F )) inv olv es solving the 2- MDP G γ i . The v aluation h h 1 i i γ i val (Safe( F )) is named v i . F or all states s suc h th at P r e 1 ( v i )( s ) > v i ( s ), the memoryless strategy at s is mo dified to a selector that is v alue-optimal for v i . The algorithm then pro ceeds to the next iteration. If P r e 1 ( v i ) = v i , then the algorithm constructs the game ( G v i , F ) = TB ( G, v i , F ), and computes A i as the set of almo st-sure winning states in G v i for the ob jectiv e S afe( F ). Let U = ( A i ∩ S ) \ W 1 . If U is non-empty , th en a select or γ i +1 is obtained at U 9 Algorithm 1 S afet y Strategy-Impro v ement Algorithm Input: a concurren t game structure G with safe set F . Output: a strategy γ f or pla yer 1. 0. Compute W 1 = { s ∈ S | h h 1 i i val (Safe( F ))( s ) = 1 } . 1. Let γ 0 = ξ unif 1 and i = 0. 2. Compute v 0 = h h 1 i i γ 0 val (Safe( F )). 3. do { 3.1. Let I = { s ∈ S \ ( W 1 ∪ T ) | P r e 1 ( v i )( s ) > v i ( s ) } . 3.2 if I 6 = ∅ , then 3.2.1 Let ξ 1 b e a pla y er-1 selector su c h that for all states s ∈ I , w e hav e P r e 1: ξ 1 ( v i )( s ) = P r e 1 ( v i )( s ) > v i ( s ). 3.2.2 The pla y er-1 selector γ i +1 is defined as follo ws : for eac h state t ∈ S , let γ i +1 ( t ) = ( γ i ( t ) if s 6∈ I ; ξ 1 ( s ) if s ∈ I . 3.3 else 3.3.1 let( G v i , F ) = TB ( G, v i , F ) 3.3.2 let A i b e the set of almost-sure winn ing states in G v i for Safe( F ) and π 1 b e a pur e memoryless almost-sure winning strategy from the set A i . 3.3.3 if (( A i ∩ S ) \ W 1 6 = ∅ ) 3.3.3.1 let U = ( A i ∩ S ) \ W 1 3.3.3.2 The pla yer-1 selector γ i +1 is defined as follo ws: for t ∈ S , let γ i +1 ( t ) =      γ i ( t ) if s 6∈ U ; ξ 1 ( s ) if s ∈ U, ξ 1 ( s ) ∈ OptSel ( v i , s ) , π 1 ( s ) = ( s , A, B ) , B = OptSelCount ( s, v , ξ 1 ) . 3.4. Compute v i +1 = h h 1 i i γ i +1 val (Safe( F )). 3.5. Let i = i + 1. } until I = ∅ and ( A i − 1 ∩ S ) \ W 1 = ∅ . 4. return γ i . from an pu re memoryless optimal strategy (i.e., an almost-sure winning strategy) in G v i , and th e algorithm pr o ceeds to iteration i + 1. If P r e 1 ( v i ) = v i and U is empt y , th en the algorithm stops and returns the memoryless strategy γ i for pla y er 1. Un lik e strategy improv emen t algorithms for turn-based games (see [6 ] for a survey), Algorithm 1 is not guaran teed to terminate, b ecause the v alue of a safet y game may not b e rational. Lemma 1 L et γ i and γ i +1 b e the player-1 sele ctors obtaine d at iter ations i and i + 1 of Al- gorithm 1. L et I = { s ∈ S \ ( W 1 ∪ T ) | P r e 1 ( v i )( s ) > v i ( s ) } . L et v i = h h 1 i i γ i val ( Safe ( F )) and v i +1 = h h 1 i i γ i +1 val ( Safe ( F )) . Then v i +1 ( s ) ≥ P r e 1 ( v i )( s ) for al l states s ∈ S ; and ther efor e v i +1 ( s ) ≥ v i ( s ) for al l states s ∈ S , and v i +1 ( s ) > v i ( s ) for al l states s ∈ I . Pro of. Consider the v aluations v i and v i +1 obtained at iterations i and i + 1, resp ectiv ely , and let w i b e the v aluation d efined b y w i ( s ) = 1 − v i ( s ) for all states s ∈ S . The counter-optimal strategy 10 for pla y er 2 to m inimize v i +1 is obtained b y maximizing the pr obabilit y to reac h T . Let w i +1 ( s ) = ( w i ( s ) if s ∈ S \ I ; 1 − P r e 1 ( v i )( s ) < w i ( s ) if s ∈ I . In other w ords, w i +1 = 1 − P r e 1 ( v i ), and we also ha ve w i +1 ≤ w i . W e n ow show that w i +1 is a feasible solutio n to the linear program for MDPs with the ob jectiv e R eac h ( T ), as describ ed in Section 3. Since v i = h h 1 i i γ i val (Safe( F )), it f ollo ws that for all states s ∈ S and all mov es a 2 ∈ Γ 2 ( s ), w e hav e w i ( s ) ≥ X t ∈ S w i ( t ) · δ γ i ( s, a 2 ) . F or all states s ∈ S \ I , we hav e γ i ( s ) = γ i +1 ( s ) and w i +1 ( s ) = w i ( s ), and since w i +1 ≤ w i , it follo ws that for all states s ∈ S \ I and all mo ves a 2 ∈ Γ 2 ( s ), w e ha v e w i +1 ( s ) = w i ( s ) ≥ X t ∈ S w i +1 ( t ) · δ γ i +1 ( s, a 2 ) ( for s ∈ S \ I ) . Since f or s ∈ I the selec tor γ i +1 ( s ) is obtained as an optimal selector for P r e 1 ( v i )( s ), it follo ws that for all states s ∈ I and all mo v es a 2 ∈ Γ 2 ( s ), w e ha v e P r e γ i +1 ,a 2 ( v i )( s ) ≥ P r e 1 ( v i )( s ); in other wo rds, 1 − P r e 1 ( v i )( s ) ≥ 1 − P r e γ i +1 ,a 2 ( v i )( s ). Hence for all stat es s ∈ I and all m ov es a 2 ∈ Γ 2 ( s ), w e hav e w i +1 ( s ) ≥ X t ∈ S w i ( t ) · δ γ i +1 ( s, a 2 ) . Since w i +1 ≤ w i , for all states s ∈ I and all mo v es a 2 ∈ Γ 2 ( s ), w e ha v e w i +1 ( s ) ≥ X t ∈ S w i +1 ( t ) · δ γ i +1 ( s, a 2 ) ( for s ∈ I ) . Hence it follo w s that w i +1 is a feasible solution to the linear program for MDPs w ith r eac hability ob jectiv es. Since the reac hability v aluation for pla y er 2 for Reac h ( T ) is the least solution (observ e that the ob jectiv e fun ction of the linear pr ogram is a minimizing fun ction), it follo ws that v i +1 ≥ 1 − w i +1 = P r e 1 ( v i ). Thus we obtain v i +1 ( s ) ≥ v i ( s ) for all states s ∈ S , and v i +1 ( s ) > v i ( s ) for all states s ∈ I . Recall th at b y Example 1 it follo w s that impro ve ment by only step 3.2 is not sufficient to guaran tee con ve rgence to op timal v alues. W e now present a lemma ab out the turn-based redu ction, and then sho w that step 3.3 also leads to an impro ve ment. Finally , in Theorem 4 we show that if imp ro v emen ts b y step 3.2 and step 3.3 are not p ossible, then the optimal v alue and an optimal strategy is obtained. Lemma 2 L et G b e a c oncurr ent game with a set F of safe sta tes. L et v b e a valuatio n and c on- sider ( G v , F ) = TB ( G, v , F ) . L et A b e the set of almost-sur e winning states in G v for the obje ctive Safe ( F ) , and let π 1 b e a pur e memoryless almo st-sur e winning str ate gy fr om A in G v . Consider a memoryless str ate g y π 1 in G for states in A ∩ S as fol lows: if π 1 ( s ) = ( s, A, B ) , then π 1 ( s ) ∈ OptSel ( v , s ) such that Supp ( π 1 ( s )) = A and OptSelCount ( v , s, π 1 ( s )) = B . Consider a pur e memo- ryless str ate gy π 2 for player 2. If for al l states s ∈ A ∩ S , we have π 2 ( s ) ∈ OptSelCount ( v , s , π 1 ( s )) , then for al l s ∈ A ∩ S , we have Pr π 1 ,π 2 s ( Safe ( F )) = 1 . 11 Pro of. W e analyze the Mark ov chain arising after the p la yer fix es the m emoryless strategies π 1 and π 2 . Giv en the strateg y π 2 consider the s trategy π 2 as follo ws: if π 1 ( s ) = ( s, A, B ) and π 2 ( s ) = b ∈ OptS elCount ( v , s, π 1 ( s )), then at state ( s, A, B ) choose th e successor ( s, A, b ). S ince π 1 is an almost- sur e winn in g strategy for S afe( F ), it follo w s that in the Mark o v c h ain o btained b y fixin g π 1 and π 2 in G v , all closed co nnected recur ren t set of states that in tersect with A are con tained in A , and f r om all states of A the closed connected recurrent set of states within A are reac h ed w ith p robabilit y 1. It follo ws that in the Mark o v c hain obtained from fixing π 1 and π 2 in G all closed connected r ecurrent set of states th at in tersect with A ∩ S are conta ined in A ∩ S , and from all state s of A ∩ S the closed connected r ecurren t set of states within A ∩ S are r eac hed with probabilit y 1. Th e desired r esult follo ws. Lemma 3 L et γ i and γ i +1 b e the player-1 sele ctors obtaine d at iter ations i and i + 1 of Algorithm 1. L et I = { s ∈ S \ ( W 1 ∪ T ) | P r e 1 ( v i )( s ) > v i ( s ) } = ∅ , and ( A i ∩ S ) \ W 1 6 = ∅ . L et v i = h h 1 i i γ i val ( Safe ( F )) and v i +1 = h h 1 i i γ i +1 val ( Safe ( F )) . Then v i +1 ( s ) ≥ v i ( s ) for al l states s ∈ S , and v i +1 ( s ) > v i ( s ) for some state s ∈ ( A i ∩ S ) \ W 1 . Pro of. W e fir st sho w that v i +1 ≥ v i . Let U = ( A i ∩ S ) \ W 1 . Let w i ( s ) = 1 − v i ( s ) for all states s ∈ S . Since v i = h h 1 i i γ i val (Safe( F )), it follo ws that for all states s ∈ S and all mov es a 2 ∈ Γ 2 ( s ), w e ha v e w i ( s ) ≥ X t ∈ S w i ( t ) · δ γ i ( s, a 2 ) . The selector ξ 1 ( s ) c hosen for γ i +1 at s ∈ U satisfies that ξ 1 ( s ) ∈ OptS el ( v i , s ). It follo ws that for all states s ∈ S and all mo ves a 2 ∈ Γ 2 ( s ), w e hav e w i ( s ) ≥ X t ∈ S w i ( t ) · δ γ i +1 ( s, a 2 ) . It follo ws th at the maximal probabilit y w ith which play er 2 can reac h T against the strategy γ i +1 is at most w i . It follo ws that v i ( s ) ≤ v i +1 ( s ). W e no w argue that for some sta te s ∈ U we ha v e v i +1 ( s ) > v i ( s ). Give n the strategy γ i +1 , consider a pu re memoryless counter-o ptimal str ategy π 2 for p la yer 2 to reac h T . Since the selectors γ i +1 ( s ) at states s ∈ U are obtained from the almost- sur e strategy π in the turn -based game G v i to satisfy Safe( F ), it f ollo ws fr om Lemma 2 that if for ev ery state s ∈ U , the action π 2 ( s ) ∈ OptSelCount ( v i , s, γ i +1 ), then from all stat es s ∈ U , the game sta ys safe in F with probabilit y 1. Since γ i +1 is a giv en strategy for pla y er 1, and π 2 is counter-o ptimal against γ i +1 , this w ould imply th at U ⊆ { s ∈ S | h h 1 i i val (Safe( F )) = 1 } . This would con tradict that W 1 = { s ∈ S | h h 1 i i val (Safe( F )) = 1 } and U ∩ W 1 = ∅ . It follo ws that for some state s ∗ ∈ U we h a v e π 2 ( s ∗ ) 6∈ OptSelCount ( v i , s ∗ , γ i +1 ), and since γ i +1 ( s ∗ ) ∈ OptSel ( v i , s ∗ ) w e ha v e v i ( s ∗ ) < X t ∈ S v i ( t ) · δ γ i +1 ( s ∗ , π 2 ( s ∗ )); in other w ords, we ha ve w i ( s ∗ ) > X t ∈ S w i ( t ) · δ γ i +1 ( s ∗ , π 2 ( s ∗ )) . 12 Define a v aluation z as f ollo ws: z ( s ) = w i ( s ) for s 6 = s ∗ , and z ( s ∗ ) = P t ∈ S w i ( t ) · δ γ i +1 ( s ∗ , π 2 ( s ∗ )). Hence z < w i , and giv en the strategy γ i +1 and the counter-optimal strategy π 2 , the v aluation z satisfies the inequalities of the lin ear-program for reac hability to T . It follo ws that the p robabilit y to r eac h T give n γ i +1 is at most z . Since z < w i , it follo ws that v i +1 ( s ) ≥ v i ( s ) for all s ∈ S , and v i +1 ( s ∗ ) > v i ( s ∗ ). This concludes the pro of. W e obtain the follo wing th eorem from Lemma 1 and Lemma 3 that shows that th e sequen ces of v alues w e obtain is monotonically non-decreasing. Theorem 3 (Monotonicit y of v alues) F or i ≥ 0 , let γ i and γ i +1 b e the player-1 sele ctors obtaine d at i ter ations i and i + 1 of A lgorithm 1. If γ i 6 = γ i +1 , then h h 1 i i γ i val ( Safe ( F )) < h h 1 i i γ i +1 val ( Safe ( F )) . Theorem 4 (Optimality on terminat ion) L et v i b e the valuation at iter ation i of Algorithm 1 such that v i = h h 1 i i γ i val ( Safe ( F )) . If I = { s ∈ S \ ( W 1 ∪ T ) | P r e 1 ( v i )( s ) > v i ( s ) } = ∅ , and ( A i ∩ S ) \ W 1 = ∅ , then γ i is an optima l str ate g y and v i = h h 1 i i val ( Safe ( F )) . Pro of. W e sh o w that f or all memoryless strategies π 1 for pla ye r 1 we ha v e h h 1 i i π 1 val (Safe( F )) ≤ v i . Since memoryless optimal strateg ies exist for concurrent games with safet y ob jectiv es (Theorem 1) the desired result follo ws. Let π 2 b e a pur e m emoryless optimal strategy for p la y er 2 in G v i for the ob jectiv e complemen tary to S afe( F ), where ( G v i , Safe( F )) = TB ( G, v i , F ). Consider a memoryless strategy π 1 for pla y er 1, and w e defin e a p ure memoryless strategy π 2 for pla y er 2 as follo ws. 1. I f π 1 ( s ) 6∈ OptSel ( v i , s ), then π 2 ( s ) = b ∈ Γ 2 ( s ), su ch that P r e π 1 ( s ) ,b ( v i )( s ) < v i ( s ); (such a b exists since π 1 ( s ) 6∈ OptSel ( v i , s )). 2. I f π 1 ( s ) ∈ OptS el ( v i , s ), then let A = Supp ( π 1 ( s )), and consider B suc h that B = OptSelCount ( v i , s, π 1 ( s )). Then we hav e π 2 ( s ) = b , su c h that π 2 (( s, A, B )) = ( s, A, b ). Observe that by construction of π 2 , for all s ∈ S \ ( W 1 ∪ T ), we ha v e P r e π 1 ( s ) ,π 2 ( s ) ( v i )( s ) ≤ v i ( s ). W e first sho w that in the Mark o v c hain obtained by fixing π 1 and π 2 in G , there is no closed connected recurrent set of s tates C suc h that C ⊆ S \ ( W 1 ∪ T ). Assu me to wards con tradiction that C is a closed connected recurrent set of states in S \ ( W 1 ∪ T ). The follo w in g case analysis ac hiev es the con tradiction. 1. S upp ose for ev ery state s ∈ C w e ha v e π 1 ( s ) ∈ OptSel ( v i , s ). Then c onsider the strategy π 1 in G v i suc h that for a state s ∈ C w e ha v e π 1 ( s ) = ( s, A, B ), wh ere π 1 ( s ) = A , and B = OptSelCount ( v i , s, π 1 ( s )). Since C is closed connected recurrent states, it follo ws by construction that for all states s ∈ C in the game G v i w e ha v e Pr π 1 ,π 2 s (Safe( C )) = 1, wh ere C = C ∪ { ( s, A, B ) | s ∈ C } ∪ { ( s, A, b ) | s ∈ C } . It follo ws that for all s ∈ C in G v i w e h a v e Pr π 1 ,π 2 s (Safe( F )) = 1. Since π 2 is an optimal strategy , it f ollo ws that C ⊆ ( A i ∩ S ) \ W 1 . This con tradicts that ( A i ∩ S ) \ W 1 = ∅ . 2. O therwise for some state s ∗ ∈ C w e ha v e π 1 ( s ∗ ) 6∈ OptSel ( v i , s ∗ ). Let r = min { q | U q ( v i ) ∩ C 6 = ∅} , i.e., r is the least v alue-class with non-empty inte rsection with C . Hence it follo w s that for all q < r , w e ha ve U q ( v i ) ∩ C = ∅ . Observe that since for all s ∈ C w e hav e P r e π 1 ( s ) ,π 2 ( s ) ( v i )( s ) ≤ v i ( s ), it f ollo ws that f or all s ∈ U r ( v i ) either (a) Dest ( s, π 1 ( s ) , π 2 ( s )) ⊆ 13 U r ( v i ); or (b) D e st ( s, π 1 ( s ) , π 2 ( s )) ∩ U q ( v i ) 6 = ∅ , for some q < r . Since U r ( v i ) is the least v alue-class with non-empt y intersect ion with C , it follo ws that for all s ∈ U r ( v i ) w e ha ve Dest ( s, π 1 ( s ) , π 2 ( s )) ⊆ U r ( v i ). It follo ws that C ⊆ U r ( v i ). C onsider the state s ∗ ∈ C suc h that π 1 ( s ∗ ) 6∈ OptSel ( v i , s ). By the construction of π 2 ( s ), w e ha ve P r e π 1 ( s ∗ ) ,π 2 ( s ∗ ) ( v i )( s ∗ ) < v i ( s ∗ ). Hence w e m ust ha v e Dest ( s ∗ , π 1 ( s ∗ ) , π 2 ( s ∗ )) ∩ U q ( v i ) 6 = ∅ , for s ome q < r . Thus we ha v e a con tradiction. It foll o ws from ab o v e that there is no clo sed connected recurren t set of states in S \ ( W 1 ∪ T ), and h ence w ith probabilit y 1 th e game reac hes W 1 ∪ T from all states in S \ ( W 1 ∪ T ). Hence the pr obabilit y to satisfy S afe( F ) is equal to the p robabilit y to reac h W 1 . S ince for all states s ∈ S \ ( W 1 ∪ T ) w e ha v e P r e π 1 ( s ) ,π 2 ( s ) ( v i )( s ) ≤ v i ( s ), it follo ws that giv en the strategies π 1 and π 2 , the v aluation v i satisfies all the inequalities for lin ear program to reac h W 1 . It follo ws that the probabilit y to reac h W 1 from s is atmost v i ( s ). It follo ws that for all s ∈ S \ ( W 1 ∪ T ) w e ha ve h h 1 i i π 1 val (Safe( F ))( s ) ≤ v i ( s ). The result follo ws. Con v ergence. W e fi rst ob s erv e that since p ure memoryless optimal strategies exist for turn- based sto c hastic games with safet y ob jectiv es (Theorem 1), for turn-based sto c hastic games it suffices to iterate o ve r pure memoryless selectors. Since the num b er of pur e memoryless strategies is finite, it follo ws for tur n-based sto c hastic games Algorithm 1 alwa ys te rmin ates and yields an optimal strategy . F or concurrent games, w e will use the resu lt that for ε > 0, there is a k -uniform memoryless str ategy th at ac hieves the v alue of a safet y ob jectiv e with in ε . W e first define k - uniform memoryless strategies. A selector ξ for play er 1 is k - uniform if for all s ∈ S \ ( T ∪ W 1 ) and all a ∈ Supp ( π 1 ( s )) there exists i, j ∈ N suc h that 0 ≤ i ≤ j ≤ k and ξ ( s )( a ) = i j , i.e., the mo ve s in the supp ort are play ed with p r obabilit y that are multiples of 1 ℓ with ℓ ≤ k . Lemma 4 F or al l c oncurr ent game gr aphs G , for al l safety obje ctives Safe ( F ) , for F ⊆ S , for al l ε > 0 , ther e exist k -uniform sele ctors ξ such th at ξ is an ε -optimal str ate gy for k = 2 2 O ( n ) ε , wh er e n = | S | . Pro of. (Sketch). F or a rational r , using the results of [11], it can b e sh o wn that whether h h 1 i i val (Safe( F ))( s ) ≥ r can b e expressed in the qu an tifier free fragmen t of the theory of reals. Then using the form u la in th e theory of r eals and Theorem 13 .12 of [1], it can b e sho wn that if there is a memoryless strategy π 1 that ac hiev es v alue at least r , then th ere is a k -uniform memoryless strategy π k 1 that ac h iev es v alue at least r − ε , where k = 2 2 O ( n ) ε , for n = | S | . Strategy impro vemen t w ith k -uniform selectors. W e first argue that if w e restrict Algo- rithm 1 suc h that ev ery iteration yields a k -uniform selector, then the algorithm terminates. If w e restrict to k -uniform selectors, then a concurrent game graph G can b e con verted to a turn -based sto c hastic game graph, where pla ye r 1 first c ho oses a k -uniform selector, then pla ye r 2 c ho oses an action, and then th e transition is determined b y the chose n k -uniform selector of pla yer 1, the action of pla ye r 2 and the transitio n fun ction δ of the game graph G . Then by termination of turn-based sto chastic games it f ollo ws that th e algorithm will terminate. Giv en k , let us denote by z k i the v aluation of Algorithm 1 at iteration i , where the selectors are restricted to b e k -uniform , and v i is the v aluation of Algorithm 1 at iteratio n i . Since v i is obtained without an y restriction, it follo w s that f or all k > 0, for all i ≥ 0, w e ha ve z k i ≤ v i . F rom Lemma 4 it follo ws that f or all ε > 0, there exists a k > 0 and i ≥ 0 su c h that f or all s w e ha v e z k i ( s ) ≥ h h 1 i i val (Safe( F ))( s ) − ε . This giv es us the follo wing r esult. 14 Theorem 5 (Con verg ence) L et v i b e the valuation obtaine d at iter ation i of Algo rithm 1. Then the fol lowing assertions hold. 1. F or al l ε > 0 , ther e exists i such that for al l s we have v i ( s ) ≥ h h 1 i i val ( Safe ( F ))( s ) − ε . 2. lim i →∞ v i = h h 1 i i val ( Safe ( F )) . Complexit y . Algorithm 1 ma y n ot terminate in general. W e b riefly describ e the complexit y o f ev ery iteration. Give n a v aluation v i , the computation of P r e 1 ( v i ) in v olv es solution of matrix games with rew ard s v i and can b e compu ted in p olynomial time usin g linear-pr ogramming. Given v i and P r e 1 ( v i ) = v i , the set OptSel ( v i , s ) and OptSelCount ( v i , s ) can b e computed b y en umerating the su b sets of a v ailable actions at s and th en using linear-pr ogramming: f or example to c h eck ( A, B ) ∈ OptSelCount ( v i , s ) it su ffices to chec k th at th er e is an selector ξ 1 suc h that ξ 1 is optimal (i.e. for all actions b ∈ Γ 2 ( s ) we ha v e P r e ξ 1 ,b ( v i )( s ) ≥ v i ( s )); for all a ∈ A w e ha v e ξ 1 ( a ) > 0, and for all a 6∈ A w e ha ve ξ 1 ( a ) = 0; and to chec k B is the set of coun ter-optimal actions w e chec k that for b ∈ B we ha v e P r e ξ 1 ,b ( v i )( s ) = v i ( s ); and for b 6∈ B we ha v e P r e ξ 1 ,b ( v i )( s ) > v i ( s ). All the ab o ve can b e solv ed b y c hec king feasibilit y of a set of linear inequalities. Hence TB ( G, v i , F ) can b e computed in time p olynomial in size of G and v i and exp onen tial in the num b er of mo v es. The set of almost-sure winning states in tur n-based sto c hastic games w ith safet y ob jectiv es can b e computed in linear-time [10]. 5 T e rmination for App r o ximation and T urn-based Games In this section w e present termination criteria for s trategy improv emen t algorithms for concurren t games for ε -approximat ion, and then presen t an impro ved te rmin ation condition for turn-based games. T ermination for concurren t games. A strategy impro v emen t algorithm for reac habilit y games w as presented in [4]. W e refer to the algorithm of [4] as the r e achability str ate gy impr ovement algorithm . T he reac habilit y strategy impro v ement algorithm is simpler than Algorithm 1: it is similar to Algorithm 1 and in every iterati on only Step 3.2 is executed (and Step 3.3 n eed not b e executed). Applying the reac habilit y strategy improv ement algorithm of [4] for play er 2, for a reac h abilit y ob jectiv e Reac h( T ), we obtain a sequence of v aluations ( u i ) i ≥ 0 suc h that (a) u i +1 ≥ u i ; (b) if u i +1 = u i , then u i = h h 2 i i val (Reac h ( T )); and (c) lim i →∞ u i = h h 2 i i val (Reac h ( T )). Giv en a concurrent game G with F ⊆ S and T = S \ F , we apply the reac habilit y strategy impro v emen t algorithm to obtain th e s equence of v aluation ( u i ) i ≥ 0 as ab ov e, and w e app ly Algorithm 1 to obtain a sequence of v aluation ( v i ) i ≥ 0 . The termination criteria are as follo ws: 1. if for some i we ha ve u i +1 = u i , then w e ha v e u i = h h 2 i i val (Reac h ( T )), and 1 − u i = h h 1 i i val (Safe( F )), and we obtain the v alues of the game; 2. if for some i we ha ve v i +1 = v i , then we h a v e 1 − v i = h h 2 i i val (Reac h ( T )), an d v i = h h 1 i i val (Safe( F )), and we obtain the v alues of the game; and 3. f or ε > 0, if for some i ≥ 0, w e hav e u i + v i ≥ 1 − ε , then for all s ∈ S w e hav e v i ( s ) ≥ h h 1 i i val (Safe( F ))( s ) − ε and u i ( s ) ≥ h h 2 i i val (Reac h ( T ))( s ) − ε (i.e., the algorithm can stop f or ε -appro ximation). 15 Observe that since ( u i ) i ≥ 0 and ( v i ) i ≥ 0 are b oth monotonically non-decreasing and h h 1 i i val (Safe( F )) + h h 2 i i val (Reac h ( T )) = 1, it follo ws that if u i + v i ≥ 1 − ε , then forall j ≥ i w e h a v e u i ≥ u j − ε and v i ≥ v j − ε . This esta blishes that u i ≥ h h 1 i i val (Safe( F )) − ε and v i ≥ h h 2 i i val (Reac h ( T )) − ε ; and the correctness of the stopping criteria (3) for ε -approxima tion follo ws. W e also n ote that instead of applying the reac habilit y strategy improv ement algorithm, a v alue-iteratio n algorithm ca n b e applied for reac habilit y games to obtain a sequence of v aluation with pr op erties similar to ( u i ) i ≥ 0 and the ab o v e termination criteria can b e applied. Theorem 6 L et G b e a c oncurr ent game gr aph with a safety obje ctive Safe ( F ) . Algorith m 1 and the r e achability str ate gy impr ovement algorithm for player 2 for the r e achability obje c tive R e ach ( S \ F ) yield se quenc e of valuat ions ( v i ) i ≥ 0 and ( u i ) i ≥ 0 , r esp e ctively, such that (a) for al l i ≥ 0 , we have v i ≤ h h 1 i i val ( Safe ( F )) ≤ 1 − u i ; and (b) lim i →∞ v i = lim i →∞ 1 − u i = h h 1 i i val ( Safe ( F )) . T ermination f or turn-based games. F or turn-based sto c hastic games Algorithm 1 and as w ell as the reac habilit y s trategy impro v ement algorithm terminates. Eac h iteration of the reac h ab ility strategy impr o v emen t algorithm of [4] is compu table in p olynomial time, and here we pr esen t a ter- mination guaran tee for the reac hability strategy impro v ement algorithm. T o apply the reac habilit y strategy impr o v emen t alg orithm w e assume the ob jectiv e of pla y er 1 to b e a reac habilit y ob jec- tiv e Reac h( T ), and the correctness of the alg orithm relies on the notion of pr op er str ate gies . Let W 2 = { s ∈ S | h h 1 i i val (Reac h ( T ))( s ) = 0 } . Then the notion of p rop er strategies and its prop erties are as follo ws. Definition 4 (Prop er strategies and selectors) A player-1 str ate gy π 1 is prop er if for al l player-2 str ate gie s π 2 , and for al l states s ∈ S \ ( T ∪ W 2 ) , we have Pr π 1 ,π 2 s ( R e ach ( T ∪ W 2 )) = 1 . A player-1 sele ctor ξ 1 is prop er if the memoryless player-1 str ate gy ξ 1 is pr op er. Lemma 5 ([4]) Given a sele ctor ξ 1 for player 1, the memoryless player-1 str ate gy ξ 1 is pr op er iff for every pur e sele ctor ξ 2 for player 2, and for al l states s ∈ S , we have Pr ξ 1 ,ξ 2 s ( R e ach ( T ∪ W 2 )) = 1 . The follo wing result follo ws from th e result of [4] sp ecial ized for the case of turn -based sto chastic games. Lemma 6 L et G b e a turn-b ase d sto chastic game with r e achability obje ctive R e ach ( T ) for player 1. L et γ 0 b e the initial sele ctor, and γ i b e the sele ctor obtaine d at iter ation i of the r e achability str ate gy impr ovement algorithm. If γ i is a pur e, pr op er sele ctor, then the fol lowing assertions hold: 1. for al l i ≥ 0 , we have γ i is a pur e, pr op er sele ctor; 2. for al l i ≥ 0 , we have u i +1 ≥ u i , wher e u i = h h 1 i i γ i val ( R e ach ( T )) and u i +1 = h h 1 i i γ i +1 val ( R e ach ( T )) ; and 3. if u i +1 = u i , then u i = h h 1 i i val ( R e ach ( T )) , and ther e exists i such that u i +1 = u i . The strategy improv emen t algorithm of Cond on [6] works only for halting games , but the r eac ha- bilit y strategy imp ro v ement algorithm wo rks if we start with a pu re, prop er selector for reac habilit y games that are not halting. Hence to use the reac habilit y strategy impro v ement algorithm to com- pute v alues w e need to start with a pure, prop er selector. W e presen t a pro cedur e to compu te a 16 pure, prop er selector, and then presen t termination b ounds (i.e., b ounds on i suc h that u i +1 = u i ). The construction of pure, prop er selector is b ased on the notion of attr actors defined b elo w. Attr actor str ate gy. Let A 0 = W 2 ∪ T , and for i ≥ 0 w e hav e A i +1 = A i ∪ { s ∈ S 1 ∪ S R | E ( s ) ∩ A i 6 = ∅} ∪ { s ∈ S 2 | E ( s ) ⊆ A i } . Since for all s ∈ S \ W 2 w e ha ve h h 1 i i val (Reac h ( T )) > 0, it follo ws that from all states in S \ W 2 pla y er 1 ca n ensu re th at T is reac hed with p ositiv e prob ab ility . It follo w s that for s ome i ≥ 0 w e ha v e A i = S . Th e pure attr actor selector ξ ∗ is as follo ws: for a state s ∈ ( A i +1 \ A i ) ∩ S 1 w e h a v e ξ ∗ ( s )( t ) = 1, wh ere t ∈ A i (suc h a t exists by construction). The pure memoryless strateg y ξ ∗ ensures that for all i ≥ 0, from A i +1 the game reac hes A i with p ositiv e probability . Hence there is no end -comp onen t C con tained in S \ ( W 2 ∪ T ) in the MDP G ξ ∗ . It follo w s that ξ ∗ is a p ure selector that is prop er, and the selec tor ξ ∗ can b e compu ted in O ( | E | ) time. This co mpletes the reac h abilit y strategy imp ro v ement algorithm for tur n-based sto c hastic games. W e no w presen t the termination b oun ds. T ermination b ounds. W e present termination b ound s for binary tur n-based sto chastic games. A turn-based sto c hastic game is binary if for all s ∈ S R w e ha ve | E ( s ) | ≤ 2, and for all s ∈ S R if | E ( s ) | = 2, then for all t ∈ E ( s ) w e ha ve δ ( s )( t ) = 1 2 , i.e., for all probabilistic states there are at most t w o successors and the transition function δ is un iform. Lemma 7 L et G b e a binary Markov chain with | S | states with a r e achability obje c tive R e ach ( T ) . Then for al l s ∈ S we have h h 1 i i val ( R e ach ( T )) = p q , with p , q ∈ N and p, q ≤ 4 | S |− 1 . Pro of. The results follo w as a sp ecial case of Lemma 2 of [6]. Lemma 2 of [6 ] holds for h alting turn-based sto chastic games, and since Mark o v chains reac h es the set of closed connected recur ren t states with probability 1 from all states th e result follo ws. Lemma 8 L et G b e a binar y turn-b ase d sto chastic game with a r e achability obje ctive R e ach ( T ) . Then for al l s ∈ S we have h h 1 i i val ( R e ach ( T )) = p q , with p , q ∈ N and p, q ≤ 4 | S R |− 1 . Pro of. S ince pure memoryless optimal s tr ategie s exist for b oth pla y ers (Theorem 1), we fix pure memoryless optimal strate gies π 1 and π 2 for b oth pla yers. Th e Mark o v c h ain G π 1 ,π 2 can b e then reduced to an equiv alen t Mark o v c hains w ith | S R | states (since w e fix determin istic successors for states in S 1 ∪ S 2 , they can b e collapsed to their su ccessors). The resu lt th en follo ws from Lemma 7 . F rom Lemma 8 it follo ws th at at iteration i of th e reac hability strategy improv ement algorithm either the sum of the v alues either increases b y 1 4 | S R |− 1 or else there is a v aluation u i suc h that u i +1 = u i . Since the su m of v alues of all states can b e at most | S | , it follo ws that algo rithm terminates in at m ost | S | · 4 | S R |− 1 steps. Moreo ve r, since the n umber of pu re memoryless strategies is at most Q s ∈ S 1 | E ( s ) | , the algorithm terminates in at most Q s ∈ S 1 | E ( s ) | steps. It follo ws from the resu lts of [19] that a turn-based sto chastic game graph G can b e reduced to a equiv alent bin ary turn-based sto c hastic ga me graph G ′ suc h that the set of p la yer 1 and play er 2 states in G and G ′ are the same and the num b er of pr obabilistic states in G ′ is O ( | δ | ), where | δ | is the size of th e transition function in G . Thus we ob tain the follo w in g resu lt. 17 Theorem 7 L et G b e a turn-b ase d sto chastic game with a r e achability obje ctive R e ach ( T ) , then the r e achability str ate gy impr ovement al gorithm c omputes the values in time O  min { Y s ∈ S 1 | E ( s ) | , 2 O ( | δ | ) } · p oly ( | G |  ; wher e p oly is p olynomial fu nction. The results of [15] presen ted an algorithm for turn-based stochastic games that w orks in time O ( | S R | ! · p oly ( | G | )). Th e algorithm of [15 ] wo rks only for tur n -based sto c hastic games, for general turn-based sto chastic games the complexit y of th e algorithm of [15] is b etter. Ho wev er, for turn - based sto chastic games where the transition fun ction at all states can expressed in constan t bits we ha v e | δ | = O ( | S R | ). In th ese cases the reac habilit y strategy impr o v emen t algo rithm (that wo rks for b oth concurrent and turn-b ased sto c hastic games) w orks in time 2 O ( | S R | ) · p oly ( | G | ) as compared to the time 2 O ( | S R |· log( | S R | ) · p oly ( | G | ) of the algorithm of [15]. References [1] S. Basu, R. P ollac k, and M.-F.Ro y . Algorith ms in R e al Algebr aic Ge ometry . Springer, 2003. [2] D.P . Bertsek as. Dynamic Pr o gr amming and Optimal Contr ol . A thena S cien tific, 1995. V olumes I and I I. [3] A. Bianco and L. d e Alfaro. Mo del chec king of pr obabilistic and nondeterministic systems. In FSTTCS 95: Softwar e T e chnolo gy and The or etic al Computer Scienc e , vol ume 1026 of L e ctur e Notes in Computer Scienc e , pages 499–5 13. Spr inger-V erlag, 1995. [4] K. Chatterjee, L. de Alfaro, and T.A. Henzinger. Strategy impr o vemen t in concurr en t r eac h- abilit y games. In QEST’06 . IEEE, 2006. [5] A. Cond on. The complexit y of s to c hastic games. Information and Computation , 96(2):20 3–224, 1992. [6] A. Condon. O n algorithms for simple stochastic games. In A dvanc es in Computational Com- plexity The ory , volume 13 of DIMACS Series in Discr ete M athematics and The or etic al Com- puter Scienc e , pages 51–73. American Mathematical So ciet y , 1993. [7] C. Courcoub etis and M. Y annak akis. The complexit y of probabilistic verificati on. Journal of the ACM , 42(4):8 57–907, 1995. [8] L. de Alfaro. F ormal V erific ation of Pr ob abilistic Systems . PhD th esis, S tanford Un iv ersit y , 1997. T ec hnical Rep ort ST AN-CS-TR-98-1601 . [9] L. d e Alfaro and T.A. Henzinger. Concurren t omega-regular games. I n P r o c e e dings of the 15th Annual Symp osium on L o gic in Computer Scienc e , page s 141–1 54. IEEE Compu ter So ciet y Press, 2000. [10] L. de Alfaro, T.A. Henzinger, and O. K upferman. Concurr en t reac habilit y games. Th e or etic al Computer Scienc e , 386(3):1 88–217, 2007. 18 [11] L. de Alfaro and R. Ma jumdar. Quant itativ e solution of omega-regular g ames. Journal of Computer and System Sci e nc es , 68:374 –397, 2004. [12] C. Derman. Finite State M arkovian De cision Pr o c esses . Academic Press, 1970. [13] K. Etessami and M. Y annak akis. Recursiv e concurr en t sto chastic games. In ICALP 06: Autom ata, L anguages, and Pr o gr amming . Spr inger, 2006 . [14] J. Filar and K. V rieze. Comp etitive Markov De cision Pr o c esses . Springer-V erlag, 1997. [15] H. Gim b ert and F. Horn. Simple stochasti c games with f ew random v ertices are easy to solve. In F oSSaCS’08 (T o app e ar) , 2008. [16] J.G. Kemeny , J.L. Snell, and A.W. Knapp . Denumer able Markov Chains . D. V an Nostrand Compan y , 1966 . [17] P .R. K umar and T.H. Sh iau. Existence of v alue and randomized strategies in zero-sum d iscrete- time sto c hastic dyn amic games. SIAM J. Contr ol and Optimization , 19(5):617– 634, 19 81. [18] D.A. Martin. T he d eterminacy of Blac kw ell games. The Journal of Symb olic L o gic , 63(4):156 5– 1581, 1998 . [19] U. Zwic k and M.S. P aterson. The complexit y o f mean pa y off games on graphs. The or etic al Computer Scienc e , 158:343 –359, 1996. 19

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment