Measuring and Synthesizing Systems in Probabilistic Environments
Often one has a preference order among the different systems that satisfy a given specification. Under a probabilistic assumption about the possible inputs, such a preference order is naturally expressed by a weighted automaton, which assigns to each…
Authors: Krishnendu Chatterjee, Thomas A. Henzinger, Barbara Jobstmann
Measuring and Syn thesizing Systems in Probabilistic En vironmen ts Krishnendu Chatterjee 1 , Th omas A. Henzinger 1 , 2 , Barbar a Jobstmann 3 , and Rohit Singh 4 1 Institute of Science and T ec hnology Austria (IST Austria) 2 ´ Ecole Po lytechnique F ´ ed´ eral de Lausanne (EPFL), Switzerland 3 CNRS/V eri mag, F rance 4 Indian Institute of T echnology (II T), Bomba y T e chnic al R ep ort, April 14, 2011 Abstract. Often one has a preference order among t he d ifferent systems that satisfy a giv en sp ecification. Under a probabilis tic assumption about the p ossible inputs, suc h a preference order is naturally expressed by a w eighted automaton, which assigns to each w ord a v alue, such that a system is preferred if it generates a higher expected val ue. W e solve the foll owing optimal-synthesis problem: giv en an omega-regular sp ecification, a Marko v chai n that describ es the distribution of inp uts, and a weigh ted automaton that measures how w ell a system satisfies the given specification under the giv en input assumption, sy nthesize a system that optimizes the measured v alue. F or safety sp ecifications and measures that are defined b y mean- pay off automata, the optimal-synthesis problem amounts to finding a strategy in a Marko v decision pro cess (MDP) that is optimal for a long-run av er- age rew ard ob jectiv e, which can be done in p olyn omial time. F or general omega-regular sp ecifications, t he solution rests on a new, p olynomial-time algorithm for computing optimal strategies in MDPs wi th mean-pay off parity ob jectives. Ou r algorithm generates optimal strategies consisting of tw o memoryless strategies and a coun ter. This coun ter is in general not b ounded. T o obtain a fi nite-state system, we show h o w to construct an ε -optimal strategy wi th a b ounded counter for any ε > 0. F urthermore, w e sh o w how to d ecide in p olynomial time if w e can construct an optimal finite-state system (i.e., a system without a counter) for a given sp ecifica- tion. W e hav e implemented our approach and the und erlying algorithms in a t ool that t akes qualitativ e an d quantitativ e sp ecifications and automatically constructs a system that satisfies the qualitativ e sp ecification and opti- mizes the quantitativ e sp ecification, if such a system exists. W e present some exp erimental results show ing optimal systems that w ere automati- cally generated in this wa y . 1 In tro duction Building co rrect and reliable programs is o ne of the k ey challenges in computer science. Automatic verification and synthesis aims to address this pro blem by defining cor rectness with resp ect to a formal sp e cific ation , a mathematical de- scription of the desired b ehavior of the system. In a utomatic verification, we ask if a given sys tem satisfies a given sp ecifica tion [18, 43, 20]. The syn thesis problem asks to automatically derived a system fro m a sp ecifica tion [17, 44, 41]. T radition- ally , the verification and synt hesis problem are s tudied with resp ect to Bo o lean sp ecifications in an adversarial en viro nment: the Bo olean (or qualitative) specifi- cation ma ps each p ossible behavior of a system to true or false indicating if this behavior is a desired b ehavior or not. Analy zing a sy stem in a n a dversarial en- vironment corres p o nds to considering the system under the w ors t- case b e havior of the environmen t. In this work we study the verification and synth esis prob- lem for quant itative ob jectives in pr obabilistic environments, whic h co rresp onds to analyzing the system under the av erage - case b ehavior of its en vironment. Quantitativ e reasoning is traditiona lly used to mea sure quantitativ e prop erties of systems, such as p erfor ma nce or reliability (cf. [1, 34, 4, 38]). Qua nt itative rea- soning has also been shown useful in th e classically Boolean con texts of verification and synthesis [6 , 35]. In particular, b y a ugmenting a Bo olea n s pe c ific ations with a quantitativ e sp ecifications, w e can measure how “well” a system sa tisfies the sp ec- ification. F or example, among sys tems that respond to requests, w e ma y prefer one system over another if it r e sp onds quick er, o r it resp onds to more r equests, or it issues fewer unreques ted respo nses, etc. In sy nt hesis, we can use such measures to guide the synthesis pro cess tow ards deriving a sys tem that is, in the de s ired sense, “optimal” a mong all systems that satisfy the spe c ification [6]. There a re ma ny wa ys to define a quantitativ e mea s ure that captures the “go o d- ness” o f a sy s tem with r esp ect to the Bo olea n sp ecification, and particular mea- sures can b e quite different, but there are tw o questions e very suc h mea sure ha s to answer: (1) ho w to as sign a quantitativ e v alue to one particular behavior of a sy s tem (measure along a b ehavior) and (2) ho w to aggreg ate the quantitativ e v alues that are assigned to the p ossible b ehaviors o f the system (measure acro ss behaviors). Recall the res p o nse prop erty . Supp ose there is a sequence of requests along a b ehavior and we are interested primarily in resp onse time, i.e., the quick er the system resp onds, the b etter. As measure (1) along a pa rticular b ehavior, we may b e interested in a n a verage or the supremum (i.e., worst case) o f all res po nse times, or in any other function that aggreg ates all resp onse times a long a b ehavior int o a single rea l v alue. The choice of measure (2) a cross behaviors is independent: we may b e in terested in a n a verage of a ll v alues assigned to individual behaviors, or in the supremum, or aga in, in some other function. In this wa y , we can choose to measure the a verage (acr oss b ehaviors) of a verage (alo ng a b ehavior) resp onse times, or the a verage of w orst-cas e r esp onse times, or the worst c a se o f average resp onse times, or the w orst case o f w orst-ca se r esp onse times, etc. Note that these are the same t wo c hoices that appear in weigh ted automata and max-plus algebras (cf. [29, 32, 21]). In previous work, we studied v arious measures (1) along a b ehavior. In par tic- ular, lex icogra phically ordered tuples of av erages [6] a nd ratios [7] are of natural int erest in certain contexts. Alur et al. [2 ] consider a n automaton mo del with a quan titative measur e (1) that is defined with resp e c t to cer tain accum ulation po ints alo ng a b ehavior. How ever, in a ll o f these cases, for measure (2) only the 2 worst case (i.e., supremum) is considered. This comes natural as an extension o f Bo olean thinking, where a s ystem fails to satisfy a proper ty if e ven a single b e- havior violates the prop er ty . But in this wa y , w e cannot distinguish be t ween t wo systems tha t have the sa me worst ca ses acr oss b ehaviors, but in one system al- most all p o ssible b ehaviors exhibit the worst case, while in the other only very few behaviors do so. In contrast, in p erforma nc e ev aluation one usually considers the av erage case across differe nt b ehaviors. F or instance, consider a resource con troller for tw o clients. Clients se nd re- quests, a nd the controller grants the reso urce to o ne of them at a time. Supp os e we prefer, a gain, s ystems where re q uests are granted “as quickly as p ossible.” Ev- ery c ontroller that av o ids simultaneous g rants will have a b ehavior a long whic h at least one gr ant is delay ed by one step, namely , the behavior along which both clients contin uously send requests . The best the controller can do is to alternate betw een the clients. How ever, if systems ar e measured with r esp ect to the worst case acr oss different b ehaviors, then a cont roller that alwa ys alterna tes b etw ee n bo th clien ts, indep endent of the actual requests, is a s goo d a s a con troller that tries to g rant all requests immedia tely and only alternates when bo th clie nts r e- quest the resource at the same time. Clearly , if we wish to synthesize the prefer red controller, we need to apply an a verage-case mea sure across b ehaviors. In this pap er, we present a measure (2) that averages acr oss all po ssible b ehav- iors of a sys tem and so lve the corresp o nding syn thesis problem to der ive an opti- mal system. I n s ynthesis, the differen t po ssible behaviors of a system are caused b y different input sequences. Therefor e, in order to take a meaningful av erage a c r oss different be haviors, we need to assume a probability distribution ov er the pos sible input se q uences. F or exa mple, if o n input 0 a system has resp onse time r 0 , and on input 1 resp onse time r 1 , and input 0 is twice as lik ely a s input 1, then the av erage res p o nse time is (2 r 0 + r 1 ) / 3. The resulting sy n thesis problem is as follows: given a Bo olean specifica tion ϕ , a proba bilistic input assumption µ , a nd a meas ure that assig ns to each system M a v alue V ϕ µ ( M ) of how “well” M satisfies ϕ under µ , constr uct a sys tem M suc h that V ϕ µ ( M ) ≥ V ϕ µ ( M ′ ) for all M ′ . W e solve this problem for qua litative sp ecifi- cations tha t a re g iven as ω -automata, input ass umptions that are given as finite Marko v c hains, and a q uantitativ e sp ecification given as mean-payoff automata which defines a quantitativ e language b y ass igning v alues to behaviors. F rom the ab ov e three inputs w e derive a mea sure that captures (1) an av erage along s ystem behaviors as well as (2) an av erage acr oss system b ehaviors; and thus w e obtain a measure that induces a v alue for eac h s y stem. T o our kno wledge this is the first s o lution of a synthesis pro blem for an average- case meas ure acros s system b ehaviors. T ec hnically the solution res ts on a new, po lynomial-time algo rithm for computing optimal strategies in MDPs with mea n- pay off parity ob jectives. In c o ntrast to MDPs with mean-pa yoff ob jectives, where pure memoryless optimal strategies exist, optimal s trategies for mean-pay off par - it y ob jectives in MDPs requir e infinite memo ry . It follows from our result that the infinite memory can b e captured with a counter, and with this insight w e develop the p oly nomial time a lg orithm for solving MDP s with mean-pay off parity ob jec- tives. A careful ana lysis of the cons tr ucted stra tegies allows us to construct, f or an y 3 ε > 0, a finite-s tate s y stem that is within ε of the optimal v a lue. F urthermore, we present a polyno mial-time pro cedure to decide if there exists a finite-state system (system without a counter) that achiev es the optimal v alue for a mean-payoff par- it y sp ecificatio n. W e sho w that for MDPs with mean- pay off pa rity o b jectives finite memory do es not help, i.e., either the optimal strateg y requires infinite memory or there exis ts a memoryless str ategy that also achiev es the o ptimal v a lue. W e g ive a linea r program to ch eck if there ex is ts a memory less stra tegy that is optimal. Related w orks Many forma lisms for quan titative s pe c ifications hav e be en consid- ered in the literature [2 , 8– 1 1, 23, 24, 27, 2 8, 37 ]; most of these works (o ther than [2, 11, 2 3]) do not co nsider mean-pay off specifica tions and none of these works focus on how quan titative spe c ifications c a n be used to obtain b etter implementations for the synthesis problem. F ur thermore, several notions of metrics for probabilis- tic sys tems a nd games hav e b een prop osed in the literatur e [25, 26]; these metrics provide a meas ure that indicates how clo s e are t wo systems with r esp ect to all tempo ral pr op erties expr essible in a logic; whereas our w ork us es qua nt itative sp ecification to compare systems with resp ect to the pr op erty of in terest. Similar in spirit but based on a co mpletely different technique, is the w ork b y Nieb ert et al. [39], who group behaviors into go o d and bad with r e s p e c t to satisfying a given L TL s p e c ification and us e a CTL ∗ -like analysis specifica tion to qua ntify ov er the go o d and bad b ehaviors. T his measure of logical pro pe r ties was used by Katz and Peled [3 5] to guide ge ne tic a lgorithms to discov er coun terexamples and cor rections for distributed proto c o ls. Co nt rol and synthesis in the presence o f uncer taint y has bee n c o nsidered in several works such as [3, 1 9 , 5]: in all these works, the framework consists of MDPs to model no ndeterministic and probabilistic behavior, and the sp ecification is a B o olean specification. In contrast to these works where the proba- bilistic choice re present uncertaint y , in our work the probabilistic choice repre sent a mo del for the e nvironment assumption on the input sequences that allows us to co nsider the sys tem as a whole. Moreov er, we consider quan titative ob jectives. Parr and Russel [40] also synthesize stra tegies fo r MDPs that optimize a quan- titative ob jectives. They optimize with respe c t to the expe c ted discounted total reward, while w e consider mean- pay off ob jectives. F urthermor e , we allow the user (i) to provide additionally qualitativ e (in particular liv eness) constraints and (ii) to sp ecify the qualitative and the quantit ative c onstraints indep endent o f the MDP . MDPs with mean-payoff ob jectives are w ell studied. The b o ok s [30, 42] present a detailed analysis of this topic. W e prese nt a solution to a more general condition: the B o olean co mbination of mea n-pay off and parity condition on MDPs. W e show that MDPs with mean-pay off parity ob jectives ca n b e solved in polynomia l time. Structure of the pap er Section 2 gives the necessa ry theoretica l background and fixes the notation. In Section 3 w e in tr o duce the problem of measuring systems with resp ect to quantitativ e sp ecifications using sev eral examples, define our new measure, and show how to compute the v alue o f a sys tem with resp ect to this measure. In Section 4 we show how to construct a system that sa tisfy a qualitative sp ecification and optimize a quant itative sp ecification with resp ect to our new measure. In Section 5 we present exp erimental results and we co nclude in Section 6 . 4 This pap er is a n extended and improved version o f [16] that includes ne w theoretical results, more exa mples, detailed proo fs, and rep orts on an impr oved implemen tation. W e prese nt new theor e tica l re sults related to finite-state str ategies for approximating the v alues in mean-payoff par ity MDPs and a polyno mial-time pro cedure to dec ide the existence of memory less s trategy that achieves the optimal v alue. 2 Preliminaries 2.1 Alphab et, W ords, and Languages An alph ab et Σ consists of a finite set of letters σ ∈ Σ . W e often use letters representing a ssignments to a set of Bo olea n v a riables V . In this cas e w e write Σ = 2 V , i.e., Σ is the set of all s ubsets of V , and a letter σ = { v 1 , . . . , v n } ∈ 2 V enco des the unique assignment, in which all v a riables in σ are set to true and all other v ar iables a re set to false. A wor d w over Σ is either a finite or infinite sequence of letters, i.e., w ∈ Σ ∗ ∪ Σ ω . Given a word w ∈ Σ ω , we denote by w i the letter at position i of w and by w i the prefix of w of length i , i.e., w i = w 1 w 2 . . . w i . W e denote by | w | the leng th o f the word w , i.e., | w i | = i and | w | = ∞ , if w is infinite. A qualitative language L is a s ubset of Σ ω . A quantitative language L [11] is a ma pping from the set of words to the set of reals , i.e ., L : Σ ω → R . Note that the characteris tic function of a qualitative languag e L is a quantitative language mapping w ords to 0 and 1. Given a qualitativ e language L , w e use L also t o denote its c haracter istic function. 2.2 Automata with P arity , Safet y , and Mean-P a y off Ob jectiv e An (finite-st ate) automaton is a tuple A = ( Σ , Q, q 0 , ∆ ), where Σ is a alphab et , Q is a (finite) s e t of states , q 0 ∈ Q is an initial state , and ∆ : Q × Σ → Q 5 is a tr ansition function that ma ps a s tate and a letter to a successor s tate. The run of A on a wor d w = w 0 w 1 . . . is a sequence of sta tes ρ = ρ 0 ρ 1 . . . such that (i) ρ 0 = q 0 and (ii) for all 0 ≤ i ≤ | w | , ∆ ( ρ i , w i ) = ρ i +1 ). A p arity aut omaton is a tuple A = (( Σ , Q, q 0 , ∆ ) , p ), where ( Σ , Q, q 0 , ∆ ) is a finite-state auto maton and p : Q → { 0 , 1 , . . . , d } is a priority function that maps every state to a natural num ber in [0 , d ] called priority . A parity a utomaton A ac c epts a wor d w if the least prior ity of all s tates occurr ing infinitely often in the run ρ of A on w is even, i.e ., min q ∈ In f ( ρ ) p ( q ) is even, wher e Inf ( ρ ) = { q | ∀ i ∃ j > i ρ j = q } . The language of A denoted by L A is the set of a ll words a ccepted by A . A safety automaton is a parit y automaton with o nly pr iorities 0 and 1, and no transitio ns from pr iority-1 to prior ity-0 s ta tes. A me an-p ayoff aut omaton is a tuple A = (( Σ , Q, q 0 , ∆ ) , r ), wher e ( Σ , Q, q 0 , ∆ ) is a finite-state automaton and r : Q × Σ → N is a r ewar d function that as s o ciates to each tra nsition of the automaton a r ewar d v ∈ N . A mean-payoff automaton assigns to e a ch word w the 5 Note that our automata are deterministic and complete to simplify th e p resenta tion. 5 q 0 q 1 g (1) ¯ r ¯ g (1) r ¯ g (0) ¯ g (0) g (1) Fig. 1. Mean-pay off automaton A q 0 q 1 ¯ r / ¯ g r /g / ¯ g Fig. 2. Finite-state system M long-run average of the rewards, i.e., for a w ord w le t ρ b e the run of A on w , then we hav e L A ( w ) = ( 1 n · P n i =1 r ( ρ i , w i ) if w is finite, lim inf n →∞ L A ( w n ) otherwise. Note that L A is a function assigning v a lues to words. Example 1. Figure 1 shows a mean-payoff automaton A = (( Σ , Q, q 0 , ∆ ) , r ) for words ov er the alphabet Σ = 2 { r,g } = {{ } , { r } , { g } , { r, g }} , whic h are all poss ible assignments to the tw o Bo olean v ariables r and g . E.g ., the letter { r } means that v a riable r is tr ue and a ll the other v a riables (in this case only g ) are fals e. The automaton has tw o states q 0 and q 1 represented by circles. State q 0 is the initial state, which is indica ted by the str a ight arrow fro m the le ft. T r ansitions ar e represented by directed arrows. They a re lab eled with (i) a conjunction of literals representing a set o f letters a nd (ii) in parentheses, the reward obtained when following this tr ansition. If a v ar iable v app ears in p ositive form in a lab el, then we can tak e this transitio n only with a letter that includes v . If the v ar iable v app ear in nega ted fo r m (i.e., ¯ v ), then this tra nsition can only b e ta ken with letter that do not include v . Note that tr ansitions depend only on the signals that a pp e ar in their labels. E.g., the self-loo p on state q 0 lab eled with g (1) means that we ca n mov e from q 0 to q 0 with a n y letter that includes g , i.e., either with letter { g } or with letter { r, g } . The automaton ass igns to each word in Σ ω the av erage reward. E.g., the r un of A on the word ( { r } { r } { r g } ) ω is ( q 0 q 0 q 1 ) ω and the cor resp onding sequence of re wards is (0 0 1) ω with an av erag e r eward of (0 + 0 + 1) / 3 = 1 / 3 . 2.3 State mac hi nes and Sp eci fi cations A (finite-)state machine (or system ) with input signals I and output signals O is a tuple M = ( Q, q 0 , ∆, λ ), where ( Σ I , Q , q 0 , ∆ ) with Σ I = 2 I is a (finite-state) automaton and λ : Q × Σ I → Σ 0 with Σ I = 2 I and Σ O = 2 O is a lab eling fu nction that maps ev ery tra nsition in ∆ to an elemen t in Σ O . The sets Σ I and Σ O are called the input and the output alphab et of M , resp ectively . W e denote the joint alphab et 2 I ∪ O by Σ . Given a n input word w ∈ Σ ∗ I ∪ Σ ω I , let ρ by the run of M on w , the outc ome of M on w , denoted by O M ( w ), is the word v ∈ Σ ∗ ∪ Σ ω s.t. for all 0 ≤ i ≤ | w | , 6 v i = w i ∪ λ ( ρ i , w i ). No te that O M is the function mapping input words to outcomes. The language o f M denoted by L M is the set of outcomes of M on all infinite input word. Example 2. Consider the system M depicted in Fig ure 2. System M has one Bo olean input v a r iable r and one Boo lean output v ariables g . In every step, M reads the v alue of the v ariable r and sets the v alue of the v ar iable g . More pre- cisely , M sets g to false, whenever either r is false in the curr ent step or g hav e bee n true in the previo us step. The input alphab et of M is 2 { r } = {{} , { r } } and its output alphab et is 2 { g } = {{} , { g }} . Recall that all v aria bles that a r e abs ent in a le tter are s et to false, e.g ., the input letter {} means that the v alue of r is false, while { r } refers to r b e ing true. W e again lab el edges with conjunctions o f literals . The conjunction on the le ft of the s la sh describ es a set o f input letters, i.e., a s e t of assignments to the input v ar iables. The conjunction on the righ t describ es a single o utput letter, whic h cor r esp onds to a n assignment o f the output v a ribles. E.g., the tra nsition from state q 1 to state q 0 lab eled / ¯ g means that if the system is in sta te q 1 , then it moves to the state q 0 and sets the v ariables g to false for any input le tter becaus e the conjunction for the input v ariables is empty . Consider the input word w = { r }{ r }{} { r } . The o utco me o f M o f w is the combined word { r g } { r }{}{ r g } . The la nguage of M are a ll the infinite words gen- erated by arbitrar ily concatenating the following three words: (i) w 1 = { } , (ii) w 2 = { r , g } { r } , a nd (iii) w 3 = { r, g } {} , i.e., L M = ( w 1 | w 2 | w 3 ) ω . W e a na lyze sta te machines with resp ect to qualitative and quantitativ e sp ec- ifications. Qualitative sp e cific ations ar e qualitative languag es, i.e., subs ets of Σ ω or equiv alently functions mapping w ords to 0 and 1. W e consider ω -regular spec- ifications given as s afety or parity automata. Giv en a safety o r parity automa to n A and a state machine M , w e s ay M satisfies L A (written M | = L A ) if L M ⊆ L A or e q uiv alently ∀ w ∈ Σ ω I : L A ( O M ( w )) = 1 . A quantitative sp e cific ation is given by a quantitativ e language L , i.e., a function that as signs v alues to w ords . Giv en a state mac hine M , we use function comp osition to relate L and M , i.e., L ◦ O M is mapping every input word w o f M to the v a lue a s signed by L to the o utcome of M on w . W e consider quantitativ e s pe c ifications given by Mean-payoff automa ta. 2.4 Mark ov Chains and Mark o v Decisi on Pro cesses (MDP) A pr ob ability distribution over a finite set S is a function d : S → [0 , 1] s uch that P q ∈ Q d ( q ) = 1 . W e deno te th e set of a ll probabilistic distributions o ver S by D ( S ). A Markov De cision Pr o c ess (MDP) G = ( S, s 0 , E , S 1 , S P , δ ) consists o f a finite set of states S , an initia l state s 0 ∈ S , a set of e dges E ⊆ S × S , a partition ( S 1 , S P ) of the set S , and a pr obabilistic transition function δ : S P → D ( S ). The states in S 1 are the pla yer- 1 states, where pla yer 1 decides the successor state; and the s ta tes in S P are the pr ob abilistic states, where the success o r state is chosen acco r ding to the proba bilis tic transition function δ . So, we can view an MDP as a ga me betw een t wo players: play e r 1 a nd a r andom player that plays accor ding to δ . W e assume that for s ∈ S P and t ∈ S , we hav e ( s, t ) ∈ E iff δ ( s )( t ) > 0, a nd we often write δ ( s, t ) for δ ( s )( t ). F o r tec hnical con venience we assume that ev ery state has 7 at least one outgoing edge. F or a state s ∈ S , we write E ( s ) to denote the set { t ∈ S | ( s, t ) ∈ E } of p ossible succe ssors. If the set S 1 = ∅ , then G is called a Markov Chain a nd we omit the par titio n ( S 1 , S P ) from the definition. A Σ -lab ele d MDP ( G, λ ) is a n MDP G with a lab eling function λ : S → Σ a ssigning to each state of G a letter from Σ . W e ass ume that la be led MDPs are deterministic and complete, i.e., (i) ∀ ( s, s ′ ) , ( s, s ′′ ) ∈ E , λ ( s ′ ) = λ ( s ′′ ) → s ′ = s ′′ holds, and (ii) ∀ s ∈ S, σ ∈ Σ , ∃ s ′ ∈ S s.t. ( s, s ′ ) ∈ E and λ ( s ′ ) = σ . 2.5 Pla ys and strategies An infinite path, o r a play , of the MDP G is an infinite sequence ω = s 0 s 1 s 2 . . . o f states such that ( s k , s k +1 ) ∈ E for all k ∈ N . Note tha t we use ω only to deno te plays, i.e., infinit e sequences o f states. W e use v to refer to finite sequences o f s tates. W e write Ω for the set of all plays, and for a state s ∈ S , we write Ω s ⊆ Ω for the set of pla ys starting at s . A str ate gy for play er 1 is a function π : S ∗ S 1 → D ( S ) that assigns a probability distribution to a ll finite sequences v ∈ S ∗ S 1 of sta tes ending in a player-1 state. Player 1 follows π , if s he mak e all her mo ves accor ding to the distributions provided b y π . A stra tegy m ust prescrib e only av ailable mov e s, i.e., for all v ∈ S ∗ , s ∈ S 1 , and t ∈ S , if π ( v s )( t ) > 0, then ( s, t ) ∈ E . W e denote by Π the s et o f all strategies for player 1. Once a starting sta te s ∈ S and a strategy π ∈ Π is fixe d, the outcome of the game is a rando m walk ω π s for whic h the pr obabilities of ev ery event A ⊆ Ω , whic h is a measurable set of plays, are uniquely defined. F or a state s ∈ S and an e ven t A ⊆ Ω , we write µ π s ( A ) for the probability that a play b elong s to A if the game s tarts from the s tate s and player 1 follow the strategy π , respectively . F or a mea surable function f : Ω → R we denote by E π s [ f ] the exp e ctation of the function f under the probability mea sure µ π s ( · ). Strategies that do not use r andomization are called pure. A play er-1 strategy π is pur e if for all v ∈ S ∗ and s ∈ S 1 , there is a state t ∈ S such that π ( v s )( t ) = 1. A memoryless player-1 strategy depends only on the curren t state, i.e., for all v , v ′ ∈ S ∗ and for all s ∈ S 1 we have π ( v s ) = π ( v ′ s ). A memoryless strategy can b e represented as a function π : S 1 → D ( S ). A pur e memoryless str ate gy is a str ategy that is b oth pure a nd memoryless. A pure memoryless strategy can b e r epresented as a function π : S 1 → S . A pur e finite-state str ate gy is a strategy that can b e represent by a finite-state machine M = ( Q, q 0 , ∆, λ ) with input alphab et Σ I = S and output alphabet Σ O = S . The state Q represent a s et of memor y lo cations with q 0 as the initial memory conten t. The transition function ∆ : Q × S → Q describ es how to update the memory while moving to the next state in the MDP . The labeling function λ : Q × S → S defines the moves of Play er 1, i.e., for every memory lo c a tion a nd state o f the MDP , it provides a succe s sor state in the MDP . 2.6 Resulting Mark o v c hains, recurrence classes , unicha in, and m ulticha in Given an MDP G and a pure memoryless or finite-state strategy π , if we restrict G to follo w the actions suggested in π , we obtain a Markov chain. 8 Given a Marko v chain G = ( S, s 0 , E , δ ), a state s ∈ S is called r e curr ent 6 if the exp ected num be r of visits to s is infinite. Otherwis e, the state s is called tr ansient . A maximal se t of recur rent s ta tes that is closed 7 under E is called r e curr en c e class. A Markov chain G is unichain if it has a sing le recurre nce class. Otherwise, G is called multichai n . 2.7 Quan titativ e Ob jectives A quan titative ob jective is given by a measura ble function f : Ω → R . W e consider several o b jectives base d on prio rity and reward functions. Giv en a priority function p : S → { 0 , 1 , . . . , d } , w e defined the set of plays satisfying the pa r ity ob jective as Ω p = { ω ∈ Ω | min p (Inf ( ω )) is ev en } . A Parity obje ctive parit y p is the characteristic function of Ω p . Given a reward function r : S → N ∪ {⊥ } , the me an-p ayoff obje ctive m ean r for a play ω = s 1 s 2 s 3 . . . is defined as me an r ( ω ) = lim inf n →∞ 1 n · P n i =1 r ( s i ), if for all i > 0 : r ( s i ) 6 = ⊥ , o therwise mean r ( ω ) = ⊥ . Given a prior ity function p a nd a r eward function r the me an-p ayoff p arity obje ctive mp p,r assigns the long-r un av erage o f t he rew ards if the parit y o b jective is satisfied; otherwise it as signs ⊥ . F ormally , for a pla y ω w e hav e mp p,r ( ω ) = ( mean r ( ω ) if pa rity p ( ω ) = 1 , ⊥ otherwise . F or a rew ard function r : S → R the max obje ctive max r assigns to a play the maximum reward that a pp e a rs in the play . Note that since S is finite, the num b er of diff erent re wards app ea ring in a pla y is finite and hence the maximum is defined. F orma lly , for a pla y ω = s 1 s 2 s 3 . . . w e ha ve max r ( ω ) = max h r ( s i ) i i ≥ 0 . 2.8 V alues, o ptimal stratgies, and almost-s u re winni ng states Given an MDP G , the value function V G for an ob jective f is t he function fr om the state space S to the set R o f rea ls. F o r all s tates s ∈ S , let V G ( f )( s ) = sup π ∈ Π E π s [ f ] . In other w ords, the v alue V G ( f )( s ) is the maximal ex p ecta tion with whic h pla yer 1 can achiev e her ob jective f from s tate s . A strategy π is optimal from s tate s for ob jective f if V G ( f )( s ) = E π s [ f ]. F or parity ob jectives, mean-pa yoff ob jectives, and max ob jectives pure memoryless optimal s trategies exist in MDPs. Given an MDP G and a priority function p , we denote by W G ( par ity p ) = { s ∈ S | V G ( par ity p )( s ) = 1 } , the set of states with v alue 1. These states are called the almost-sur e winning s tates for the play er a nd an optima l stra teg y fro m the a lmost- sure winning states is called a almost-sure winning s trategy . The se t W G ( par ity p ) for an MDP G with prior it y function p can b e co mputed in O ( d · n 3 2 ) time, wher e n is the size of the MDP G a nd d is the num b er of prio rities [13, 1 4]. F or states 6 Note that w e do not distinguish n ull or positive rec urrent states since we only consider finite Mark ov c hains. 7 W e use the usual d efinition for closed, i.e., given a set Y , a set X ⊆ Y is closed under a relation R ⊆ Y × Y , if forall x ∈ X and forall y ∈ Y , if ( x, y ) ∈ R , th en y ∈ X . 9 q 0 q 1 g i (1) ¯ r i ¯ g i (1) r i ¯ g i (0) ¯ g i (0) g i (1) Fig. 3. Automaton A i q 0 q 1 /g 1 ¯ g 2 / ¯ g 1 g 2 /g 1 ¯ g 2 Fig. 4. System M 1 q 0 q 1 q 2 ¯ r 2 /g 1 ¯ g 2 ¯ r 1 r 2 / ¯ g 1 g 2 r 1 r 2 /g 1 ¯ g 2 ¯ r 1 / ¯ g 1 g 2 r 1 / ¯ g 1 g 2 r 2 /g 1 ¯ g 2 ¯ r 2 /g 1 ¯ g 2 Fig. 5. System M 2 preferring r 1 . in S \ W G ( par ity p ) the parity ob jective is fals ifie d with pos itive probability for a ll strategies, whic h implies that for all states in S \ W G ( par ity p ) the v alue is less than 1 (i.e., V G ( par ity p )( s ) < 1). 3 Measuring Systems In this section, w e start with an example to explain the problem and in tr o duce our measure. Then, w e define the mea sure for mally and show finally , how to compute the v a lue o f a system with respect to the given measure. Example 3. Recall the example from the introduction, where we co nsider a re- source co nt roller for t wo clien ts. C lie n t i requests the resource by setting its re- quest signal r i . The resource is gr anted to Client i by r aising the grant s ignal g i . W e require that the cont roller g uarantees m utually exclusive a ccess a nd that it is fair, i.e., a reques ting client ev entually gets acces s to the resourc e . Assume we pre- fer co ntrollers that resp ond quickly . Figur e 3 shows a specificatio n that r ewards a quick r esp onse to request r i . The specification is g iven a s a Mean-pa yoff automato n that mea sures the average dela y b etw een a request r i and a corresp o nding grant g i . Recall that transitions are lab eled with a conjunction o f literals and a reward in par entheses. In particular, whenever a request is gra nted the re ward is 1, while a delay of the grant results in re ward 0 . The automaton assigns to ea ch w ord in (2 { r i ,g i } ) ω the av erage reward. F or instance, the v alue of the word ( { r i }{ r i , g i } ) ω is (0 + 1) / 2 = 1 / 2. W e c a n take t wo copies of this sp ecification, one for e ach clien t, and ass ign to ea ch w ord in (2 { r 1 ,r 2 ,g 1 ,g 2 } ) ω the sum of the av erage rewards. E.g., the word ( { r 1 , g 2 }{ r 1 , g 1 } ) ω gets a n av erage reward of 1 / 2 with resp ect to the fir st client a nd reward 1 with resp ect to the sec o nd client, which sums up to a total reward of 3 / 2. Consider the systems M 1 and M 2 in Figure 4 and 5, resp ectively . T ra ns itions are labeled with conjunctions o f input and output literals s e parated b y a sla s h. System M 1 alternates b etw e e n g ranting the resource to Client 1 a nd 2. System M 2 grants the r esource to Clien t 2, if only Client 2 is sending requests. By default it gra nts the resourc e to Clien t 1. If both clients request, then the con troller al- ternates betw een them. Both systems ar e cor rect with resp ect to the functional requirements describ e a bove: they are fair to both clients and guarantee that the resource is not accessed sim ultaneously . 10 q 0 q 0 q 0 q 1 q 0 q 0 q 1 q 0 q 1 q 0 q 1 q 0 ¯ r 2 /g 1 ¯ g 2 (2) r 2 /g 1 ¯ g 2 (1) r 1 / ¯ g 1 g 2 (1) ¯ r 1 / ¯ g 1 g 2 (2) r 1 / ¯ g 1 g 2 (1) ¯ r 1 / ¯ g 1 g 2 (2) ¯ r 2 /g 1 ¯ g 2 (2) r 2 /g 1 ¯ g 2 (1) Fig. 6. Pro duct of M 1 with Sp ecification A 1 and A 2 . Though, o ne can arg ue that System M 2 is better than M 1 bec ause the delay betw een requests and gra nt s is, for most input sequences, smaller than the delay in System M 1 . F or instance, co nsider the input tr a ce ( { r 2 }{ r 1 } ) ω . The res po nse of System M 1 is ( { g 1 }{ g 2 } ) ω . Lo ok ing at the pro duct b etw ee n the sys tem M 1 and the specifica tions A 1 and A 2 shown in Figure 6, we ca n see that this results in an average rew ard of 1. Similar, Figure 7 shows the pro duct of M 2 , A 1 , a nd A 2 . System M 2 resp onds with ( { g 2 }{ g 1 } ) ω and obtains a r eward of 2. Now, consider the se quence ( { r 1 , r 2 } ) ω , which is the worst input sequence the environmen t can provide. In both systems, this s e quences leads to a reward of 1, whic h is the low est po ssible rew ard. So M 1 and M 2 cannot be distinguished with resp ect to their worst case behavior. In or der to measure a system with resp ect to its average behavior, we a im to av erag e ov er the rewards o btained for all p oss ible input se q uences. Since we hav e infinite sequences, o ne wa y to av erag e is the limit o f the av era ge ov er all finite prefixes. Note that this can only be done if we know the v alues of finite words with respe c t to the quantitativ e specific a tion. F or instance, for a finite- state machine M and a Mea n-pay off automaton A , we can define the a verage as V L A ⊘ ( M ) := lim n →∞ 1 | Σ I | n P w ∈ Σ n I L A ( O M ( w n )) . How ever, if we truly wan t to capture the av erag e b ehavior, we need to know, how often the different parts of the system are used. This co rresp onds to knowing how likely the different input sequences are. The measure a b ov e assumes that all input sequenc e s are “equally likely”. In order to define mea s ures that take the behavior of the environmen t into account, we use a pr obability meas ure on input w ords. In particular, we consider the proba bilit y space ( Σ ω I , F , µ ) over Σ ω I , where F is the σ -a lgebra genera ted by the cylinder sets of Σ ω (whic h are the sets of infinite words sha ring a co mmo n prefix) (in other w ords, w e hav e the Ca ntor top olo g y on Σ ω I ) and µ is a probabilit y measure defined on ( Σ ω , F ). W e use finite lab eled Mar ko v chains to define the probability measure µ . Example 4. Recall the con troller of E xample 3. Assume we k now W e can represent such a b ehavior by ass igning proba bilities to the ev ents in Σ = 2 { r 1 ,r 2 } . Ass ume Client 1 sends requests with probability p 1 and Clien t 2 sends them with pr ob- ability p 2 < p 1 , independent of what has happened b efor e . Then, w e can build 11 q 0 q 0 q 0 q 1 q 0 q 1 q 2 q 1 q 0 ¯ r 2 /g 1 ¯ g 2 (2) ¯ r 1 r 2 / ¯ g 1 g 2 (2) r 1 r 2 /g 1 ¯ g 2 (1) ¯ r 1 / ¯ g 1 g 2 (2) r 1 / ¯ g 1 g 2 (1) r 2 /g 1 ¯ g 2 (1) ¯ r 2 /g 1 ¯ g 2 (2) Fig. 7. Pro duct of M 1 , A 1 , and A 2 . a labe le d Ma rko v chain with four states S p = { s 0 , s 1 , s 2 , s 3 } each labeled with a letter in Σ , i.e., λ ( s 0 ) = {} , λ ( s 1 ) = { r 2 } , λ ( s 2 ) = { r 1 } , and λ ( s 3 ) = { r 1 , r 2 } , and the following transition probabilities: (i) δ ( s i )( s 0 ) = (1 − p 1 ) · (1 − p 2 ), (ii) δ ( s i )( s 1 ) = (1 − p 1 ) · p 2 , (iii) δ ( s i )( s 2 ) = p 1 · (1 − p 2 ), and (iv) δ ( s i )( s 3 ) = p 1 · p 2 , for all i ∈ { 0 , 1 , 2 , 3 } . Once w e ha ve a probability measure µ on the input sequences and the asso ci- ated exp ectatio n mea sure E , we can define a satisfaction rela tion b etw een sys tems and s p ecific a tions and a measure for a system with resp ect to a qualitative and a quantitativ e sp ecifica tion. Definition 1 (Satisfaction). Given a state machine M with input alphab et Σ I , a qualitative sp e cific ation ϕ , and a pr ob ability me asur e µ on ( Σ ω I , F ) , we say that M satisfies ϕ under µ (written M | = µ ϕ ) iff M satisfies ϕ with pr ob ability 1 , i.e., E [ ϕ ◦ O M ] = 1 , wher e E is t he exp e ctation me asur e for µ . Recall that we use a quan titative sp ecification to describe how “g o o d” a system is. Since we aim for a sys tem that satisfies the given (qualitative) specification a nd is “go o d” in a given sense, we define the v alue of a mac hine with respe ct to a qualitative and a quantitativ e specification. Definition 2 (V alue). Give n a state machi ne M , a qualitative sp e cific ation ϕ , quantitative sp e cific ation ψ , and a pr ob ability me asure µ on ( Σ ω I , F ) , the v alue of M with resp ect to ϕ and ψ under µ is define d as the exp e ctation of the function ψ ◦ O M under the pr ob ability me asur e µ if M satisfies ϕ under µ , and ⊥ otherwise. F ormal ly, let E b e the exp e ctation me asur e for µ , t hen V ϕψ µ ( M ) := ( E [ ψ ◦ O M ] if M | = µ ϕ, ⊥ otherwise. If ϕ is the set of al l wor ds, then we write V ψ µ ( M ) . F urthermor e, we say M optimizes ψ under µ , if V ψ µ ( M ) ≥ V ψ µ ( M ′ ) for al l systems M ′ . In Definition 2, we could also consider the traditiona l satisfaction r elation, i.e., M | = ϕ . W e hav e algorithms for b oth notions but we focus on satisfaction under µ , since sa tisfaction with pr obability 1 is the natural co rrectness criter io n, if we 12 are given a probabilistic e n viro nment assumption. Note that for safety sp ecifica - tions the tw o no tio ns co incide, becaus e we as sume that the labeled Markov chain defining the input distr ibution is complete. 8 F or parity specificatio ns , the r esults in this section would change only slight ly if w e replac e M | = µ ϕ by M | = ϕ . In particular, ins tea d of a na lyzing a Ma rko v chain with pa rity o b jective, we w ould hav e to analyze a n automaton with parity ob jective. W e discuss the the alternative synthesis algor ithm in the co nc lus ions. Lemma 1. Give n a finite-state machine M , a safety or a p arity aut omaton A , a me an-p ayoff automaton B , and a lab ele d Markov chain ( G, λ G ) defining a pr ob a- bility m e asur e µ on ( Σ ω I , F ) , we c an c onstruct a Markov chain G ′ = ( S ′ , s ′ 0 , E ′ , δ ′ ) , a r ewar d function r ′ , and a priority function p ′ such that V L A ,L B µ ( M ) = ( 2 · V G ′ ( mean r ′ )( s ′ 0 ) if A is a safety aut omaton, 2 · V G ′ ( mp p ′ ,r ′ )( s ′ 0 ) otherwise . Pr o of. T o build G ′ , w e fir st build the pro duct of M , A , and B (cf. Figure 6), which is a finite-state machine C = ( Q, q 0 , ∆, λ ) aug men ted with a (transition) reward function r : Q × Σ I → N and a pr iority function p : Q → { 0 , . . . , d } . Let G = ( S, s 0 , E , δ ), then we c o nstruct a Mar ko v chain G ′ = ( S ′ , s ′ 0 , E ′ ∪ E ′′ , δ ′ ), a reward function r ′ : S ′ → N , and a priority function p ′ : S ′ → { 0 , . . . , d } as follows: S ′ = Q × S × { 0 , 1 } , s ′ 0 = ( q 0 , s 0 , 0 ), E ′ = { (( q , s, 0) , ( q , s ′ , 1 )) | ( s, s ′ ) ∈ E } , E ′′ = { (( q , s, 1 )( q ′ , s, 0 )) | ∆ ( q , λ G ( s )) = q ′ } , a nd δ ′ ( t )( t ′ ) = 1 if ( t, t ′ ) ∈ E ′′ δ ( s, s ′ ) if ( t, t ′ ) ∈ E ′ , t = ( q , s, 0) , and t ′ = ( q , s ′ , 1 ) 0 otherwise. In G ′ every transition of M × A is split in to t wo parts: in the first pa rt, G ′ chooses the input v a lue acco rding to the distribution g iven by G . In the second part, G ′ outputs the v alue fr om M c o rresp onding to the chosen input. The reward given by A for this transition is assigned to the intermediate state, i.e ., r ′ ( s ′ ) = r ( q , λ G ( s )) if s ′ = ( q , s, 1), other w is e r ′ ( s ′ ) = 0, and the priorities are copied from A , i.e., p ′ (( q , s, b )) = p ( q ). If A is a safety automaton, we ov erwrite the rewards function r ′ to map all states s ′ ∈ S ′ with priority 1 to ⊥ , i.e., r ′ ( s ) = ⊥ if p ′ ( s ) = 1. This allows us to ignore the prio r ity function a nd compute the sy stem v alue based on the mean-pay off v a lue. Note that we ca n also compute M | = µ L A and V L B µ ( M ) s e parately by building t wo MCs: (1) G ′ augmented with a priorit y function p ′ and (2) G ′′ augmented with 8 Recall that a Marko v c hain is complete, if in every state there is an edge for ev ery input value. Since every edge has a p ositive probability , also every finite path has a p ositiv e probability and therefore a sy stem v iolating a safety specification will h a ve a v alue ⊥ . If the Mark ov c hain is not complete (i.e., we are giv en an input distribution that assigns probabilit y 0 to some finite input sequences), we require a simple pre- processing step that restricts our algorithms to the set of states satisfying the safet y condition indep endent of the input assumption. This set can b e comput ed in linear time by solving a safety game. 13 a reward function r ′′ . Then, V L A ,L B µ ( M ) = V G ′ ( mean r ′ )( s ′ 0 ), if V G ′′ ( par ity p ′′ )( s ′′ 0 ) = 1, otherwise V L A ,L B µ ( M ) = ⊥ . Even though, the approach with t w o MCs has a better complexit y , we cons tructed a single MC to sho w the similarity b etw een verification and syn thesis. ⊓ ⊔ Theorem 1. Given a finite-state machine M , a p arity automaton A , a me an- p ayoff automaton B , and a lab ele d Markov chain ( G, λ G ) defining a pr ob ability me asur e µ , we c an c ompute the value V L A ,L B µ ( M ) in p olynomial time. F urthermor e, if ( G, λ G ) defines a uniform input distribution, then V L B ⊘ ( M ) = V L B µ ( M ) 9 . Pr o of. Due to Lemma 1, w e constr uct Marko v chain G ′ , a reward function r ′ , and a priority function p ′ such that V L A ,L B µ ( M ) = V G ′ ( mp p ′ ,r ′ )( s ′ 0 ). Since G ′ is a Markov chain, we can compute W G ′ ( par ity p ′ ) a nd V G ′ ( mean r ′ )( s ′ 0 ) in po ly nomial time [13, 30], a nd V G ′ ( mp p ′ ,r ′ )( s ′ 0 ) = V G ′ ( mean r ′ )( s ′ 0 ) if s ′ 0 ∈ W G ′ , and ⊥ otherwis e. Note that the v alue V G ′ ( mean r ′ )( s ′ 0 ) is the sum ov er all sta tes s of the re ward at s (i.e., r ′ ( s )) times the long -run av erage frequency of being in s (the Cesar o limit of being at s [30]). ⊓ ⊔ Example 5. Recall the t wo system M 1 and M 2 (Figure 4 and 5, resp ectively) and the sp ecific a tion A (cf. Figure 3) that r ewards quick resp o nses. The tw o sys tems are equiv alent wrt the worst case behavior. Let us consider the average b ehavior: we build a Mar ko v chain G ⊘ that assigns 1 / 4 to all event s in 2 { r 1 ,r 2 } . T o measure M 1 , we tak e the pro duct b etw een G ⊘ and M 1 × A (sho wn in Figure 6). The pro duct loo ks like the automaton in Fig ure 6 with an intermediate state for e a ch edge. This s tate is la b eled with the reward of the edg e . All transitio n leading to int ermediate states hav e probability 1 / 2, the o ther once ha ve pr obability 1. So the exp ectation of b eing in a state is the same for all four main sta tes (i.e., 1 / 8) and half of it in the eight intermediate states (i.e., 1 / 16). F our (intermediate) states ha ve a r eward of 1 , four hav e a reward of 2 . So w e g et a total reward of 4 · 1 / 16 + 4 · 2 · 1 / 16 = 3 / 4, and a system v alue of 1 . 5. This is exp e cted when lo oking at Fig ur e 6 b eca use ea ch state has t wo inputs resulting in a re ward of 2 and t wo inputs with r eward 1. F or System M 2 , we obtain Marko v chain simila r to Figure 7 but no w the probability of the transitio ns cor resp onding to the self-lo o ps on the initial state sum up to 3 / 4 . So it is more lik ely to state in the initial state, then to leav e it. The exp ectation for being in the states ( q 0 , q 0 , q 0 ),( q 1 , q 0 , q 1 ), and ( q 2 , q 1 , q 0 ) are 2 / 3, 2 / 9, and 1 / 9, resp ectively , and their expected r ewards are (2 + 2 + 2 + 1) / 4 = 7 / 4, 3 / 2, and 3 / 2, re s pe ctively . So, the total reward of System M 2 is 2 / 3 · 7 / 4 + 2 / 9 · 3 / 2 + 1 / 9 · 3 / 2 = 1 . 67, whic h is clearly better than the v a lue o f system M 1 for specification A . 4 Syn t hesizing Optimal Systems In this sec tio n, w e show how to construct a system that satisfies a qualitativ e sp ecification and optimizing a quantitativ e sp ecification under a giv en pro ba bilistic 9 W e can show that this measure is inv ariant under transformations of the computation tree. 14 environmen t. First, w e r educe the problem to finding an optimal strategy in an MDP fo r a mean-pay off (parity) ob jective. Then, we show how to compute suc h a stra tegy using end compo nents a nd a reduction to max ob jective. In this part, we als o show ho w to decide if the given sp ecifica tion can b e implemented b y a finite-state system that is optimal. In case that the sp ecification do es not p ermit such an implementation, w e show how to construct, for every ε > 0 , a finite-state system that is ε -optimal. At the end of the section, we provide a linear progra m that computes the v a lue function of an MDP with max ob jective, which shows that the v alue function of an MDP with mean-pay off parity ob jective can be computed in polynomia l time. 4.1 Reduction to MDP with mean-pa y off (parit y) ob jectiv es Lemma 2. Give n a safety (r esp. p arity) automaton A , a me an-p ayoff automa- ton B , and a lab ele d Markov chain ( G, λ G ) defining a pr ob ability me asur e µ on ( Σ ω I , F ) , we c an c onstruct a lab ele d MDP ( G ′ , λ G ′ ) with G ′ = ( S ′ , s ′ 0 , E ′ , S ′ 1 , S ′ P , δ ′ ) , a r ewar d function r ′ , and a priority function p ′ such that every pur e st ra te gy π that is optimal fr om state s ′ 0 for the obje ctive mean r ′ (r esp. mp p ′ ,r ′ ) and for which E π s ′ 0 ( mean r ′ ) 6 = ⊥ (re sp. E π s ′ 0 ( mp p ′ ,r ′ ) 6 = ⊥ ) c orr esp onds to a state machine M that satisfies L A under µ and optimizes L B under µ . The construction o f G ′ is very similar to the construction used in Lemma 1. Int uitively , G ′ alternates b etw ee n mimicking a mov e of G and mimicking a mov e of A × B × C , where C is an a utomaton with | Σ O | -states that pushes the output lab els from trans itions to states, i.e., the tra nsition function δ C of C is the larg est transition function s.t. ∀ s, s ′ , σ, σ ′ : δ C ( s, σ ) = δ C ( s ′ , σ ′ ) → σ = σ ′ . Pr iorities p ′ are again copied from A a nd rewards r ′ from B . The lab els for λ G ′ are either tak en from λ G (in in termediate s tate) or they corresp ond to the transitions tak en in C . Every pure strategy in G ′ fixes one output v alue for every po ssible input seq uence. The construction o f the state machine depe nds on the structur e of the str a tegy . F or pure memor yless s trategies, the construction is straig ht forw ard. A t the end of this section, we discuss ho w to deal with o ther s trategies. The follo wing theorem follo ws from Lemma 2 and the fact that MDPs with mean-pay off ob jective hav e pure memoryless optimal s tr ategies and they can b e computed in p olyno mial time (cf. [30]). Theorem 2. Given a s afety automaton A , a me an-p ayoff automaton B , and a lab ele d Markov chain ( G, λ G ) defining a pr ob ability me asur e µ , we c an c onstruct a finite-state machine M (if one ex ist s) in p olynomial time t hat satisfies 10 L A under µ and optimizes L B under µ . 4.2 MDPs with mean-pa y off parity ob jectiv es It follows from Lemma 2 that if the qualitativ e specification is a parity a uto maton, along with the Markov chain for pro ba bilistic input assumption, and mean-pay off 10 Recall that for safet y specification M | = µ L A and M | = L A coincide. 15 automata for quantitativ e specifica tio n, t hen the solution r educes to solv ing MDPs with mean-pay off parity ob jective. In the follo wing we pro vide a n algorithmic solution of MDPs with mea n-pay off par ity ob jective. W e first presen t few ba sic results on MDPs. End c omp onents in MDPs [2 2 , 19 ] play a role equiv alent to closed recurrent sets in Marko v chains. Given an MDP G = ( S, s 0 , E , S 1 , S P , δ ) , a set U ⊆ S of states is an end c omp onent [22, 19] if U is δ - closed (i.e., f or all s ∈ U ∩ S P we have E ( s ) ⊆ U ) and the sub-g ame of G restr icted to U (denoted G ↾ U ) is stro ngly connected. W e denote by E ( G ) the se t of end comp onents of an MDP G . The following lemma states that, given any strategy (memor yless or not), with probabilit y 1 the set of states visited infinitely often along a play is an end comp onent. This lemma allows us to deriv e co nclusions on the (infinite) s et of plays in a n MDP b y ana lyzing the (finite) set of end components in the MDP . Lemma 3. [22 , 19] Given an MDP G , for al l states s ∈ S and al l str ate gies π ∈ Π , we have µ π s ( { ω | Inf ( ω ) ∈ E ( G ) } ) = 1 . F or a n end comp onent U ∈ E ( G ), consider the memory le ss strateg y π U that plays in any s tate s in U ∩ S 1 all edge s in E ( s ) ∩ U uniformly at random. Given the strateg y π U , the end comp onent U is a clos ed connected re current set in the Marko v chain obtained by fixing π U . Lemma 4. Give n an MDP G and an end c omp onent U ∈ E ( G ) , the str ate gy π U ensur es that for al l states s ∈ U , we have µ π U s ( { ω | Inf ( ω ) = U } ) = 1 . It follo ws tha t the s tr ategy π U ensures that from a ny sta rting state s , any other state t is rea ched in finit e time with probability 1. F rom Lemma 4 we can conclude that in an MDP the v alue for mean-pay off parity ob jectives can b e obtained b y computing v alues for end-comp onents and then applying the maximal exp ectatio n to reach the v alues of the end compo ne nts. Lemma 5. Consid er an MDP G with state sp ac e S , a priority fun ct ion p , and r ewar d funct ion r such that (a) G is an end-c omp onent (i.e., S is an end c omp o- nent) and (b) the minimum priority in S is even. Then the value for me an-p ayoff p arity obje ctive for al l states c oincide with the value for me an-p ayoff obje ctive, i. e., for al l states s we have V G ( mp p,r )( s ) = V G ( mean r )( s ) . Pr o of. W e consider t wo pure memoryles s stra tegies: o ne for the mean-payoff o b- jective and one for re aching the minimum prior ity ob jective and com bine them to pro duce the v a lue for mean-pay off parity ob jective. Consider a pure memory - less optimal strategy π m for the mean-pay off ob jective; and the strategy π S is a pure memoryless strategy for the sto chastic sho r test path to r each the states w ith the minim um pr iority (a nd the prio rity is even). Observe tha t under the strategy π S we o btain a Markov chain s uch that every clos ed recurrent set in the Markov chain contains sta tes with the minim um priority , and he nc e fro m an y state s a state w ith the minim um pr iority (whic h is even) is reached in finite time w ith probability 1. The mean-pay off v a lue for all states s ∈ S is the same: if we fix the memoryless strategy π u that choo ses all successors uniformly at ra ndom, then w e 16 get a Marko v chain as the who le set S as a closed rec urrent set, and henc e from all states s ∈ S any state t ∈ S is r eached in finite time with probability 1, and hence the mean-pay o ff v alue at s is at leas t the mean-pa yoff v alue at t . It follo ws that for all s, t ∈ S we hav e V ( mean r )( s ) = V ( mean r )( t ), and let us denote the uniform v alue by v ∗ . The s trategy π m is a pure memor y less strategy and once it is fixed we obtain a Mar ko v c hain and the limit of the av erag e frequenc y of the states exists and since π m is optimal it follows that for a ll states s ∈ S w e ha ve lim n →∞ 1 n · n X i =1 E π m s [ r ( θ i )] = v ∗ , where θ i is the ra ndom v ar iable for the i -th state of a path. Hence the strategy π m ensures that for an y ε > 0, there exists j ( ε ) ∈ N such that if π m is play ed for any ℓ ≥ j ( ε ) steps then the exp ected average o f the r ewards fo r ℓ steps is within ε o f the mean-payoff v alue o f the MDP , i.e., for all s ∈ S , for all ℓ ≥ j ( ε ) we hav e 1 ℓ · ℓ X i =0 E π m s [ r ( θ i )] ≥ v ∗ − ε. Let β b e the maxim um abs olute v alue of the rewards. The optimal strateg y for mean-pay off o b jective is pla yed in rounds, and the strateg y for ro und i is as follows: 1. Stage 1. Firs t play the strategy π S till the minim um priority s tate is reached. 2. Stage 2. Let ε i = 1 /i . If the game was in the fir st stage in this ( i - th round) for k i steps, then play the s trategy π m for ℓ i steps such that ℓ i ≥ max { j ( ε i ) , i · k i · β } . This ensures that the expected av erage of the rew ards in round i is at least ℓ i · ( v ∗ − ε i ) k i + ℓ i = ( ℓ i + k i ) · v ∗ − k i · v ∗ − ℓ i · ε i k i + ℓ i ≥ v ∗ − ℓ i · ε i + k i · v ∗ ℓ i + k i ≥ v ∗ − ℓ i · ε i + k i · β ℓ i + k i (since v ∗ ≤ β ) ≥ v ∗ − ℓ i · ε i + k i · β ℓ i = v ∗ − ε i − k i · β ℓ i ≥ v ∗ − 1 i − 1 i = v ∗ − 2 i . Then the stra tegy pro ceeds to round i + 1. The strategy ensures that there are infinitely many rounds, and hence the min- im um priority that is v is ited infinitely often with probability 1 is the minimum priority of the end comp onent (which is even). This ensures that the parity ob jec- tive is satisfied with pr obability 1. The ab ov e strategy ensures tha t the v alue for the mean-pay o ff parity ob jective is lim inf i →∞ ( v ∗ − 2 i ) = v ∗ . 17 This completes the pro of. ⊓ ⊔ Lemma 5 shows that in a n end co mpo nent if the minim um priority is even, then the v alue for mean-pay off parity and mean-payoff ob jective coincide if w e consider the sub-game restricted to the end c omp onent. The strategy constructed in Lemma 5 re q uires infinite memor y and in the following lemma w e show that for all ε > 0, the ε -a pproximation ca n b e achiev ed w ith finite memory s trategies. Lemma 6. Consid er an MDP G with state sp ac e S , a priority fun ct ion p , and r ewar d funct ion r such that (a) G is an end-c omp onent (i.e., S is an end c omp o- nent) and (b) t he minimum priority in S is even. Then fo r al l ε > 0 ther e is a finite-memory str ate gy π ε for which the me an-p ayoff p arity obje ctive value for al l states is within ε of t he value for the me an-p ayoff obje ctive, i.e. , for al l states s we have E π ε s [ mp p,r ] ≥ V G ( mean r )( s ) − ε . Pr o of. The pro o f of the res ult is similar as the proo f of Lemma 5 and the key difference is that the Stage 1 and Stage 2 strategies will b e play e d for a fixe d nu mber of ro unds , depending on ε > 0, but will not v ary across ro unds. Fix ε > 0, and we sho w how to cons truct a finite-memory strategy to achiev e 2 · ε approximation. As ε > 0 is a rbitrary , the desired result will follow. As in Lemma 5 we consider the t wo pure memoryless stra tegies: one for the mean- payoff ob jective and one for reaching the minim um priority o b jective and com bine them to pro duce the approximation of the v alue for mean-pay off pa rity ob jective. Consider a pure memoryless optimal stra tegy π m for the mean- pay off ob jective; and the stra teg y π S is a pure memoryless strategy for the stochastic shor test path to reach the states with the minim um prior it y (and the priority is even). As in Lemma 5 w e observe that under the str ategy π S we obtain a Marko v chain such that every closed recur rent set in t he Mark ov chain contains sta tes with t he min imum pr iority , and hence fro m any state s a state with the minimum priority (which is even) is reached in finite time with probability 1 . Let n be the num ber o f states of the end comp onent, and let η be the minimum p ositive tra ns ition probability in the end comp onent. The strategy π S ensures that fr om all states s there is a path to the minim um e ven prior ity state in the graph of the Marko v chain, and the path is o f length at most n . Hence the strateg y π S ensures that from a ll states s a minimum priority s tate is reached within n steps with proba bility at least η n (w e will refer this as Prop erty 1 later in the pro o f ). As shown in L e mma 5 the mean-pa yoff v alue for all states s ∈ S is the same: for all s, t ∈ S we have V ( mea n r )( s ) = V ( mean r )( t ), and let us denote the uniform v alue b y v ∗ . The strategy π m is a pure memor y less strategy and once it is fixed we obtain a Mar ko v c ha in and the limit o f the av erag e frequency of the states exis ts a nd since π m is o ptimal it follows that for all states s ∈ S we hav e lim n →∞ 1 n · n X i =1 E π m s [ r ( θ i )] = v ∗ , where θ i is the ra ndom v ar ia ble for the i -th state o f a path. Hence the stra tegy π m ensures that for any ε 1 > 0, there exists j ( ε 1 ) ∈ N such that if π m is play ed for any ℓ ≥ j ( ε 1 ) steps then the expected av erage o f the rewards for ℓ steps is within 18 ε 1 of the mean-pay off v alue o f the MDP , i.e., for all s ∈ S , for all ℓ ≥ j ( ε 1 ) we hav e 1 ℓ · ℓ X i =0 E π m s [ r ( θ i )] ≥ v ∗ − ε 1 . Let β b e the maximum a bsolute v alue of the rewards. The finite-memor y 2 · ε - optimal stra tegy fo r the mean-pay off parity ob jectiv e is play ed in rounds, but in contrast to Lemma 5 in every round the s ame strategy is pla yed. The stra tegy for a r ound is as follows: 1. Stage 1. First play the strategy π S for n steps. 2. Stage 2. Play the strateg y π m for ℓ steps such that ℓ ≥ max { j ( ε ) , 1 ε · n · β } . This ensures that the expected av erage of the rew ards in a round is at least ℓ · ( v ∗ − ε ) n + ℓ = ( ℓ + n ) · v ∗ − n · v ∗ − ℓ · ε n + ℓ ≥ v ∗ − ℓ · ε + n · v ∗ ℓ + n ≥ v ∗ − ℓ · ε + n · β ℓ + n (since v ∗ ≤ β ) ≥ v ∗ − ℓ · ε + n · β ℓ = v ∗ − ε − n · β ℓ ≥ v ∗ − ε − ε = v ∗ − 2 · ε. Then the stra tegy pro ceeds to the next ro und. The ab ove strategy is a finite-memor y str ategy as it needs to r emember the num b er n for first stage and the n umber ℓ for second sta ge. The abov e strategy e ns ures that the v alue for the mean-pa yoff ob jective is at least v ∗ − 2 · ε . T o co mplete the pro of that the strateg y is a 2 · ε optimal strateg y w e need to show that the parity ob jective is satisfied with probability 1. W e call a round a su c c ess is a minimum even priority state is visited. Hence we need to argue that with proba bilit y 1 there are infinitely man y success rounds. E very ro und is a success with probability at least α = η n > 0 (as b y P r op erty 1 the strateg y π S ensures that a minim um priority state is visited with probability at lea st α in n steps). F or r ound i , the probability that there is no success r ound after ro und i is lim k →∞ α k = 0. Since the countable union of measure zero even ts ha s measure zero, it follows that for any round i , the pr obability that ther e is no success round after ro und i is zero. It follows that the probabilit y that ther e are infinitely man y succes s rounds is 1 , i.e., the parit y ob jective is satisfied with proba bilit y 1. This completes the pro of. ⊓ ⊔ In the following w e show that if a s ystem can achiev e the o ptimal v a lue with a pure finite- s tate strateg y , then it can ac hieve the optimal v a lue a lso with a pur e memoryless strategy . 19 Lemma 7. Consid er an MDP G = ( S, s 0 , E , S 1 , S P , δ ) , a priority function p , and r ewar d funct ion r such that (a) S is an end c omp onent and (b) the m inimum priority in S is even. If ther e ex ists an optimal pur e fi nite-state str ate gy π , then ther e exists an optimal pur e memoryless str ate gy π ′ . Pr o of. Let M be the Markov c hain obtaine d b y fixing the strategy in G to π , i.e., M is the synchronous pro duct of G and a finite-state system descr ibing π . Since the mean-payoff par ity ob jective is prefix- indep endent and S is an end-comp onent (i.e., all states can reach each other with probabilit y 1), all recurrence classes in M hav e the same mea n-pay off par ity v alue. Ther efore we ca n co nstruct a finite-state strategy ˆ π such that the Marko v chain ˆ M obtained b y fixing the stra tegy in G to ˆ π has a s ing le rec ur rence class. Let ˆ C b e the single recurr ence class of ˆ M and let ˆ C | G be the set of states in G that app ear in ˆ C . W e know tha t min( p ( ˆ C | G )) is even. Let C 1 , . . . , C k be the c omp onent r e curre nc e classes that a rise if we fix an optimal pure memoryless strategy for the mean-payoff ob jective in G restricted to ˆ C | G . Since ˆ π is an optimal strategy , ˆ C a nd its comp onent recur rence classes C 1 , . . . , C k hav e the same mean-payoff v alue. Other wise, as sume there exists some C i that has a higher v alue, then a n infinite-state strategy that alternates b etw een playing a strategy tha t ensures C i and a strategy to reach the minimal prior ity (cf. pro of of Lemma 5 ) w ould achiev e a higher mean-pay off parit y v alue, which contradicts the as sumption that ˆ π is an optimal str ategy . Similarly , if some C i has a low er v alue, then removing C i would again result in a b etter stra tegy . If there is a r ecurrence cla s s C i such that min( p ( C i )) is o dd, then we can ignore C i in ˆ C without c hanging the v alue. Finally , assume ther e are t wo co mpo nent recurrent classes C 1 and C 2 such that min ( p ( C 1 )) a nd min( p ( C 2 )) is ev en, then we ca n igno re one of them without c hanging the payoff v a lue. F rom these prop erties, it follows that if ther e exists a n optimal finite-s tate strategy π , then there exists a recurr ence class C i s.t. the minimal prio rity is even and the mean-pay o ff v a lue is the sa me as the mean- payoff v a lue of π . The desired pure memoryless s trategy π ′ enforces the recurrence class C i by pla ying a strategy to stay within C i for sta tes in C i and for all states outside of C i it plays a pure memo ryless almost-s ure winning s trategy to reach C i . 4.3 Algorithm based on linear programming Computing b est end-comp onen t v al u e s W e fir st compute a s et S ∗ such that every end component U with min( p ( U )) is ev en is a subset of S ∗ . W e also compute a function f ∗ : S ∗ → R + that assigns to every state s ∈ S ∗ the v alue for the mean- pay off parity ob jectiv e that c an b e obtaine d b y visiting only states of a n end comp onent that con tains s . The computation o f S ∗ and f ∗ is as follows: 1. S ∗ 0 is the set of maximal end-comp onents with priority 0 and for a s tate s ∈ S ∗ 0 the function f ∗ assigns the mean-pay off v a lue when the sub-game is res tricted to S ∗ 0 (b y L e mma 5 w e k now tha t if we restric t the game to the end-comp onents, then the mean-pay off v alues and mea n- pay off pa rity v alues coincide); 20 2. for i ≥ 0, let S ∗ 2 i be the set of maximal end compo nent s with states with priority 2 i or more and that contains at least one state with priority 2 i , and f ∗ assigns the mean-pay off v a lue of the MDP r estricted to the set of end comp onents S ∗ 2 i . The set S ∗ = S ⌊ d/ 2 ⌋ i =0 S ∗ 2 i . This pro cedure g ives the v alues under the end-comp onent consideratio n. In the following, w e show how to chec k if an e nd- comp onent has a pure memo ryless strategy that achiev es the optimal v alue. Chec king end-comp onent for memoryless strategy Let U ∈ S ∗ be a max- imal end-compo nent with a minimal ev en priority , as computed in the previous section. Without loss of generality we assume that the MDP is bipartite, i.e., play er-1 states and probabilistic states strictly alterna te along ev ery path. Let E 1 = E ∩ ( S 1 × S P ) b e the s et of play er-1 edges , i.e ., the set of edges starting from a pla yer-1 state. The mean-pay off v alue of an end-compo nent can be co mputed using the following linear progra m fo r MDPs with unichain strategies (cf. [42, 22]): maximize X ( s,t ) ∈ E 1 x ( s,t ) · ( r ( s ) + r ( t )) (1) sub ject to X ( s,t ) ∈ E 1 x ( s,t ) = 1 (2) ∀ s ∈ S 1 X t ∈ S P , ( s,t ) ∈ E x ( s,t ) = X ( s ′ ,t ′ ) ∈ E 1 x ( s ′ ,t ′ ) · δ ( t ′ , s ) (3) The progr a m has one v ariable x ( s,t ) for every outgoing edge of a player-1 s tate. Int uitively , x ( s,t ) represents the frequency of being in state s and choo sing the e dge to state t . Note that a ll states s, t such that x ( s,t ) > 0 b elong to a recurrence cla ss. In order to c heck if there exists an optimal pure memory le ss str ategy in U , we call a mo dified v ersion of the line a r prog ram a b ov e for every ev en prior ity d . In particular, w e add the following additional constraints: ∀ s ∈ S 1 ∀ t ∈ S P :( s,t ) ∈ E x ( s,t ) = 0 if p ( s ) < d or p ( t ) < d (4) It r equires that in the resulting recurrence cla ss no priority small than d is vis- ited. T o ensure that the r esulting recurr ence class includes a t least o n s tate with priority d , w e a dd the following ter m to the ob jective function (Eqn. 1). X ( s,t ) ∈ E 1 s.t. p ( s )= d or p ( t )= d x ( s,t ) (5) Finally , let v b e the mean-pay off v alue for U o btained by so lving the linea r prog ram with E qn. 1 to 3. If there exists a n even prio rity d such that the mo dified linear progra m (Eqn. 1 to 5 ) ha s a v a lue strictly gr e a ter than v , then there exists a pure memoryless strateg y in S tha t achiev es the optimal v a lue. If the v a lue of the linear 21 progra m is strictly grea ter than v , then there exists a witness priority d and a corres p o nding edge ( s, t ) ∈ E 1 such that x ( s,t ) in Eqn. 5 has a p ositive v alue. In order to compute the max ima l reachabilit y expectation we present the fo l- lowing reduction. T ransformation to MDPs with max ob jectiv e Giv en a n MDP G = ( S, s 0 , E , S 1 , S P , δ ) with a p os itive reward function r : S → R + and a prio rity function p : S → { 0 , . . . , d } , and let S ∗ and f ∗ be the output of the ab ove pro ce dur e. W e con- struct an MDP G = ( S , s 0 , E , S 1 , S P , δ ) with a reward function r as follows: S = S ∪ b S ∗ (i.e., the set o f states co nsists of the state space S a nd a copy b S ∗ of S ∗ ), E = E ∪ { ( s, b s ) | s ∈ S ∗ ∩ S 1 and b s is the cop y o f s in c S ∗ } ∪ { ( b s, b s ) | b s ∈ b S ∗ } (i.e., along with edges E , for all play er 1 states s in S ∗ there is an edge to its copy b s in b S ∗ , and all states in b S ∗ are absorbing states), S 1 = S 1 ∪ b S ∗ , r ( s ) = 0 fo r all s ∈ S and r ( b s ) = f ∗ ( s ), wher e b s is the cop y of s . W e refer to this co ns truction as max conv ersion. The r elationship betw een V G ( mp p,r ) and V G ( max r ) can b e established as follows. 1. Consider a stra tegy π in G . If an end comp onent U is visited infinitely often, and min( p ( U )) is odd, then the pay off is ⊥ , and if min( p ( U )) is even, then the ma ximal payoff achiev able for the mean-pay off parity ob jective is upper bo unded by the pay off o f the mean-pay o ff ob jective (which is as s igned by f ∗ ). It follo ws that for all s ∈ S we hav e V G ( mp p,r )( s ) ≤ V G ( max r )( s ) . 2. Let π be a pure memor yless optimal s tr ategy for the ob jective max r in G . W e fix a strategy π in G as follo ws: if at a state s ∈ S ∗ the s trategy π c ho oses the edge ( s, b s ), then in G on rea ching s , the stra tegy π plays accor ding to the strategy of a n winning end compo nent that ensur es the mean-payoff v alue (as shown in Lemma 5), o ther wise π follows π . It fo llows tha t for all s ∈ S w e hav e V G ( mp p,r )( s ) ≥ V G ( max r )( s ) . It follows that for all s ∈ S we have V G ( mp p,r )( s ) = V G ( max r )( s ) . In order to solve G with the ob jective max r , we set up the follo wing linear prog ram and s olve it with a standard LP solver (e.g., [33]). Linear prog ramm ing for the max ob jectiv e in G The following linea r pro- gram characterizes the v alue function V G ( max r ). Observe that w e hav e already restricted our s elves to the almo st-sure winning states W G ( par ity p ), and b elow w e assume W G ( par ity p ) = S . F or all s ∈ S we hav e a v ar iable x s and the ob jective function is min P s ∈ S x s . The set of linear cons traints a re as follows: x s ≥ 0 ∀ s ∈ S ; x s = r ( s ) ∀ s ∈ b S ∗ ; x s ≥ x t ∀ s ∈ S 1 , ( s, t ) ∈ E ; x s = P t ∈ S δ ( s )( t ) · x t ∀ s ∈ S P . The correctness pr o of of the ab ov e linea r program to c hara cterize the v alue func- tion V G ( max r ) follows b y extending the result for re achabilit y ob jectives [30]. The 22 key pro p erty that can b e used to pro ve the correctness of the ab ov e claim is as follows: if a pure memor yless optimal stra tegy is fixed, then from a ll states in S , the se t b S ∗ of absorbing states is reached with probability 1. The ab ove pr op erty can b e prov ed as follo ws: since r is a pos itive reward function, it follo ws that for all s ∈ S we have V G ( mp p,r )( s ) > 0. Mo reov er, for all s ta tes s ∈ S w e hav e V G ( max r )( s ) = V G ( mp p,r )( s ) > 0 . Observe that for all s ∈ S we have r ( s ) = 0 . Hence if we fix a pur e memoryless optimal stra teg y π in G , then in the Markov chain G π there is no closed recurrent set C such that C ⊆ S . It follows that for all states s ∈ S , in the Ma rko v G π , the set b S ∗ is reached with probability 1. Using the ab ov e fact and the corr ectness of linea r-prog ramming for r eachabilit y ob jectives, the cor rectness pro of of the ab ov e linear- pr ogra m for the ob jectiv e max r in G can be obta ined. This shows that the v alue function V G ( mp p,r ) for MDPs with r eward function r can b e computed in poly no mial time. W e can sear ch for a pur e mem- oryless strateg y that achieves the optimal v alue by slight ly mo dify the presented pro cedure. First, w e check for each end-comp onent if a pure memory le ss stra tegy with o ptimal v a lue exists. Then, in the transfor mation to MDP with ma x ob jec- tive, w e cr eate copy states o nly for states in end-compo ne nts that ha ve optimal pure memoryless s trategies. In all states, for which the v alues obtain from the t w o different transformation to MDP with max ob jective co incide, a pure memoryless strategy that achiev es the optimal v alue exists. This g iven us the following lemma. Lemma 8. Give n a MDP with a me an-p ayoff p arity obje ctive, the value function for the me an-p ayoff p arity obje ct ive c an b e c ompute d in p olynomial time. We c an de cide in p olynomial time if ther e exists a p ur e memoryless (or finite-state) str ate gy that achieves t he optimal value. Note that, in gener al, the o ptimal strategies constructed for mean-payoff pa r ity requires memory , but the memory requirement is captured by a counter (which can be represe n ted b y a s tate mac hine with s tate space N ). The optimal strategy as describ ed in Lemma 5 plays t wo memoryles s str ategies, and eac h strategy is play ed a num b e r of steps which ca n b e stored in a counter. Using Lemma 6, we can fix the size the counter fo r an y ε > 0 a nd obtain a finite-state s trategy that is ε -optimal. Lemma 7 and the procedur e ab ov e allows us to check in polynomial time if there exists a pure memo ryless strateg y that achieves the o ptimal v alue. This result is quite surprising because the related problem of co mputing the optimal pure memoryles s strategy , i.e., the strategy that is optimal with res pe ct to all pure memoryless s trategy is NP-complete; t he upper bound follo ws fr om Theor em 1 and the fact that emptiness of parit y automata can be chec ked in p olynomial time [36]; the lower b ound follows fro m a r eduction of the dir e cted subgr aph homeomor phism problem [31]. Lemma 2 and Lemma 8 yield the following theore m. Theorem 3. Given a Parity sp e cific ation A , a Me an-p ayoff sp e cific ation B , and a lab ele d Markov cha in ( G, λ ) defining a pr ob ability me asur e µ on ( Σ ω I , F ) , we c an c onstru ct a state machine M (if one exists) in p olynomial time that s atisfi es L A under µ and optimizes L B under µ . We c an de cide in p olynomial time if M c an b e implemente d by a fi nite-state machine. If M r e quir es infinite memory, then for 23 al l ε > 0 , we c an c onstru ct a finite-st ate machine M ′ that satisfies L A under µ and optimizes L B under µ within ε . 5 Exp erimen tal Results In this section we illustrate whic h t yp es of systems, we ca n construct using qualita- tive and quan titative sp ecifica tions under pro babilistic en vironment assumptions. W e hav e implemen ted the approach as part of Quasy , our quan titative synthesis to ol [15]. Our to ol tak es qualitative and quantitativ e sp ecifica tions and automati- cally constructs a system that satisfies the qualitative sp ecification and optimizes the quan titative s pec ification, if suc h a sys tem exists. The user can c ho ose betw een a sys tem that satisfies a nd optimizes the spe c ific a tions (a) under all p ossible en- vironment b ehaviors or (b) under the most-likely environment b ehaviors given as a pr obability distribution on the p ossible input se q uences. W e are interested in the la tter functionality , i.e., in systems that are optimal for the av erage-ca se b ehaviors of the e nvironment. In this case, a spec ific a tion consists of (i) a s afety or a parity automato n A , (ii) a mean- pay off automaton B , a nd an environmen t assumption µ , given as a set of probability distributions d s ov er input letters for e a ch state s of B . Our implemenation first builds the product of A and B . Then, it construct the co rresp onding MDP G . If A is a safety sp ecifica tion, our implementation computes an optimal pur e memoryless stra tegy using p olicy iteration for multi-c hain MDPs [30]. Finally , if the v alue of the stra teg y is different from ⊥ , then it conv erts the s trategy to a finite-state machine M whic h satisfies L A (under µ ) and is optimal for B under µ . In the case of parity s p ecific a tions, we implemen ted the algorithm describ ed in Section 4.2. Then, o ur implemen tation pro duces t w o mealy machines M 1 and M 2 as output: (i) M 1 is optimal wrt the mean-pay off ob jective and (ii) M 2 almost-sure ly sa tis fie s the pa r ity ob jective. The actual system corre s po nds to a combination of the tw o mealy mac hines based on inputs from the en vironment switching o ver from one mea ly machin e to another based on a counter a s explained in Section 4 .2. More precis ely , if w e use the strategy used in the pro of of Le mma 5, we obtain an optimal but infinite-state system, because the size o f the coun ter ca nnot b e bounded. If we aim for a finite- state system, we can use the s tr ategy suggested in pro of of Lemma 6 lea ding to a finite-state system with an ε - optimal v alue. F urthermore , Lemma 7 and the corres p o nding linear progra m in Section 4 .3 allows us to check if ther e exists an optimal pure finite-state strategy . In this case, w e ca n return a single mealy machine. 5.1 Priorit y-driv en Contr oll er. In o ur fir st exper iment, w e to ok a s the quantitativ e sp ecific a tion B the pr o duct of the specificatio ns A 1 and A 2 from E xample 3 (Figure 3), where we sum the weights on the edges . The qualitative sp ecification is a safety automaton A ensuring mu- tually exclusive grants. W e assumed the constant probabilities P ( { r 1 = 1 } ) = 0 . 4 and P ( { r 2 = 1 } ) = 0 . 3 for the even ts r 1 = 1 a nd r 2 = 1, resp ectively . The opti- mal ma chine constructed b y the to ol is shown in Figure 8 . Note that its b ehavior 24 q 0 q 1 r 1 ¯ r 2 /g 1 ¯ g 2 ¯ r 1 / ¯ g 1 g 2 r 1 r 2 /g 1 ¯ g 2 r 1 /g 1 ¯ g 2 ¯ r 1 ¯ r 2 / ¯ g 1 g 2 ¯ r 1 r 2 / ¯ g 1 g 2 Fig. 8. Opt imal Mealy machine for the 2-client sp ecification wi thout response constraints and the safety automaton G 2 T able 1. Results for 2 to 7 clients without resp onse constraints Clien ts States in A × B S tates in G States in M V alue of M Time in s 2 4 13 2 1.854 0.50 3 8 35 4 2.368 0.81 4 16 97 8 2.520 1.64 5 32 275 16 2.534 3.43 6 64 793 32 2.534 15.89 7 128 2315 64 2.534 34.28 do es not depend on the state, i.e., State q 0 and q 1 are simulation equiv alent and can b e merged. Since our to ol do es not minimize state machines yet, we obtain a system with tw o states. This system b ehav es like a prio rity-driv en scheduler. It alwa ys grants the resource to the clien t that is more likely to send requests, if she is r equesting it. Otherwise, the reso urce is grant ed to the other client. Intuitiv ely , this is optimal b eca use Clien t 1 is mor e lik ely to send requests and s o missing a request from Client 2 is better than mis s ing a re q uest fro m Client 1 . 5.2 F air Con troller. In the second exp eriment, we added resp onse constr aints to the safety sp ecificatio n. The constraints are given as safety automata that require that every r e quest is granted within tw o steps. W e added one automa ton C i for each client i a nd the final qualitative s pe cification was A × C 1 × C 2 . The optimal machine the to ol constructs is System M 2 of Example 3 (Figure 5). System M 2 follows the request sent, if only a single reques t is send. If b oth clien ts request simultaneously , it alternates betw een g 1 and g 2 . If no ne of the clients is reques ting it grants g 1 . Recall that s ystem M 1 and M 2 from Example 3 exhibit the same worst-case b ehavior, so a sy nthesis approa ch based o n optimizing the worst-case b ehavior would not b e able to cons tr uct M 2 . 25 T able 2. Results for 2 to 4 cli ents with respon se constrai nts Clien ts States in A × B S tates in G States in M V alue of M 2 3 11 3 1.85 0 3 34 156 16 2.32 9 4 125 557 125 2.366 5.3 General Con trollers. W e reran b oth exper iments for sev eral clients. Aga in, the quantitativ e sp ecificatio n was the pro duct o f A i ’s. W e used a skew e d probability distribution with P ( { r n = 1 } ) = 0 . 3 and P ( { r i = 1 } ) = P ( { r i +1 = 1 } ) + 0 . 1 for 1 ≤ i ≤ 6 and the qua litative sp ecification r equired m utual exclusion. T able 1 shows in the first three columns the n umber of clients, the size of the spec ification ( A × B ), and the size of the corres p o nding MDP ( G ). Column 4 a nd 5 show the size and the v alue of the resulting machine ( M ), res pe ctively . The last co lumn sho ws the time needed to construct the s ystem. The r uns took betw een half a second and half a min ute. The systems generated as a result of this experiment hav e an intrinsic priority to granting re quests in order of probabilities from la rgest to smallest. T able 2 shows the results when adding respo nse constraint s that require tha t every request has to b e granted within the next n steps, where n is the num b er of clients. This exp eriment leads to quite intelligen t systems whic h prioritize with the mos t probable input request but slowly the prio rity shifts to the next r equest v ariable cyclically resulting into serv icing an y request in n s teps when there are n clients. Note th at these sy stems are (as expected) quite a bit la rger t han the corresp onding priority-driven controllers. 6 Conclusions and F uture W ork In this pap er we showed ho w to measure and synthesize systems under proba- bilistic en vironment a ssumptions wr t qualitativ e and quantitativ e sp e cifications. W e consider ed the satisfaction of the q ualitative sp ecification with pr obability 1 ( M | = µ ϕ ). Alternatively , we could hav e consider ed the satisfactio n of the qualita- tive sp ecification with certa int y ( M | = ϕ ). F or safety sp ecificatio n the t wo notions coincide, ho wev er, they are differ ent for parity s pe c ification. The no tion of satisfac- tion of the parit y sp ecifica tion with certaint y and optimizing the mean-pay off sp ec- ification can be o bta ined similar to the so lution of mea n-pay off par ity games [12] by replacing the solution o f mean-pay off games by s olution o f MDPs with mean-payoff ob jectives. Howev e r , since solving MDPs with parity sp ecificatio n for certain ty is equiv alent to solving t wo-play er parity games, and no polynomial time a lgorithm is known for parity g ames, the a lgorithmic solution for the satisfaction of the qualita- tive sp ecification with cer ta int y is computationally exp ensive a s compared to the po lynomial time algorithm for MDPs with mea n-pay off parity ob jectives. Mo re- ov er, under pr obabilistic a ssumption satisfa c tio n with probability 1 is the natural notion. 26 W e have implemented our algorithm in the to ol Quasy , a quantitativ e synthe- sis to o l fo r co nstructing worst-cas e and average-case optimal systems with resp ect to a q ua litative and a quantitativ e sp ecification. W e can chec k if an optimal finite- state system exists and then either constructs an o ptimal or an ε -optima l system depe nding on the o utcome o f the chec k. In our future work, we will explore different directions to improve the p er for- mance o f Quasy . In par ticular, a recen t pap er b y Wimmer et a l. [4 5] prese nt s a n efficient tec hnique for so lving MDP w ith mean-payoff o b jectives based o n combin- ing symbolic and explicit computation. W e will inv estigate if symbolic and explicit computations can b e combined for MDPs with mean-pay o ff pa rity ob jectives as well. References 1. Luca de Alfaro. T emp oral logi cs for the sp ecification of performance and reliabilit y . In ST ACS ’97 , pages 165–176, London, UK , 1997. Springer-V erlag. 2. R. Alur, A. Degorre, O. Maler, and G. W eiss. On omega-languages defined b y mean- pay off cond itions. In F OSSACS , Lecture Notes in Computer Science, pages 333 –347. Springer, 2009. 3. C. Baier, M. Gr¨ oßer, M. Leuck er, B. Bollig, and F. Ciesinski. Con troller synthesis for probabilistic systems. In IFIP TCS , pages 493–506, 200 4. 4. C. Baier and J.-P . Kato en. Principles of Mo del Che cking (R epr esentation and Mind Series) . The MIT Press, 2008. 5. A. Bianco and L. de Alfaro. Model chec king of probabilistic and nondeterministic systems. In FSTTC S 95 , pages 49 9–513. Sp ringer-V erlag, 1995. 6. R. Bloem, K. Chatterjee, T.A. H en zinger, and B. Jobstmann. Better qualit y in synthesis through qu antitativ e ob jectiv es. In CA V . Sprin ger, 2009. 7. R. Bloem, K. Greimel, T.A. Hen zinger, and B. Jobstmann. Synthesizing robust systems. In FM C AD’09 , 2009. 8. A. Chakrabarti, K . Chatterjee, T.A. Henzinger, O. Kup ferman, and R. Ma jumdar. V erifying quantitativ e prop erties u sing b ound functions. In CHARME . Springer, 2005. 9. A. Chakrabarti, L. de Alfaro , T.A. H enzinger, and M. S toelinga. Resource interface s. In EMSOFT , LNCS 2855, pages 117–133. Sp ringer, 2003. 10. K. Chatterjee, L. d e Alfaro, M. F aella, T.A. Henzinger, R. Ma jumdar, and M. Stoelinga. Compositional quantitativ e reasoning. In QEST , page s 179–18 8. IEEE Computer So ciety Press, 2006. 11. K. Chatterjee, L. Doy en, and T.A. Henzinger. Quantita tive languages. In Computer Scienc e L o gic (CSL) , pages 385–40 0, 2008. 12. K. Chatterjee, T.A. Henzinger, and M.Jurdzinski. Mean-pay off parity games. In LICS , pages 178–187, 2005. 13. K. Chatterjee, M. Jurdzi ´ nski, and T.A. Henzinger. Simple sto chastic parity games. In CSL’03 , volume 2803 of LNCS , pages 100–113. S pringer, 2003. 14. K. Chatterjee, M. Jurdzi´ n ski, and T.A. Hen zinger. Quantitativ e stochastic parity games. I n SODA’04 , pages 121–130. SI AM, 2004. 15. Krishnendu Chatterjee, Thomas Henzinger, Barbara Jobstmann, and Rohit Singh. Quasy: Quantitativ e synthesis to ol. In T ACAS , 2011. Accepted for publication. 16. Krishnendu Chatterjee, Thomas A. Henzinger, Barbara Jobstmann, and R oh it Singh. Measuring and synthesizing systems in probabilistic environmen ts. In CA V , pages 380–395 , 2010. 27 17. A. Churc h. Logic, arithmetic and automata. In Pr o c e e dings International Mathe - matic al Congr ess , 1962. 18. Edm und M. Clark e and E. A llen Emerson. Design and Synthesis of Synchronization Skeletons Using Bra nching-Time T emp oral Logic. In L o gic of Pr o gr ams , pages 52–71, 1981. 19. C. Courcoub etis and M. Y annaka kis. Mark o v decision p rocesses an d regular ev ents. In I CALP 90 , v olume 443 of L e ctur e Notes in Computer Scienc e , pages 336–349. Springer-V erlag, 1990. 20. P atric k Cousot and Radhia Cousot. Abstract Interpretation: A U nified Lattice Mo del for Static An alysis of Programs by Construction or Approximation of Fixpoints. In POPL , pages 238–252 , 1977. 21. R.A. Cuninghame-Green. Minimax algebra. I n L e ctur e Notes in Ec onomics and Mathematic al Systems , volume 166. Sprin ger-V erlag, 1979. 22. L. de Alfaro. F ormal V erific ation of Pr ob abilistic Systems . PhD thesis, Stanford Universit y , 1997. 23. L. de A lfaro. Stochastic t ransition systems. In CO N C UR 98 , pages 423–438, 199 8. 24. L. de A lfaro, T.A. Henzinger, and R. Ma jumdar. Discounting the future in systems theory . In ICALP’ 03 , pages 1022–10 37, 2003. 25. L. d e Alfaro, R. Ma jumdar, V. Raman, and M. Stoelinga. Game relations and metrics. In LICS , pages 99–108. IEEE Computer S ociety Press, 2007. 26. J. Desharnais, V. Gupta, R. Jagadeesan, and P . Panangaden. Metrics for labelled marko v systems. I n CONCUR 99: Concurr ency The ory , pages 258–273 . Springer, 1999. 27. M. Droste and P . Gastin. W eighted automata and w eighted logi cs. The or etic al Computer Scienc e , 380:69–86, 200 7. 28. M. Droste, W. Kuich, and G. R ahonis. Multi-v alued MS O logics ov er words and trees. F undamenta Informatic ae , 84:305–327, 20 08. 29. M. Droste, W. Kuich, and H. V ogler. Handb o ok of W ei ghte d A utomata . Sp ringer Publishing Company , Incorp orated, 2009. 30. J. Filar and K. V rieze. Comp etitive Markov De cision Pr o c esses . Springer-V erlag, 1996. 31. S. F ortune, J.E. Hopcroft, and J. Wyllie. The directed subgraph homeomorphism problem. The or. Comput. Sci. , pages 11 1–121, 1980 . 32. S. Gaub ert. Methods and applications of (max, +) linear algebra. I n ST ACS ’97 , pages 261–28 2. Springer-V erlag, 1997. 33. Glpk ( gnu linear programming kit) . http://www.gn u.org/sof tw are/glpk/. 34. Boudewijn R. H av erko rt. Performanc e of Computer Communic ation Systems: A Mo del-Base d Appr o ach . John Wiley & Sons, Inc., New Y ork, NY, USA , 1998. 35. G. Katz and D. P eled. Co de mutation in veri fication and automatic co de correction. In T ACAS 2010 , 2010. T o app ear. 36. V. King, O. Ku pferman, and M. Y. V ardi. On the complexity of parit y w ord au- tomata. In F oundations of Softwar e Scienc e and Computation Structur es , pages 276–286 , 2001. 37. O. Ku pferman and Y. Lustig. Lattice automata. In VMCAI , LN CS 4349, pages 199–213 . Springer, 2007. 38. M. Kwiatko wsk a, G. Norman, and D. Park er. PRISM: Probabilistic mo del chec king for performance and reliability analysis. ACM SIGMETRICS Perform. Evaluation R eview , 2009. 39. P . Nieb ert, D. Peled, and A. Pnueli. D iscriminativ e mo del checking. In CA V , 2008. 40. Ronald P arr and Stu art Ru ssell. Reinforcement learning with hierarchies of mac hines. In A dvanc es in Neur al Information Pr o c essing Systems 10 , p ages 1043–10 49. MIT Press, 1997. 28 41. A. Pnueli and R. Rosner. On the synthesis of a reactive module. In Pr o c. Symp osium on Principles of Pr o gr amming L anguages (POPL ’89) , pages 179–190, 19 89. 42. M.L. Puterman. Markov De cision Pr o c esses . John Wiley and Sons, 1994. 43. Jean-Pierre Q ueille and Jos eph Sifakis. Sp ecification and verificatio n of concurrent systems in CESAR. In Symp osium on Pr o gr amming , p ages 337–351, 1982. 44. P . J. G. Ramadge and W. M. W onham. The con trol of discrete eve nt systems. Pr o c e e di ngs of the IEEE , 77:81–98, 1989. 45. R. Wimmer, B. Braitling, B. Becke r, E. M. H ahn, P . Crouzen, H . Hermanns, A. Dhama, and O. Theel. Symblicit calculation of long-run avera ges for concurrent probabilistic systems. In QEST , 2010. 29
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment