Optimal minimax strategy in a dice game

Optimal minimax strategy in a dice game F. Cro cce ∗ E. Mordec ki † Octob er 26, 2018 Abstract Eac h of t w o pla yers, b y turns, rolls a dice several times accum ulating the successiv e scores un til he decides to stop, or he rolls an ace. When stopping, the accum ulated turn score is added to the pla yer accoun t and the dice is given to his opp onen t. If he rolls an ace, the dice is giv en to the opp onent without adding an y p oint. In this pap er we form ulate this game in the framew ork of comp etitive Mark ov decision processes (also known as sto chastic games ), show that the game has a v alue, provide an algorithm to compute the optimal minimax strategy , and presen t results of this algorithm in three diﬀeren t v arian ts of the game. Keyw ords: Comp etitiv e Mark ov pro cesses, Sto chastic games, dice games, mini- max strategy . AMS MSC: 60J10, 60G40, 91A15 1 In tro duction Consider a tw o-play ers dice game in whic h play ers accumulate p oints by turns with the follo wing rules. The pla yer who reaches a certain ﬁxed num b er of p oin ts is the winner of the game. In his turn each play er rolls the dice several times un til deciding to stop or rolling an ace. If he decides to stop the accumulated successiv e scores are added to his accoun t; while if he rolls an ace no additional points are obtained. As a ﬁrst approac h to ﬁnd optimal strategies for this game Roters [5] studied the optimal stopping problem corresp onding to the maximisation of the exp ected score in one turn. The optimal solution is a go o d wa y of minimising the num b er of turns required to reac h the ob jectiv e. Later, Roters & Haigh [3] found the strategy that minimises the exp ected num b er of turns required to reac h the target. This second strategy is better than the one obtained in [5] but none of them take into account the consideration of the n umber of p oin ts of the opp onen t, that is clearly relev an t in order to win the game. ∗ F acultad de Ciencias, Cen tro de Matem´ atica. Igu´ a 4225, CP 11400 Mon tevideo. e-mail: fabian@cmat.edu.uy † F acultad de Ciencias, Cen tro de Matem´ atica. Igu´ a 4225, CP 11400 Mon tevideo. e-mail: mordec ki@cmat.edu.uy 1 In this pap er we formulate this game in the framew ork of comp etitive Mark o v decision pro cesses (also kno wn as sto chastic games ), show that the game has a v alue, pro vide an algorithm to compute the optimal minimax strategy , and present results of this algorithm in three diﬀeren t v arian ts of the game. The concept of “sto chastic games” was introduced in 1953 b y Shapley in [6]. In the recen t b o ok by Filar and V rieze [2], the authors provide a general and mo dern comprehensiv e approac h to this theory departing from the theory of c ontr ol le d Markov pr o c esses (that can b e considered “solitaire” stochastic games) and call the type of games w e are in terested in c omp etitive Markov de cision pr o c esses , a denomination that w e ﬁnd more accurate than the more usual denomination sto chastic game . During the preparation of this pap er we found the related article b y Neller and Presser [4] where the authors, following an heuristic approac h, form ulate the Bellman equation of the problem (that is a consequence of our results), and compute the optimal strategy of a v ariant of this game. It must be noted that the theory of Filar and V rieze [2], that we follo w, provide the solution of the problem in the set of all p ossible strategies, including non-stationary and randomised strategies, i.e. the set of b eha viour strategies. In section 2 we presen t the theory of c omp etitive Markov de cision pr o c esses , sp e- cially in the tr ansient case and w e conclude the section with the form ulation of the theorem we need to solve our dice game. A pro of of this theorem can essentially b e found in [2]. In section 3 w e determine the state space of our game, the corresponding action spaces for eac h pla y er, the pa y oﬀ function of the game, and the Mark ov tran- sitions dep ending on eac h state of the pro cess and action of the pla yers. In section 4 we presen t tw o related games: ﬁrst, in order to win, the play er has to reach the target exactly (if the target is exceeded, he gives the dice to his opp onent without c hanging his accum ulated score); in the second v ariant the play ers aim to maximise the diﬀerence b et w een their scores. In section 5 w e presen t the conclusions. 2 Comp etitiv e Mark o v decision pro cesses A c omp etitive Markov de cision pr o c ess (also kno wn as a sto chastic game ) is the math- ematical mo del of a sequential game, in whic h t w o pla y ers tak e actions considering the status of a certain Marko v processes. Both actions determine an immediate pa yoﬀ for each play er and the probability distribution of the following state of the game. Our in terest is centred in t wo-pla yers, ﬁnite-state and ﬁnite-action, zero-sum games. T o deﬁne them formally w e need the follo wing ingredien ts: (S) States: A ﬁnite set S of the p ossible states of the game. (A) Actions: F or eac h state s ∈ S we consider ﬁnite sets A s and B s whose elements are the possible actions for the play ers; at eac h step both pla yers tak e his actions sim ultaneously and indep enden tly . (P) P a y oﬀs: F or each state s ∈ S a function r s : A s × B s → R determines the amoun t that pla y er t w o has to pay to play er one dep ending on the actions tak en b y b oth pla y ers. 2 (TP) T ransition probabilities: F or each ( s, a, b ) ∈ S × A s × B s , P s,a,b is a distribution probabilit y on S , whic h determines the follo wing state of game. W e denote by S t the state of the game at time t = 0 , 1 , . . . ; the initial state is ﬁxed ( S 0 = s 0 ). A t each step t play ers choose actions A t and B t in A S t B S t resp ectiv ely , whic h determine that play er tw o has to pay to play er one an amount of r S t ( A t , B t ) , and the distribution probabilit y of S t +1 will b e P S t ,A t ,B t . The random v ariable W = P ∞ t =0 r S t ( A t , B t ) is the total amoun t that pla yer t w o pa ys to play er one (could b e negative). Note that W dep ends on the w ay in whic h pla y ers tak e their actions. The ob jective of the game for play er one is to maximise the exp ected v alue V of the accum ulated pa y oﬀ W , while pla yer tw o has the ob jective of minimising it. In principle V could b e inﬁnite. Often for economic applications the accum ulated pay oﬀ is W β = P ∞ t =0 β t r S t ( A t , B t ), called “the discounted sum”, where 0 < β < 1 represent the dev aluation of the money . This discount factor β ensures that W is ﬁnite with probabilit y one and the existence of its exp ected v alue. In the case of tr ansient sto c hastic games, the situation considered in this pap er, the sum deﬁning W is ﬁnite a.s. due to the fact that the pro cess alwa ys reaches a ﬁnal state s f ∈ S in a (not necessarily b ounded) ﬁnite n umber of steps. In the deﬁnition of transient sto chastic game additional conditions are required, in order to V to b e ﬁnite. Before the formal deﬁnition of transient stochastic games, the concept of b eha viour strategy is in tro duced. 2.1 Strategies Consider the set K deﬁned b y K = { ( s, a, b ) : s ∈ S, a ∈ A s , b ∈ B s } W e deﬁne, for eac h t = 0 , 1 , . . . , the sets H t , of admissible histories up to time t , b y H t =    S if t = 0 K × . . . × K | {z } t times × S else Deﬁnition 1 (Behaviour strategy) Given a sto chastic game, a b eha viour strat- egy for player one (two) in the game is a function π which asso ciates to e ach history h = ( s 0 , a 0 , b 0 , . . . , s ) ∈ ∪ ∞ t =0 H t a distribution pr ob ability π ( ·| h ) in A s (r esp e ctively ϕ ( ·| h ) in B s ). In the c ontext of a sto chastic game, we denote by Π ( Φ ), the set of al l b ehaviour str ate gies for player one (two). Note that the previous deﬁnition is in agreement with the intuitiv e idea that a pla y er can choose his action based on the history of the game. There are tw o relev ant sub classes of strategies, pur e and stationary , introduced b elow. 3 Deﬁnition 2 (Pure strategy) A b ehaviour str ate gy π is said to b e pur e, if for e ach history h ther e exists an action a h such as π ( a h | h ) = 1 . We c ould say that a pur e str ate gy cho oses the action to b e taken in a deterministic way. Deﬁnition 3 (Stationary strategy) A b ehaviour str ate gy π is said to b e station- ary, if the pr ob ability distribution π ( ·| h ) dep ends only on s , the last state of the history. In this c ase we use the notation π ( ·| s ) . 2.2 Probabilistic framework W e now construct the probability space in whic h the optimisation pro cedure takes place. Consider the pro duct space Ω = ( S × ∪ s ∈ S A s × ∪ s ∈ S B s ) N equipp ed with the product σ -algebra F , deﬁned as the minimal σ -algebra containing the cylinder sets of Ω. Giv en ω = ( s 0 , a 0 , b 0 , s 1 , a 1 , b 1 , . . . ) ∈ Ω, a sequence of states and actions in the pro duct space, the co ordinate proc esses { S t } t =0 , 1 ,... , { A t } t =0 , 1 ,... , { B t } t =0 , 1 ,... are deﬁned b y S t ( ω ) = s t , A t ( ω ) = a t , B t ( ω ) = b t . In this framework, given π , ϕ behaviour strategies for pla yers one and tw o and an initial state s ∈ S , it is p ossible to introduce a probability P s,π ,ϕ , suc h that, for the random v ector H t = ( S 0 , A 0 , B 0 , S 1 , A 1 , B 1 , . . . , S t ) , and the ﬁnite sequence of states and actions h t = ( s 0 , a 0 , b 0 , . . . , s t ) , the following assertions hold: • the game starts in the state s , i.e., P s,π ,ϕ ( S 0 = s ) = 1; • with probabilit y 1, H t tak e their v alues in H t ; • the probabilit y distribution on the actions c hosen b y pla y ers at time t dep ends on H t , according to P s,π ,ϕ ( A t = a t | H t = h t ) = π ( a t | h t ) P s,π ,ϕ ( B t = b t | H t = h t ) = ϕ ( b t | h t ) P s,π ,ϕ ( A t = a t , B t = b t | H t = h t ) = π ( a t | h t ) ϕ ( b t | h t ) . • the distribution probability of S t +1 dep ends only on S t , A t , B t , b eing the tran- sition probabilities (TP) of the game P s,π ,ϕ ( S t +1 = s t +1 | H t = h t , A t = a t , B t = b t ) = P s t ,a t ,b t ( s t +1 ) . W e denote b y E s,π ,ϕ the exp ected v alue in the probabilit y space (Ω , F , P s,π ,ϕ ) . 4 2.3 T ransien t sto c hastic games Deﬁnition 4 (T ransien t sto c hastic game) A sto chastic game is transient when ther e exists a ﬁnal state s f ∈ S such that: (1) r s f ( a, b ) = 0 , ∀ a ∈ A s f , ∀ b ∈ B s f ; (2) P s f ,a,b ( s f ) = 1 , ∀ a ∈ A s f , ∀ b ∈ B s f ; (3) for al l p air of str ate gies ( π , ϕ ) of player one and two r esp e ctively and for al l initial state s ∞ X t =0 P s,π ,ϕ ( S t 6 = s f ) < ∞ . Conditions (1) and (2) ensur e that, onc e the game fal ls into the ﬁnal state s f , it never changes the state again and the gain of b oth players is zer o. The thir d c ondition ensur es that the game ﬁnishes with pr ob ability one. Deﬁnition 5 (V alue of a pair of strategies) Given π ∈ Π , ϕ ∈ Φ , str ate gies for players one and two in a tr ansient sto chastic game, the value of the str ate gies is a function V π ,ϕ : S → R deﬁne d by V π ,ϕ ( s ) = ∞ X t =0 E s,π ,ϕ ( r S t ( A t , B t )) . Deﬁnition 6 (Optimal strategy) A b ehaviour str ate gy π ∗ for player one in a tr an- sient sto chastic game is said to b e optimal if inf ϕ ∈ Φ V π ∗ ,ϕ ( s ) = sup π ∈ Π inf ϕ ∈ Φ V π ,ϕ ( s ) , ∀ s ∈ S. A nalo gously a b ehaviour str ate gy ϕ ∗ for player two, is said to b e optimal if sup π ∈ Π V π ,ϕ ∗ ( s ) = inf ϕ ∈ Φ sup π ∈ Π V π ,ϕ ( s ) , ∀ s ∈ S. W e no w form ulate the result used to solv e our dice game. Theorem 2.1 (V alue and optimal strategies) Given a tr ansient sto chastic game the fol lowing identity is fulﬁl le d sup π inf ϕ V π ,ϕ ( s ) = inf ϕ sup π V π ,ϕ ( s ) , for al l s ∈ S . (2.1) The ve ctor deﬁne d in (2.1) , denote d by ( v ( s )) s ∈ S , is c al le d the v alue of the game . This value is the unique joint solution of x ( s ) = " r s ( a, b ) + X s 0 ∈ S P s,a,b ( s 0 ) x ( s 0 ) # ∗ a ∈ A s ,b ∈ B s , 5 wher e [ · ] ∗ r epr esents the value of the matrix game (in the minimax sense) obtaine d by c onsidering r ows a ∈ A s and c olumns b ∈ B s . Mor e over, the stationary str ate gies π and ϕ for players one and two, such that π ( ·| s ) and ϕ ( ·| s ) ar e the optimal str ate gies of the matrix game " r s ( a, b ) + X s 0 ∈ S P s,a,b ( s 0 ) v ( s 0 ) # a ∈ A s ,b ∈ B s for al l state s ∈ S , ar e optimal str ate gies in the tr ansient sto chastic game. Pro of 1 This the or em is essential ly a p articular c ase of the or em 4.2.6 in [2]. A detaile d pr o of c an b e found at [1]. Remark 2.1 The pr evious the or em ensur es the existenc e of optimal str ate gies for b oth players. Particularly they ar e in the sub class of stationary str ate gies. In the pr o of of this the or em a map U such that U v ( s ) = " r s ( a, b ) + X s 0 ∈ S P s,a,b ( s 0 ) v ( s 0 ) # ∗ a ∈ A s ,b ∈ B s , (2.2) which is a n -step c ontr action, is c onsider e d. Afterwar ds, we use d the map U to implement a numeric al metho d to ﬁnd a unique ﬁxe d p oint, that is the value of the game. 3 The dice game In this section we describ e the states, actions, pa y oﬀs, and transition probabilities (deﬁned in section 2) corresp onding to our dice game, and presen t the numerical results, showing the optimal strategy for a play er. This strategy , optimal in the class of b eha viour strategies, ensures a pla yer to win with probabilit y of at least 1/2 indep enden tly of the opp onent strategy . The optimal strategy is pure and stationary , and consists in a simple rule indicating whether to r ol l or to stop , dep ending on the scores of the pla y er and its opp onen t. 3.1 Mo delling the dice game T o solve the dice game (compute optimal strategies), we mo del it as a transient sto c hastic game. W e hav e to sp ecify the set of states, p ossible actions, pa y oﬀs and transition probabilities. (S) States: During the dice game, there are four asp ect v arying: the play er j who has the dice ( j = 1 , 2), the accumulated score α of play er one, the accum ulated score β of pla yer t wo and the turn score τ of the play er j . So, we consider states ( j, α, β , τ ). W e also need to consider tw o sp ecial states: an initial state s 0 , and a ﬁnal state s f . 6 T able 1: P ossible actions for eac h pla y er dep ending on the state of the game. pla y er one pla y er t wo s i , s f to wait to wait (1 , α, β , 0) to r ol l to wait (1 , α, β , τ ) 0 <τ < 200 − α to r ol l , to stop to wait (1 , α, β , τ ) τ ≥ 200 − α to stop to wait (2 , α, β , 0) to wait to r ol l (2 , α, β , τ ) 0 <τ < 200 − β to wait to r ol l , to stop (2 , α, β , τ ) τ ≥ 200 − β to wait to stop If the score of either of the play ers is greater or equal than 200 the game is ov er, then, it is in the state s f . Because of that, states ( j, α, β , τ ), only make sense if α < 200, β < 200. The same happ en if τ is big enough to reach 200 stopping. So the ﬁnite set S of p ossible states is S = { s 0 , s f } ∪ S 1 ∪ S 2 where S 1 is the set of states of the pla y er one: S 1 = { (1 , α, β , τ ) : 0 ≤ α ≤ 199 , 0 ≤ β ≤ 199 , 0 ≤ τ ≤ 205 − α } and S 2 is the set of states of the pla y er t w o: S 2 = { (2 , α, β , τ ) : 0 ≤ α ≤ 199 , 0 ≤ β ≤ 199 , 0 ≤ τ ≤ 205 − β } . (A) Actions: W e ha ve to sp ecify the set of actions p er state for each pla y er. Pos- sible actions in this game are to r ol l and to stop . W e add an extra action to wait , which represen ts that is not the turn of the play er. There are some con- strain ts to ensure the transient condition of the sto chastic game: in the states (1 , α, β , 0) α< 200 do es not mak e sense for play er one to take the action to stop b ecause there’s nothing to loose. If in our mo del we p ermit taking the action to stop with 0 p oints in the turn ( τ = 0), is easy to see that there exist a pair of strategies that mak e the game inﬁnite. The same happen if the action to r ol l is p ossible when stopping is enough to win. The table 1 shows the set of p ossible actions p er state. (P) P a y oﬀs: Because we wan t to maximise the probabilit y of winning, w e deﬁne the pa y oﬀ function in suc h a wa y that maximising the probabilit y of winning is equiv alent to maximising the exp ected v alue V of the pa y oﬀs accum ulated along the game. The mo del of transient sto c hastic game allow us to deﬁne a pa y oﬀ for eac h pair ( state, action ) but in this case is enough to deﬁne the pa y oﬀ dep ending only on the state as follo ws: r s =  1 if s = (1 , α, β , τ ) with α + τ ≥ 200 0 else 7 (TP) T ransition probabilities: T o represent graphically the transition probabilities w e use the follo wing represen tation for the states: (1 , α, β , τ ) = ONML HIJK α τ β (2 , α, β , τ ) = β τ α . The dynamic of the game and the semantic of the states determine transition probabilities b et w een states. In the ﬁgure 1, the probability transitions from a state (1 , α , β , τ ) with α + τ < 200, dep ending on the play er decision are pre- sen ted. to r ol l ONML HIJK α τ β 1 6 ( ( R R R R R R R R R R R R R R R R 1 6 | | y y y y y y y y 1 6 " " D D D D D D D D D D D D D D D D D D D D 1 6                 1 6   1 6   2 2 2 2 2 2 2 2 2 2 2 2 2 2 β 0 α WVUT PQRS α τ +2 β WVUT PQRS α τ +3 β WVUT PQRS α τ +4 β WVUT PQRS α t +5 β WVUT PQRS α τ +6 β to stop ONML HIJK α τ β 1   β 0 α + τ Figure 1: T ransition probabilities from a state (1 , α, β , τ ) with α + τ < 200, dep ending on the pla y er decision. Figure 1 shows that, when the decision is to r ol l , the distribution probability on the states is asso ciated with the results of rolling a dice; particularly the probabilit y of lo osing the turn is 1/6. In the winning states of play er one (i.e. (1 , α, β , τ ) with α + τ ≥ 200) the transition is, with probabilit y one, to the ﬁnal state ( s f ). T ransitions for pla y er t wo are completely symmetric. As is show ed in ﬁgure 2 in the sp ecial states s i and s f transitions do not depend on the actions tak en b y pla y ers, indeed they do not hav e options. ?>=< 89:; s i 1 2         1 2   ? ? ? ? ? ? 0 0 0 GFED @ABC 0 0 0 7654 0123 s f 1 k k Figure 2: T ransition probabilities from the initial and ﬁnal states. No w we v erify that the sto chastic game deﬁned ab ov e is transient. W e then prov e that s f satisﬁes conditions (1), (2) and (3) in deﬁnition 4. Conditions (1) and (2) are trivially fulﬁlled, it remains to v erify that ∞ X t =0 P s,π ,ϕ ( S t 6 = s f ) < ∞ , 8 for all initial state s , and ev ery pair of strategies π , ϕ . Due to the fact that at the b eginning of a turn the only option is to r ol l , and that in a state in which the accumulated score is e nough to win the pla y er has to stop , the game can not con tin ue indeﬁnitely . F or example, if a 6 is rolled consecutively 70 times it is imp ossible to av oid reaching the ﬁnal state. Denoting by γ the probabilit y of rolling 70 times a 6, i.e. γ = 1 / 6 70 > 0, it is easy to see that γ is a low er b ound for the probabilit y of S t = s f for t ≥ 70. By a similar argumen t, it follo ws that, for n = 0 , 1 , . . . P ( S t 6 = s f ) < (1 − γ ) n if 70 n ≤ t < 70 ( n + 1) . Then ∞ X t =0 P s,π ,ϕ ( S t 6 = s f ) < 70 ∞ X n =0 (1 − γ ) n < ∞ and our mo del is transien t. 3.2 Numerical results In this section the results of the theorem 2.1 are applied to the particular case of the transien t sto c hastic game deﬁned ab o v e. Rewriting the deﬁnition of application U, deﬁned in equation (2.2), w e obtain: U v ( s ) =                    1 2 v (1 , 0 , 0 , 0) + 1 2 v (2 , 0 , 0 , 0) if s = s i v (1 , α, β , 0) rol l if s = (1 , α, β , 0) max { v (1 , α, β , τ ) stop , v (1 , α, β , τ ) rol l } if s = (1 , α, β , τ ) : α + τ < 200 1 if s = (1 , α, β , τ ) : α + τ ≥ 200 v (2 , α, β , 0) rol l if s = (2 , α, β , 0) min { v (2 , α, β , τ ) stop , v (2 , α, β , τ ) rol l } if s = (2 , α, β , τ ) : β + τ < 200 0 in other case where v (1 , α, β , τ ) rol l = 1 6 v (2 , α, β , 0) + 1 6 P 6 k =2 v (1 , α, β , τ + k ) v (1 , α, β , τ ) stop = v (2 , α + τ , β , 0) v (2 , α, β , τ ) rol l = 1 6 v (1 , α, β , 0) + 1 6 P 6 k =2 v (2 , α, β , τ + k ) v (2 , α, β , τ ) stop = v (1 , α, β + τ , 0) . Note that in the equations ab o v e we hav e replaced the v alue of the matrix games in equation (2.2) by a maximum, in the states in whic h play er one has to take the decision, and b y a minimum when is pla yer tw o the one that has to do it. In the states in which b oth play ers ha ve only one choice the v alue of the matrix game is the only en try of the matrix. Since there are no states in whic h both pla yers hav e to decide sim ultaneously , the stationary strategy that emerges from the theorem turns out to b e pure, i.e. each play er tak es an action with probability 1. T o determine the complete solution is necessary to sp ecify whic h action should b e tak en in ab out 4 000 000 states. In ﬁgure 3 the optimal strategy for play er one for some states is sho wn. The complete solution can b e found at 9 β = 0 β = 150 β = 180 β = 185 Figure 3: P art of the optimal strategy for play er one (for play er tw o is symmetric). It includes states with opp onent score β = 0 , 150 , 180 & 185. In the gra y zone the optimal action is to r ol l and in the black zone is to stop . www.cmat.edu.uy/cmat/docentes/fabian/documentos/optimalstrategy.pdf . Some observ ations ab out the solution: • A t the b eginning of the game, when b oth play ers hav e low scores, we see that the optimal action is to r ol l when τ < 20 and to stop in the other case, following the strategy found by Roters [5] maximising the expected v alue of a turn score. The heuristic interpretation of this fact is: when far aw ay from the target it is optimal to approac h it in steps as large as p ossible. • When the opp onent score β b ecomes larger the optimal strategy b ecomes riskier. This can b e explained b ecause there are less turns to reac h the target. • F or opp onent scores greater or equal than 187 ( β ≥ 187), the graphic b ecomes absolutely gra y . In other words, when the opp onent is close to win, giving him the dice is a bad idea. 10 • T o compare the optimal strategy with the one found by Haigh & Roters [3], we sim ulate 10 000 000 games. Our simulation show ed that in 52% of the games, the winner w as the pla y er with the optimal strategy . 4 Tw o related games 4.1 Reac hing exactly the target It is in teresting to explore how the optimal strategy changes when the game is mo d- iﬁed. In this section we consider the same dice game, with the only diﬀerence b eing that the condition to win is to reac h exactly 200 p oints. If the sum of accumu- lated and turn score is greater than 200 the turn ﬁnishes without capitalising any p oin t. The formulation of the game is quite similar, the diﬀerence app ears when the accum ulated score plus the turn score is greater that 194, situation in whic h one roll of the dice can exceed the target. As an example of the mentioned diﬀerence, in ﬁgure 4 we show the transition probabilities when the accumulated score is 180 and the turn score is 16 (180 + 16 > 194). Note that the probability of losing the to r ol l _^]\ XYZ[ 180 16 β 3 6              1 6   1 6   ; ; ; ; ; ; ; ; ; ; ; 1 6 & & M M M M M M M M M M M M M M M M M M M β 0 180 _^]\ XYZ[ 180 18 β _^]\ XYZ[ 180 19 β _^]\ XYZ[ 180 20 β Figure 4: Example of transition probabilit y in the v arian t of the game presented in section 4.1. turn is the probabilit y of rolling a 1,5,6. In ﬁgure 5 part of the optimal strategy for this v ariant of the game is shown. The complete optimal strategy is av ailable in www.cmat.edu.uy/cmat/docentes/fabian/documentos/optimalexactly.pdf . Some remarks ab out the solution: • As in the classical game, when the target is far, the strategy is similar to “stop if τ > 20”. • Unlik e the classical game, the optimal strategy in this case never b ecomes so risky . This is easy to understand b ecause the probabilit y of winning in an y turn is less or equal than 1 / 6, ev en in the case of b eing v ery close to the target. 11 β = 0 β = 150 β = 180 β = 198 Figure 5: P art of the optimal strategy for play er one (for play er t wo is symmetric) for the game presented in subsection 4.1. It includes states with opp onen t score β = 0 , 150 , 180 & 198. In the gray zone the optimal action is to r ol l and in the black zone is to stop . • When α + τ = 194 there is a “roll zone” larger than usual, b ecause 194 is the largest score in which there is no risk of losing in one roll but it is possible to win rolling a 6. 4.2 Maximising the exp ected diﬀerence In the second v ariant of the game, the winner, when reac hing the target, obtains from the loser the diﬀerence b et w een the target and the loser’s score. Again, the mo del of the game is v ery similar to the classical mo del, the only diﬀerence is the pa y oﬀ function: r s =    200 − β , if s = (1 , α, β , τ ) with α + τ ≥ 200, α − 200 , if s = (2 , α, β , τ ) with β + τ ≥ 200, 0 , else. 12 Figure 6 shows the optimal strategy for some opp onent scores. The complete optimal strategy can b e found at www.cmat.edu.uy/cmat/docentes/fabian/documentos/optimalmaxdif.pdf . The main diﬀerence with the optimal strategy in the classical case, is that when one pla y er is close to win (taking into account his curren t turn score), he tak es the risk of rolling, this feature b eing observ ed for an y score of the opp onen t. β = 0 β = 150 β = 170 β = 180 Figure 6: P art of the optimal strategy for play er one (for play er t wo is symmetric) for the v arian t presented in subsection 4.2. It includes states with opp onent score β = 0 , 150 , 170 & 180. In the gray zone the optimal action is to r ol l and in the black zone is to stop . 5 Conclusion In this paper we mo del a dice game in the framew ork of Mark ov comp etitiv e decision pro cesses (also known as sto chastic games ) in order to obtain optimal strategies for a play er. Our main results are the pro of of the existence of a v alue and an optimal minimax strategy for the game, and the prop osal of an algorithm to ﬁnd this strategy . 13 W e base our results on the theory of transien t stochastic games exp osed b y Filar and V rieze in [2]. Previous mathematical treatments of this problem include the solution of the optimal stopping problem for a play er that w ants to maximise the exp ected num b er of p oints in a single turn (see Roters [5]) and the minimisation of the exp ected num b er of turns required to reac h a target (see Haigh and Roters and [3]). Another previous con tribution was done b y Neller and Presser [4], who found the optimal strategy in the set of stationary pure strategies, departing from a Bellman equation. W e also provide an algorithm to compute explicitly this optimal strategy (that coincides with the optimal strategy in the larger class of b ehaviour s trategies) and sho w ho w this algorithm w orks in three diﬀeren t v arian ts of the game. Ac kno wledgemen ts This w ork w as partially supp orted b y An tel-F undaciba agreemen t “An´ alisis de Algo- ritmos de Co diﬁcaci´ on y Cifrado” References [1] Crocce, F. (2009). Jue gos esto c´ astic os tr ansitorios y aplic aciones. Master thesis, PEDECIBA, Mon tevideo, Urugua y . [2] Filar, J. & Vrieze, K. (1997). Comp etitive Markov De cision Pr o c esses. Springer-V erlang, New Y ork. [3] Haigh, J. & Roters, M. (2000). Optimal Strategy in a Dice Game. Journal of Applie d Pr ob ability 37, 1110–1116. [4] Neller, T. & Presser, C. (2004). Optimal Play of the Dice Game Pig. The UMAP Journal 25.1, 25–47. [5] R oters, M. (1998). Optimal Stopping in a Dice Game. Journal of Applie d Pr obability 35, 229–235. [6] Shapley, L.S. (1953). Sto chastic games. Pro c. of the Nat. A c ad. Scienc es 39, 1095-1100. 14

Optimal minimax strategy in a dice game

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment