An adjusted payoff-based procedure for normal form games
We study a simple adaptive model in the framework of an N -player normal form game. The model consists of a repeated game where the players only know their own action space and their own payoff scored at each stage, not those of the other agents. Eac…
Authors: Mario Bravo
An adjusted pa y off-based pro cedure for normal form games Mario Bra vo Univ ersidad de San tiago de Chile Departamen to de Matem´ atica y Ciencia de la Computaci´ on mario.bravo.g@usach.cl Abstract W e study a simple adaptive model in the framework of an N -pla yer normal form game. The mo del consists of a rep eated game where the pla y ers only kno w their own action space and their o wn pa yoff scored at each stage, not those of the other agents. Eac h play er, in order to up date her mixed action, computes the av erage vector pay off she has obtained by using the n umber of times she has play ed each pure action. The resulting sto c hastic pro cess is analyzed via the ODE metho d from stochastic approximation theory . W e are interested in the con vergence of the pro cess to rest p oin ts of the related contin uous dynamics. Results concerning almost sure conv ergence and conv ergence with p ositiv e probability are obtained and applied to a traffic game. W e also provide some examples where conv ergence o ccurs with probability zero. Keyw ords: Normal form games, Learning, Adaptive dynamics, Stochastic appro ximation MSC2010 Sub ject classification: Primary: 91A26, 91A10 ; Secondary: 62L20, 93E35 1 In tro duction This paper studies an adaptiv e model for an N -pla yer rep eated game. W e consider b oundedly rational pla y ers that adapt using simple b eha vioral rules based on past exp erience. The decision that a play er can make at each stage hinges on the amount of information av ailable. There are several approaches, depending on ho w m uch information agen ts can gather o ver time. Fictitious play (see Brown [8], F uden b erg and Levine [15]) is one of the most widely studied pro cedures. Play ers adapt their b eha vior b y p erforming b est responses to the opp onen t’s av erage past play o ver time. In this case, eac h pla yer needs to know her o wn pay off function and to receiv e complete information ab out the other play ers’ mo ves. A less restrictiv e framew ork is when eac h pla yer is informed of all the p ossible pay offs she could hav e obtained by using alternative mov es. The exp onen tial pro cedure (F reund and Shapire [14]) is one example of this kind of adaptive pro cess. Note that, in this case, a pla yer do es not necessarily observ e her pa yoff function. W e are interested in a less informativ e con text here. Pla y ers do not an ticipate opp onents’ b eha vior and w e assume that they hav e no information on the structure of the game. This means that agents ha ve only their o wn action space and past realized pay offs to react to the environmen t. W e assume that pla yers are giv en a rule of b eha vior (a de cision rule ) whic h depends on a state v ariable. The state v ariable is up dated b y a p ossibly time-dependent rule (an up dating rule ) based on the history of play and curren t observ ations. A mo del widely studied in this framew ork is the cumulative r einfor c ement le arning pro cedure, where pla yers conserv e a v ector p erception (the state v ariable) in which eac h co ordinate of the vector represen ts ho w a mov e p erforms. The up dating rule is defined by adding the pa yoff received to the component of the previous vector p erception corresponding to the mov e actually play ed, and k eeping the other comp onen ts unaltered for the unused mov es. The decision rule is given by the normalization of this p erception v ector, 1 assuming that pay offs are positive. Several results for the con vergence (and noncon vergence) of pla yers’ mixed actions hav e b een obtained (see Beggs [2], B¨ orgers and Sarin [6], Laslier et al. [20], as well as a normalized v ersion by Posc h [26] for the 2-play er game framework and Erev and Roth [13] for exp erimen tal results). In Cominetti et al. [10], the authors study a model in the same spirit, mainly using a Logit decision rule (whic h allo ws nonpositive pay offs) in the N -pla yer case. Play ers update the p erception vector by p erforming an a verage b et ween the new pay off receiv ed and the previous p erception. Conditions are giv en to ensure the conv ergence to a Nash equilibrium of a perturb ed version of the game. A similar model is studied b y Leslie and Collins [21], where results concerning 2-play er games are obtained. Another approach using this information framew ork is prop osed by Hart and Mas-Colell [17], where the analysis focuses on the conv ergence of the empirical frequency of pla y instead of the long-term b eha vior of the mixed action. Using techniques based on consisten t pro cedures (see Hart and Mas-Colell [16]), it is shown that, for all games, the set of correlated equilibria is attained. W e consider here a particular updating rule where pla y ers main tain a p erception vector that is up dated, on the coordinate corresponding to the action pla yed, b y computing the av erage betw een the previous perception and the pa yoff received using the num b er of times that eac h action has b een play ed. It is natural to c onsider this v ariant: the actions that ha ve been play ed most often in the past are the ones for whic h the pla yer should ha ve the most accurate p erception, so it is sensible for the pla yer to put less w eight on the most recen t pay off when up dating his perception of this action. The resulting pro cess turns out to b e a v ariation of that explored by Cominetti et al. [10]; but in our case, pla yers use more information on the history of play . Using the to ols pro vided by the stochastic appro ximation theory (see e.g., Bena ¨ ım [3], Ben veniste et al. [4], Kushner and Yin [19]), the asymptotic b eha vior of the pro cess can b e analyzed by studying a related contin uous dynamics. W e are in terested in the case where pla yers use the Logit decison rule, and our aim is to find general conditions that will lead to almost sure, or with positive probability , conv ergence to an attractor of the asso ciated ODE. This case is particularly in teresting b ecause the rest p oin ts of the ODE are the Nash equilibria of a related game. This paper is organized as follo ws. Section 2 describ es the fundamen tal theory underpinning the sto c hastic appro ximation. Section 3 precisely defines our model in the framew ork of an infinitely rep eated N -pla yer normal form game. In Section 4, we restate our algorithm so that it fits the stochastic approximation setting and we provide a general almost sure conv ergence result. In Section 5.1 we treat the case of the Logit rule in detail. W e start b y finding an explicit condition to ensure almost sure conv ergence derived from Section 4. This condition requires the smo othing parameters asso ciated with the Logit rule to b e sufficiently small. It is worth noting that, by this p oin t, we hav e pro ved that the results obtained for the pro cess studied by Cominetti et al . [10] also hold in our setting. Given this, w e compare these t w o pro cesses in terms of the path- wise rate of con vergence. Later, under a weak er assumption, we study con v ergence to attractors with p ositiv e probabilit y . W e apply this result to a particular traffic game on a simple net work (studied as an application in [10]), showing that conv ergence with p ositiv e probability holds under a muc h w eaker assumption than in the general case. Finally , we pro vide some examples where the conv ergence is lost. 2 Preliminaries This section recalls some basic features of the sto c hastic approximation theory follo wing the approach in Bena ¨ ım [3]. The aim is to study the follo wing discrete pro cess in R d z n +1 − z n = γ n +1 H ( z n ) + V n +1 , (2.1) where ( γ n ) n is a nonnegativ e step-size sequence, H : R d → R d is a con tinuous function and ( V n ) n is a (deterministic or random) noise term. Let us denote by L ( z n ) the limit set of the sequence ( z n ) n , i.e., the set of p oin ts z suc h that lim l → + ∞ z n l = z for some sequence n l → + ∞ . The connection b etw een the asymptotic b eha vior of the discrete process (2.1) and the asymptotic behavior of the contin uous dynamics ˙ z = H ( z ) (2.2) 2 is obtained as follows. Given ε > 0, T > 0, a set Z ⊆ R d and tw o points x, y ∈ Z , we say that there is an ( ε, T )-c hain IN z b et ween x and y if there exist k solutions of (2.2) { x 1 , . . . , x k } and times { t 1 , . . . , t k } greater than T such that (1) x i ([0 , t i ]) ⊆ Z for all i ∈ { 1 , . . . , k } , (2) k x i ( t i ) − x i +1 (0) k < ε for all i ∈ { 1 , . . . , k − 1 } , (3) k x 1 (0) − x k < ε and k x k ( t k ) − y k < ε. Definition 2.1. A set D ⊆ R d is Internally Chain T ransitive (ICT) for the dynamics (2.2) if it is c omp act and for al l ε > 0 , T > 0 and x, y ∈ D ther e exists an ( ε, T ) -chain in D b etwe en x and y . This definition is derived from the notion of Internal ly Chain R e curr ent sets introduced by Conley [11]. Roughly sp eaking, on an ICT set, we can link any tw o p oin ts by a chain of solutions of the dynamics (2.2) by allo wing small p erturbations. ICT sets are compact, in v arian t and attractor-free. In Bena ¨ ım [3] the following general theorem is prov ed. Theorem 2.2. Consider the discr ete pr o c ess (2.1) . Assume that H is a Lipschitz function and that ( a ) the se quenc e ( γ n ) is deterministic, γ n ≥ 0 , P n γ n = + ∞ and γ n → 0 , ( b ) sup n ∈ N k z n k < + ∞ , and ( c ) for any T > 0 lim n → + ∞ sup k − 1 X i = n γ i +1 V i +1 ; k ∈ { n + 1 , . . . , m ( n X j =1 γ j + T ) } = 0 , wher e m ( t ) is the lar gest inte ger l such that t ≥ l P j =1 γ j . Then L ( z n ) is an ICT set for the dynamics (2.2) . Remark 2.3. In the case where the noise ( V n ) n in (2.1) is a martingale difference sequence with respect to some filtration on a probability space, w e sa y that (2.1) is a Robbins–Monro [27] algorithm . In this framew ork if, for instance, sup n E ( k V n k 2 ) < + ∞ and ( γ n ) n ∈ l 2 ( N ) then assumption ( c ) in Theorem 2.2 holds with probability one (see Bena ¨ ım [3, Prop osition 4.2]). Moreov er this result is still v alid if the noise can be decomp osed into a martingale difference process plus a random v ariable that con verges almost surely to zero. 3 The mo del An N -pla yer normal form game is introduced as follo ws. Let A = { 1 , 2 , . . . , N } be the set of play ers. F or ev ery i ∈ A let S i b e the finite action set for pla yer i and let the set ∆ i = { z ∈ R | S i | ; z i ≥ 0 , P i z i = 1 } denote her mixed action set. S = Q i ∈ A S i is the set of action profiles and ∆ = Q i ∈ A ∆ i is the set of mixed action profiles. W e write as ( s, s − i ) ∈ S the action profile where play er i uses s ∈ S i and her opp onen ts use the profile s − i ∈ Q j 6 = i S j and we adopt the same notation when a mixed action profile is inv olved. The pa yoff function of each play er i ∈ A is denoted by G i : S → R and its multilinear extension b y G i : ∆ → R . The game is rep eated infinitely and we assume that play ers are not informed ab out the structure of the game, i.e., neither the n umber of play ers (or their strategies) nor the pay off functions are known. At the stage n ∈ N , each play er i selects an action s i n ∈ S i using the mixed action σ i n ∈ ∆ i . Then, she obtains her o wn pay off g i n = G i ( s i n , s − i n ), and this is the only information she receives. F or ev ery n ∈ N and for eac h play er i , w e assume that the mixed action at stage n , σ i n ∈ ∆ i , is determined 3 as a function of a previous p er c eption ve ctor x i n − 1 ∈ R | S i | , i.e., σ i n = σ i ( x i n − 1 ) with σ i : R | S i | → ∆ i . The state space for the p erception vector profiles x = ( x 1 , . . . , x N ) ∈ Q i ∈ A R | S i | is denoted X . W e also assume that, for every i ∈ A , the function σ i : R | S i | → ∆ i is contin uous, and for all s ∈ S i and x i ∈ R | S i | , σ is ( x i ) > 0 . (A) W e will refer to the function σ : X → ∆ with σ ( x ) = ( σ 1 ( x 1 ) , . . . , σ N ( x N )) as the de cision rule of the pla yers. A t the end of stage n , each pla yer i uses the v alue g i n and x i n − 1 to obtain the new p erception vector x i n , and so on. The manner in which x n is up dated is called the up dating rule of the play ers. Cominetti et al. [10] study the following up dating rule x is n +1 = ( 1 − γ n +1 x is n + γ n +1 g i n +1 , if s = s i n +1 , x is n , otherwise, (3.1) where we assume that γ n = 1 n (see the discussion after Prop osition 5.6 for an explanation on this c hoice). In this pap er we consider a v ariation of (3.1). Play ers will use more information by taking in to account the num ber of times their actions hav e b een pla yed. Explicitly , w e define the adjusted pro cess (APD) b y x is n +1 = 1 − 1 θ is n +1 x is n + 1 θ is n + 1 g i n +1 , if s = s i n +1 , x is n , otherwise, (APD) where θ is n denotes the num b er of times action s has b een used by play er i ∈ A up to time n . Given the particular structure in (APD), x n can b e assumed to lie within a compact subset of X for all n ∈ N . Note that the new v ariable is simply an av erage b et ween the previous one and the new pay off scored. W e also notice that (A) implies that the decision rule can b e assumed to b e component-wise b ounded a w ay from zero. As usual, w e denote b y F n the σ -algebra generated b y the history up to time n , F n = σ ( s m , g m ) 1 ≤ m ≤ n , where s m = ( s 1 m , . . . , s N m ) and g m = ( g 1 m , . . . , g N m ). 4 Asymptotic analysis If we wan t to analyze (APD) using the to ols decrib ed in Section 2, the main problem is that w e ha ve a sto c hastic algorithm in discrete time where the step size is random and, moreo ver, dep ends on the co ordinates of the vector to b e up dated. Thus, in order to study the asymptotic prop erties of our adaptive pro cess, let us restate the up dating sc heme (APD) in the following manner: x is n +1 − x is n = 1 θ is n + 1 { s = s i n +1 } ( g i n +1 − x is n ) 1 { s = s i n +1 } = 1 ( n + 1) λ is n h ( g i n +1 − x is n ) 1 { s = s i n +1 } + b is n +1 i , and b is n +1 = O 1 n , where λ is n = θ is n n is the empirical frequency of action s for pla yer i up to time n and 1 C stands for the indicator function of set C . Remark 4.1. Note that the previous decomp osition is not well-defined when θ is n = 0, but Lemma 4.3 shows that it is almost surely v alid for large n and for all i and s ∈ S i . Standard computations inv olving a verages sho w that λ is n +1 − λ is n = 1 n + 1 1 { s = s i n +1 } − λ is n . 4 Then we can express (APD) differently by in tro ducing the empirical frequency of play . The new form is the (up to a v anishing term) martingale difference scheme x is n +1 − x is n = 1 n + 1 σ is ( x i n ) λ is n ( G i ( s, σ − i ( x n )) − x is n ) + U is n +1 , λ is n +1 − λ is n = 1 n + 1 σ is ( x i n ) − λ is n + M is n +1 , (4.1) where the noise terms are explicitly U is n +1 = 1 λ is n ( g i n +1 − x is n ) 1 { s = s i n +1 } − σ is ( x i n ) λ is n ( G i ( s, σ − i ( x n )) − x is n ) + b is n +1 , = 1 λ is n ( g i n +1 − x is n ) 1 { s = s i n +1 } − E 1 λ is n ( g i n +1 − x is n ) 1 { s = s i n +1 } | F n + b is n +1 , M is n +1 = 1 { s = s i n +1 } − σ is ( x i n ) , = 1 { s = s i n +1 } − λ is n − E ( 1 { s = s i n +1 } − λ is n | F n ) . (4.2) F rom no w on, w e denote b y n = ( U n , M n ) the noise term asso ciated with our pro cess. The sc heme (4.1) will allow us to deal with the random (and pla yer-dependent) character of the step size in (APD). Now, in the spirit of Theorem 2.2, the asymptotic b eha vior of (4.1) is related to the contin uous dynamics ˙ x is t = σ is ( x i t ) λ is t G i ( s, σ − i ( x t )) − x is t = Ψ is x ( x t , λ t ) , ˙ λ is t = σ is ( x i t ) − λ is t = Ψ is λ ( x t , λ t ) , (4.3) with Ψ x : X × ∆ → Q i ∈ A R | S i | and Ψ λ : X × ∆ → Q i ∈ A ∆ i 0 , and ∆ i 0 standing for the tangen t space to ∆ i , i.e., ∆ i 0 = { z ∈ R | S i | ; P s ∈ S i z s = 0 } . Let us denote Ψ the function defined by Ψ( x, λ ) = (Ψ x ( x, λ ) , Ψ λ ( x, λ )). F or the sak e of completeness, let us write the pro cess (3.1) as x is n +1 − x is n = 1 n + 1 σ is ( x i n )( G i ( s, σ − i ( x n )) − x is n ) + ˜ U is n +1 , (4.4) with the noise term given b y ˜ U is n +1 = ( g i n +1 − x is n ) 1 { s = s i n +1 } − σ is ( x i n )( G i ( s, σ − i n ( x n )) − x is n ) , = ( g i n +1 − x is n ) 1 { s = s i n +1 } − E (( g i n +1 − x is n ) 1 { s = s i n +1 } | F n ) . (4.5) Therefore, the corresp onding con tinuous dynamics is giv en by ˙ x is t = σ is ( x i t ) G i ( s, σ − i ( x t )) − x is t = Φ is ( x t ) , (4.6) where Φ : X → Q i ∈ A R | S i | . Remark 4.2. Observ e that the follo wing simple fact holds ( x, σ ( x )) ∈ X × ∆ is a rest p oin t of (4.3) ⇔ x ∈ X is a rest p oin t of (4.6) . In the follo wing, w e will sho w that asymptotic properties similar to those of (4.4) can b e obtained for our pro cess. This means that explicit conditions can b e found to ensure that the pro cess (4.1) conv erges almost surely to a global attractor for the dynamics (4.3). Recall that w e ha ve assumed that, for ev ery n ∈ N and i ∈ A , the mixed action σ i n ∈ ∆ i is component-wise b ounded aw ay from zero. The purp ose of the next simple lemma is to v erify that the same holds, almost surely , for empirical frequencies of play . 5 Lemma 4.3. F or n ≥ 1 , let σ n b e a pr ob ability distribution over a finite set T and let i n +1 b e an element of T which is dr awn with law σ n and assume ( σ n ) n is adapte d to the natur al filtr ation gener ate d by the history. F or al l j ∈ T , set λ j n = 1 n n X p =1 1 { i p = j } . Assume that ther e exists σ > 0 such that σ j n ≥ σ . Then lim inf n → + ∞ λ j n ≥ σ , almost sur ely, for every j ∈ T . Pr o of. Fix j ∈ T and let F k b e the σ -algebra generated b y the history { i 1 , . . . , i k } up to time k . Then w e ha ve that E ( 1 { i k = j } | F k − 1 ) = σ j k − 1 ≥ σ . On the other hand the random pro cess ( φ j n ) n giv en by φ j n = n X k =1 1 k 1 { i k = j } − E ( 1 { i k = j } | F k − 1 ) is a martingale and sup n ∈ N ( φ j n ) 2 ≤ C · P p ≥ 1 1 p 2 < + ∞ for some constant C . Hence ( φ j n ) n con verges almost surely . Now Kronec ker’s lemma (see e.g., Shiryaev [30, Lemma IV.3.2]) giv es that lim n → + ∞ 1 n n X k =1 1 { i k = j } − E ( 1 { i k = j } | F k − 1 ) = 0 . So that 1 n P n k =1 ( 1 { i k = j } − E ( 1 { i k = j } | F k − 1 )) ≤ λ j n − σ . T aking the lim inf w e conclude. Prop osition 4.4. The pr o c ess (4.1) c onver ges almost sur ely to an ICT set for the c ontinuous dynamics (4.3) . Pr o of. W e only hav e to show that our pro cess satisfies the h yp otheses of Theorem 2.2. The assumptions concerning the regularit y of the function inv olved, the step-size sequence and the b oundedness of the pro cess ( x n , λ n ) n hold immediately . According to (4.2), M n is almost surely b ounded and can b e written as a martingale difference scheme plus a v anishing term. Observe that E ( U n +1 | F n ) = 0 and that U is n +1 ≤ C /λ is n , for some constant C . Then Lemma 4.3 implies that U n is almost surely b ounded. In view of Remark 2.3, assumption ( c ) of Theorem 2.2 holds for the noise term n = ( U n , M n ) and the conclusion follows. Let us define the function F : X → Q i R | S i | b y F is ( x ) = G i ( s, σ − i ( x )) . (4.7) Cominetti et al. [10] show that if the function F is contracting for the infinit y norm, then the pro cess (4.4) con verges almost surely to the unique rest p oint of the dynamics (4.6). The following result shows that the same holds for the pro cess (4.1) by adding a slighlty stronger assumption on the decision rule σ . Prop osition 4.5. Assume that F is c ontr acting for the infinity norm and that, for every i ∈ A , the function σ i is Lipschitz for the infinity norm. Then ther e exists a unique r est p oint ( x ∗ , σ ( x ∗ )) ∈ X × ∆ of (4.3) . F urthermor e, the set { ( x ∗ , σ ( x ∗ )) } is a glob al attr actor and the pr o c ess (4.1) c onver ges almost sur ely to ( x ∗ , σ ( x ∗ )) . 6 Pr o of. According to Remark 4.2, ( x ∗ , σ ( x ∗ )) ∈ X × ∆ is a rest p oin t of (4.3) if and only if F ( x ∗ ) = x ∗ , hence the existence and uniqueness follow from the fact that F is con tracting. Let 0 ≤ L < 1 and K i b e the Lipstc hitz constants asso ciated with the functions F and σ i , i ∈ A , resp ectiv ely . W e w ant to find a suitable strict Ly apunov function, i.e., a function V that decreases along the solution paths and that verifies V − 1 ( { 0 } ) = { ( x ∗ , λ ∗ ) } with λ ∗ = σ ( x ∗ ). Let V : X × ∆ → R + b e defined by V ( x, λ ) = max n k x − x ∗ k ∞ , 1 ζ k λ − λ ∗ k ∞ o , where ζ > 0 will b e defined later. F unction V is the maximum of a finite n umber of smo oth functions, therefore it is absolutely con tinuous and its deriv atives are the ev aluation of the deriv atives of the function attaining the maximum. W e distinguish tw o cases: Case 1. V ( x t , λ t ) = k x t − x ∗ k ∞ . Let i ∈ A and s ∈ S i b e suc h that V ( x t , λ t ) = | x is t − x is ∗ | . Let us assume that x is t − x is ∗ ≥ 0. Then, for almost all t ∈ R , d dt V ( x t , λ t ) = d dt ( x is t − x is ∗ ) = σ is ( x i t ) λ is t F is ( x t ) − F is ( x ∗ ) + x is ∗ − x is t , ≤ − ξ (1 − L ) k x t − x ∗ k ∞ = − ξ (1 − L ) V ( x t , λ t ) , for some ξ > 0 suc h that σ is ( x ) ≥ ξ for ev ery i ∈ A and s ∈ S i . If x is t − x is ∗ < 0, the computations are analogous. Case 2. V ( x t , λ t ) = 1 ζ k λ t − x ∗ k ∞ . Let i ∈ A and s ∈ S i b e such that V ( x t , λ t ) = 1 ζ | λ j r t − λ j r ∗ | . W e also assume that λ j r t − λ j r ∗ ≥ 0. Then, for almost all t ∈ R , d dt V ( x t , λ t ) = 1 ζ σ j r ( x j t ) − σ j r ( x ∗ ) + λ j r ∗ − λ j r t ≤ − 1 ζ k λ t − λ ∗ k ∞ + 1 ζ | σ j r ( x j t ) − σ j r ( x ∗ ) | ≤ − V ( x t , λ t ) + max i K i ζ k x t − x ∗ k ∞ = − (1 − max i K i ζ ) V ( x t , λ t ) , and we take ζ > 0 sufficiently large to hav e 1 > max i K i /ζ . Again, if the relation λ j r t − λ j r ∗ < 0 holds, the computations are the same. Hence V ( x t , λ t ) ≤ − K V ( x t , λ t ) for some K > 0. So V decreases exp onentially fast along the solution paths of the dynamics and V ( x, λ ) = 0 if and only if ( x, λ ) = ( x ∗ , λ ∗ ). Therefore the set { ( x ∗ , λ ∗ ) } is a global attractor which is the unique ICT set for (4.3) (see [3, Corollary 5.4]). Prop osition 4.4 finishes the proof. 5 Logit rule The Logit rule is broadly based on the field of discrete c hoice models as w ell as game theory . F or instance, a mo del of learning in games where the logit function is used is given by the logit-resp onse dynamics [5, 1]. In this mo del, the aim is to study the sto c hastic stabilit y states of the induced pro cess along with equilibrium selection issues (see [22] for a pay off-based implementation of this dynamics and related mo dels). Explicitly , the decision rule σ : X → ∆ is given b y σ is ( x i ) = exp ( β i x is ) P r ∈ S i exp ( β i x ir ) , (5.1) for ev ery i ∈ A and s ∈ S i , where β i > 0 is called the smo othing parameter for pla yer i . According to Remark 4.2, the follo wing result shows that the rest points of the dynamics (4.3) are the Nash equilibria for an entrop y p erturbed version of the original game (see Cominetti et al. [10]). 7 Lemma 5.1. Under the L o git de cision rule (5.1) , if x ∈ X is a r est p oint of the dynamics (4.6) , then σ ( x ) is a Nash e quilibrium of a game wher e the action set for e ach player i is ∆ i and her p ayoff G i : ∆ → R is given by G i ( π ) = X s ∈ S i π is G i ( s, π − i ) − 1 β i X s ∈ S i π is ln( π is ) − 1 . (5.2) 5.1 Almost sure con vergence W e w ant to apply Prop osition 4.5 within this framework. F or that purp ose, let us in tro duce the maximum unilateral deviation pay off that a single play er can exp erience, η = max i ∈ A,s ∈ S i r 1 ,r 2 ∈ ˜ S − i | G i ( s, r 1 ) − G i ( s, r 2 ) | , (5.3) where ˜ S − i = { ( r 1 , r 2 ) ∈ S − i × S − i ; r k 1 6 = r k 2 for exactly one k } . Now the follo wing prop osition ensures that, if the parameters are sufficiently small, the unique attractor is attained with probability one. F rom now on, w e denote α = max i ∈ A P j 6 = i β j . Prop osition 5.2. If 2 η α < 1 , the discr ete pr o c ess (4.1) c onver ges almost sur ely to the unique r est p oint ( x ∗ , σ ( x ∗ )) of the dynamics (4.3) . Pr o of. W e know from Cominetti et al. [10, Prop osition 5] that, if 2 η α < 1, function F (defined in (4.7)) is con tracting for the infinity norm. Observe also that, for every i ∈ A , function σ i is Lipschitz for the infinit y norm, since it is a smo oth function defined on a compact set. Therefore, Prop osition 4.5 applies. Rate of Conv ergence Up to this p oin t, w e w ere able to repro duce some of the theoretical results of the original mo del (4.4) regarding its almost sure con vergence to global attractors. Now, we wan t to justify the inclusion of a coun ter to the previous actions in terms of the rate of con vergence when b oth learning pro cesses (4.1) and (4.4) conv erge almost surely to ( x ∗ , λ ∗ ) and x ∗ , resp ectiv ely , and step size γ n = 1 n is considered. This rate of con vergence is closely linked to the largest real part eigen v alue of the Jacobian matrix of the functions Ψ = (Ψ x , Ψ λ ) and Φ at the resp ectiv e rest p oin ts. Let us denote ρ ( B ) the maximum real part of the eigenv alues of a matrix B ∈ R k × k , i.e., ρ ( B ) = max { Re( µ j ); j = 1 , . . . , k , where µ j ∈ C is an eigenv alue of the matrix B } . W e sa y that a matrix B is stable if ρ ( B ) < 0. Lemma 5.3. Assume that 2 η α < 1 . L et ( x ∗ , λ ∗ ) and x ∗ b e the unique r est p oints of the dynamics (4.3) and (4.6) , r esp e ctively. Then − 1 ≤ ρ ( ∇ Ψ( x ∗ , λ ∗ )) < − 1 2 ≤ − N P k ∈ A | S k | ≤ ρ ( ∇ Φ( x ∗ )) < 0 . (5.4) Pr o of. Straigh tforward computations concerning the function Ψ (see (4.3)) show that ∂ Ψ is λ ∂ x j r ( x ∗ , λ ∗ ) = 0 and ∂ Ψ is λ ∂ λ j r ( x ∗ , λ ∗ ) = − 1 { is = j r } , for every i, j ∈ A and ( s, r ) ∈ S i × S j . Therefore, the matrix ∇ Ψ( x ∗ , λ ∗ ) lo oks lik e ∇ Ψ( x ∗ , λ ∗ ) = ∇ x Ψ x ( x ∗ , λ ∗ ) 0 L − I , (5.5) 8 where I stands for the iden tity matrix and ∇ x Ψ x ( x ∗ , λ ∗ ) denotes the Jacobian matrix of Ψ x with respect to x at ( x ∗ , λ ∗ ). Notice that the interesting eigen v alues of this matrix are giv en by its upper-left blo c k b ecause of the zero blo c k and the iden tity matrix on the right side in (5.5). Observe also that ∂ Ψ is x ∂ x is ( x ∗ , λ ∗ ) = − 1, i.e., matrix ∇ x Ψ x ( x ∗ , λ ∗ ) has diagonal terms equal to − 1. On the other hand, w e kno w that ev ery eigen v alue of a complex matrix B = ( B pq ) lies within at least one of the Gershgorin discs D p ( B ) = { z ∈ C , | z − B pp | ≤ R p } where R p = P q 6 = p |B pq | . Given the specific form of matrix ∇ x Ψ( x ∗ , λ ∗ ) we can estimate the p osition of its eigen v alues. So, in our case, R is = X j ∈ A, j 6 = i X r ∈ S j ∂ Ψ is x ∂ x j r ( x ∗ , λ ∗ ) , since ∂ Ψ is x ∂ x j r ( x ∗ , λ ∗ ) = 0 if i = j and r 6 = s . This follo ws from the fact that F is ( x ) (defined in (4.7)) is indep enden t of the vector x i . Explicitly , ∂ Ψ is x ∂ x j r ( x ∗ , λ ∗ ) = β j σ j r ∗ G i ( s, r, σ − ( i,j ) ∗ ) − G i ( s, σ − i ∗ ) , where G i ( s, r, σ − ( i,j ) ∗ ) = X a ∈ S − i a j = r G i ( s, a ) Y k 6 = i k 6 = j σ ka k ∗ , for i 6 = j . So that R is = X j ∈ A j 6 = i β j X r ∈ S j σ j r ∗ G i ( s, r, σ − ( i,j ) ∗ ) − G i ( s, σ − i ∗ ) ≤ η α. Then we hav e that all the eigenv alues of matrix ∇ x Ψ x ( x ∗ , λ ∗ ) are contained in the complex disc { z ∈ C , | z + 1 | ≤ η α } ⊇ [ i ∈ A s ∈ S i D is ( ∇ x Ψ x ( x ∗ , λ ∗ )) , (5.6) whic h implies that ρ ( ∇ Ψ( x ∗ , λ ∗ )) < − 1 / 2. Analogous computations inv olving function Φ show that D is ( ∇ Φ( x ∗ )) ⊆ { z ∈ C , | z + σ is ∗ | ≤ σ is ∗ η α } , for every i ∈ A and s ∈ S i . Since − σ is ∗ + σ is ∗ η α < 0, then ρ ( ∇ Φ( x ∗ )) < 0. It is obvious that − 1 ≤ ρ ( ∇ Ψ( x ∗ , λ ∗ )). Inequality − N / P k | S k | ≤ ρ ( ∇ Φ( x ∗ )) follows since the trace of matrix ∇ Φ( x ∗ ) is equal to − N . Remark 5.4. Notice that 1 / 2 = N / P k | S k | if and only if | S k | = 2 for all k ∈ A . The following reduced v ersion of Chen [9, Theorem 3.1.1] will b e useful. Theorem 5.5. Consider the discr ete pr o c ess given by (2.1) . Assume that the fol lowing hold. ( a ) F or every n ∈ N , γ n > 0 , lim n → + ∞ γ n = 0 , P n γ n = + ∞ and lim n → + ∞ γ n − γ n +1 γ n +1 γ n = γ ≥ 0 . ( b ) z n → z 0 almost sur ely. 9 ( c ) Ther e exists δ ∈ (0 , 1] such that ( c. 1) for a p ath such that z n → z 0 , the noise V n c an b e de c omp ose d into V n = V 0 n + V 00 n wher e X n ≥ 1 γ 1 − δ n V 0 n +1 < + ∞ and V 00 n = O ( γ δ n ) , ( c. 2) the function H is lo c al ly b ounde d and is differ entiable at z 0 such that H ( z ) = H ( z − z 0 ) + r ( z ) wher e r ( z 0 ) = 0 and r ( z ) = o ( k z − z 0 k ) as z → z 0 and ( c. 2) the matrix H is stable and, furthermor e, H + δ γ I is also stable. Then, almost sur ely, n ( z n − z 0 ) → 0 , as n → + ∞ , for any n = o ((1 /γ n ) δ ) . The previous result allows us to show that our algorithm is faster. This means that, under the common h yp othesis 2 η α < 1 (whic h ensures almost sure conv ergence for b oth pro cesses), emplo ying the adjusted pro cess (4.1) will help the pla yers to adapt their b eha vior faster than with the original pro cess (4.4). Prop osition 5.6. Assume that 2 η α < 1 and let ( x ∗ , λ ∗ ) ∈ X × ∆ and x ∗ ∈ X b e the unique r est p oints of dynamics (4.3) and (4.6) , r esp e ctively. Then the fol lowing estimates hold ( i ) for almost al l tr aje ctories of (4.4) ε n ( x n − x ∗ ) → 0 , as n → + ∞ , for every se quenc e ε n = o ( n | ρ ( ∇ Φ( x ∗ )) | ) , ( ii ) for almost al l tr aje ctories of (4.1) ε n ( x n , λ n ) − ( x ∗ , λ ∗ ) → 0 , as n → + ∞ , for every se quenc e ε n = o ( n 1 2 ) . Pr o of. Recall that n = ( U n , M n ) and ˜ U n are the noise terms associated with (4.1) and (4.4), respectively (see (4.2) and (4.5)). W e observe that, for b oth pro cesses, hypotheses ( a ) and ( b ) in Theorem 5.5 are immediately satisfied, since γ n = 1 n , (with γ = 1) and since Prop osition 5.2 applies. Let us verify that condition ( c ) holds. ( i ) Fix δ ∈ (0 , | ρ ( ∇ Φ( x ∗ )) | ). The random pro cess ( ˜ U n ) n is almost surely bounded and satisfies that E ( ˜ U n +1 | F n ) = 0. Therefore, Z n = P n k =1 (1 /k ) 1 − δ ˜ U k +1 is a martingale where sup n k Z n k 2 < P + ∞ k =1 (1 /k ) 2(1 − δ ) < + ∞ , and thus con vergen t (since δ < 1 / 2). T o conclude, observe that function Φ is smo oth and that matrix ∇ Φ( x ∗ ) + δ I is stable. ( ii ) Fix δ ∈ (0 , 1 / 2). W e rep eat the argument by noting that n = ˜ n + ˜ b n where ˜ b n = O (1 /n ) and E (˜ n +1 | F n ) = 0. T o finish, we use the fact that matrix ∇ Ψ( x ∗ , λ ∗ ) + δ I is stable since inequalit y (5.4) holds. 10 Tw o imp ortan t comments are in order. First, as b efore, let us call C n the upp er-left blo c k of the matrix E ( T n +1 n +1 | F n ), where, C is,j r n = 0 if i 6 = j, − σ is ( x i n ) λ is n ( G i ( s, σ − i ( x n )) − x is n ) · σ ir ( x i n ) λ ir n ( G i ( r , σ − i ( x n )) − x ir n ) + O 1 n if i = j, s 6 = r , σ is n ( λ is n ) 2 E (( G i ( s, s − i n +1 ) − x is n ) 2 | F n ) − σ is n ( G i ( s, σ − i n ) − x is n ) 2 + O 1 n otherwise . Giv en that the vector of probabilities σ n con verges, C n con verges almost surely to a deterministic matrix C (whic h is diagonal since C is,j r n → 0 when i = j and s 6 = r ). Moreov er, C is p ositiv e definite since C is,j r n = σ is n ( λ is n ) 2 E (( G i ( s, s − i n +1 )) 2 | F n ) − σ is n ( G i ( s, σ − i n )) 2 + σ is n (1 − σ is n ) x is n ( x is n − G i ( s, σ is n )) + O 1 n , and therefore C is,j r = 1 σ is ∗ " X s − i ∈ S − i ( G i ( s, s − i )) 2 σ − i ∗ ( s − i ) − σ is ∗ ( G i ( s, σ − i ∗ )) 2 # > 1 σ is ∗ " X s − i ∈ S − i ( G i ( s, s − i )) 2 σ − i ∗ ( s − i ) − ( G i ( s, σ − i ∗ )) 2 # ≥ 0 , from the fact that 0 < σ is ∗ < 1 and the con vexit y of x 2 . Hence we can conclude that E ( T n +1 n +1 | F n ) con- v erges to a p ositiv e definite deterministic matrix and that √ n (( x n , λ n ) − ( x ∗ , λ ∗ )) conv erges in distribution to a normal random v ariable (see e.g.. [9, Theorem 3.3.2]). F or (4.4), considering the con tinuous function C ( x ) = E ( ˜ U T n +1 ˜ U n +1 | x n = x ) and sligh tly modifying the pro of of [12, Theorem 2.2.12], it can b e shown that n | ρ ( ∇ Φ( x ∗ )) | ( x n − x ∗ ) conv erges almost surely to a finite random v ariable if 0 < | ρ ( ∇ Φ( x ∗ )) | < 1 / 2. F or instance, considering the game defined by (5.7), w e hav e that | ρ ( ∇ Φ( x ∗ )) | ≈ 0 . 3. Figure 1 depicts the results of a numerical experience in this particular example where 2 η α = 0 . 8. Second, observe that a b etter rate can b e achiev ed for (4.4) if the step size is given b y γ n = a n for a > | ρ ( ∇ Ψ( x ∗ )) | . This leads to the rate o ( n − δ ) for all δ ∈ (0 , 1 / 2). Ho wev er it is somewhat unrealistic to assume that the play ers hav e this information in adv ance. Nevertheless, w e alwa ys hav e that | ρ ( ∇ Φ( x ∗ )) | < | ρ ( ∇ Ψ( x ∗ , λ ∗ )) | and thus the sc heme (4.1) can reac h at least the same path-wise rate of conv ergence under the hypotheses of Prop osition 5.6 and regardless of the step size considered. (0 , 0) (1 , 0) (0 , 1) (0 , 1) (0 , 0) (1 , 0) (1 , 0) (0 , 1) (0 , 0) (5.7) 5.2 Con v ergence with p ositiv e probability W e use the estimates giv en b y Lemma 5.3 to extend the range of parameters where general conv ergence results can b e obtained for the pro cess (4.1). W e start b y sho wing that there exists a unique rest p oin t of (4.3) whic h is stable if 1 ≤ 2 ηα < 2. Let Y ⊆ X × ∆ b e the set of rest p oin ts of (4.3) and let B ( A ) b e the basin of attraction corresp onding to an attractor A . Prop osition 5.7. Assume that 1 ≤ 2 η α < 2 . Then, ther e exists a unique r est p oint ( x ∗ , λ ∗ ) for the dynamics (4.3) which is an attr actor. Pr o of. Let ( x ∗ , λ ∗ ) ∈ Y . If 1 ≤ 2 η α < 2, equation (5.6) sho ws that matrix ∇ Ψ( x ∗ , λ ∗ ) is stable. T o prov e that { ( x ∗ , λ ∗ ) } is an attractor, take V ( x, λ ) = (( x, λ ) − ( x ∗ , λ ∗ )) T D (( x, λ ) − ( x ∗ , λ ∗ )) as a (lo cal) Lyapuno v 11 0 0.005 0.01 0.015 0.02 0.025 0.03 0 5000 10000 15000 20000 25000 || x n x ⇤ || 2 || ( x n , n ) ( x ⇤ , ⇤ ) || 2 1 1 Figure 1: k ( x n , λ n ) − ( x ∗ , λ ∗ ) k 2 v ersus k x n − x ∗ k 2 . function where, for instance, D is the positive definite solution of the Lyapuno v equation ∇ Ψ( x ∗ , λ ∗ ) T D + D ∇ Ψ( x ∗ , λ ∗ ) = − I . Given the fact that basins of attraction cannot ov erlap, Y is finite since X × ∆ is compact and Ψ is regular. Finally , Y reduces to one p oint since, in this case, it is impossible to hav e finitely man y stable equilibibria due to the Poincar ´ e–Hopf Theorem (see e.g., Milnor [23, Chapter 6]). The following definition is crucial to ensure conv ergence with positive probability of the process ( x n , λ n ) n to a given (not necessarily global) attractor. Definition 5.8. L et ( z n ) n b e a discr ete sto chastic pr o c ess with state sp ac e Z . A p oint z ∈ Z is attainable by ( z n ) n if for e ach m ∈ N and every op en neighb orho o d U of z , P ( ∃ n ≥ m, z n ∈ U ) > 0 . The following lemma relies on the particular form of the up dating rule (APD) considered in this work. Lemma 5.9. Fix λ = ( λ 1 , . . . , λ N ) ∈ ∆ . Set x i ∈ R | S i | such that x is = G i ( s, λ − i ) for al l s ∈ S i and set x = ( x 1 , . . . , x N ) ∈ X . Then, ( x, λ ) ∈ X × ∆ is attainable by the pr o c ess ( x n , λ n ) n . In p articular, any r est p oint of (4.3) is attainable. Pr o of. The fact that σ is n ≥ ξ > 0 for every i ∈ A , s ∈ S i and n ∈ N implies that an y finite sequence generated b y (4.1) has positive probabilit y . The updating rule (APD) can b e expressed almost surely , for n sufficiently large, as x is n +1 = 1 θ is n g i υ is ( θ is n ) + g i υ is ( θ is n − 1) + · · · + g i υ is (1) + x is 0 + O 1 n , (5.8) where υ is ( k ) = inf { q ≥ 1 , θ is q = k } , i.e., the stage when play er i has play ed s ∈ S i for the k -th time. Observe that w e can assume that m = 0 in the definition of attainability . Let ζ s n b e the num b er of times that the action profile s ∈ S has b een pla yed up to time n . Hence , for ev ery i ∈ A and s ∈ S i , (5.8) implies that x is n +1 = X r ∈ S − i G i ( s, r ) ζ ( s, r ) n θ is n + b n , with b n = O ( θ is n ) − 1 . Observe that θ is n → + ∞ almost surely , due to the conditional Borel–Cantelli lemma. Fix ε > 0 and let n b e an integer such that k i s = n ˜ k i s ∈ N , where, for every i ∈ A and s ∈ S i , ˜ k i s denotes a rational n umber satisfying that | λ is − ˜ k i s | < ε . F or a profile s ∈ S , let us define the p ositiv e integers n s = Q i ∈ A k i s i and n = P s ∈ S n s . No w we take the sequence generated by (4.1) defined by l ∈ N blo c ks of 12 size n where within each blo c k, each s ∈ S is pla yed exactly n s times, regardless of the order of pla y . Fix i ∈ A and r ∈ S − i , so that, by construction ζ ( s, r ) ln θ is ln = Q j 6 = i k j r j k i s k i s P u ∈ S − i Q j 6 = i k j u j = Y j 6 = i λ j r j + ˜ b ε , where ˜ b ε → 0 as ε → 0. Finally , given ε 0 > 0, set l large and ε small to ha ve k ( x ln +1 , λ ln +1 ) − ( x, λ ) k < ε 0 . Recall that L ( z n ) is the limit set of sequence ( z n ) n . The following result is the goal of this subsection. Prop osition 5.10. If an attr actor A for the dynamics (4.3) satisfies that B ( A ) ∩ Y 6 = ∅ , then P ( L ( x n , λ n ) ⊆ A ) > 0 . In p articular, under the L o git de cision rule (5.1) , if 1 ≤ 2 η α < 2 , then Y r e duc es to one p oint ( x ∗ , λ ∗ ) and P (( x n , λ n ) → ( x ∗ , λ ∗ )) > 0 . Before providing the pro of we need, to briefly introduce the follo wing concepts. Let φ b e the semi-flow induced by the differential equation (4.3) and Y t the contin uous time affine pro cess asso ciated with the discrete pro cess ( x n , λ n ) n , i.e., Y ( τ n + u ) = ( x n , λ n ) + u ( x n +1 , λ n +1 ) − ( x n , λ n ) τ n +1 − τ n , (5.9) for all n ∈ N and u ∈ [0 , 1 n +1 ), where τ n = P n m =1 1 m . Let ( F t ) t ≥ 0 b e the natural associated filtration. The follo wing technical lemma is no w needed. W e omit the pro of b ecause we k eep strictly to the lines of Bena ¨ ım [3, Prop osition 4.1] along with the explicit computations provided in the pro of of Schreiber [29, Theorem 2.6]. Lemma 5.11. F or al l T > 0 and δ > 0 , P sup u ≥ t sup 0 ≤ h ≤ T k Y ( u + h ) − φ h ( Y ( u )) k ≥ δ | F t ≤ C ( δ, T ) exp( ct ) , for some p ositive c onstants c and C ( δ, T ) when t ≥ 0 is lar ge enough. Pr o of of Pr op osition 5.10. In view of Prop osition 5.7 and Lemmas 5.9 and 5.11 the result follo ws directly from Bena ¨ ım [3, Theorem 7.3]. Note that for Lemma 5.9 and for the first part of the statement in Proposition 5.10, we ha ve only assumed condition (A) on decision rule σ . The follo wing example shows that the first part of Prop osition 5.10 is interesting in its o wn righ t. Consider the 2-play er zero-sum game defined by the pa yoff G = 0 − 1 1 0 . (5.10) Let ( x ∗ , σ ( x ∗ )), with σ 1 ( x 1 ∗ ) = σ 2 ( x 2 ∗ ) = (1 / (1 + e β ) , e β / (1 + e β )) and x 1 ∗ = x 2 ∗ = ( − e β / (1 + e β ) , 1 / (1 + e β )), b e the unique rest p oint of (4.3). In this case, ev ery eigenv alue of ∇ Ψ( x ∗ , σ ( x ∗ )) is equal to -1. Then P (( x n , λ n ) → ( x ∗ , σ ( x ∗ ))) > 0 for all β > 0. Remark 5.12. Observe that, for any zero-sum game, there exists a unique equilibrium. It is exactly the same pro of as in [18, Theorem 3.2], since if ( x, λ ) is a rest p oin t of (4.3), then λ is the unique rest p oint of the p erturbed b est resp onse dynamics studied. 13 A traffic game The (almost sure or with positive probability) con vergence to attractor results obtained when the Logit decision rule is considered are v alid under the strong assumption 2 η α < 2. In fact, this condition becomes v ery difficult to verify as the num b er of pla yers increases. Moreov er, noncon vergence can occur for some games (see Section 5.3 for details) if parameter η α is large. In this part, we will discuss the interesting application dev elop ed in Cominetti et al. [10, Section 3] and we will show that a result in the spirit of Prop osition 5.10 can b e obtained under a m uch w eaker condition. Consider a net work with a top ology that consists of a set of parallel routes. Each route r ∈ R in the net work is characterized by an increasing sequence of v alues c r 1 ≤ · · · ≤ c r N where c r u represen ts the av erage tra vel time when r carries a load of u users. The traffic game is defined as follo ws. The action set is common to all play ers, i.e., S i = R , for every i ∈ A with R the set of a v ailable routes. The pay off to each play er i , when action profile r ∈ R N is play ed (i.e., when the netw ork is loaded b y the configuration r ), is given by − c r i u = G i ( r ), that is, minus her tra vel time. This traffic game is sho wn to b e a potential game in the sense that there exists a function Λ : [0 , 1] N ×|R| → R such that ∂ Λ ∂ λ is ( λ ) = G i ( s, λ − i ) , for every λ ∈ ∆. Explicitly , the function Λ is given by Λ( π ) = − E π X r ∈R U r X u =1 c r u , (5.11) where the exp ectation is taken with resp ect to random v ariables U r = P i ∈ A X ir with X ir indep enden t Bernouilli v ariables suc h that P ( X ir = 1) = π ir . It is also shown that the second deriv atives of Λ are zero except for ∂ 2 Λ ∂ π j r ∂ π ir ( π ) = E π c r U r ij +1 − c r U r ij +2 ∈ [ − η , 0] , (5.12) i 6 = j , where U r ij = P k 6 = i,j X kr . Notice that this notion do es not corresp ond to the standard Monderer and Shapley’s [24] notion of a p oten tial game. W e supp ose that the smo othing parameters are iden tical for all play ers, i.e., β i = β for every i ∈ A . Note that, in this framework, η (defined in (5.3)) translates to η = max { η r u ; r ∈ R , 2 ≤ u ≤ N } = max { c r u − c r u − 1 ; r ∈ R , 2 ≤ u ≤ N } . (5.13) Cominetti et al. [10] obtain the following result. Prop osition 5.13. If η β < 1 , then (4.6) has a unique r est p oint x ∗ ∈ X which is symmetric in the sense that x ∗ = ( ˆ x, . . . , ˆ x ) . F urthermor e, { x ∗ } is an attr actor for (4.6) . Remark 5.14. The strong requiremen t (also for the model in [10]) on the smoothing parameter in order to ensure uniqueness of equilibrium, can make the prediction of the mo del very different from the set of Nash equilibria of the stage game. F or instance, this is the case in the tw o-play er congestion game with tw o links represen ted by the matrix G = − 2 − 1 − 1 − 3 , where there are t wo strict equilibria (one play er on each route) and one symmetric mixed Nash equilibrium ˆ σ = (2 / 3 , 1 / 3). In this particular case, we can chec k n umerically that if β ≤ β ∗ = 0 . 99 then there exists a unique equilibrium p oint ( σ ∗ , x ∗ ) of our mo del. Observ e that, naturally , this range is larger than the one derived from our general result β < η − 1 = 0 . 5. When taking the v alue β ∗ , w e ha ve that σ 1 ∗ = σ 2 ∗ = (0 . 5709 , 0 . 4290), which is far from the Nash equilibrium ˆ σ . The previous prop osition provides a muc h w eaker condition for the existence and uniqueness of a rest p oin t of (4.6). Observe also that, despite the fact that the second part yields the existence of an attractor, no 14 con vergence result is obtained for the discrete pro cess (4.4). The next result shows that, under the assumption η β < 1, an additional result can b e obtained for (4.1). Prop osition 5.15. If ηβ < 1 , (4.3) has a unique r est p oint ( x ∗ , λ ∗ ) ∈ X × ∆ which is symmetric in the sense that x ∗ = ( ˆ x, . . . , ˆ x ) and λ ∗ = ( ˆ λ, . . . , ˆ λ ) = σ ( x ∗ ) . F urthermor e, { ( x ∗ , λ ∗ ) } is an attr actor for (4.3) and P (( x n , λ n ) → ( x ∗ , λ ∗ )) > 0 . Pr o of. The existence and uniqueness of the symmetric rest p oint of (4.3) follows from Remark 4.2 and Prop osition 5.13. The rest of the proof (b elo w) shows that matrix ∇ Ψ( x ∗ , λ ∗ ) is stable. Hence, { ( x ∗ , λ ∗ ) } is an attractor for (4.3) and Prop osition 5.10 applies. Recall that J β = ∇ x Ψ x ( x ∗ , λ ∗ ) is the upper-left blo c k of matrix ∇ Ψ( x ∗ , λ ∗ ) (see (5.5)). Observ e that, from the definition of Ψ x , the fact that σ i dep ends only on x i and (5.11), the entries of J β are given by J is,j r β = X k ∈ A X r 0 ∈R ∂ 2 Λ ∂ π kr 0 ∂ π is ( λ ∗ ) ∂ σ kr 0 ∂ x j r ( x ∗ ) − 1 { is = j r } = X r 0 ∈R ∂ 2 Λ ∂ π j r 0 ∂ π is ( λ ∗ ) ∂ σ j r 0 ∂ x j r ( x ∗ ) − 1 { is = j r } = β λ j r ∗ (1 − λ j r ∗ ) E λ ∗ c r U r ij +1 − c r U r ij +2 1 { s = r,i 6 = j } − 1 { is = j r } . (5.14) Since λ ∗ is symmetric ( λ ir = λ j r , for all i, j ∈ A ), J β is a symmetric matrix. Let us show that J β is negativ e definite by mo difying the trick used in Cominetti et al. [10, Prop osition 12]. T ak e h ∈ R N |R| \{ 0 } , then, from (5.14), h T J β h = X r ∈R β X i 6 = j h ir p λ ir ∗ (1 − λ ir ∗ ) h j r q λ j r ∗ (1 − λ j r ∗ ) E λ ∗ c r U r ij +1 − c r U r ij +2 − X i ( h ir ) 2 . F or ev ery i ∈ A and r ∈ R , put v ir = h ir q 1 − λ ir ∗ λ ir ∗ , Z ir = v ir X ir and set η r 0 = η r 1 = 0. Therefore, h T J β h = X r ∈R β X i 6 = j v ir v j r λ ir ∗ λ j r ∗ E λ ∗ c r U r ij +1 − c r U r ij +2 − X i λ ir ∗ ( v ir ) 2 1 − λ ir ∗ = X r ∈R E λ ∗ β X i 6 = j Z ir Z j r ( c r U r − 1 − c r U r − X i ( Z ir ) 2 1 − λ ir ∗ ≤ X r ∈R E λ ∗ − η r U r β X i Z ir 2 + ( η r U r β − 1) X i ( Z ir ) 2 < 0 , where the last inequality follo ws by observing that η r U r ≤ η . 5.3 Noncon v ergence In order to giv e an idea of the behavior of the sto c hastic pro cess defined b y (4.1) when β (w e assume β i = β for all i ∈ A ) b ecomes large, we pro vide a small class of games which underlines the relev ance of the h yp otheses considered throughout this pap er. Consider a 2-play er symmetric game, i.e., the action set S = S 1 = S 2 is common for b oth pla yers and the pay offs verify that G 1 = ( G 2 ) T . Let us assume that G 1 has constant-sum b y ro w, which is, P r G 1 ( s, r ) = k ∈ R for every s ∈ S . It is easy to chec k that for this kind of game there exists a rest point of (4.3) whic h has the form ( x, σ ( x )) ∈ X × ∆ suc h that x i = (1 /k , . . . , 1 /k ) for i ∈ { 1 , 2 } . W e also assume that P s G 1 ( s, s ) 6 = k . A game that satisfies the preceding conditions is the go o d (resp. b ad ) Ro c k-Scissors-Paper game 0 a − b − b 0 a a − b 0 , 15 where 0 < b < a (resp. 0 < a < b ) or the game (5.7). The (strong) hypotheses ab o ve ensure that at least one rest p oin t of (4.3) does not dep end on the parameter β . In the follo wing we will easily sho w that if β is sufficiently large then the rest p oin t ( x, σ ( x )) b ecomes linearly unstable. Later, we will pro ve that this implies that P (( x n , λ n ) → ( x, σ ( x ))) = 0. Lemma 5.16. If β > 0 is sufficiently lar ge, then ther e exists an eigenvalue µ of ∇ Ψ( x, σ ( x )) such that Re( µ ) > 0 . Pr o of. Again, let J β = ∇ x Ψ x ( x, σ ( x )) be the upper-left blo c k of the Jacobian matrix of Ψ, whic h is the only relev ant part, ev aluated at ( x, σ ( x )). The precise expression for the entries of J β is J is,j r β = ∂ Ψ is x ∂ x j r ( x, σ ( x )) = − 1 , if i = j and s = r 0 , if i = j and s 6 = r β 1 | S | G i ( s, r ) − k | S | , otherwise, i, j ∈ { 1 , 2 } . Thus J β has the form − I J β J β − I , with J β ∈ R | S | × R | S | . Observe that w e can decomp ose J β as J β = β J − I , where J = 0 J J 0 . Let µ 1 , . . . , µ | S | ∈ C b e the eigenv alues of J (coun ting multiplicit y). Since we ha ve assumed that P s G 1 ( s, s ) 6 = k , the trace of J is not zero. Therefore, there exists some eigenv alue µ k , k ∈ { 1 , . . . | S |} , with a nonzero real part. W e hav e that, if v is an eigen vector asso ciated with µ k , then µ k is an eigenv alue of J with corresp onding eigen vector u = ( v, v ) ∈ R | S | × R | S | since J u = 0 J J 0 v v = J v J v = µ k u . If Re( µ k ) > 0, the pro of is finished. If Re( µ l ) ≤ 0 for all l ∈ { 1 , . . . | S |} then P l Re( µ l ) < 0. Also, the trace of J is zero and therefore there exists an eigenv alue µ of J (which is not an eigen v alue of J ) suc h that Re( µ ) > 0. Finally , observ e that det ( β J − µI ) = 1 β | S | det J − µ β I , (5.15) and it is straightforw ard from (5.15) that µ is an eigenv alue of the matrix J β if µ = (1 + µ ) /β is an eigenv alue of J . Then µ = β µ − 1 whose real part is strictly p ositiv e for a sufficiently large β . Prop osition 5.17. Ther e exists β > 0 lar ge enough and at le ast one r est p oint ( x, σ ( x )) ∈ X × ∆ of (4.3) such that, P (( x n , λ n ) → ( x, σ ( x ))) = 0 . Pr o of. W e can directly apply Brandi ` ere and Duflo [7, Theorem 1]. The hypotheses of the theorem concerning the contin uous dynamics and the step size of the discrete process (4.1) are immediately satisfied. The only condition that needs to be v erified if ho w pow erfully the noise is pro jected in a repulsive direction at ( x, σ ( x )). Explicitly , it is sufficient to prov e that lim inf n → + ∞ E ( || pr n +1 || 2 | F n ) > 0 a.s. on the even t Γ x, σ ( x ) = { ( x n , λ n ) → ( x, σ ( x )) } , (5.16) since the noise term n = ( U n , M n ) is almost surely b ounded. Here, the upp er-script pr stands for the pro jection onto the repulsive subspace spanned by the eigenv ectors asso ciated with the eigen v alues with a 16 p ositiv e real part. Fix i ∈ { 1 , 2 } , take β large to ha ve an eigenv alue µ of ∇ Ψ( x, σ ( x )) such that Re( µ ) > 0 and let v b e a correspondent (p ossibly generalized) eigenv ector. The vector v has the form v = ( v 1 , v 2 ). Note that, necessarily , v 2 6 = 0 since, if v 2 = 0, then v 1 is a vector of ones, which is indeed an eigen vector for the upp er-left blo c k of ∇ Ψ( x, σ ( x )) having -1 as the asso ciated eigen v alue. So that E ( || ε pr n +1 || 2 | F n ) ≥ E ( kh ε n +1 , v i v k 2 | F n ) ≥ c E (( M j r n +1 ) 2 | F n ) , with j = − i and for some r ∈ S and c > 0. In view of (4.2), E (( M j r n +1 ) 2 | F n ) = E (( 1 { s j n +1 = r } − σ j r ( x j n )) 2 | F n ) + O 1 n = σ j r ( x j n )(1 − σ j r ( x j n )) + O 1 n . T o conclude, take the lim inf n in the previous expression on the even t Γ( x, σ ( x )) to conclude that (5.16) holds, since σ is is b ounded a wa y from zero for every i ∈ { 1 , 2 } and s ∈ S . As observ ed b y P emantle [25], nonconv ergence results like the previous prop osition are not v ery in teresting if the set of unstable p oin ts is to o lar ge . The most useful consequences can b e stated when this set is finite, as in our example (5.7); moreov er, it is easy to chec k that ( x, σ ( x )) is the unique rest p oin t of (4.3) for all β > 0. The previous result shows that, for a large β , ( x, σ ( x )) has probability zero of b eing the limit of the pro cess, while for small β it is almost surely the limit. More precisely , we ha ve that ρ ( ∇ Ψ( x, σ ( x ))) > 0 if β > 3. Note that, since in this particular case the equilibrium p oin t is known, we can sho w that ( x, σ ( x )) is stable if 2 η α = 2 β < 6, i.e. if β < 3. Therefore, by using Proposition 5.10, we can fully characterize the b eha vior of the pro cess in this case (except for the case where β = 3). Sim ulations suggest that there is a cycle that attracts the tra jectories and that the empirical frequencies of pla y still conv erge to σ ( x ), when β is large (see Figure 2). 0 0. 005 0. 01 0. 015 0. 02 0. 025 0. 03 0 5000 10000 15000 20000 25000 || x n x ⇤ || 2 || ( x n , n ) ( x ⇤ , ⇤ ) || 2 1 1 Figure 2: The mixed action σ 1 n of Play er 1 when β = 4. Finally , note that the same analysis will not work for a general class of games (for instance zero-sum games, as shown by the game giv en b y Equation (5.10)). Nev ertheless, similar analysis can be applied to cases where the game has a unique equilibrium which is kno wn to b e unstable. See, for instance, [28, Chapter 9], where this type of study is applied to some of the most w ell-known dynamics. Ac kno wledgements I am deeply indebted to Sylv ain Sorin for bringing this problem to my attention and also Michel Bena ¨ ım, Rob erto Cominetti, and Mathieu F aure for v ery helpful discussions and commen ts. The developmen t of this pro ject was partially funded b y F ondecyt grant No. 3130732, the N ´ ucleo Milenio Informaci´ on y Co ordi- naci´ on en Redes ICM/FIC R C130003 and by the Complex Engineering Systems Institute (ICM: P-05-004-F, CONICYT: FBO16). References [1] C. Al´ os-F errer and N. Netzer, The lo git-r esp onse dynamics , Games Econ. Behav. 68 (2010), 413–427. 17 [2] A. W. Beggs, On the c onver genc e of r einfor c ement le arning , J. Econom. Theory 122 (2005), 1–36. [3] M. Bena ¨ ım, Dynamics of sto chastic appr oximation algorithms , S´ eminaire de Probabilit´ es, XXXII I, Lec- ture Notes in Math., vol. 1709, Springer-V erlag, Berlin, 1999, pp. 1–68. [4] A. Benv eniste, M. M´ etivier, and P . Priouret, A daptive algorithms and sto chastic appr oximations , Springer-V erlag, Berlin, 1990. [5] L. Blume, The statistic al me chanics of str ate gic inter action , Games Econ. Behav. 5 (1993), 387–424. [6] T. B¨ orgers a nd R. Sarin, L e arning thr ough r einfor c ement and r eplic ator dynamics , J. Econom. Theory 77 (1997), 1–14. [7] O. Brandi` ere and M. Duflo, L es algorithmes sto chastiques c ontournent-ils les pi` eges? , Ann. Inst. H. P oincar´ e Probab. Statist. 32 (1996), 395–427. [8] G. W. Brown, Iter ative solution of games by fictitious play , Activit y Analysis of Production and Allo ca- tion, John Wiley & Sons Inc., New Y ork, N. Y., 1951, pp. 374–376. [9] H. F. Chen, Sto chastic appr oximation and its applic ations , Kluw er Academic Publishers, Dordrech t, 2002. [10] R. Cominetti, E. Melo, and S. Sorin, A p ayoff-b ase d le arning pr o c e dur e and its applic ation to tr affic games , Games Econ. Behav. 70 (2010), 71–83. [11] C. Conley , Isolate d invariant sets and the Morse index , American Mathematical So ciet y , Providence, R.I., 1978. [12] M. Duflo, R andom iter ative mo dels , Applications of Mathematics (New Y ork), vol. 34, Springer-V erlag, Berlin, 1997. [13] I. Erev and A. E. Roth, Pr e dicting how p e ople play games: R einfor c ement le arning in exp erimental games with unique, mixe d str ate gy e quilibria , Amer. Econ. Rev. 88 (1998), 848–81. [14] Y. F reund and R. E. Schapire, A daptive game playing using multiplic ative weights , Games Econ. Behav. 29 (1999), 79–103. [15] D. F uden b erg and D. K. Levine, The the ory of le arning in games , MIT Press, Cambridge, MA, 1998. [16] S. Hart and A. Mas-Colell, A simple adaptive pr o c e dur e le ading to c orr elate d e quilibrium , Econometrica 68 (2000), 1127–1150. [17] , A r einfor c ement pr o c e dur e le ading to c orr elate d e quilibrium , Economics essays: A F estschrift for W erner Hildebrand, Springer, Berlin, 2001, pp. 181–200. [18] J. Hofbauer and E. Hopkins, L e arning in p erturb e d asymmetric games , Games Econ. Behav. 52 (2005), 133–152. [19] H. J. Kushner and G. Yin, Sto chastic appr oximation and r e cursive algorithms and applic ations , Springer- V erlag, New Y ork, 2003. [20] J. F. Laslier, R. T op ol, and B. W alliser, A b ehavior al le arning pr o c ess in games , Games Econ. Beha v. 37 (2001), 340–366. [21] D. S. Leslie and E. J. Collins, Individual Q -le arning in normal form games , SIAM J. Con trol Optim. 44 (2005), 495–514. [22] J. R. Marden and J. S. Shamma, R evisiting lo g-line ar le arning: Asynchr ony, c ompleteness and p ayoff- b ase d implementation , Games Econ. Behav. 75 (2012), 788–808. 18 [23] J. W. Milnor, T op olo gy fr om the differ entiable viewp oint , Princeton Universit y Press, Princeton, NJ, 1997. [24] D. Monderer and L. S. Shapley , Potential games , Games Econ. Beha v. 14 (1996), 124–143. [25] R. Peman tle, Nonc onver genc e to unstable p oints in urn mo dels and sto chastic appr oximations , Ann. Probab. 18 (1990), 698–712. [26] M. P osch, Cycling in a sto chastic le arning algorithm for normal form games , J. Ev ol. Econ. 7 (1997), 193–207. [27] H. Robbins and S. Monro, A sto chastic appr oximation metho d , Ann. Math. Statistics 22 (1951), 400–407. [28] W. H. Sandholm, Population Games and Evolutionary Dynamics , MIT Press, Cam bridge, 2010. [29] S. J. Sc hreib er, Urn mo dels, r eplic ator pr o c esses, and r andom genetic drift , SIAM J. Appl. Math. 61 (2001), 2148–2167. [30] A. N. Shiryaev, Pr ob ability , Springer-V erlag, New Y ork, 1996. 19
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment