Global Convergence for Replicator Dynamics of Repeated Snowdrift Games
To understand the emergence and sustainment of cooperative behavior in interacting collectives, we perform global convergence analysis for replicator dynamics of a large, well-mixed population of individuals playing a repeated snowdrift game with fou…
Authors: Pouria Ramazi, Ming Cao
Global Con ver gence f or Replicator Dynamics of Repeated Sno wdrift Games Pouria Ramazi and Ming Cao Abstract —T o understand the emergence and sustainment of cooperative behavior in interacting collectiv es, we perform global con vergence analysis for replicator dynamics of a large, well- mixed population of individuals playing a r epeated snowdrift game with f our typical strategies, which are always cooperate (ALLC), tit-f or -tat (TFT), suspicious tit-f or -tat (STFT) and al- ways defect (ALLD). The dynamical model is a three-dimensional ODE system that is parameterized by the pay offs of the base game. Instead of routine searches for evolutionarily stable strate- gies and sets, we expand our analysis to determining the asymp- totic beha vior of solution trajectories starting fr om any initial state, and in particular show that f or the full range of payoffs, every trajectory of the system conv erges to an equilibrium point. What enables us to achieve such compr ehensive results is studying the dynamics of two ratios ¯ of the state variables, each of which either monotonically incr eases or decreases in the half-spaces separated by their corresponding planes. The conver gence results highlight three findings that are of particular importance f or understanding the cooperation mechanisms among self-interested agents playing repeated snowdrift games. First, the inclusion of TFT - and STFT -players, the two types of conditional strategy players in the game, increases the share of cooperators of the overall population compared to the situation when the population consists of only ALLC- and ALLD-players. This confirms findings in biology and sociology that recipr ocity may promote coopera- tion in social collective actions, such as reducing traffic jams and division of labors, where each individual may gain mor e to play the opposite of what her opponent chooses. Second, surprisingly enough, regardless of the pay offs, there always exists a set of initial conditions under which ALLC players do not vanish in the long run, which does not hold for all the other three types of players. So an ALLC-player , although perceived as the one that can be easily taken advantage of in snowdrift games, has certain endurance in the long run. Third, the parametric framework makes it possible to actually contr ol the final population shares, a challenging topic in population dynamics, by tuning the payoffs of the base game. I . I N T RO D U C T I O N Game theory provides a framework for studying various control problems such as robust control, distrib uted control and optimization for traf fic systems, communication networks and multi-agent systems in general; in this context, the dif ferent types of games that have been modeled and analyzed in the literature include potential games [1]–[5], stochastic games [6]–[8], constrained g ames [9], repeated games [10], [11], matrix games [12], networked games [13], and others [14]– [20]. More recently , ev olutionary game theory has gained more The work was supported in part by the European Research Council (ERC- StG-307207) and the Netherlands Or ganization for Scientific Research (NW O- vidi-14134). P . Ramazi is with Statistical and Mathematical Sciences Department, Univ ersity of Alberta, Canada and M. Cao is with ENTEG, Faculty of Science and Engineering, Uni versity of Groningen, The Netherlands, p.ramazi@gmail.com, m.cao@rug.nl attention since it is a powerful tool in understanding the e v o- lution of cooperation among selfish indi viduals as reported by biologists, sociologists, economists, etc [21]–[26]. Researchers hav e found that network topology [27], phenotypic interactions [28], [29], punishment [30], population heterogeneity [31], as well as other components in game setups can all af fect the success of cooperators in face of defectors. One stimulating mechanism for the ev olution of cooperation that is generally believ ed to promote cooperation, especially in human societies [32], is dir ect recipr ocity [33]. This mechanism is captured by r epeated games where individuals play a base game repeatedly and can base their action in each round of the game on that of the opponent in the previous round, resulting in r eactive strate gies . Perhaps the most typical reactiv e strategy is the simple yet successful tit-for-tat ( T F T ) strategy where the player starts with cooperation and cooperates if the opponent cooperated and defects if the opponent defected in the last round. A more defectiv e version of the strategy is the suspicious tit-for-tat ( S T F T ) strate gy which is the same as T F T except that the player starts with defection. In addition to these conditional strategies, there are tw o unconditional ones which are the two e xtreme strategies in repeated 2-strategy games: always- cooperate ( ALLC ) and always-defect ( ALLD ) . While much research has been carried out to in vestigate the performance of different reacti ve strategies under the prisoner’ s dilemma game, the cornerstone of game theory , [34]–[39], less has been dev oted to the anti-coordination snowdrift game [40]– [42] despite the fact that the sno wdrift game captures many be- havioral patterns that cannot be well-modeled by the prisoner’ s dilemma game [43]. Moreov er , the existing results on the snowdrift game are mainly experimental or simulation based. For example, in [41], based on human experiments the authors postulate that iterated snowdrift games can explain high levels of cooperation among non-relative humans. Howe ver , few mathematical statements have been constructed to support such claims [44]–[47]. The performance of dif ferent reacti ve strate gies also remains an open problem. Usually the strate gies are compared using 2-strategy games, e.g., the two famous competitions conducted by Axelord [48], [49] where strikingly , the simple T F T was placed first in both (note that although T F T is known to be successful mostly in the repeated prisoner’ s dilemma, it has also been reported to be successful in the repeated snowdrift game [41], [50]). The situation would be different if more than two strategies could be played in the game. Then the best strategy can be decided by natural selection, which is captured by ev olutionary dynamics such as the well-kno wn r eplicator dynamics [51]–[54]. Due to nonlinearity , the replicator dy- namics, howe ver , may exhibit quite complex behaviors, as the dynamically-equiv alent Lotka-V olterr a Equations do [55]. In- deed, except for a few cases [56] [57], the analysis is restricted to only those modeled by planar dynamical systems [58]. This makes the performance inv estigation of more than three reactiv e strategies generally challenging under the replicator dynamics. Ho we ver , the assumption of having just a small number of av ailable strategies may seem not to be realistic or representativ e for man y natural phenomena, particularly those in v olving a wide range of mutations taking place. A research line has consequently been established to study ev olutionary outcomes of repeated games with a lar ge or possibly infinite number of reacti ve strate gies by limiting the analysis to finding evolutionarily stable strate gies and sets , which are known to be asymptotically stable under many ev olutionary dynamics such as the replicator dynamics [24]. F or e xample, the repeated prisoner’ s dilemma is shown to have no pure strategies that are ev olutionarily stable or that can form an evolutionarily stable set [59], [60]. Although rev ealing (non)existence of stable sets under the ev olutionary dynamics, these works neglect other possible long-run behaviors, such as a saddle point as the simplest example. Thus, a considerable portion of equilibrium states that can be f av ored by natural selection remains con- cealed. Moreover , having many av ailable reactive strategies is not always a reasonable assumption, especially when complex strategies are costly or uncommon [61]. So there is a need for e xhaustiv e asymptotic analysis of e volutionary dynamics with typical and simple reactive strategies. The con vergence of large populations playing ev olutionary games is of general interest and has applications in control theory [46], [62]–[64]. W e address both of the above issues in this paper . While considering the snowdrift game as the base game, we study the ev olution of a large population of indi viduals playing the four just mentioned strate gies, ALLC , T F T , S T F T and ALLD , under the replicator dynamics. W e consider a completely parameterized payoff matrix with an arbitrary number of repe- titions of the base game and re veal all asymptotic outcomes of the resulting 3-dimensional dynamics. What enables us to ex- pand our analysis beyond the routine search for e volutionarily stable sets is studying the dynamics of two ratios of the state variables. By dividing the simplex into four sections, in each of which each ratio either monotonically increases or decreases, we show that e very trajectory of the system con ver ges to an equilibrium point, e xcluding the possibility of limit cycles or chaotic behaviors. This approach can be applied to general replicator dynamics with more than three strategies where one or more ratios of the state variables monotonically increase or decrease in some part of the simplex. Our analyses shed light on the social dilemma in the snowdrift game, that is why selfish individuals cooperate while they earn more if they defect against their cooperati ve opponents. This is done by showing that first of all, even in the presence of the very defectiv e strategy ALLD , for some range of payoffs and initial population portions, the population e volv es to the state where all mutually cooperate. In other words, natural selection disfa vors individuals playing ALLD and instead chooses those playing more cooperati ve strategies such as T F T and ev en ALLC . Secondly , the con ver gence results postulate that among the four types of players, ALLC s are surprisingly the best in terms of surviv al and appearance in the long run, explaining why selfish indi viduals may repeatedly cooperate in a snowdrift social context. As a second contribution, due to the parametric frame work we provide, our conv ergence analysis can be used to actually control the final state of the replicator dynamics. By tuning the parameters, one can control the final population portions of indi viduals playing the reacti ve strategies. This is possible when a central agency has control ov er the payoff matrix, e.g., tax regulations made by the gov ernment. Moreover , for populations initially having four co-existing types of players, by comparing those final states in which one or two types of players die out to those with all four , it becomes clear ho w adding a third or fourth strategy can change the final population state. These results lead to addressing the crucial question of how to contr ol portions of differ ent types of individual in a decision-making population? , which finds fascinating applications in repeated snowdrift games, ranging from trading commodities to di vision of labor . The rest of the paper is organized as follo ws. In Section II, we describe the replicator dynamics for repeated snow drift games with the above four reactive strategies. In Section III, we provide the global con ver gence results and discuss their implications on the success of the strategies. W e end with the concluding remarks in Section IV. I I . P RO B L E M F O R M U L A T I O N W e consider an infinitely large, well-mix ed population of individuals that are playing repeated games over time. Each game has two players with two pure strategies: one is to cooperate, denoted by C , and the other to defect, denoted by D , and the payof fs of the game, described by the follo wing payoff matrix, are symmetric to both players C D C R S D T P , (1) where R , S , T and P are real numbers and sometimes in the literature are called the rew ard, sucker’ s payof f, temptation and punishment respecti vely . W e call this two-player , symmetric game, the base game and denote it by G . When the payoffs of the game satisfy T > R > S > P , (2) the game is called a snowdrift game (also kno wn as the hawk-dove or the chick en game ). The game has two Nash equilibria in pure strate gies, both of which correspond to the situation when the tw o players play opposite strategies, and for this reason such a game is also called an anti-coor dination game, often used to study ho w players may contrib ute to the accomplishment of a common task. In this study , we are particularly interested in the case in which individuals play the game repeatedly ov er time and adjust their strate gies according to what their opponents ha ve played in the past. F ormally , a r epeated game, denoted by G m , m ≥ 2 , with r eactive strate gies is constructed from the base game G by repeating it for m rounds, and limiting a player’ s choice of strategies in the current round to be based on the opponent’ s choice in the previous round. In fact, a reactiv e strategy s can al ways be represented by the triple ( p, q , r ) , where p is the probability of cooperating in the first round, and q (respecti vely r ) is the probability of cooperating if the opponent has cooperated (respectiv ely defected) in the previous round. W e consider the following strate gies: • always-cooperate (ALLC) , (1 , 1 , 1) : always cooperates; • tit-for-tat (TFT) , (1 , 1 , 0) : cooperates in the first round, and then chooses what the opponent did in the previous round; • suspicious-tit-for-tat (STFT) , (0 , 1 , 0) : defects in the first round, and then chooses what the opponent did in the previous round; • always-defect (ALLD) , (0 , 0 , 0) : al ways defects. When two players play the repeated game G m , the payoffs for the reactiv e strategies can be calculated ev ery m rounds, leading to the payoff matrix A := [ a ij ] defined by A = ALLC T F T S T F T ALLD ALLC m R m R S + ( m − 1) R m S T F T m R m R d m 2 e S + b m 2 c T S + ( m − 1) P S T F T T + ( m − 1) R d m 2 e T + b m 2 c S m P m P ALLD m T T + ( m − 1) P m P m P . T o illustrate ho w the matrix A is obtained, we take the match between T F T and ALLD as an example. In round one, the T F T player cooperates and the ALLD player defects, so their payoffs according to (1) are S and T respectively . From round two, both T F T and ALLD players defect and hence receiv e P . So over time the payoffs for the T F T player are S , P , P , . . . while those for the ALLD player are T , P , P , . . . . Summing up the payof fs over the m rounds, one obtains the entries of a 23 and a 32 in A . Hence, the repeated game G m can be taken as a normal, symmetric two-player game with the payof f matrix A and with the pure-strategy set { ALLC, T F T , S T F T , ALLD } . Restricting the base g ame to be played m rounds with the same opponent is an assumption that holds in many natural systems and real-life scenarios. Birds in the same flock migrating to winter quarters interact with each other during periods of their migration; students in the same project group collaborate with each other during the semester; and tenants in the same apartment meet each other during their rental period. Such interactions take place repeatedly with the same individuals for a certain amount of time. Having clarified how a pair of indi viduals play games with each other , we no w describe the e volutionary dynamics of the whole population. T ow ards this end, we introduce r eplicator dynamics , which is a standard model from evolutionary game theory [23], [24]. Let 0 ≤ x i ( t ) ≤ 1 , i = 1 , 2 , 3 and 4 , denote the population shares at time t of those individuals playing the pure strategies ALLC , T F T , S T F T and ALLD respectiv ely . Since the four types of players constitute the whole population, it follows that for all t , P 4 i =1 x i = 1 . Define the population v ector x := x 1 x 2 x 3 x 4 > . Then x ∈ ∆ where ∆ is the 4-dimensional simplex defined by ∆ := ( z | z ∈ R 4 , z i ≥ 0 , i = 1 , . . . , 4 , 4 X i =1 z i = 1 ) . (3) W e use the unit vectors at the vertices of the simple x p 1 = 1 0 0 0 , p 2 = 0 1 0 0 , p 3 = 0 0 1 0 , p 4 = 0 0 0 1 to represent the population vectors corresponding to all ALLC players, all T F T players, all S T F T players and all ALLD players respecti vely . Then the e volutions of x i , i = 1 , . . . , 4 , are described by the replicator dynamics [23], [24] ˙ x i = [ u ( p i , x ) − u ( x, x )] x i , (4) where u ( · , · ) is the utility function defined by u ( x, y ) = x > Ay for x, y ∈ ∆ determining the fitness of a player . In essence, (4) indicates that in an evolutionary process, the reproduction rate of the strategy- i players is proportional to the difference between the fitness of strategy- i players u ( p i , x ) and the av erage population fitness u ( x, x ) as a consequence of the fact that the more payof f an indi vidual acquires when playing against its opponents, compared to the a verage payoff of the whole population, the more new offspring proportionally it produces. Since u ( p i , x ) is the expected payof f of an i -playing indi- vidual against a random other indi vidual in the population, the dynamics can be shown to be interpretable as follows [24]. Over a continuous course of time, an individual in the population (say a TFT player) randomly meets another (say an ALLD player), plays the base game with her opponent for m rounds, earns an accumulated payoff according to the payoff matrix A (that is S + ( m − 1) P ), and reproduces offspring playing her same strategy (these are TFT players) with a rate equal to her payoff. Indeed, there are two time scales of fast and slow dynamics; the time it takes for two players to play the repeated game goes much faster and in fact neglectable compared to the reproduction time. The dynamics can also be seen as the mean dynamic approximation of the following process that takes place o ver a discrete sequence of time [23]. At each time step, i) ev ery indi vidual plays the base game for m rounds with ev ery other individual in the population and earns the payoff of the average, and ii) a random individual updates her reactive strategy according to the pairwise proportional imitation update rule , that is, she randomly chooses another individual, say j , and if her payoff is less than that of individual j , imitates his strategy with a probability proportional to the payoff difference, and otherwise, sticks to her o wn current strategy . Since u ( · , · ) is continuously differentiable in R 4 × R 4 , (4) has a unique solution for any x (0) ∈ ∆ [65, Theorem 7.1.1]. It is easy to check that the solution indeed satisfies the constraints 0 ≤ x i ( t ) ≤ 1 , i = 1 , . . . , 4 for all t . Moreover , it can be verified that for any t , if x ( t ) ∈ ∆ , it holds that P 4 i =1 ˙ x i ( t ) = 0 . Hence, P 4 i =1 x i = 1 is in force for all t given x (0) ∈ ∆ . Therefore, ∆ is inv ariant under (4) and hence the dynamical system (4) is well defined on ∆ . W e perform global con ver gence analysis of the replicator dynamics (4). More specifically , for any gi ven initial condition x (0) ∈ ∆ , we aim to determine the limit state of x ( t ) for (4). I I I . G L O BA L C O N V E R G E N C E R E S U LT The main results of this paper are presented in this section. First we find the equilibrium points of the system. Then for the con ver gence results, we divide the analysis into several parts using the notion of face defined as follows. A face of the simplex is the con ve x hull of a non-empty subset H of { p 1 , p 2 , p 3 , p 4 } , and is denoted by ∆( H ) . For simplicity , we remov e the braces when H is represented by its members. For example, the face ∆( p 1 , p 3 , p 4 ) is the conv ex hull of H = { p 1 , p 3 , p 4 } . When H is proper , ∆( H ) is called a boundary face . Follo wing conv ention, the boundary of a set S , denoted by b d( S ) , is the set of points p such that e very neighborhood of p includes at least one point in S and one point out of S , and the interior of S , denoted by in t( S ) , is the greatest open subset of S . The following result enables us to analyze the ev olution of a trajectory starting from b d(∆) separately from that starting from int(∆) . Lemma 1: Each face of ∆ is in variant under the replicator dynamics (4). Pr oof: ∆ was already shown to be inv ariant in Section II. So it remains to prove the lemma for the boundary faces. This can be done based on the observation that if for some i = 1 , . . . , 4 , x i (0) = 0 , then x i ( t ) = 0 for all t ∈ R . W e start with analyzing the boundary of the simplex. Howe ver , the boundary of the simplex itself consists of the four planar faces ∆( p 1 , p 2 , p 3 ) , ∆( p 1 , p 2 , p 4 ) , ∆( p 1 , p 3 , p 4 ) and ∆( p 2 , p 3 , p 4 ) . Because of Lemma 1, we can also analyze the dynamics (4) on each of these faces separately . Y et again, the boundary of each of these planar faces consists of three one-dimensional faces known as the edges of the simplex. For example, the boundary of the face ∆( p 1 , p 2 , p 3 ) consists of the edges ∆( p 1 , p 2 ) , ∆( p 1 , p 3 ) and ∆( p 2 , p 3 ) . On the other hand, each of the edges are also inv ariant in view of Lemma 1. Therefore, we study separately trajectories starting from an edge and those starting from the interior of a planar face. Then we proceed to the interior of the simplex. T o simplify the analysis, we carry out on the matrix A some operations that preserve the dynamics (4). Subtracting m R from the entries of the first and second columns, and m S from the entries of the third and fourth columns of A , we acquire the following matrix A 0 := [ a 0 ij ] = 0 0 S + ( m − 1) R − m P m ( S − P ) 0 0 d m 2 e S + b m 2 c T − m P S − P T − R d m 2 e T + b m 2 c S − m R 0 0 m ( T − R ) T + ( m − 1) P − m R 0 0 . (5) In view of Lemma 7 in Appendix A, the dynamics (4) are unchanged with A 0 in place of A . Since A 0 is more structured with zero block matrices, in what follows we focus on A 0 instead of A . A. Equilibrium points T o determine the equilibria of the system, we first look for those on the boundary of the simplex, and then for those in the interior . 1) boundary equilibrium points: Let ∆ o and ∆ oo denote the set of equilibrium points of the replicator dynamics (4) that belong to ∆ and bd(∆) , respectiv ely . Depending on the payoffs, ∆ oo will be a combination of the unit vectors p 1 , p 2 , p 3 , p 4 , the vectors x 14 = S − P S − P + T − R 0 0 T − R S − P + T − R , x 23 = 0 d m 2 e S + b m 2 c T − m P m ( T + S − P − R ) d m 2 e T + b m 2 c S − m R m ( T + S − P − R ) 0 , x 13 = S +( m − 1) R − m P T + S +( m − 2) R − m P 0 T − R T + S +( m − 2) R − m P 0 , x 24 = 0 S − P T + S +( m − 2) P − m R 0 T +( m − 1) P − m R T + S +( m − 2) P − m R , and the sets X 12 = { αp 1 + (1 − α ) p 2 : α ∈ [0 , 1] } , X 34 = { αp 3 + (1 − α ) p 4 : α ∈ [0 , 1] } , X 123 = { x ∈ in t(∆( p 1 , p 2 , p 3 )) | a 0 31 x 1 + a 0 32 x 2 − a 0 13 x 3 = 0 } where a 0 ij ’ s are the entries of A 0 defined in (5). Here, the superscript ij in x ij (resp. X ij ) simply means that x ij (resp. X ij ) belongs to the edge ∆( p i , p j ) . The following proposition determines ∆ oo . Pr oposition 1: Assume (2) holds. It follows that 1) if S < R < T +( m − 1) P m , then ∆ oo = X 12 ∪ { x 13 , x 14 , x 23 , x 24 } ∪ X 34 ; 2) if T +( m − 1) P m ≤ R < T + S 2 , or if m = 2 n + 1 , n ≥ 1 and T + S 2 < R < ( n +1) T + n S 2 n +1 , then ∆ oo = X 12 ∪ { x 13 , x 14 , x 23 } ∪ X 34 ; 3) if m = 2 n + 1 , n ≥ 1 and R = T + S 2 , then ∆ oo = X 12 ∪ { x 13 , x 14 , x 23 } ∪ X 34 ∪ X 123 ; 4) if m = 2 n, n ≥ 1 and R = n T +( n − 1) S 2 n − 1 , then ∆ oo = X 12 ∪ { x 13 , x 14 } ∪ X 34 ∪ X 123 ; 5) if max n d m − 2 2 e S + b m 2 c T m − 1 , d m 2 e T + b m 2 c S m o < R < T , or if m = 2 n, n ≥ 1 and T + S 2 ≤ R < n T +( n − 1) S 2 n − 1 , then ∆ oo = X 12 ∪ { x 13 , x 14 } ∪ X 34 . For the proof, we need to take a closer look at the payof f matrix A 0 . The order in the magnitudes of the entries in each column of A 0 , clarified in the following lemma, pro ves useful both in the determination of the equilibria and the asymptotic behavior of the replicator dynamics (4). Lemma 2: Assume (2) holds. Consider the payoff matrix A 0 and denote the maximum positiv e, positiv e, negati ve and minimum negati ve entries of each column by ‘ ++ ’, ‘ + ’ , ‘ − ’ and ‘ −− ’, respectively . Then A 0 has the following sign structure 1) 0 0 + ++ 0 0 ++ + + ++ 0 0 ++ + 0 0 when S < R < T +( m − 1) P m ; 2) 0 0 + ++ 0 0 ++ + + ++ 0 0 ++ 0 , − 0 0 when T +( m − 1) P m ≤ R < T + S 2 ; 3) 0 0 ++ ++ 0 0 ++ + + ++ 0 0 ++ − 0 0 when m = 2 n + 1 , n ≥ 1 , and R = T + S 2 ; 4) 0 0 ++ ++ 0 0 + + + ++ , 0 0 0 ++ − 0 0 when m = 2 n + 1 , n ≥ 1 , and T + S 2 < R ≤ ( n +1) T + n S 2 n +1 ; 5) 0 0 + ++ 0 0 ++ + + 0 , − 0 0 ++ −− 0 0 when m = 2 n, n ≥ 1 , and T + S 2 ≤ R < n T +( n − 1) S 2 n − 1 ; 6) 0 0 ++ ++ 0 0 ++ + + − 0 0 ++ −− 0 0 when m = 2 n, n ≥ 1 , and R = n T +( n − 1) S 2 n − 1 ; 7) and 0 0 ++ ++ 0 0 + + + − 0 0 ++ −− 0 0 when max n d m − 2 2 e S + b m 2 c T m − 1 , d m 2 e T + b m 2 c S m o < R < T . Here, when an entry takes both 0 and one other sign (separated by a comma), 0 takes place if the equality sign of the R condition holds, and otherwise the other sign is valid. Pr oof: The sign of the elements of A 0 are determined by (2). First note that T > R implies a 0 31 > 0 . On the other hand, since m ≥ 2 , we hav e that a 0 41 > a 0 31 > 0 . Hence, due to the fact that the third and fourth entries of the last column of A 0 are zero, a 0 41 and a 0 31 are denoted by ‘ ++ ’ and ‘ + ’, respectiv ely . Similarly S > P implies a 0 14 > a 0 24 > 0 and hence a 0 14 and a 0 24 are denoted by ‘ ++ ’ and ‘ + ’, respectively . Since T , S > P implies d m 2 e S + b m 2 c T > d m 2 e P + b m 2 c P ⇒ d m 2 e S + b m 2 c T > m P , it follows that a 0 23 > 0 . Additionally , R > P , S > P ⇒ ( m − 1) R + S > m P , which implies a 0 13 > 0 . Similarly T , S > P yields T + d m − 2 2 e T + b m − 2 2 c S + S > T + d m − 2 2 e P + b m − 2 2 c P + P ⇒ d m 2 e T + b m 2 c S > T + ( m − 2) P + P = T + ( m − 1) P . Hence, a 0 32 > a 0 42 . It remains to determine the signs of a 0 42 and a 0 32 and also the ordering of a 0 13 and a 0 23 . Since m ≥ 2 , division by m − 1 is v alid, and hence the follo wing hold a 0 42 > 0 ⇐ ⇒ R < T + ( m − 1) P m , (6) a 0 32 > 0 ⇐ ⇒ R < d m 2 e T + b m 2 c S m , (7) a 0 23 > a 0 13 ⇐ ⇒ R < d m − 2 2 e S + b m 2 c T m − 1 . (8) The average of T , P , . . . , P | {z } m − 1 is less than both the av erage of T , T , . . . , T | {z } d m 2 e− 1 , S , . . . , S | {z } b m 2 c and the av erage of T , T , . . . , T | {z } b m 2 c− 1 , S , . . . , S | {z } d m − 2 2 e . Thus, T + ( m − 1) P m < d m 2 e T + b m 2 c S m , d m − 2 2 e S + b m 2 c T m − 1 . Hence, when (6) holds, so do (7) and (8). This proves the first case of the lemma. No w we compare d m − 2 2 e S + b m 2 c T m − 1 and d m 2 e T + b m 2 c S m . In general, it holds that d m − 2 2 e S + b m 2 c T m − 1 = ( n − 1) S + n T 2 n − 1 m = 2 n n S + n T 2 n m = 2 n + 1 , d m 2 e T + b m 2 c S m = n T + n S 2 n m = 2 n ( n + 1) T + n S 2 n + 1 m = 2 n + 1 . Due to the fact that T > S , we obtain d m 2 e T + b m 2 c S m < d m − 2 2 e S + b m 2 c T m − 1 m = 2 n, d m − 2 2 e S + b m 2 c T m − 1 < d m 2 e T + b m 2 c S m m = 2 n + 1 . (9) Hence, min d m 2 e T + b m 2 c S m , d m − 2 2 e S + b m 2 c T m − 1 = T + S 2 . (10) The abov e equation results in cases 2) and 3) of the lemma. The remaining cases can be verified similarly using (9) and (10). The boundary of ∆ is the union of the boundary faces ∆( p 1 , p 2 , p 3 ) , ∆( p 1 , p 2 , p 4 ) , ∆( p 1 , p 3 , p 4 ) and ∆( p 2 , p 3 , p 4 ) . So in order to find the equilibria on bd(∆) , we can in vestigate each face separately . The interior equilibria of each face is determined in the following proposition, the proof of which follows from the con vergence results and methods in [66]. Pr oposition 2: Assume (2) holds. The interiors of the faces ∆( p 1 , p 2 , p 4 ) , ∆( p 1 , p 3 , p 4 ) and ∆( p 2 , p 3 , p 4 ) do not contain an equilibrium point of the dynamics (4). If m = 2 n + 1 , n ≥ 1 and R = T + S 2 , or m = 2 n, n ≥ 1 and R = n T +( n − 1) S 2 n − 1 , then the interior of the face ∆( p 1 , p 2 , p 3 ) contains the continuum of equilibrium points X 123 , and does not contain any other equilibrium. For all other values of m and the payoffs, the interior of ∆( p 1 , p 2 , p 3 ) does not contain an equilibrium point. Now we pro ve Proposition 1. Pr oof of Pr oposition 1: In view of Proposition 2, there is no equilibrium point in the interior of any of ∆( p 1 , p 2 , p 3 ) , ∆( p 1 , p 2 , p 4 ) , ∆( p 1 , p 3 , p 4 ) and ∆( p 2 , p 3 , p 4 ) , except for Cases 3) and 4) where X 123 appears. Hence, all of the rest of the boundary equilibrium points are located on the 6 edges of the simple x. The edges ∆( p 1 , p 2 ) = X 12 and ∆( p 3 , p 4 ) = X 34 are always a continuum of equilibrium points. The vertices p 1 , p 2 , p 3 , p 4 are also always equilibrium points, but the y are included in X 12 and X 34 . Hence, the rest of the equilibrium points can be determined by inv estigating the dynamics in the interior of the remaining four edges, leading to the conclusion (see [55]). The local stability of the equilibrium points generally de- pends on the payoffs in A , and can be determined based on the conv ergence results in this section. Ho wev er , the following result guarantees the asymptotic stability of x 14 for all payoffs satisfying (2). Pr oposition 3: Assume (2) holds. Then x 14 is asymptotically stable. Pr oof: The proof follows Proposition 7 and Lemma 8 in Appendix B. 2) Interior equilibrium point: The dynamics (4), may or may not possess an interior equilibrium depending on the payoff matrix A . As shown in the following proposition, if the dynamics have an interior equilibrium, it is unique and equal to x int = ( a 0 42 − a 0 32 )( a 0 13 a 0 24 − a 0 14 a 0 23 ) ( a 0 31 − a 0 41 )( a 0 13 a 0 24 − a 0 14 a 0 23 ) ( a 0 24 − a 0 14 )( a 0 31 a 0 42 − a 0 32 a 0 41 ) ( a 0 13 − a 0 23 )( a 0 31 a 0 42 − a 0 32 a 0 41 ) /r where a 0 ij are the entries of A 0 in (5), and r = ( a 0 13 a 0 24 − a 0 14 a 0 23 )( a 0 31 − a 0 41 + a 0 42 − a 0 32 ) + ( a 0 31 a 0 42 − a 0 32 a 0 41 )( a 0 13 − a 0 23 + a 0 24 − a 0 14 ) > 0 . (11) The positi veness of r can be deriv ed from (2). Define the following constants based on the entries a 0 ij of A 0 : b 1 = − a 0 13 − a 0 23 a 0 14 − a 0 24 = d m − 2 2 e S + b m 2 c T − ( m − 1) R ( m − 1)( S − P ) , b 2 = − a 0 42 − a 0 32 a 0 41 − a 0 31 = d m − 2 2 e T + b m 2 c S − ( m − 1) P ( m − 1)( T − R ) . Pr oposition 4: Assume (2) holds. It follows that 1) if S < R < T + S 2 or if m = 2 n, n ≥ 1 and T + S 2 ≤ R < n T +( n − 1) S 2 n − 1 , then the dynamics (4) possess e xactly one interior equilibrium point x int that is a hyperbolic saddle with two negati ve eigenv alues; additionally , for all initial conditions on the open line segment L int = { x ∈ in t(∆) | x 1 = b 2 x 2 , x 4 = b 1 x 3 } , the solution trajectory conv erges to x int ; 2) otherwise, the dynamics have no interior equilibrium point. For the proof, we study the e volution of the ratios x 1 x 2 and x 4 x 3 , which due to the block anti-diagonal structure of the payoff matrix A 0 , are crucial in determining the asymptotic behavior of the replicator dynamics and are explained as follo ws. Lemma 3: Let x (0) ∈ in t(∆) . Then d dt x 1 x 2 is greater than (resp. equal to, resp. less than) 0 if and only if x 4 x 3 is greater than (resp. equal to, resp. less than) b 1 . Similarly , d dt x 4 x 3 is greater than (resp. equal to, resp. less than) 0 if and only if x 1 x 2 is greater than (resp. equal to, resp. less than) b 2 . Pr oof: In vie w of Lemma 1, x (0) ∈ in t(∆) implies x ( t ) ∈ int(∆) for all t . Hence, 0 < x i ( t ) < 1 , i = 1 , . . . , 4 , for all t . So it is possible to define the ratio x i x j ( t ) , i, j = 1 , . . . , 4 and calculate its time deri v ativ e using [24, Eq. 3.6] as d dt x i x j = [ u ( p i , x ) − u ( p j , x )] x i x j . Consider the payoff matrix A 0 and let i = 1 , j = 2 and i = 3 , j = 4 to obtain the follo wing two equations d dt x 1 x 2 = [( a 0 13 − a 0 23 ) | {z } a 0 3 x 3 + ( a 0 14 − a 0 24 ) | {z } a 0 4 x 4 ] x 1 x 2 , (12) d dt x 4 x 3 = [( a 0 41 − a 0 31 ) | {z } a 0 1 x 1 + ( a 0 42 − a 0 32 ) | {z } a 0 2 x 2 ] x 4 x 3 . In view of Lemma 2, a 0 1 , a 0 4 > 0 . Hence, because of (12), d dt x 1 x 2 > 0 ⇔ a 0 3 x 3 + a 0 4 x 4 > 0 a 0 4 > 0 ⇐ = ⇒ x 4 x 3 > − a 0 3 a 0 4 = b 1 , d dt x 4 x 3 > 0 ⇔ a 0 1 x 1 + a 0 2 x 2 > 0 a 0 1 > 0 ⇐ = ⇒ x 1 x 2 > − a 0 2 a 0 1 = b 2 . This prov es the “ greater than” cases. The “ equal to” and “ less than” cases can be prov en similarly . Determining the signs of b 1 and b 2 will prov e useful, and is clarified in the follo wing lemma. Lemma 4: It holds that b 2 > 0 . Moreover , b 1 > 0 (resp. b 1 = 0 and b 1 < 0 ) if and only if a 0 13 < a 0 23 (resp. a 0 13 = a 0 23 and a 0 13 > a 0 23 ) where a 0 ij are the entries of A 0 in (5). Pr oof: In view of Lemma 2, a 0 32 > a 0 42 and a 0 41 > a 0 31 . Hence, b 2 > 0 reg ardless of the payoffs in A 0 . Moreover , the inequality a 0 14 > a 0 24 also always holds, which leads to the proof. Now we proceed to the proof of Proposition 4. Pr oof of Pr oposition 4: Consider Case 1). In vie w of Lemma 4 and Lemma 2, b 1 , b 2 > 0 . Then each of the follo wing two sets define a plane in the simplex P 1 = x ∈ ∆ | x 4 x 3 = b 1 , P 2 = x ∈ ∆ | x 1 x 2 = b 2 . In view of Lemma 3, on each side of the plane P 1 (resp. P 2 ), the quantity x 1 x 2 (resp. x 4 x 3 ) either increases or decreases. Hence, if an interior equilibrium point exists, it has to lie on the interior of the intersection of the two planes P 1 and P 2 , which is the open line segment L int . According to Lemma 3, L int is inv ariant under the replicator dynamics (4). The dynamics of x 2 on L int can be expressed as ˙ x 2 = k ( f x 2 − g )( rx 2 − s ) x 2 (13) where k = 1 ( a 0 41 − a 0 31 ) 2 ( a 0 13 − a 0 23 + a 0 24 − a 0 14 ) > 0 , f = a 0 32 − a 0 42 + a 0 41 − a 0 31 > 0 , g = a 0 41 − a 0 31 > 0 , s = ( a 0 13 a 0 24 − a 0 14 a 0 23 )( a 0 31 − a 0 41 ) > 0 , and r is defined in (11). The equilibrium points of (13) are x ∗ 2 = 0 , s r , g f , which are easily proven to be unstable, stable and unstable, respectiv ely . On the other hand, x ∗ 2 = 0 and x ∗ 2 = g f correspond to equilibrium points on the boundary of ∆ . Hence, for an y initial condition on L int , the trajectory x ( t ) conv erges to x ∗ ∈ L int where x ∗ 2 = s r . By using the constraints P 4 i =1 x ∗ i = 1 and x ∗ ∈ L int , we get that x ∗ = x int . Hence, x int is an interior equilibrium, and for all x (0) ∈ L int , x ( t ) → x int . Now the eigen values of x int are determined. Consider the replicator dynamics (4). Replace the vector x by ˆ x = x 1 x 2 x 3 1 − x 1 − x 2 − x 3 > , and eliminate the differential equation for ˙ x 4 to get a 3rd order system. Then, the characteristic equation of the corresponding Jacobian matrix about x int is λ 3 + aλ 2 + bλ + c = 0 where a, b, c, ∈ R . It can be verified that c = ab and a > 0 > b, c . Hence, the corresponding eigenv alues of x int are − a, ± √ − b , which completes the proof of this case. Now consider Case 2) where a 0 13 ≥ a 0 23 . Hence, b 1 ≤ 0 in view of Lemma 4. Hence, P 1 does not intersect ∆ implying that the ratio x 4 x 3 is always greater than b 1 . Hence, in view of Lemma 3, x 1 x 2 monotonically increases in int(∆) . Hence, there is no interior equilibrium point in this case. B. T rajectories starting on an edge Due to inv ariance, the conv ergence analysis of the dynamics (4) on an edge ∆( p k , p j ) , k , j ∈ { 1 , 2 , 3 , 4 } , k 6 = j , can be reduced to the analysis of the follo wing 2-dimensional replicator dynamics ˙ x i = [( p i ) > ˆ A ˆ x − ˆ x > ˆ A ˆ x ] x i , i = k , j where ˆ x = x k x j , ˆ A kj = a kk a kj a j k a kk . See [24, Section 3.1.4], [55] or [66] for the analysis of these dynamics. C. T rajectories starting in the interior of a planar face W e limit this section to the following con ver gence result that can be proven using the findings in [55], [66], [67]. Pr oposition 5: If x (0) belongs to one of the faces ∆( p 1 , p 2 , p 3 ) , ∆( p 1 , p 2 , p 4 ) , ∆( p 1 , p 3 , p 4 ) or ∆( p 2 , p 3 , p 4 ) , then x ( t ) con ver ges to a point in that face as t → ∞ . D. T rajectories starting in the interior of the simplex 1) Dynamics in the four sections made by the two ratios: When b 1 and b 2 are positiv e, the ratios x 1 x 2 and x 4 x 3 divide the simplex into the four following zones: D 14 = x ∈ in t(∆) | x 1 x 2 > b 2 , x 4 x 3 > b 1 , D 23 = x ∈ in t(∆) | x 4 x 3 < b 2 , x 4 x 3 < b 1 . Y 14 = x ∈ in t(∆) | x 1 x 2 > b 2 , x 4 x 3 < b 1 , Y 23 = x ∈ in t(∆) | x 1 x 2 < b 2 , x 4 x 3 > b 1 . W e in vestigate interior trajectories of the simplex by studying the dynamics in the abo ve zones and start by D 14 and D 23 . Lemma 5: D 14 and D 23 are positiv ely in variant under (4). Pr oof: First D 14 is shown to be positiv ely inv ariant. Assume the contrary , i.e., a trajectory x ( t ) starts from some point in D 14 at t = t 0 but does not belong to D 14 at some time t ∗ > t 0 . Due to the continuity of the trajectory , there exists some time t 1 ∈ ( t 0 , t ∗ ) at which the trajectory intersects the boundary of D 14 . Hence, at least one of the followings happen x 1 x 2 ( t 1 ) = b 2 , x 4 x 3 ( t 1 ) = b 1 . W ithout loss of generality , assume the first case happens. Then x 1 x 2 ( t 1 ) < x 1 x 2 ( t 0 ) . Hence, d dt x 1 x 2 must be negati ve at some time t 2 ∈ ( t 0 , t 1 ) . Hence, due to the continuity of the time- deriv ative of x 1 x 2 , d dt x 1 x 2 is zero at some time t 3 ∈ ( t 0 , t 2 ) . Hence, in vie w of Lemma 3, x 4 x 3 ( t 3 ) = b 1 . This implies that the trajectory has intersected the boundary of D 14 at some time earlier than t 1 , a contradiction. Hence, if a trajectory starts in D 14 at some time t = t 0 , it remains there afterwards. Similarly the positiv e in variance of D 23 can be shown. Pr oposition 6: Consider a trajectory x ( t ) of the dynamics (4) that passes through x 0 at some time t 0 . If x 0 ∈ D 14 , then one of the following cases happen lim t →∞ x ( t ) = x 14 or lim t →∞ x ( t ) = x ∗ ∈ X 12 ∩ ∆ N E . If x 0 ∈ D 23 , then lim t →∞ x ( t ) = x ∗ ∈ ( { x 23 } ∪ X 12 ) ∩ ∆ N E . Pr oof: Consider the case when x 0 ∈ D 14 . In view of Lemma 5, x ( t ) ∈ D 14 for all t ≥ t 0 . Hence, both inequalities x 1 x 2 ( t ) > b 2 and x 4 x 3 ( t ) > b 1 hold for all t ≥ t 0 . Hence, in view of Lemma 3, both ratios x 4 x 3 and x 1 x 2 monotonically increase with time. Hence, each ratio conv erges to either a constant or ∞ . In case one of the ratios, e.g., x 1 x 2 , con ver ges to a constant, that constant must be strictly positiv e. This follo ws from the fact that x 1 x 2 ( t 0 ) > 0 and that x 1 x 2 monotonically increases. In general, one of the follo wing cases may occur: 1) x 1 x 2 → α > 0 and x 4 x 3 → β > 0 . Thus, x con verges to the following line se gment L α,β = { x ∈ ∆ | x 1 = αx 2 , x 4 = β x 3 } . In view of Theorem 5 in Appendix D, x → L α,β ∩ ∆ o . In what follows, it is sho wn that in t( L α,β ) ∩ ∆ o = ∅ . First note that α > b 2 . This can be prov en by contradiction: Assume that α ≤ b 2 . Since x ( t 0 ) ∈ D 14 , it holds that x 1 x 2 ( t 0 ) > b 2 . Hence, x 1 x 2 ( t 0 ) > b 2 ≥ α . Then, due to the continuity of the trajectory , there exists some time t 1 > t 0 such that x 1 x 2 ( t 1 ) = b 2 . Hence, x ( t 1 ) 6∈ D 14 , which contradicts the inv ariance property of D 14 . So α > b 2 . Now note that in t( L α,β ) ⊆ int(∆) . On the other hand, in vie w of Lemma 4, the only interior equilibrium of the system (if there exists any), belongs to the plane n x ∈ ∆ | x 1 x 2 = b 2 o . Ho wev er , as it was discussed above, x 1 x 2 → α > b 2 . Hence, in t(∆) ∩ ∆ o = ∅ . So in t( L α,β ) ∩ ∆ o = ∅ . Thus, x → bd( L α,β ) ∩ ∆ o . The boundary of L α,β consists of the following two points, each of which is an equilibrium: x α = α 1+ α 1 1+ α 0 0 > ∈ X 12 , x β = h 0 0 1 1+ β β 1+ β i > ∈ X 34 . According to Lemma 9 in Appendix C, if x con ver ges to a point, it must belong to ∆ N E . Howe ver , x β 6∈ ∆ N E in view of Lemma 10 in Appendix C. Hence, x 6→ x β implying that x → x α . On the other hand, x α ∈ X 12 and x α must belong to ∆ N E . Hence, x → x ∗ ∈ X 12 ∩ ∆ N E . 2) x 1 x 2 → α > 0 and x 4 x 3 → ∞ . Hence, x conv erges to the following line se gment L α, ∞ = { x ∈ ∆ | x 1 = αx 2 , x 3 = 0 } . Due to Theorem 5, x con verges to an equilibrium or a continuum of equilibria on L α, ∞ . On the other hand, L α, ∞ lies on the face ∆( p 1 , p 2 , p 4 ) , and in view of Proposition 2, no interior equilibrium exists on this face. Hence, x con verges to the intersection of L α, ∞ with the boundary of ∆( p 1 , p 2 , p 4 ) which is { x α , p 4 } . Howe ver , p 4 6∈ ∆ N E and hence x 6→ p 4 in view of Lemma 9. Hence, x → x α . So, similar to the previous case, x → x ∗ ∈ X 12 ∩ ∆ N E . 3) x 1 x 2 → ∞ and x 4 x 3 → β > 0 . Similar to the pre vious case, it can be shown that x → x β or x → p 1 . Howe ver , neither x β nor p 1 belongs to ∆ N E . Hence, this case ne ver happens. 4) x 1 x 2 → ∞ and x 4 x 3 → ∞ . Hence, x con verges to the following line se gment L ∞ , ∞ = { x ∈ ∆ | x 2 = 0 , x 3 = 0 } = ∆( p 1 , p 4 ) . Due to Theorem 5, x → ∆( p 1 , p 4 ) ∩ ∆ o = { p 1 , x 14 , p 4 } . On the other hand, p 1 , p 4 6∈ ∆ N E . Hence, x → x 14 in view of Lemma 9. Summarizing the above four cases completes the proof for when x 0 ∈ D 14 . Now let x 0 ∈ D 23 . By follo wing the procedure for when x 0 ∈ D 14 , it can be shown that both ratios x 4 x 3 and x 1 x 2 con ver ge either to a positi ve constant or to 0 . In general, one of the follo wing cases may occur: 1*) x 1 x 2 → α > 0 and x 4 x 3 → β > 0 . Similar to when x 0 ∈ D 14 , this case results in x → x ∗ ∈ X 12 ∩ ∆ N E . 2*) x 1 x 2 → α > 0 and x 4 x 3 → 0 . Hence, x conv erges to the following line se gment L α, 0 = { x ∈ ∆ | x 1 = αx 2 , x 4 = 0 } . In view of Theorem 5, x → L α, 0 ∩ ∆ o . Clearly L α, 0 ⊆ ∆( p 1 , p 2 , p 3 ) . On the other hand, according to Proposition 2, in t(∆( p 1 , p 2 , p 3 )) ∩ ∆ o either is empty or equals to X 123 . In view of Theorem 1, the second case only happens when m = 2 n + 1 , n ≥ 1 and R = T + S 2 , or m = 2 n, n ≥ 1 and R = n T +( n − 1) S 2 n − 1 . Howe ver , for both of these v alues of R , it can be verified that b 1 < 0 . Hence, D 23 = ∅ , which contradicts the assumption x 0 ∈ D 23 . Hence, in t(∆( p 1 , p 2 , p 3 )) ∩ ∆ o = ∅ . So in t( L α, 0 ) ∩ ∆ o = ∅ and x → b d( L α, 0 ) . Thus, x → { x α , p 3 } . Howe ver , p 3 6∈ ∆ N E and hence x 6→ p 3 , in vie w of Lemma 9. Hence, x → x α resulting in x → x ∗ ∈ X 12 ∩ ∆ N E . 3*) x 1 x 2 → 0 and x 4 x 3 → β > 0 . Hence, x con ver ges to the following line se gment L 0 ,β = { x ∈ ∆ | x 1 = 0 , x 4 = β x 3 } . Similar to the previous case, it can be shown that x → { x β , p 2 } . Hence, in view of Lemma 9, x → { x β , p 2 } ∩ ∆ N E . So x → { p 2 } ∩ ∆ N E since x β 6∈ ∆ N E . On the other hand, p 2 ∈ X 12 . Hence, x → x ∗ ∈ X 12 ∩ ∆ N E . 4*) x 1 x 2 → 0 and x 4 x 3 → 0 . Hence, x con ver ges to the following line se gment L 0 , 0 = { x ∈ ∆ | x 1 = 0 , x 4 = 0 } = ∆( p 2 , p 3 ) . Due to Theorem 5, x → ∆( p 2 , p 3 ) ∩ ∆ o = { p 2 , x 23 , p 3 } . On the other hand, p 3 6∈ ∆ N E . Hence, x → { x 23 , p 2 } ∩ ∆ N E in view of Lemma 9. Since p 2 ∈ X 12 , it can be concluded that x → x ∗ ∈ ( X 12 ∪ { x 23 } ) ∩ ∆ N E . By summarizing the abo ve cases, the proof for when x 0 ∈ D 23 is complete. Lemma 6: Consider a trajectory x ( t ) of the dynamics (4) that passes through x 0 at some time t 0 . If x 0 ∈ Y 14 , then either x ( t ) leaves Y 14 after some finite time, or lim t →∞ x ( t ) = x int or lim t →∞ x ( t ) = x ∗ ∈ X 12 ∩ ∆ N E . If x 0 ∈ Y 23 , then either x ( t ) leav es Y 23 after some finite time, or lim t →∞ x ( t ) = x int or lim t →∞ x ( t ) = x ∗ ∈ ( X 12 ∪ X 123 ) ∩ ∆ N E . Pr oof: Consider the case when x 0 ∈ Y 14 . If x leaves Y 14 after some finite time, the conclusion can be drawn directly . So let Y 14 be inv ariant. Then the inequalities in the definition of Y 14 hold for all t ≥ t 0 . Hence, in view of Lemma 3, x 1 x 2 monotonically decreases and hence conv erges to a constant α ≥ b 2 , and x 4 x 3 monotonically increases and hence con verges to a constant β ≤ b 1 as t → ∞ . Hence, x ( t ) conv erges to the line segment L α,β = { x ∈ ∆ | x 1 = αx 2 , x 4 = β x 3 } . So based on Theorem 5 in Appendix D, x ( t ) conv erges to L α,β ∩ ∆ o . On the other hand, ∆ o includes at most one interior equilibrium point x int according to Lemma 5. Hence, either x ( t ) → x int or x ( t ) → L α,β ∩ ∆ oo . The first case leads to the conclusion directly , so consider the second case. First note that α > 0 since b 2 > 0 in view of Lemmas 4 and 2. Moreover , β > 0 since x 4 x 3 monotonically increases from x 4 x 3 (0) > 0 to β . Hence, α, β > 0 . So on the set L α,β ∩ b d(∆) , either x 1 = x 2 = 0 or x 3 = x 4 = 0 holds. Then L α,β ∩ bd(∆) equals a point x ∗ ∈ X 12 ∪ X 34 . On the other hand, ∆ oo ⊆ bd(∆) . Hence, since X 12 ∪ X 34 ⊆ ∆ oo it holds that L α,β ∩ ∆ oo = x ∗ ∈ X 12 ∪ X 34 . Thus, in view of Lemma 9 in Appendix C, x ( t ) → x ∗ ∈ ( X 12 ∪ X 34 ) ∩ ∆ N E . On the other hand, X 34 ∩ ∆ N E = ∅ according to Lemma 10 in Appendix C. Hence, x ( t ) → x ∗ ∈ X 12 ∩ ∆ N E , which completes the proof of this part. Now consider the case when x 0 ∈ Y 23 and Y 23 is in variant (otherwise, the result is tri vial). Hence, in vie w of Lemma 3, x 1 x 2 monotonically increases and hence con verges to a constant α ≤ b 2 , and x 4 x 3 monotonically decreases and hence conv erges to a constant β ≥ b 1 as t → ∞ . So similar to the previous case, either x ( t ) → x int or x ( t ) → L α,β ∩ ∆ oo . Again the first case leads to the conclusion directly , so consider the second. It must be true that α > 0 since x 1 x 2 monotonically increases from x 1 x 2 (0) > 0 to α . If β is also positi ve, then the same as when x 0 ∈ Y 14 takes place, which makes the result trivial. So let β = 0 . Then L α,β = { x ∈ ∆ | x 1 = αx 2 , x 4 = 0 } . Hence, in view of Theorem 1, L α,β ∩ ∆ oo = x ∗ ∈ { x 13 } ∪ X 12 ∪ X 123 . So in vie w of Lemma 9 and 11 in Appendix C, x ( t ) → x ∗ ∈ ( X 12 ∪ X 123 ) ∩ ∆ N E , which completes the proof. 2) Global results: W e proceed to the global con ver gence analysis. As one would expect, the conv ergence results depend on the payoffs and to some extent also on m . W e provide the results from small to large R via the following four theorems. Theor em 1: Assume (2) holds. Let x (0) ∈ int(∆) . Denote the 2-dimensional stable manifold of x int by W s ( x int ) . If S < R < T + S 2 , then 1) x (0) ∈ W s ( x int ) ⇒ lim t →∞ x ( t ) = x int ; 2) x (0) 6∈ W s ( x int ) ⇒ lim t →∞ x ( t ) = x 14 or x 23 ; 3) x 14 and x 23 are asymptotically stable and their basins of attraction are separated by W s ( x int ) ; 4) x (0) ∈ D 14 ⇒ lim t →∞ x ( t ) = x 14 ; 5) x (0) ∈ D 23 ⇒ lim t →∞ x ( t ) = x 23 . Pr oof: Case 1) of the theorem is a direct result of Theorem 4. No w we proceed to Case 2). According to Lemma 4 and Lemma 2, b 1 , b 2 > 0 . Hence, the interior of the simple x can be written as in t(∆) = D 14 ∪ D 23 ∪ Y 14 ∪ Y 23 ∪ L int ∪ ˆ P 11 ∪ ˆ P 12 ∪ ˆ P 21 ∪ ˆ P 22 (14) where L int is defined in Theorem 4 and ˆ P 11 = x ∈ in t(∆) | x 4 x 3 = b 1 , x 1 x 2 > b 2 , (15) ˆ P 12 = x ∈ in t(∆) | x 4 x 3 = b 1 , x 1 x 2 < b 2 , ˆ P 21 = x ∈ in t(∆) | x 1 x 2 = b 2 , x 4 x 3 > b 1 , ˆ P 22 = x ∈ in t(∆) | x 1 x 2 = b 2 , x 4 x 3 < b 1 . Hence, x (0) belongs to one of the sets on the right hand side of (14). If x (0) ∈ D 14 , then in view of Proposition 6, x ( t ) con ver ges to either x 14 or a point in X 12 ∩ ∆ N E . Ho wev er , in view of Lemma 10 in Appendix C, X 12 ∩ ∆ N E = ∅ . Hence, x ( t ) → x 14 . This proves Case 4). Similarly Case 5) can be shown. Now consider the case when x (0) ∈ Y 14 . In view of Lemma 6, if x ( t ) remains in Y 14 , it con ver ges to a point in X 12 ∩ ∆ N E . Howe ver , in view of Lemma 10, X 12 ∩ ∆ N E = ∅ , which implies x ( t ) leav es Y 14 after some finite time. Hence, x ( t ) enters one of the sets L int , ˆ P 11 , or ˆ P 22 at some time t 1 > 0 . If x ( t 1 ) ∈ L int , then x ( t ) → x int in view of Proposition 4. If x ( t 1 ) ∈ ˆ P 11 , then x ( t ) enters D 14 after t = t 1 since x 1 x 2 > b 2 in ˆ P 11 and hence in vie w of Lemma 3, x 4 x 3 increases at ˆ P 11 . So x ( t ) → x 14 in view of Case 4). Similarly , it can be shown that if x ( t 1 ) ∈ ˆ P 22 , then x ( t ) → x 23 . Hence, if x (0) ∈ Y 14 , then x ( t ) conv erges to one of the points x 14 , x 23 or x int . The same can be sho wn for when x (0) ∈ Y 23 since X 123 6⊆ ∆ oo when R < T + S 2 . Moreo ver , the cases when x (0) belongs to one of the sets L int , ˆ P 11 , ˆ P 12 , ˆ P 21 or ˆ P 22 are already included in the arguments for Y 14 and Y 23 . Hence, x ( t ) con verges to one of x 14 , x 23 or x int . On the other hand, only for x (0) ∈ W s ( x int ) , x ( t ) → x int . Hence, Case 2) is prov en. Both x 14 and x 23 are asymptotically stable thanks to Propo- sition 7 and lemma 8. Denote their corresponding basin of attractions by B 14 and B 23 . Clearly B 14 and B 23 are disjoint. Define ˆ B 14 := b d( B 14 ) ∩ int(∆) and ˆ B 23 := b d( B 23 ) ∩ int(∆) . Consider a point x ∗ ∈ ˆ B 14 . The solution x ( t ) with the initial condition x ∗ , con ver ges to one of x 14 , x 23 or x int as it was shown above. Howe ver , x ( t ) 6→ x 14 since x ∗ 6∈ B 14 . Moreov er , x ∗ 6∈ B 23 since x ∗ ∈ b d( B 14 ) and B 14 ∩ B 23 = ∅ and B 23 is open. Hence, x ( t ) → x int . So x int lies on B 14 . The same can be shown for B 23 . Now both ˆ B 14 and ˆ B 23 are 2- dimensional in v ariant manifolds, and for any initial condition located on them, x ( t ) → x int . On the other hand, x int is hyperbolic in vie w of Theorem 4, and hence W s ( x int ) is the unique 2-dimensional inv ariant manifold passing through x int . Hence, ˆ B 14 and ˆ B 23 coincide and are equiv alent to W s ( x int ) . This proves Case 3) and hence the whole. An example of the two-dimensional stable manifold men- tioned in Theorem 1 is shown in Figure 1. For intermediate 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 0.2 0.4 0.6 0.8 1 x 23 x int x 14 p 4 p 1 p 2 p 3 Fig. 1. An example of the two-dimensional stable manifold mentioned in Theorem 1 for payof f values T = 6 , R = 4 , S = 3 , P = 2 and the number of repetitions m = 8 . The cyan points are samples of the stable manifold W s ( x int ) . values of R , the con ver gence results depend on whether m is odd or ev en. Therefore, two separate theorems are dedicated to these values. Theor em 2: Assume (2) holds. Let x (0) ∈ int(∆) . Assume m = 2 n, n ≥ 1 . It follo ws that 1) if R = T + S 2 , then lim t →∞ x ( t ) = x ∗ ∈ { x 14 , x int , p 2 } ; 2) if T + S 2 < R < n T +( n − 1) S 2 n − 1 , then lim t →∞ x ( t ) = x ∗ ∈ { x 14 , x int } ∪ ( X 12 ∩ ∆ N E ); 3) if R = n T +( n − 1) S 2 n − 1 , then lim t →∞ x ( t ) = x ∗ ∈ { x 14 } ∪ X 123 ∪ ( X 12 ∩ ∆ N E ) . Pr oof: In view of Lemma 4, b 1 , b 2 > 0 in Cases 1) and 2). Hence, by following the same steps as in the proof of Theorem 1, it can be shown that x ( t ) → x ∗ ∈ { x 14 , x 23 , x int } ∪ ( X 12 ∩ ∆ N E ) . Howe ver , x 23 6∈ ∆ o in view of Theorem 1 and hence x ( t ) 6→ x 23 . Then in view of Lemma 9, Case 2) is prov en. Moreov er , the fact that X 12 ∩ ∆ N E = { p 2 } for R = T + S 2 , prov es Case 1). For Case 3), b 2 > 0 , but b 1 = 0 in view of Lemma 4. Hence, int(∆) can be written as follo ws in t(∆) = D 14 ∪ Y 14 ∪ ˆ P 11 where ˆ P 11 is defined in (15). Then similar to the proof of Theorem 1, we arrive at the conclusion. Theor em 3: Assume (2) holds. Let x (0) ∈ int(∆) . Assume m = 2 n + 1 , n ≥ 1 . It follo ws that 1) if R = T + S 2 , then lim t →∞ x ( t ) = x ∗ ∈ { x 14 , x 23 } ∪ X 123 ; 2) if T + S 2 < R ≤ ( n +1) T + n S 2 n +1 , then lim t →∞ x ( t ) = x 14 ; Pr oof: In view of Lemma 4, b 2 > 0 ≥ b 1 in all cases. Hence, by following the same steps as in the proof of Case 3) of Theorem 2, it can be sho wn that x ( t ) → x ∗ ∈ { x 14 , x 23 }∪ ( X 12 ∩ ∆ N E ) ∪ X 123 where X 123 shows up only in Case 1) according to Proposition 1. Then according to Lemma 10 in Appendix C, X 12 ∩ ∆ N E = ∅ , which prov es Case 1) and Case 2) except for when R equals ( n +1) T + n S 2 n +1 . When the equality happens, X 12 ∩ ∆ N E = { p 2 } in vie w of Lemma 11 in Appendix C. Howe ver , in vie w of Lemma 3, b 1 ≤ 0 implies that x 1 x 2 monotonically increases. Hence, x 1 x 2 ( t ) > x 1 x 2 (0) for all t > 0 . On the other hand, x 1 x 2 (0) > p 2 since x (0) ∈ in t(∆) . Hence, x ( t ) 6→ p 2 , which completes the proof. Theor em 4: Assume (2) holds. Let x (0) ∈ int(∆) . If max n d m − 2 2 e S + b m 2 c T m − 1 , d m 2 e T + b m 2 c S m o < R < T , then lim t →∞ x ( t ) = x ∗ ∈ { x 14 } ∪ ( X 12 ∩ ∆ N E ) . Pr oof: The proof is similar to that of Case 2) in Theorem 3. Note that each equilibrium on X 34 performs as an α - limit point in the case of Theorem 4. The integration of the con ver gence results when the initial condition is in the interior of the simplex and when it is on the boundary of the simple x, yields the following corollary . Cor ollary 1: Assume (2) holds. For any initial condition x (0) ∈ ∆ , the solution x ( t ) of the replicator dynamics (4), con ver ges to a point in ∆ as time goes to infinity . Therefore, no limit cycle or strange attractor can take place in the dynamics, and we al ways ha ve con vergence to a point. E. Discussion Now that we kno w the asymptotic behavior of the replicator dynamics (4) for all range of payoffs, we can proceed to the interpretation of the results in terms of the indi viduals playing the four types of strategies. W e use two performance measures to compare the population at dif ferent states x . The first one is averag e population payoff x > Ax . The second is av erage number of times cooperation is played in the population, which we call the average cooperation le vel and denote by x C := X i,j ∈{ 1 ,..., 4 } x i x j C ij 2 m where C ij is the number of times cooperation is played in the m rounds when two indi viduals playing the strategies corresponding to indices i and j are matched to play the repeated game G m . As an illustration, C 11 = 2 m , as both matched ALLC players cooperate in every m rounds, and C 14 = C 41 = m as only the ALLC player cooperates when matched with an ALLD player . Moreo ver , the av erage cooperation level at x 14 is S − P S − P + T − R since only ALLC players cooperate, and reaches 1 at any state in X 12 since both ALLC and T F T players cooperate. 1 1.5 2 2.1 2.5 3 3.5 0 5 10 15 20 25 30 35 40 45 50 average population payoff data1 Fig. 2. A verage population payoff at the equilibria as a function of R . Other parameters are set to be T = 3 , S = 1 , P = 0 and m = 6 . Now consider a population where the portions of indi viduals playing ALLC , T F T , S T F T and ALLD are all nonzero. For small values of R , i.e., less than the av erage of T and S , almost always the population con ver ges to one of the following states: i) x 14 that is a mixed population of ALLC and ALLD players or ii) x 23 that is a mixed population of T F T and S T F T players. Both states are ev olutionary (and hence asymptotically) stable (see Appendix-B). Therefore, ev olutionary forces select against any mutant population at these two states. Moreover , for a zero-measure set of initial states, the population con ver ges to x int where all four types of players are present. Howe ver , x int is unstable and small perturbations can lead the population to one of x 14 and x 23 . Between the two, x 23 has a higher a verage population payoff since ( x 23 ) > Ax 23 − ( x 14 ) > Ax 14 = ( m ( S − T ) 2 4( T − R + S − P ) > 0 m = 2 n, n ≥ 1 ( m 2 − 1)( S − T ) 2 4 m ( T − R + S − P ) > 0 m = 2 n + 1 , n ≥ 1 (see Figure 2) as well as a higher cooperation level since x 23 C − x 14 C = ( T − S 2( T − R + S − P ) > 0 m = 2 n, n ≥ 1 ( m − 1)( T − S ) 2 m ( T − R + S − P ) > 0 m = 2 n + 1 , n ≥ 1 . Now if the base game is repeated for even number of times, as R increases, the state x 23 mov es tow ards p 2 where only T F T players are present. When R equals the average of T and S , x 23 coincides with p 2 , and hence S T F T players stand out (except for those zero-measure initial conditions that lead to x int ). As R further increases, the single equilibrium state p 2 is expanded to the set of a continuum of equilibria X 12 ∩ ∆ N E . Therefore, the population either con ver ges to x 14 where ALLC and ALLD players coexist or to a state where ALLC and T F T players coexist. As one would expect, any equilibrium x α ∈ X 12 ∩ ∆ N E outperforms x 14 in terms of both a verage population payoff and average cooperation level as ( x α ) > Ax α − ( x 14 ) > Ax 14 = m ( T − R )( R − S ) ( T − R + S − P ) > 0 , and that x α has the highest possible av erage cooperation lev el x α C = 1 . At the same time, x int is moving tow ards the face ∆( p 1 , p 2 , p 3 ) , and when R equals n T +( n − 1) S 2 n − 1 , x int lies on X 123 where ALLC , T F T and S T F T players coexist. If the base game is repeated for odd number of times, S T F T players surviv e for a greater range of R . This time for R being equal to T + S 2 , x int lies on X 123 . Then suddenly , by a small increment in R , the set X 123 disappears, and no population con ver ges to x 23 . Therefore, starting from any initial condition, the population con ver ges to the polymorphic population of ALLC and ALLD players x 14 . This is the only situation where both conditional strategies T F T and S T F T are wiped out of the population, and is continued up to when R equals ( n +1) T + n S 2 n +1 . It can be verified that both the a verage population payof f and cooperation level at x 14 monotonically increase in R . Therefore, as e xpected, increasing R results in a more profitable and cooperati ve long-term population. When R further increases, the beha vior of the system is almost the same for both odd and ev en m . The population either con ver ges to x 14 where ALLC and ALLD players coexist or to an equilibrium on X 12 ∩ ∆ N E where ALLC and T F T players coexist. So the suspiciousness of S T F T players wipes them out from the population. Moreover , as R increases, x 14 gets closer to p 1 where all indi viduals are ALLC players. In general, perhaps S T F T can be considered as the worst strategy in terms of surviv al especially for R > T + S 2 . Con versely , regardless of the payoffs, there always exists a set of initial conditions for which ALLC players show up in the long run. Moreover , in addition to x 14 , all the limit states in X 12 ∩ ∆ N E (except for p 2 ) hav e a nonzero portion of ALLC players. This, surprisingly , makes the simple unconditional ALLC strategy perhaps the most rob ust in terms of surviv al and appearance in the long run. This may explain the existence of individuals who unconditionally cooperate in real-life scenarios that can be captured by repeated snowdrift games. Interestingly , x 14 is always an ev olutionary (and asymptot- ically) stable state of the system, regardless of the payoffs. This state consists of S − P S − P + T − R ALLC players that can be considered as cooperators and T − R S − P + T − R ALLD players that can be considered as defectors. On the other hand, the unique ev olutionarily stable state of the base game consists of S − P S − P + T − R C players, i.e., cooperators, and T − R S − P + T − R D players, i.e., defectors. Thus, the repetition of the base game and the introduction of the tw o conditional strate gies T F T and S T F T , does not eliminate or even change this ev olutionarily stable mixture of cooperators and defectors, but adds some new more-cooperati ve final states such as those on X 12 . Moreov er , since x 14 C = S − P S − P + T − R and x α C = 1 for any x α ∈ X 12 , adding enough T F T players to a population of ALLC and ALLD players can dramatically increase the av erage le vel of cooperation, if R is large enough. The claim does not change when S T F T players are also present in the population. More specifically , if R is greater than the lower bound provided in Theorem 4, and x 2 (0) , the initial portion of T F T players is large enough so that x (0) belongs to the basin of attraction of X 12 , then the population state conv erges to a point on X 12 that has a higher average population payoff and cooperation lev el. The con vergence analysis also rev eals how the average cooperation le vel changes as R increases. P articularly , in the presence of the four types of players, increments in R make the final population more probable to become completely cooperativ e. I V . C O N C L U D I N G R E M A R K S Our analysis highlights repetition as a mechanism that promotes cooperation among selfish individuals in snowdrift social dilemmas. Unlike the trend of research on repeated games that allows for a wide range of complicated and uncom- mon reactive strategies, we have limited them to four typical ones. This pro vides a more realistic setup for human societies [61]. On the other hand, we have modelled the ev olutions of the players’ population portions by replicator dynamics which well approximate the behavior of well-mix ed large populations gov erned by the proportional imitation update rule [23]. Gi ven all this, we show that for large well-mixed populations of imitativ e individuals who play sno wdrift games, repeating the game and the introduction of the conditional strategy T F T promotes cooperation. Howe ver , this is not because T F T players are long-term dominants as often reported in repeated prisoner’ s dilemma, but because they lead to more-cooperativ e final population states which are also more profitable. This promotion of cooperation is preserved e ven if some of the T F T players start their interactions suspiciously and defect initially; that is, if there are also some S T F T players in the population. Indeed, for low v alues of re ward R , such players surviv e, yet for high rew ards they become extinct. Finally , those who always cooperate regardless of their opponents’ mov es ha ve a high chance of surviv al, which may explain the observation of such behaviors in real life. Our main technical contribution is the study of the ratios of the state v ariables of the replicator dynamics and to show their monotonicity o ver time. Such a technique has the potential to be applied to other replicator dynamics whose payof f matrix has repeated entries. W e, finally , emphasize that all payoffs and e ven the number of repetitions are parameters, which not only giv es rise to the above outcomes, but more importantly provides a parametric framew ork to control the final av erage population payoff and cooperation lev el. This simply can be done by tuning the parameters according to the con ver gence results in Theorems 2 to 4. A P P E N D I X A. Lemma 7 Lemma 7: The replicator dynamics (4) are inv ariant under the addition of a constant to all of the entries of a column of the payoff matrix A . Pr oof: See [24, Section 3.1.2]. B. Evolutionary Stability: Pr oof of Pr oposition 3 A state x ∈ ∆ is said to be an evolutionarily stable state (strate gy) (ESS) of A if it satisfies the follo wing tw o conditions [23, pp. 81]: x > Ax ≥ y > Ax ∀ y ∈ ∆ , (16) [ x > Ax = y > Ax and y 6 = x ] ⇒ x > Ay > y > Ay . (17) The set of all ev olutionarily stable states is denoted by ∆ E S S . Lemma 8 (Pr oposition 3.10 in [24]): Every x ∈ ∆ E S S is asymptotically stable under the replicator dynamics (4). Pr oposition 7: x 14 ∈ ∆ E S S . Moreover , x 23 ∈ ∆ E S S if R < T + S 2 . Pr oof: The result for x 14 is pro ven in the follo wing, and that for x 23 can be done similarly . Consider Ax 14 = 1 a 0 14 + a 0 41 a 0 14 a 0 41 a 0 24 a 0 41 a 0 14 a 0 31 a 0 14 a 0 41 > . In view of Lemma 2, a 0 41 > a 0 31 ≥ 0 and a 0 14 > a 0 24 ≥ 0 . Hence, a 0 14 a 0 41 > a 0 24 a 0 41 , a 0 14 a 0 41 , implying that the maximum element of Ax 14 is a 0 14 a 0 41 . Hence, any y ∈ ∆ satisfying y 2 , y 3 = 0 maximizes y > Ax 14 . So x 14 > Ax 14 is the maximum of y > Ax 14 , which implies that (16) is in force. On the other hand, if for some y ∈ ∆ , x 14 > Ax 14 = y > Ax 14 , then y maximizes y > Ax 14 . Such a y satisfies y 2 , y 3 = 0 , which results in y > Ax 14 = a 0 14 2 y 4 + a 0 41 2 y 1 a 0 14 + a 0 41 , y > Ax 14 = ( a 0 14 + a 0 41 ) y 1 y 4 . (18) On the other hand, [ y 4 ( a 0 14 + a 0 41 ) − a 0 41 ] 2 ≥ 0 ⇐ ⇒ a 0 14 2 y 4 + a 0 41 2 (1 − y 4 ) ≥ ( a 0 14 + a 0 41 ) 2 y 4 (1 − y 4 ) ⇐ ⇒ a 0 14 2 y 4 + a 0 41 2 y 1 a 0 14 + a 0 41 ≥ ( a 0 14 + a 0 41 ) y 1 y 4 . Hence, in vie w of (18), x 14 > Ay ≥ y > Ay . Howe ver , the equality holds only when [ y 4 ( a 0 14 + a 0 41 ) − a 0 41 ] 2 = 0 ⇒ y 4 = a 0 41 a 0 14 + a 0 41 , y 1 = a 0 14 a 0 14 + a 0 41 ⇒ y = x 14 . Hence, x 14 > Ax 14 > y > Ax 14 for all y 6 = x 14 . So (17) is true, implying x 14 ∈ ∆ E S S . C. Nash equilibria and their r elation to con verg ence points Call a trajectory x ( t ) an interior trajectory , if x (0) ∈ in t(∆) . When inv estigating the final state (con ver gence point) of an interior trajectory , several equilibrium points often show up as possible candidates. In what follows, a kno wn game the- oretical result is revie wed to confine the possible candidates. Define ∆ N E , the subset of strategies (states) that are in Nash equilibrium with themselves [24, Section 1.5.2], by ∆ N E = x ∈ ∆ | x > Ax ≥ y > Ax ∀ y ∈ ∆ . Lemma 9: ([24, Proposition 3.5]) If an interior trajectory x ( t ) con ver ges to a point x ∗ , then x ∗ ∈ ∆ N E . Similar to Lemma 7, it can be easily verified that ∆ N E is in v ariant under the addition of a constant to all of the entries of a column of the payoff matrix A . Hence, we change A in the definition of ∆ N E with the more simple-structure payoff matrix A 0 in future deriv ations. The following lemma rev eals those points of X 12 and X 34 that belong to ∆ N E . Lemma 10: Assume (2) holds. Then X 34 ∩ ∆ N E = ∅ . Moreov er , • if S < R < T + S 2 or m = 2 n + 1 , n ≥ 1 and T + S 2 ≤ R < ( n +1) T + n S 2 n +1 , then X 12 ∩ ∆ N E = ∅ ; • if m = 2 n + 1 , n ≥ 1 and R = ( n +1) T + n S 2 n +1 , or m = 2 n, n ≥ 1 and R = T + S 2 , then X 12 ∩ ∆ N E = { p 2 } ; • if m = 2 n + 1 , n ≥ 1 and ( n +1) T + n S 2 n +1 < R < T , or m = 2 n, n ≥ 1 and T + S 2 < R < T , then X 12 ∩ ∆ N E = n αp 1 + (1 − α ) p 2 α ∈ h 0 , min n m R −d m 2 e T −b m 2 c S T − R , m R − T − ( m − 1) P m ( T − R ) , 1 oio . Pr oof: Let x ∈ X 34 . Then A 0 x = a 0 13 x 3 + a 0 14 x 4 a 0 23 x 3 + a 0 24 x 4 0 0 > . So based on the definition of ∆ N E , x ∈ ∆ N E ⇐ ⇒ a 0 13 x 3 + a 0 14 x 4 ≤ 0 and a 0 23 x 3 + a 0 24 x 4 ≤ 0 . Howe ver , in view of Lemma 2, a 0 13 , a 0 14 , a 0 23 , a 0 24 > 0 . Hence, because of x 3 + x 4 = 1 and x 3 , x 4 ≥ 0 , it can be concluded that x 6∈ ∆ N E . Now let x ∈ X 12 . Then A 0 x = 0 0 a 0 31 x 1 + a 0 32 x 2 a 0 41 x 1 + a 0 42 x 2 > . Then based on the definition of ∆ E S S , we have x ∈ ∆ N E ⇐ ⇒ a 0 31 x 1 + a 0 32 x 2 ≤ 0 and a 0 41 x 1 + a 0 42 x 2 ≤ 0 . Moreov er , a 0 41 , a 0 31 > 0 in vie w of Lemma 2. So x ∈ ∆ N E ⇐ ⇒ x 1 + a 0 32 a 0 31 x 2 ≤ 0 and x 1 + a 0 42 a 0 41 x 2 ≤ 0 ⇐ ⇒ 0 ≤ x 1 ≤ min − a 0 32 a 0 31 , − a 0 42 a 0 41 , x 1 ≤ 1 . Hence, if min n − a 0 32 a 0 31 , − a 0 42 a 0 41 o < 0 , then x 6∈ ∆ N E . Otherwise, x = αp 1 + (1 − α ) p 2 where α ∈ h 0 , min n − a 0 32 a 0 31 , − a 0 42 a 0 41 , 1 oi . Substituting the v alues of a 0 ij from A 0 in the abo ve equation completes the proof. The following lemma reveals those singleton boundary equilibria that belong to ∆ N E . Lemma 11: x 13 , x 24 6∈ ∆ N E and x 14 ∈ ∆ N E . Moreover , if S < R < T + S 2 , or m = 2 n + 1 , n ≥ 1 and R = T + S 2 , then x 23 ∈ ∆ N E . Otherwise, x 23 6∈ ∆ N E . Pr oof: The sign-structure of A 0 x 13 is of the form + + + ++ > . Hence, ( p 4 ) > A 0 x 13 > x 13 > A 0 x 13 . Hence, x 13 6∈ ∆ N E by definition. Similarly x 24 6∈ ∆ N E can be sho wn. Now the result for x 23 is proven and that for x 14 can be done similarly . Define z := A 0 x 23 = a 0 13 x 3 a 0 23 x 3 a 0 32 x 2 a 0 42 x 2 > . Let S < R < T + S 2 or m = 2 n + 1 , n ≥ 1 and R = T + S 2 . In view of Lemma 2, a 0 32 > a 0 42 and hence z 3 > z 4 . Similarly , z 2 ≥ z 1 . Moreover , it can be verified that z 2 = z 3 . Hence, z 2 , z 3 = max i ∈{ 1 ,..., 4 } z i . Hence, any x ∈ ∆( p 2 , p 3 ) , maximizes x > z = x > A 0 x 23 ov er ∆ . Hence, since x 23 ∈ ∆( p 2 , p 3 ) , it holds that x 23 > A 0 x 23 ≥ y > A 0 x 23 for all y ∈ ∆ . Hence, x 23 ∈ ∆ N E . For all other pay- offs, either x 23 6∈ ∆ or z 1 > z 2 . The first case clearly implies x 23 6∈ ∆ N E . For the second case, ( p 1 ) > A 0 x 23 > x 23 > A 0 x 23 , which rules out x 23 from ∆ N E . D. Con ver gence to a line segment implies con ver gence to a (set of continuum) stationary point(s) In the analysis of Section III-D1, we often face the situation where we know that the trajectory conv erges to a line segment. Howe ver , for completeness of our con ver gence results, we need to know whether the omega limit set of the trajectory is the whole line segment or just some parts of it. For this purpose, we use a theorem showing that if a trajectory con ver ges to a line segment, it con ver ges to an equilibrium point or a continuum of equilibria on that line segment. Consider the function y : R → R n and the set S ⊆ R n . The expression y ( t ) → S implies that for any > 0 , there exists some M > 0 such that t > M ⇒ inf s ∈S k y ( t ) − s k < where k · k denotes an arbitrary norm in R n . Theor em 5 (reformulation of Cor ollary 1 in [68]): Consider the C r , r ≥ 1 , vector field ˙ y = f ( y ) , y ∈ R n . If y ( t ) con- ver ges to a compact simple open curve L , then y ( t ) con ver ges to an equilibrium point or a continuum of equilibrium points on L . A C K N O W L E D G M E N T S W e would like to thank Dr . Hildeberto Jard ´ on-K ojakhmetov for his technical discussions. R E F E R E N C E S [1] J. R. Marden and M. Effros, “The price of selfishness in network coding, ” IEEE T ransactions on Information Theory , vol. 58, no. 4, pp. 2349–2361, 2012. [2] A. Cort ´ es and S. Martinez, “Self-triggered best-response dynamics for continuous games, ” IEEE T ransactions on A utomatic Contr ol , v ol. 60, no. 4, pp. 1115–1120, 2015. [3] N. Li and J. R. Marden, “Designing games for distributed optimization, ” Selected T opics in Signal Pr ocessing, IEEE Journal of , vol. 7, no. 2, pp. 230–242, 2013. [4] J. R. Marden and T . Roughgarden, “Generalized efficiency bounds in distributed resource allocation, ” IEEE T ransactions on Automatic Contr ol , vol. 59, no. 3, pp. 571–584, 2014. [5] N. Li and J. R. Marden, “Decoupling coupled constraints through utility design, ” IEEE T ransactions on Automatic Contr ol , vol. 59, no. 8, pp. 2289–2294, 2014. [6] E. Altman and Y . Hayel, “Markov decision ev olutionary games, ” IEEE T ransactions on Automatic Contr ol , vol. 55, no. 7, pp. 1560–1569, 2010. [7] H. S. Chang, S. Marcus et al. , “T wo-person zero-sum markov games: receding horizon approach, ” IEEE Tr ansactions on Automatic Control , vol. 48, no. 11, pp. 1951–1961, 2003. [8] P . Wiecek, E. Altman, and Y . Hayel, “Stochastic state dependent population games in wireless communication, ” IEEE T ransactions on Automatic Contr ol , vol. 56, no. 3, pp. 492–505, 2011. [9] E. Altman and E. Solan, “Constrained games: the impact of the attitude to adversary’ s constraints, ” IEEE T ransactions on Automatic Control , vol. 54, no. 10, pp. 2435–2440, 2009. [10] J. R. Marden, G. Arslan, and J. S. Shamma, “Joint strategy fictitious play with inertia for potential games, ” IEEE T ransactions on Automatic Contr ol , vol. 54, no. 2, pp. 208–220, 2009. [11] J. S. Shamma and G. Arslan, “Dynamic fictitious play , dynamic gradient play , and distributed conv ergence to nash equilibria, ” IEEE Tr ansactions on Automatic Control , vol. 50, no. 3, pp. 312–327, 2005. [12] S. D. Bopardikar , A. Borri, J. P . Hespanha, M. Prandini, and M. D. Di Benedetto, “Randomized sampling for large zero-sum games, ” Auto- matica , vol. 49, no. 5, pp. 1184–1194, 2013. [13] P . Guo, Y . W ang, and H. Li, “ Algebraic formulation and strategy optimization for a class of ev olutionary networked games via semi- tensor product method, ” Automatica , vol. 49, no. 11, pp. 3384–3389, 2013. [14] P . Frihauf, M. Krstic, and T . Ba c s ar , “Nash equilibrium seeking in non- cooperativ e games, ” IEEE T ransactions on Automatic Control , vol. 57, no. 5, pp. 1192–1207, 2012. [15] M. S. Stankovi ´ c, K. H. Johansson, and D. M. Stipanovi ´ c, “Distributed seeking of nash equilibria with applications to mobile sensor networks, ” IEEE T ransactions on Automatic Contr ol , vol. 57, no. 4, pp. 904–919, 2012. [16] K. G. V amvoudakis, J. P . Hespanha, B. Sinopoli, and Y . Mo, “Detection in adversarial en vironments, ” IEEE T ransactions on Automatic Contr ol , vol. 59, no. 12, pp. 3209–3223, 2014. [17] J. R. Marden, “State based potential games, ” Automatica , v ol. 48, no. 12, pp. 3075–3088, 2012. [18] T . Mylv aganam, M. Sassano, and A. Astolfi, “Constructi ve-nash equi- libria for nonzero-sum dif ferential games, ” IEEE T ransactions on Auto- matic Control , vol. 60, no. 4, pp. 950–965, 2015. [19] B. Gharesifard and J. Cort ´ es, “Evolution of players’ misperceptions in hypergames under perfect observations, ” IEEE T ransactions on Auto- matic Control , vol. 57, no. 7, pp. 1627–1640, 2012. [20] P . Ramazi and M. Cao, “ Asynchronous decision-making dynamics under best-response update rule in finite heterogeneous populations, ” IEEE T ransactions on Automatic Control , 2017. [21] P . van den Berg, L. Molleman, and F . J. W eissing, “Focus on the success of others leads to selfish behavior , ” Pr oceedings of the National Academy of Sciences , vol. 112, no. 9, pp. 2912–2917, 2015. [22] M. A. Nowak, Evolutionary Dynamics: Exploring the Equations of Life . Harvard Univ ersity Press, 2006. [23] W . H. Sandholm, P opulation Games and Evolutionary Dynamics . MIT Press, 2010. [24] J. W . W eibull, Evolutionary Game Theory . MIT Press, 1997. [25] P . Ramazi, J. Hessel, and M. Cao, “How feeling betrayed af fects cooperation, ” PLoS ONE , vol. 10, no. 4, p. e0122205, 2015. [26] P . Ramazi, J. Riehl, and M. Cao, “Networks of conforming or noncon- forming individuals tend to reach satisfactory decisions, ” Proceedings of the National Academy of Sciences , v ol. 113, no. 46, pp. 12 985–12 990, 2016. [27] D. G. Rand, M. A. Nowak, J. H. Fowler , and N. A. Christakis, “Static network structure can stabilize human cooperation, ” Pr oceedings of the National Academy of Sciences , vol. 111, no. 48, pp. 17 093–17 098, 2014. [28] V . A. Jansen and M. V an Baalen, “ Altruism through beard chromody- namics, ” Natur e , vol. 440, no. 7084, pp. 663–666, 2006. [29] P . Ramazi, M. Cao, and F . J. W eissing, “Evolutionary dynamics of homophily and heterophily , ” Scientific r eports , vol. 6, 2016. [30] P . van den Berg, L. Molleman, and F . J. W eissing, “The social costs of punishment, ” Behavioral and Br ain Sciences , vol. 35, no. 01, pp. 42–43, 2012. [31] P . Ramazi and M. Cao, “ Analysis and control of strategic interactions in finite heterogeneous populations under best-response update rule, ” in Decision and Control (CDC), 2015 IEEE 54th Annual Confer ence on . IEEE, 2015, pp. 4537–4542. [32] M. V an V eelen, J. Garc ´ ıa, D. G. Rand, and M. A. Nowak, “Direct reciprocity in structured populations, ” Pr oceedings of the National Academy of Sciences , vol. 109, no. 25, pp. 9929–9934, 2012. [33] R. L. T rivers, “The evolution of reciprocal altruism, ” Quarterly r eview of biology , pp. 35–57, 1971. [34] M. Nowak and K. Sigmund, “The ev olution of stochastic strategies in the prisoner’ s dilemma, ” Acta Applicandae Mathematicae , vol. 20, no. 3, pp. 247–265, 1990. [35] C. A. Ioannou, “ Asymptotic behavior of strategies in the repeated pris- oner’ s dilemma game in the presence of errors, ” Artificial Intelligence Resear ch , vol. 3, no. 4, p. p28, 2014. [36] C. Hilbe, M. A. Nowak, and K. Sigmund, “Evolution of extortion in iterated prisoner’ s dilemma games, ” Proceedings of the National Academy of Sciences , vol. 110, no. 17, pp. 6913–6918, 2013. [37] J. Gruji ´ c, B. Eke, A. Cabrales, J. A. Cuesta, and A. S ´ anchez, “Three is a crowd in iterated prisoner’ s dilemmas: experimental evidence on reciprocal behavior , ” Scientific Reports , vol. 2, 2012. [38] J. Lorberbaum, “No strategy is ev olutionarily stable in the repeated prisoner’ s dilemma, ” J ournal of Theoretical Biology , vol. 168, no. 2, pp. 117–130, 1994. [39] L. A. Imhof, D. Fudenberg, and M. A. Nowak, “Evolutionary cycles of cooperation and defection, ” Proceedings of the National Academy of Sciences of the United States of America , vol. 102, no. 31, pp. 10 797– 10 800, 2005. [40] H. Qi, S. Ma, N. Jia, and G. W ang, “Experiments on individual strategy updating in iterated snowdrift game under random rematching, ” Journal of Theoretical Biology , vol. 368, pp. 1–12, 2015. [41] R. K ¨ ummerli, C. Colliard, N. Fiechter, B. Petitpierre, F . Russier , and L. Keller , “Human cooperation in social dilemmas: comparing the snowdrift game with the prisoner’ s dilemma, ” Proceedings of the Royal Society of London B: Biological Sciences , vol. 274, no. 1628, pp. 2965– 2970, 2007. [42] C. W ang, B. W u, M. Cao, and G. Xie, “Modified snowdrift games for multi-robot water polo matches, ” in Pr oc. of the 24th Chinese Control and Decision Confer ence (CCDC) , 2012, pp. 164–169. [43] C. Hauert and M. Doebeli, “Spatial structure often inhibits the evolution of cooperation in the sno wdrift game, ” Nature , vol. 428, no. 6983, pp. 643–646, 2004. [44] M. Doebeli, C. Hauert, and T . Killingback, “The ev olutionary origin of cooperators and defectors, ” Science , vol. 306, no. 5697, pp. 859–862, 2004. [45] C. Hauert, F . Michor , M. A. Nowak, and M. Doebeli, “Synergy and discounting of cooperation in social dilemmas, ” Journal of Theoretical Biology , vol. 239, no. 2, pp. 195–202, 2006. [46] D. Madeo and C. Mocenni, “Game interactions and dynamics on net- worked populations, ” IEEE T ransactions on Automatic Control , vol. 60, no. 7, pp. 1801–1810, 2014. [47] J. Qin, Y . Chen, W . Fu, Y . Kang, and M. Perc, “Neighborhood diversity promotes cooperation in social dilemmas, ” IEEE Access , vol. 6, pp. 5003–5009, 2017. [48] R. Ax elrod, “Effecti ve choice in the prisoner’ s dilemma, ” J ournal of conflict resolution , vol. 24, no. 1, pp. 3–25, 1980. [49] ——, “More effecti ve choice in the prisoner’ s dilemma, ” Journal of Conflict Resolution , vol. 24, no. 3, pp. 379–403, 1980. [50] F . Dubois and L.-A. Giraldeau, “The foragers dilemma: Food sharing and food defense as risk-sensitive foraging options, ” The American Naturalist , vol. 162, no. 6, pp. 768–779, 2003. [51] N. Ben Khalifa, R. El-Azouzi, and Y . Hayel, “Delayed evolutionary game dynamics with non-uniform interactions in two communities, ” in Decision and Contr ol (CDC), 2014 IEEE 53rd Annual Confer ence on . IEEE, 2014, pp. 3809–3814. [52] V . S. Borkar and P . R. Kumar , “Dynamic cesaro-wardrop equilibration in networks, ” IEEE T ransactions on Automatic Contr ol , vol. 48, no. 3, pp. 382–396, 2003. [53] I. Brunetti, Y . Hayel, and E. Altman, “State policy couple dynamics in evolutionary games, ” in American Control Confer ence (A CC), 2015 . IEEE, 2015, pp. 1758–1763. [54] B. Drighes, W . Krichene, and A. Bayen, “Stability of nash equilibria in the congestion game under replicator dynamics, ” in Decision and Contr ol (CDC), 2014 IEEE 53rd Annual Conference on . IEEE, 2014, pp. 1923–1929. [55] J. Hofbauer and K. Sigmund, Evolutionary games and population dynamics . Cambridge University Press, 1998. [56] O. Diekmann and S. v an Gils, “On the c yclic replicator equation and the dynamics of semelparous populations, ” SIAM Journal on Applied Dynamical Systems , vol. 8, no. 3, pp. 1160–1189, 2009. [57] E. Zeeman and M. Zeeman, “From local to global behavior in competi- tiv e lotka-volterra systems, ” Tr ansactions of the American Mathematical Society , pp. 713–734, 2003. [58] I. M. Bomze, “Lotka-V olterra equation and replicator dynamics: ne w issues in classification, ” Biological Cybernetics , vol. 72, no. 5, pp. 447– 453, 1995. [59] R. Selten and P . Hammerstein, “Gaps in harley’ s argument on ev olution- arily stable learning rules and in the logic of ?tit for tat?” Behavior al and Brain Sciences , vol. 7, no. 1, pp. 115–116, 1984. [60] J. Garc ´ ıa and M. van V eelen, “In and out of equilibrium i: Evolution of strategies in repeated games with discounting, ” Journal of Economic Theory , vol. 161, pp. 161–189, 2016. [61] L. Samuelson and J. M. Swinkels, “Ev olutionary stability and lexico- graphic preferences, ” Games and Economic Behavior , vol. 44, no. 2, pp. 332–342, 2003. [62] G. Theodorakopoulos, J.-Y . Le Boudec, and J. S. Baras, “Selfish response to epidemic propagation, ” IEEE T ransactions on Automatic Contr ol , vol. 58, no. 2, pp. 363–376, 2012. [63] G. Obando, A. Pantoja, and N. Quijano, “Building temperature control based on population dynamics, ” IEEE T ransactions on Contr ol Systems T echnolo gy , vol. 22, no. 1, pp. 404–412, 2013. [64] P . Wiecek, E. Altman, and Y . Hayel, “Stochastic state dependent population games in wireless communication, ” IEEE T ransactions on Automatic Contr ol , vol. 56, no. 3, pp. 492–505, 2010. [65] S. W iggins, Intr oduction to applied nonlinear dynamical systems and chaos . Springer Science & Business Media, 2003, vol. 2. [66] P . Ramazi and M. Cao, “Stability analysis for replicator dynamics of ev olutionary sno wdrift games, ” in Decision and Control (CDC), 2014 IEEE 53rd Annual Conference on . IEEE, 2014, pp. 4515–4520. [67] I. M. Bomze, “Lotka-V olterra equation and replicator dynamics: a two- dimensional classification, ” Biolo gical Cybernetics , vol. 48, no. 3, pp. 201–211, 1983. [68] P . Ramazi, H. Jard ´ on-K ojakhmetov , and M. Cao, “Limit sets of trajecto- ries conv erging to curves, ” Applied Mathematics Letters , under revie w . Pouria Ramazi received the B.S. degree in electrical engineering in 2010 from University of T ehran, Iran, the M.S. degree in systems, control and robotics in 2012 from Royal Institute of T echnology , Sweden, and the Ph.D. degree in systems and control in 2017 from the University of Groningen, the Netherlands. He is currently a joint Postdoctoral Research As- sociate with the Departments of Mathematical and Statistical Sciences and Computing Sciences of the Univ ersity of Alberta. Ming Cao is currently a professor of systems and control with the Engineering and T echnology In- stitute (ENTEG) at the Univ ersity of Groningen, the Netherlands, where he started as a tenure-track assistant professor in 2008. He recei ved the Bachelor degree in 1999 and the Master degree in 2002 from Tsinghua Uni versity , Beijing, China, and the PhD degree in 2007 from Y ale University , New Hav en, CT , USA, all in electrical engineering. From September 2007 to August 2008, he was a post- doctoral research associate with the Department of Mechanical and Aerospace Engineering at Princeton Univ ersity , Princeton, NJ, USA. He worked as a research intern during the summer of 2006 with the Mathematical Sciences Department at the IBM T . J. W atson Research Center , NY , USA. He is the 2017 and inaugural recipient of the Manfred Thoma medal from the International Federation of Automatic Control (IF A C) and the 2016 recipient of the European Control A ward sponsored by the European Control Association (EUCA). He is an associate editor for IEEE Transactions on Automatic Control, IEEE Transactions on Circuits and Systems and Systems & Control Letters, and for the Conference Editorial Board of the IEEE Control Systems Society . He is also a member of the IF AC T echnical Committee on Networked Systems. His main research interest is in autonomous agents and multi-agent systems, mobile sensor networks and complex networks.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment