How cooperation emerges and persists in a population of selfish agents is a fundamental question in evolutionary game theory. Our research shows that Collective Strategies with Master-Slave Mechanism (CSMSM) defeat Tit-for-Tat and other well-known strategies in spatial iterated prisoner's dilemma. A CSMSM identifies kin members by means of a handshaking mechanism. If the opponent is identified as non-kin, a CSMSM will always defect. Once two CSMSMs meet, they play master and slave roles. A mater defects and a slave cooperates in order to maximize the master's payoff. CSMSM outperforms non-collective strategies in spatial IPD even if there is only a small cluster of CSMSMs in the population. The existence and performance of CSMSM in spatial iterated prisoner's dilemma suggests that cooperation first appears and persists in a group of collective agents.
The Prisoner's Dilemma is a two-player non-zero-sum game in which two players try to maximize their payoffs by cooperating with or betraying the other player. In the classical version of prisoner's dilemma, each player chooses between two strategies, Cooperate (C) and Defect (D). Their payoffs can be represented by the matrix shown in Figure 1.
In the payoff matrix, R, S, T, and P denote Reward for mutual cooperation, Sucker’s payoff, Temptation to defect, and Punishment for mutual defection respectively, and T > R > P > S. The constraint motivates each player to play non-cooperatively.
When both players are rational and they make their choice independently, the theoretical outcome of the game is a Nash equilibrium, in which both players choose to defect, and each receives a ‘Punishment for mutual defection’. It is worse for each player than the outcome they would have received if they had cooperated. [1,2] In the Iterated Prisoner’s Dilemma (IPD), two players have to choose their mutual strategy repeatedly, and they also have memory of their previous behaviors and the behaviors of the opponents. There is R > 1 2(S +T), which is set to prevent any incentive to alternate between cooperation and defection. IPD is considered to be an ideal experimental platform for the evolution of cooperation among selfish individuals and it has attracted wide interest since Robert Axelrod’s IPD tournaments and his book ‘The Evolution of Cooperation’ [3].
If the precise length of an IPD is known to the players, the best strategy for both players is to defect in each move. This is a conclusion from backward induction: both players will choose to defect in the final iteration because the opponent will not be able to subsequently punish the player. Given mutual defection in the final iteration, the optimal strategy in the penultimate iteration is defection for both players, and so on, back to the initial iteration. If the precise length of an IPD is infinite or unknown, mutual cooperation can also be equilibrium.
Axelrod was the first to study efficient IPD strategies by means of competitions [4,5]. TFT always cooperates in the first move and then mimics whatever the opponent did in the previous move. According to Axelrod, several characteristics make TFT successful: TFT is Nice, Retaliating and Forgiving. TFT is not a Nash equilibrium and there is always a sub-game perfect equilibrium that dominates TFT, according to the Folk Theorem in game theory [6,7]. On the other hand, whether or not TFT is the most efficient strategy in IPD is still unclear. Some strategies perform better than TFT in specific environments [8][9][10][11][12]. Therefore, researchers are attempting to develop novel strategies that can outperform TFT either in round-robin tournaments or in evolutionary dynamics.
In recent IPD competitions, strategies have appeared with identification mechanisms. With a rule-based identification mechanism, a strategy called APavlov won competition four of the 2005 IPD competition [13]. Furthermore, many of the top listed strategies somehow explore the opponent by using simple mechanisms [14,15]. This shows that strategies that explore and then exploit the opponent can outperform any single non-group strategy in round-robin IPD tournaments.
A strategy with a simple identification mechanism, named ‘handshake’ [16], appeared in evolutionary IPD. This strategy defects at the first move and cooperates on the second move. If the opponent behaves in the same way, it will keep cooperating in the following moves. Otherwise, it will always defect. This ‘initial defect then cooperate’ can be seen as a password. Any strategy that knows this password (or behaves the same by chance) may evoke handshake’s cooperation while other actions trigger defection. By means of a mechanism like ‘handshake’, a group of strategies are able to recognize each other and then behave collectively [17,18].
In the 2004 IPD competition, a team from Southampton University led by Jennings introduced a group of strategies, which outperformed all singleton strategies, and won the top three positions. These strategies were designed to recognize each other through a predetermined sequence of 5-10 moves at the start. Once two Southampton players recognized each other, they would act as a ‘master’ or ‘slave’ -a master will always defect while a slave will always cooperate in order for the master to win the maximum points. If the opponent was recognized as not being a Southampton entry, it would immediately defect to minimize the score of the opponent [19]. The Southampton strategies were designed to maximize the payoffs of a small number of masters so that they could win in a round robin IPD tournament. The slaves performed poorly and they generally received less payoffs than their opponents. This master-slave scheme is ineffective in evolutionary dynamics because the slaves quickly die out and then the masters lose the advantage of exploiting the slaves.
In thi
This content is AI-processed based on open access ArXiv data.