Evolution of cooperation with Q-learning: the impact of information perception

Evolution of cooperation with Q-learning: the impact of information perception
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

The inherent complexity of human beings manifests in a remarkable diversity of responses to intricate environments, enabling us to approach problems from varied perspectives. However, in the study of cooperation, existing research within the reinforcement learning framework often assumes that individuals have access to identical information when making decisions, which contrasts with the reality that individuals frequently perceive information differently. In this study, we employ the Q-learning algorithm to explore the impact of information perception on the evolution of cooperation in a two-person Prisoner’s Dilemma game. We demonstrate that the evolutionary processes differ significantly across three distinct information perception scenarios, highlighting the critical role of information structure in the emergence of cooperation. Notably, the asymmetric information scenario reveals a complex dynamical process, including the emergence, breakdown, and reconstruction of cooperation, mirroring psychological shifts observed in human behavior. Our findings underscore the importance of information structure in fostering cooperation, offering new insights into the establishment of stable cooperative relationships among humans.


💡 Research Summary

This paper investigates how differences in information perception affect the evolution of cooperation using a reinforcement‑learning framework, specifically the Q‑learning algorithm, in a two‑player Prisoner’s Dilemma (PD) setting. The authors argue that most prior RL studies on cooperation assume symmetric information—both agents have access to the same type of data—whereas real humans often perceive information asymmetrically due to age, experience, culture, status, or other contextual factors. To address this gap, they construct three distinct information‑perception schemes: (I) “You+You,” where both agents observe the opponent’s previous action; (II) “Me+Me,” where both agents base their state on their own previous action; and (III) “You+Me,” an asymmetric configuration in which one agent observes the opponent’s action while the other uses its own past action.

The PD payoff matrix follows the standard ordering T > R > P > S with T+S < 2R, instantiated as R = 1.0, S = −b, T = 1 + b, P = 0, where b > 0 quantifies the dilemma strength. Each agent maintains a 2 × 2 Q‑table (states C or D, actions C or D) and updates it via the TD‑error rule Q(s,a)←(1−α)Q(s,a)+α


Comments & Academic Discussion

Loading comments...

Leave a Comment