Aspiration Learning in Coordination Games

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

We consider the problem of distributed convergence to efficient outcomes in coordination games through dynamics based on aspiration learning. Under aspiration learning, a player continues to play an action as long as the rewards received exceed a specified aspiration level. Here, the aspiration level is a fading memory average of past rewards, and these levels also are subject to occasional random perturbations. A player becomes dissatisfied whenever a received reward is less than the aspiration level, in which case the player experiments with a probability proportional to the degree of dissatisfaction. Our first contribution is the characterization of the asymptotic behavior of the induced Markov chain of the iterated process in terms of an equivalent finite-state Markov chain. We then characterize explicitly the behavior of the proposed aspiration learning in a generalized version of coordination games, examples of which include network formation and common-pool games. In particular, we show that in generic coordination games the frequency at which an efficient action profile is played can be made arbitrarily large. Although convergence to efficient outcomes is desirable, in several coordination games, such as common-pool games, attainability of fair outcomes, i.e., sequences of plays at which players experience highly rewarding returns with the same frequency, might also be of special interest. To this end, we demonstrate through analysis and simulations that aspiration learning also establishes fair outcomes in all symmetric coordination games, including common-pool games.

💡 Research Summary

The paper investigates how a simple, distributed learning rule—aspiration learning—can drive a large population of agents playing a coordination game toward efficient and, in symmetric settings, fair outcomes. In the proposed scheme each agent maintains an aspiration level, a fading‑memory average of past payoffs, which is occasionally perturbed by a small random noise. At each period the agent compares the received payoff to its current aspiration. If the payoff falls short, the agent becomes dissatisfied; the degree of dissatisfaction determines a probability of experimentation, i.e., of randomly selecting a new action. If the payoff meets or exceeds the aspiration, the agent repeats its current action (“win‑stay”). The aspiration level is then updated by a convex combination of its previous value and the observed payoff, plus the occasional perturbation.

The authors first show that the resulting stochastic process is a Markov chain on a hybrid state space (discrete actions, continuous aspirations). By letting the experimentation probability ε become sufficiently small, they prove that this infinite‑state chain is stochastically equivalent to a finite‑state Markov chain that distinguishes only between “satisfied” and “dissatisfied” meta‑states. This reduction enables a tractable analysis of the invariant distribution: as ε → 0 the unique stationary distribution concentrates arbitrarily large mass on the set (\bar A) of payoff‑dominant action profiles. Consequently, the long‑run frequency with which an efficient profile is played can be made as close to one as desired by choosing a small enough step size in the aspiration update and a sufficiently low perturbation rate.

The paper then introduces a generalized class of coordination games. A game belongs to this class if there exists a non‑empty set (\bar A) of action profiles that payoff‑dominates all other profiles (condition (a)), any profile outside the union of Nash equilibria and (\bar A) admits at least one player with a better reply that does not hurt the others (condition (b)), and any Nash equilibrium outside (\bar A) can be reached from some other profile through a sequence of unilateral improvements (condition (c)). This definition subsumes classic Stag‑Hunt, network formation, and common‑pool games.

Two main theorems are established for this class. The first theorem states that, under aspiration learning with sufficiently small ε and learning rate β, the stationary distribution of the induced Markov chain places arbitrarily large probability on (\bar A). Hence, efficient outcomes are selected with high frequency. The second theorem deals with symmetric coordination games, where all players have identical action sets and payoff functions. In such games, if (\bar A) contains several equally efficient profiles, the stationary distribution becomes uniform over (\bar A). This uniformity yields a notion of fairness: each efficient profile is visited with the same long‑run frequency, which is particularly relevant for common‑pool scenarios where users should obtain equal access to a shared resource.

To validate the theory, the authors conduct simulations in two representative domains. In a network formation game, each node chooses a set of outgoing links; the payoff balances connectivity (a binary indicator of whether a directed path exists to every other node) against the cost of maintaining links. Aspiration learning drives the network to a “critical” connected topology that minimizes link cost while preserving full connectivity, confirming the efficiency claim. In a common‑pool game, multiple users compete for a single channel; successful transmission yields a payoff of 1, collision yields 0. The learning dynamics lead to a rotating schedule where each user experiences successful transmissions with equal long‑run frequency, illustrating the fairness result.

The paper’s contributions are threefold: (1) a rigorous Markov‑chain equivalence that reduces an infinite‑state learning process to a tractable finite‑state model; (2) explicit convergence results for a broad class of coordination games, guaranteeing that efficient profiles dominate the stationary distribution; and (3) a fairness guarantee for symmetric games, showing that aspiration learning naturally balances the long‑run play among multiple efficient outcomes. Compared with earlier works that focused on two‑player, two‑action games or required explicit best‑response computations, this study provides a scalable, fully distributed mechanism suitable for large‑scale wireless, networking, or resource‑allocation systems.

Future directions suggested include extending the analysis to time‑varying payoff structures, limited observation settings where agents cannot directly monitor others’ actions, and asymmetric games where fairness notions must be redefined. Overall, the work demonstrates that a simple “win‑stay, lose‑shift” rule augmented with fading‑memory aspirations and occasional perturbations can simultaneously achieve efficiency and fairness in complex multi‑agent coordination problems.

Aspiration Learning in Coordination Games

💡 Research Summary

Comments & Academic Discussion

Leave a Comment