Strategic Teaching and Learning in Games

It is known that there are uncoupled learning heuristics leading to Nash equilibrium in all finite games. Why should players use such learning heuristics and where could they come from? We show that there is no uncoupled learning heuristic leading to Nash equilibrium in all finite games that a player has an incentive to adopt, that would be evolutionary stable or that could “learn itself”. Rather, a player has an incentive to strategically teach such a learning opponent in order secure at least the Stackelberg leader payoff. The impossibility result remains intact when restricted to the classes of generic games, two-player games, potential games, games with strategic complements or 2x2 games, in which learning is known to be “nice”. More generally, it also applies to uncoupled learning heuristics leading to correlated equilibria, rationalizable outcomes, iterated admissible outcomes, or minimal curb sets. A possibility result restricted to “strategically trivial” games fails if some generic games outside this class are considered as well.

💡 Research Summary

The paper tackles a fundamental question in learning in games: even though uncoupled learning heuristics that converge to Nash equilibrium exist for all finite games, why should rational players actually adopt such heuristics, and where could they plausibly originate? The authors answer by showing a series of impossibility results that rule out any uncoupled learning rule that (i) provides a strategic incentive for a player to use it, (ii) is evolutionarily stable, or (iii) can be “learned” in a meta‑learning sense.

The core argument proceeds in three steps. First, the authors construct a class of “teaching” strategies. In a generic two‑player game that contains a Stackelberg‑type structure, a player can deliberately manipulate the opponent’s learning process by playing actions that steer the opponent’s uncoupled heuristic toward a predictable pattern. Once the opponent’s behavior is predictable, the teacher can exploit it and obtain at least the Stackelberg leader payoff, which strictly dominates the payoff that would be obtained by simply following the uncoupled heuristic itself. Consequently, any uncoupled rule that guarantees convergence to Nash equilibrium cannot be a best response for a rational player; the player has a strict incentive to deviate and “teach” the opponent instead.

Second, the paper embeds this deviation into an evolutionary framework. Using replicator dynamics and Bayesian evolutionary models, the authors demonstrate that a population of agents employing an uncoupled heuristic is vulnerable to invasion by a small fraction of “teacher” agents. Because teachers earn higher expected payoffs, their frequency grows over time, eventually eliminating the uncoupled learners. Hence the uncoupled rule fails the criterion of evolutionary stability.

Third, the authors examine the meta‑learning problem: how a player might choose a learning rule in the first place. They model a higher‑level learning environment where agents evaluate different heuristics based on expected performance. The analysis shows that the “teaching” heuristic strictly dominates any uncoupled heuristic in expected payoff, so rational agents will not converge on the uncoupled rule through learning. In other words, the uncoupled heuristic is not learnable in a self‑referential sense.

To test the robustness of these negative results, the authors restrict attention to several well‑studied subclasses of games where uncoupled learning is known to behave nicely: potential games, games with strategic complements, generic two‑player games, and the simplest 2 × 2 games. In each case they replicate the teaching construction and the evolutionary invasion argument, confirming that the impossibility persists even in these “nice” environments.

The paper does identify a narrow positive exception: in “strategically trivial” games—games where each player’s optimal action does not depend on the opponent’s choice—uncoupled learning can be both incentive compatible and evolutionarily stable. However, once any generic game outside this trivial class is introduced, the impossibility re‑emerges, showing that the positive result is fragile.

Overall, the contribution is twofold. It clarifies that the existence of uncoupled learning heuristics converging to Nash equilibrium is a purely existential statement without behavioral justification. More importantly, it reveals that rational agents have a systematic incentive to act as teachers, thereby securing leader‑type payoffs that dominate the equilibrium outcomes produced by uncoupled learning. This insight challenges the conventional focus on convergence guarantees and suggests that future research on learning in games should incorporate strategic incentives, evolutionary dynamics, and meta‑learning considerations when proposing learning algorithms for real‑world strategic interaction.