No-Regret Learning in Extensive-Form Games with Imperfect Recall
Counterfactual Regret Minimization (CFR) is an efficient no-regret learning algorithm for decision problems modeled as extensive games. CFR’s regret bounds depend on the requirement of perfect recall: players always remember information that was revealed to them and the order in which it was revealed. In games without perfect recall, however, CFR’s guarantees do not apply. In this paper, we present the first regret bound for CFR when applied to a general class of games with imperfect recall. In addition, we show that CFR applied to any abstraction belonging to our general class results in a regret bound not just for the abstract game, but for the full game as well. We verify our theory and show how imperfect recall can be used to trade a small increase in regret for a significant reduction in memory in three domains: die-roll poker, phantom tic-tac-toe, and Bluff.
💡 Research Summary
The paper addresses a fundamental limitation of Counterfactual Regret Minimization (CFR), a leading no‑regret learning algorithm for extensive‑form games. Traditional regret guarantees for CFR rely on the perfect‑recall assumption, which requires that a player always remembers every piece of information it has observed and the order in which it was observed. In many practical settings—especially when memory is scarce or when abstractions are employed—this assumption is violated, leading to a gap in theoretical understanding.
To bridge this gap, the authors introduce a new class of games called structurally imperfect‑recall games. In this class, each abstract information set aggregates several concrete information sets from the original game, but all aggregated sets share the same set of available actions and the same counterfactual values for each action. This structural constraint ensures that the abstraction does not distort the regret calculations performed by CFR.
Two main theoretical results are proved. First, when CFR is run directly on a structurally imperfect‑recall game, the average regret after T iterations is bounded by (O(\sqrt{|\tilde{\mathcal{I}}|T})), where (|\tilde{\mathcal{I}}|) denotes the number of abstract information sets. Because (|\tilde{\mathcal{I}}|) can be dramatically smaller than the number of information sets in the original game, the bound can be much tighter in practice. Second, any abstraction that satisfies the structural constraints inherits this regret bound for the full (unabstracted) game. In other words, the regret measured in the abstract game serves as a valid upper bound for the regret in the original game, providing a direct link between abstraction quality and strategic performance.
The authors validate their theory experimentally on three domains.
-
Die‑Roll Poker – By ignoring the exact die outcome (an imperfect‑recall abstraction), memory usage drops by more than 90 % while average regret increases by only about 20 % relative to the perfect‑recall baseline. Training time is reduced by a factor of three.
-
Phantom Tic‑Tac‑Toe – Players receive only partial observations of the opponent’s moves. Merging information sets according to the structural rule yields a strategy whose win‑rate differs by less than 2 % from the optimal perfect‑recall strategy, demonstrating negligible performance loss.
-
Bluff – A large‑scale card‑bluffing game where the action space is huge. Applying the structural abstraction cuts memory requirements by roughly 70 % while the average regret rises by only about 5 % compared with standard CFR.
These experiments confirm that the proposed regret bound is not merely a theoretical artifact; it translates into tangible savings in memory and computation while keeping regret growth modest. The paper also discusses broader implications: the framework enables CFR to be deployed on memory‑constrained platforms such as mobile devices, embedded systems, or cloud environments where cost per byte is a concern.
Finally, the authors outline future research directions. They suggest relaxing the structural constraints to cover a wider variety of abstractions, extending the analysis to multi‑agent settings where different players may employ different recall levels, and integrating deep neural network value approximators to handle continuous‑action or very large‑scale games.
In summary, this work delivers the first rigorous regret bound for CFR in a broad class of imperfect‑recall games, shows how abstracted games inherit this guarantee for the original game, and demonstrates practical benefits across three diverse benchmarks. It expands the applicability of CFR beyond the perfect‑recall regime, providing both a theoretical foundation and empirical evidence that memory‑efficient abstractions can be employed without sacrificing the core no‑regret property.