Robust approachability and regret minimization in games with partial monitoring
Approachability has become a standard tool in analyzing earning algorithms in the adversarial online learning setup. We develop a variant of approachability for games where there is ambiguity in the obtained reward that belongs to a set, rather than being a single vector. Using this variant we tackle the problem of approachability in games with partial monitoring and develop simple and efficient algorithms (i.e., with constant per-step complexity) for this setup. We finally consider external regret and internal regret in repeated games with partial monitoring and derive regret-minimizing strategies based on approachability theory.
💡 Research Summary
The paper introduces a new framework called “robust approachability” that extends Blackwell’s classical approachability theory to settings where the payoff received by the decision maker is not a single deterministic vector but a set of possible vectors, reflecting intrinsic ambiguity or partial observation. The authors first formalize this set‑valued payoff model, defining a linear (and later a more general concave‑convex) extension of the payoff function to mixed strategies. They then give a clean necessary and sufficient condition for a convex target set C to be robustly approachable: for every mixed strategy of Nature there must exist a mixed strategy of the player such that the resulting set‑valued payoff lies inside C.
Building on this condition, they propose an iterative algorithm that at each round computes the ℓ₂‑projection of the empirical average set‑valued payoff onto C and then solves a scalar zero‑sum minimax problem to select the next mixed action. Because the minimax reduces to a linear program, each iteration runs in polynomial time in the size of the action spaces, and when the projection onto C can be performed efficiently (e.g., C is a ball or a polytope) the per‑step complexity is constant.
The core contribution is the application of robust approachability to games with partial monitoring. In such games the learner observes a random signal whose distribution depends on both players’ actions, but the signal does not directly reveal the reward. The authors model the signal as inducing a set‑valued payoff mapping and show that, under a “bi‑piecewise‑linear” structure (the signal is piecewise linear in each player’s action), the robust approachability algorithm can be applied directly, yielding constant‑time per‑round procedures with explicit convergence rates.
Using this machinery, the paper derives regret‑minimizing strategies for both external and internal regret in repeated partial‑monitoring games. External regret is cast as approaching a convex set that captures the condition “average payoff is at least as good as the best fixed action in hindsight.” Internal regret is handled by defining a family of target sets for each possible deviation i→j and simultaneously ensuring approachability for all of them. The resulting algorithms achieve O(1/√T) convergence (up to logarithmic factors) with the same constant‑time per‑round cost, matching the best known rates (e.g., Lugosi et al., 2016) but with a far simpler and more transparent proof.
The paper also discusses extensions beyond the bi‑piecewise‑linear case. For general signal structures, the authors show that robust approachability can still be used to reach polytope target sets efficiently, while arbitrary convex target sets are theoretically approachable but may incur higher computational overhead due to costly projections.
In summary, the authors provide: (1) a novel robust approachability theory for set‑valued payoffs; (2) concrete, constant‑complexity algorithms for partial‑monitoring games under a realistic signal model; (3) a unified treatment of external and internal regret minimization that improves upon existing exponential‑weight methods; and (4) a discussion of computational trade‑offs for more general settings. The work bridges a gap between geometric approachability concepts and practical online learning under uncertainty, offering both theoretical insight and algorithms ready for deployment in large‑scale, real‑time decision systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment