No Internal Regret via Neighborhood Watch
We present an algorithm which attains O(\sqrt{T}) internal (and thus external) regret for finite games with partial monitoring under the local observability condition. Recently, this condition has been shown by (Bartok, Pal, and Szepesvari, 2011) to imply the O(\sqrt{T}) rate for partial monitoring games against an i.i.d. opponent, and the authors conjectured that the same holds for non-stochastic adversaries. Our result is in the affirmative, and it completes the characterization of possible rates for finite partial-monitoring games, an open question stated by (Cesa-Bianchi, Lugosi, and Stoltz, 2006). Our regret guarantees also hold for the more general model of partial monitoring with random signals.
💡 Research Summary
The paper addresses a long‑standing open problem in the theory of partial‑monitoring games: achieving optimal regret rates against an adversarial (non‑stochastic) opponent. In a finite zero‑sum game the row player selects one of N actions, the column player selects one of M actions, and the row player observes only a signal from a known matrix H rather than the opponent’s action or the incurred loss L. The goal is to minimize both external regret (compared to the best fixed action in hindsight) and internal regret (compared to any unilateral deviation from one action to another).
A key structural condition, called local observability, was introduced by Bartók, Pál, and Szepesvári (2011). It requires that for every pair of neighboring actions i and j (neighbors are defined via the adjacency of best‑response cells in the simplex of opponent mixed strategies) the loss difference ℓ_j − ℓ_i lies in the column space of the combined signal matrix S(i,j). Equivalently, there exists a vector v(i,j) such that ℓ_j − ℓ_i = S(i,j)ᵀ v(i,j). This condition guarantees that the row player can construct unbiased estimates of loss differences using only the observed signals.
The authors introduce a two‑level algorithm called Neighborhood Watch. At the lower level, for each action i a local algorithm A_i runs on the neighbor set N_i. Each A_i receives only the signals that involve i and its neighbors, builds unbiased estimates b_t(i,j) of ℓ_j − ℓ_i using the vectors v(i,j), and feeds these estimates to a full‑information online convex optimization sub‑routine (e.g., Exponential Weights). The sub‑routine guarantees O(√T) external regret on the estimated losses, which translates into O(√T) local internal regret for the pair (i, j).
At the upper level, the distributions q_t^i produced by the N local algorithms are assembled into a matrix Q_t =
Comments & Academic Discussion
Loading comments...
Leave a Comment