Blackwell Approachability and Low-Regret Learning are Equivalent

Reading time: 5 minute
...

📝 Original Info

  • Title: Blackwell Approachability and Low-Regret Learning are Equivalent
  • ArXiv ID: 1011.1936
  • Date: 2023-06-15
  • Authors: : James Hannan, David Blackwell

📝 Abstract

We consider the celebrated Blackwell Approachability Theorem for two-player games with vector payoffs. We show that Blackwell's result is equivalent, via efficient reductions, to the existence of "no-regret" algorithms for Online Linear Optimization. Indeed, we show that any algorithm for one such problem can be efficiently converted into an algorithm for the other. We provide a useful application of this reduction: the first efficient algorithm for calibrated forecasting.

💡 Deep Analysis

Figure 1

📄 Full Content

A typical assumption in game theory, and indeed in most of economics, is that an agent's goal is to optimize a scalar-valued payoff function-a person's wealth, for example. Such scalar-valued utility functions are the basis for much work in learning and Statistics too, where one hopes to maximize prediction accuracy or minimize expected loss. Towards this end, a natural goal is to prove a guarantee on some algorithm's minimum expected payoff (or maximum reward).

In 1956, David Blackwell posed an intriguing question: what guarantee can we hope to achieve when playing a two-player game with a vector-valued payoff, particularly when the opponent is potentially an adversary? For the case of scalar payoffs, as in a two-player zero-sum game, we already have a concise guarantee by way of Von Neumann’s minimax theorem: either player has a fixed oblivious strategy that is effectively the “best possible”, in that this player could do no better even with knowledge of the opponent’s randomized strategy in advance. This result is equivalent to strong duality for linear programming.

When our payoffs are non-scalar quantities, it does not make sense to ask “can we earn at least x?”. Instead, we would like to ask “can we guarantee that our vector payoff lies in some convex set S”? In this case, the story is more difficult, and Blackwell observed that an oblivious strategy does not suffice-in short, we do not achieve “duality” for vector-payoff games. What Blackwell was able to prove is that this negative result applies only for one-shot games. In his celebrated Approachability Theorem [3], one can achieve a duality statment in the limit when the game is played repeatedly, where the player may learn from his opponent’s prior actions. Blackwell actually constructed an algorithm (that is, an adaptive strategy) with the guarantee that the average payoff vector “approaches” S, hence the name of the theorem.

Blackwell Approachability has the flavor of learning in repeated games, a topic which has received much interest. In particular, there are a wealth of recent results on so-called no-regret learning algorithms for making repeated decisions given an arbitrary (and potentially adversarial) sequence of cost functions. The first no-regret algorithm for a “discrete action” setting was given in a seminal paper by James Hannan in 1956 [10]. That same year, David Blackwell pointed out [2] that his Approachability result leads, as a special case, to an algorithm with essentially the same low-regret guarantee proven by Hannan. Blackwell thus found an intriguing connection between repeated vector-payoff games and low-regret learning, a connection that we shall explore in greater detail in the present work. Indeed, we will show that the relationship goes much deeper than Blackwell had originally supposed. We prove that, in fact, Blackwell’s Approachability Theorem is equivalent, in a very strong sense, to no-regret learning, for the particular setting of so-called “Online Linear Optimization”. Precisely, we show that any no-regret algorithm can be converted into an algorithm for Approachability and vice versa. This is algorithmic equivalence is achieved via the use of conic duality: if our goal is low-regret learning in a cone K, we can convert this into a problem of approachability of the dual cone K 0 , and vice versa.

This equivalence provides a range of benefits and one such is “calibrated forecasting”. The goal of a calibrated forecaster is to ensure that sequential probability predictions of repeated events are “unbiased” in the following sense: when the weatherman says “30% chance of rain”, it should actually rain roughly three times out of ten. The problem of calibrated forecasting was reduced to Blackwell’s Approachability Theorem by Foster [7], and a handful of other calibration techniques have been proposed, yet none have provided any efficiency guarantees on the strategy. Using a similar reduction from calibration to approachability, and by carefully constructing the reduction from approachability to online linear optimization, we achieve the first efficient calibration algorithm.

There is by now vast literature on all three main topics of this paper: approachability, online learning and calibration, see [4] for an excellent exposition. The relation between the three areas is not as well-understood.

Blackwell himself noted that approachability implies no regret algorithms in the discrete setting. However, as we show hereby, the full power of approachability extends to a much more general framework of online linear optimization, which has only recently been explored (see [12] for a survey) and shown to give the first efficient algorithms for a host of problems (e.g. [1,6]). Perhaps more significant, we also prove the reverse direction -online linear optimization exactly captures the power of approachability. Previously, it was considered by many to be strictly stronger than regret minimization.

Calibration is a fundamen

📸 Image Gallery

cover.png

Reference

This content is AI-processed based on open access ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut