A Primal Condition for Approachability with Partial Monitoring

A Primal Condition for Approac habilit y with P artial Monitoring Shie Mannor — Vianney P erchet — Gilles Stoltz ∗ T ec hnion — Université P aris-Diderot — CNRS / Ecole normale sup érieure / HEC P aris Septem b er 17, 2021 Abstract In approachabilit y with full monitoring there are tw o types of conditions that are known to b e equiv alen t for con vex sets: a primal and a dual condition. The primal one is of the form: a set C is approachable if and only all containing half- spaces are approac hable in the one-shot game. The dual condition is of the form: a con vex set C is approachable if and only if it in tersects all pa yoﬀ sets of a certain form. W e consider approac hability in games with partial monitoring. In previous w orks (Perc het, 2011a, Mannor et al., 2011) we pro vided a dual c haracterization of approac hable con v ex sets and w e also exhibited eﬃcien t strategies in the case where C is a p olytop e. In this pap er we provide primal conditions on a conv ex set to b e approac hable with partial monitoring. They dep end on a modiﬁed reward function and lead to approachabilit y strategies based on mo diﬁed pay oﬀ functions and that pro ceed b y pro jections similarly to Blac kwell’s (1956) strategy . This is in con trast with previously studied strategies in this context that relied mostly on the signaling structure and aimed at estimating w ell the distributions of the signals received. Our results generalize classical results by Kohlberg (1975) (see also Mertens et al., 1994) and apply to games with arbitrary signaling structure as well as to arbitrary conv ex sets. 1. In tro duction Approac hability theory dates bac k to the seminal pap er of Blackw ell (1956). In this pap er Blackw ell presented conditions under which a play er can guaran tee that the long- term av erage v ector-v alued reward is asymptotically close to some target set regardless of the opp onen t actions. If this prop ert y holds, we say that the set is approac hable. In ∗ This w ork b egan after an interesting ob jection of and further discussions with Jean-F rançois Mertens during the presen tation of our earlier results (Mannor et al., 2011) on the dual c haracterization of approac hability with partial monitoring at the conference Games T oulouse 2011 . The material presented in this article was developed further with Sylv ain Sorin back in Paris, whom we thank deeply for his advice and encouragemen ts. Sadly , Jean-F rançois Mertens, a close collab orator of Sylv ain Sorin, passed a wa y in the mean time. This contribution is in honor of b oth of these imp ortant contributors to the theory of repeated games. the full monitoring case studied in Blackw ell (1956) there are tw o equiv alen t conditions for a conv ex set to b e approachable. The ﬁrst one, known as a primal condition (or later termed the “B” condition in honor for Blackw ell) states that ev ery half-space that con tains the target set is also approac hable. It turns out that whether a half-space is approac hable is determined b y the sign of the v alue of some asso ciated zero-sum game. The second characterization, known as the dual condition, states that for every mixed action of the opp onent, the pla yer can guarantee that the one-shot vector-v alued reward is inside the target set. Approac hability theory has found man y applications in game theory , online learning, and related ﬁelds. Both primal and dual characterizations are of in terest therein. Indeed, c hecking if the dual condition holds is formally simple while a concrete approaching strategy naturally derives from the primal condition (it only requires solving a one-shot zero-sum game at ev ery stage of the rep eated vector-v alued game). Approac hability theory has b een applied to zero-sum rep eated games with incomplete information and/or imperfect (or partial) monitoring. The work of Kohlberg (1975) (see also Mertens et al., 1994) uses approac hability to deriv e strategies for games with in- complete information. The general case of rep eated v ector-v alued games with partial monitoring w as studied only recen tly . A dual c haracterization of approachable conv ex sets with partial monitoring w as presented b y Perc het (2011a). How ever, it is not useful for deriving approac hing strategies since it essentially requires to run a calibration al- gorithm, whic h is kno wn to b e computationally hard. In a recen t work (Mannor et al., 2011) we derived eﬃcient strategies for approachabilit y in games with partial monitoring in some cases, e.g., when the con vex set to b e approached is a p olytop e. How ev er, these strategies are based on the dual condition, and not on any primal one: they thus do not shed ligh t on the structure of the game. In this pap er we pro vide a primal condition for approac hability in games with partial monitoring. It can b e stated, as in Blackw ell (1956), as a requirement that every half- space containing the target set is one-shot approachable. Ho wev er, the reward function has to b e mo diﬁed in some cases for the condition to b e suﬃcient. W e also show how it leads to an eﬃcien t approachabilit y strategy , at least in the case of approac hable p olytop es. Outline. In Section 2 we deﬁne a mo del of partial monitoring and recall some of the basic results from approachabilit y (b oth in terms of its primal and dual c haracteriza- tions). In Section 3 we explain the curren t state-of-the-art, recall the dual condition for approac hability with partial monitoring, and outline our ob jectiv es. In Section 4 w e pro vide results for approac hing half-spaces as they hav e the simplest c haracterization of approac hability: w e sho w that the signaling structure has no impact on approac hability of half-spaces, only the pay oﬀ structure do es. This is not the case anymore for approach- abilit y of more complicated conv ex sets, which is the fo cus of the subsequent sections. In Section 5 we discuss the case of a target set that is an orthan t under additional prop- erties on the pa yoﬀ–signaling structure: w e show that a natural primal condition holds. This condition, which we term the “upp er-right-corner prop erty” is the main technical 2 con tribution of the pap er. W e sho w that basically , we can study approachabilit y for a mo diﬁed pay oﬀ function and that under some fa vorable conditions, a primal condition is easy to deriv e (which is the main conceptual con tribution). As an intermezzo, w e link our results to K ohlb erg (1975) in Section 6 and show that rep eated games with imp erfect information can b e analyzed using our approach for games with imp erfect monitoring. In Section 7 we then analyze the case of a general signaling structure for the approach- abilit y of orthants and pro vide an eﬃcient approac hing strategy based on the exhibited primal conditions. Finally , we relax the shap e of the target set from an orthan t to a p olytop e in Section 8, and then to a general con vex set in Section 9. Our generalizations sho w that the same primal condition holds in all cases. The generalization from orthan ts to p olytop es is based on the observ ation that any p olytop e can b e represen ted as an or- than t in a space whose dimensionality is the n umber of linear inequalities describing the p olytop e and on a mo diﬁed rew ard function. The generalization to general conv ex sets uses supp ort functions and lifting to deriv e similar results; w e pro vide some background material on supp ort functions in the app endix. 2. Mo del and preliminaries W e no w deﬁne the mo del of in terest and then recall some basic results from approac ha- bilit y theory for rep eated vector-v alued games (with full monitoring). Mo del and notation W e consider a vector-v alued game betw een t wo play ers, a decision maker (or pla yer) and Nature, with resp ective ﬁnite action sets I and J , whose cardinalities are referred to as I and J . W e denote by d the dimension of the reward v ector and equip R d with the ` 2 –norm k · k 2 . The pay oﬀ function of the pla y er is given b y a mapping r : I × J → R d , whic h is multi-linearly extended to ∆( I ) × ∆( J ) , the set of pro duct-distributions o ver I × J . A t eac h round, the play er and Nature sim ultaneously c ho ose their actions i n ∈ I and j n ∈ J , at random according to probabilit y distributions denoted b y x n ∈ ∆( I ) and y n ∈ ∆( J ) . A t the end of a round, the pla yer do es not observ e Nature’s action j n nor ev en the pay oﬀ r n := r ( i n , j n ) he obtains: he only gets to see some signal. More precisely , there is a ﬁnite set H of possible signals, and the signal s n ∈ H that is shown to the pla yer is dra wn at random according to the distribution H ( i n , j n ) , where the mapping H : I × J → ∆( H ) is known by b oth play ers. The play er is said to hav e full monitoring if H = J and H ( i, j ) = F ull( i, j ) := δ j , i.e., if the action of Nature is observ ed. W e sp eak of a game in the dark when the signaling structure H is not informative at all, i.e., when H is reduced to a single signal referred to as ∅ ; w e denote this situation by H = Dark . Of ma jor in terest will b e maximal information mapping H : ∆( J ) → ∆( H ) I , which is deﬁned as follows. The image of each j ∈ J is the vector H ( j ) =  H ( i, j )  i ∈I , and this deﬁnition is extended linearly onto ∆( J ) . An elemen t of the image F = H  ∆( J )  3 of H is referred to as a ﬂag . The notion of “ﬂag” is key: the play er only accesses the mixed actions y of Nature through H . Indeed, as is intuitiv e and as is made more formal at the end of the pro of of Proposition 2, he could at b est access or estimate the ﬂag H ( y ) but not y itself. F or every x ∈ ∆( I ) and h ∈ F the set of pay oﬀs compatible with h is m ( x, h ) =  r ( x, y ) : y ∈ ∆( J ) suc h that H ( y ) = h  . (1) The set m ( x, h ) represents all the rew ards that are statistically compatible with the ﬂag h (or put diﬀeren tly , the set of all p ossible rewards w e cannot distinguish from). Note that with full monitoring, H reduces to ∆( J ) and one has m ( x, y ) =  r ( x, y )  for all y ∈ ∆( J ) . Finally , we denote by M a uniform ` 2 –b ound on r , that is, M = max i,j   r ( i, j ) k . Also, for every n ∈ N and sequence ( a m ) m ∈ N , the av erage of the ﬁrst n elements is referred to as a n = (1 /n ) P n m =1 a m . The distance to a set C is denoted by d C . A b ehavioral strategy σ of the play er is a mapping from the set of his ﬁnite his- tories ∪ n ∈ N ( I × H ) n in to ∆( I ) . Similarly , a strategy τ of nature is a mapping from ∪ n ∈ N ( I × H × J ) n in to ∆( J ) . As usual, w e denote b y P σ,τ the probabilit y induced b y the pair ( σ, τ ) onto ( I × H × J ) N . Deﬁnition and some prop erties of approachabilit y A set C ⊆ R d is r –approachable for the signaling structure H , or, in short, is ( r, H ) – approac hable, if, for all ε > 0 , there exists a strategy σ ε of the pla yer and a natural n umber N ∈ N suc h that, for all strategies τ of Nature, P σ ε ,τ  ∃ n > N s.t. d C ( r n ) > ε  6 ε . W e refer to the strategy σ ε as an ε –approachabilit y strategy of C . It is easy to show that the approachabilit y of C implies the existence of a strategy ensuring that the sequence of the av erage v ector-v alued pa yoﬀs con verges to the set C almost surely , uniformly with resp ect to the strategies of Nat ure. By analogy suc h a strategy is called a 0 – approac hability strategy of C . Con versely , a set C is r –excludable for the signaling structure H if, for some δ > 0 , the complement of the δ –neighborho o d of C is r –approac hable by Nature for the signaling structure H . Primal characterization. W e now discuss c haracterizations of approac hability in the case of full monitoring. W e will need the stronger notion of one-shot approachabilit y (the notion of one-shot excludabilit y is stated only for later purp oses). 4 Deﬁnition 1. A set C is one-shot r –appr o achable if ther e exists x ∈ ∆( I ) such that for al l y ∈ ∆( J ) , one has r ( x, y ) ∈ C . A set C is one-shot r –excludable if for some δ > 0 , the c omplement of the δ –neighb orho o d of C is one-shot r –appr o achable by Natur e. Blac kwell (1956) (see also Mertens et al., 1994) provided the follo wing primal char- acterization of approachable con vex 1 sets. A set that satisﬁes it is called a B –set. Theorem 1. A c onvex set C is ( r, F ull) –appr o achable if and only if any c ontaining half- sp ac e C hs ⊇ C is one-shot r –appr o achable. This c haracterization also leads to an approac hability strategy , which w e describ e with a sligh t modiﬁcation with resp ect to its most classical statement. W e denote by r 0 t = r ( x t , j t ) the exp ected pay oﬀ obtained at round t , whic h is a quantit y that is observed b y the play er. A t stage n , if r 0 n 6∈ C , let π C ( r 0 n ) denote the pro jection of r 0 n on to C and consider the con taining half-space C hs , n +1 whose deﬁning h yp erplane is tangent to C at π C ( r 0 n ) . The strategy then consists of choosing the mixed action x n +1 ∈ ∆( I ) asso ciated with the one-shot approac hability of C hs , n +1 , as illustrated in Figure 1. Deﬁnition 1. A set C is one-shot r –appr o a chable if ther e exists x ∈ ∆( I ) such that f or al l y ∈ ∆( J ) , one has r ( x, y ) ∈ C . A set C is on e- sho t r –excludable if for some δ > 0 , the c omplement of the δ –neighb orho o d of C is one-shot r –appr o achable by Natur e. Blac kw ell (1956) (see also Mertens et a l., 1994 ) pro vided the follo wing primal char- acterization of approac habl e con v ex 1 sets. A set that satisﬁes it is ca lled a B –set. Theorem 1. A c onvex set C is ( r , F ull) –appr o ach able if and on ly if any c ontaining half- sp ac e C hs ⊇ C is one-shot r –appr o achable. This c haracterization a lso leads to an approa c habilit y strategy , whic h w e describ e with a sligh t mo diﬁcation with resp ect to its most cl assical st atemen t. W e denote b y r 0 t = r ( x t , j t ) the exp ected pa y oﬀ obtained at roun d t , whic h is a quan tit y that is observ ed b y the pla y er. A t stage n , if r 0 n 6∈ C , let π C ( r 0 n ) denote the pro jecti on of r 0 n on to C and consider the con taining half-space C hs , n +1 whose deﬁning h yp erplane is tangen t to C at π C ( r 0 n ) . The strategy then consists of c ho osi ng the mixed acti on x n +1 ∈ ∆( I ) asso c iated with the one-shot approac habilit y of C hs , n +1 , as illustrated in Figure 1. C r 0 n π n C hs , n +1 r ( x n +1 , j ) r ( x n +1 , j 0 ) Figure 1: An illustration of Blac kw ell’s approac hing strategy . A t stage n + 1 , when r 0 n 6∈ C , the con v ex set C and t he exp ec ted pa y oﬀ r 0 n +1 are in the half- space C hs , n +1 while r 0 n lies in its complemen t. If r 0 n ∈ C , an y c hoice x n +1 ∈ ∆( I ) is suitable. The ab o v e strategy ens ures that for all y ∈ ∆( J ) ,  r ( x n +1 , y ) − π C ( r 0 n ) , r 0 n − π C ( r 0 n )  6 0 ; (2) whic h in turn ensures the con v ergence to C of the mixed pa y oﬀs at a rate indep enden t of d , na mely d C 1 n n X t =1 r ( x t , j t ) ! 6 2 M √ n . (3) The uniform con v ergence of r n to C is deduced b y marti ngale con v ergence theorems (e.g., the Ho eﬀding–Azuma inequalit y) from the ab o v e uniform con v ergence of r 0 n to C . 1 This primal c harac terization w as actually stated b y Blac kw ell (1956) in a more general w a y for all, non-necessarily c on v ex, sets. 5 Figure 1: An illustration of Blac kwell’s approac hing strategy . At stage n + 1 , when r 0 n 6∈ C , the conv ex set C and the exp ected pay oﬀ r 0 n +1 are in the half-space C hs , n +1 while r 0 n lies in its complemen t. If r 0 n ∈ C , any choice x n +1 ∈ ∆( I ) is suitable. The ab ov e strategy ensures that for all y ∈ ∆( J ) ,  r ( x n +1 , y ) − π C ( r 0 n ) , r 0 n − π C ( r 0 n )  6 0 ; (2) whic h in turn ensures the con vergence to C of the mixed pay oﬀs at a rate indep endent of d , namely d C 1 n n X t =1 r ( x t , j t ) ! 6 2 M √ n . (3) The uniform conv ergence of r n to C is deduced by martingale con vergence theorems (e.g., the Ho eﬀding–Azuma inequalit y) from the ab ov e uniform con vergence of r 0 n to C . 1 This primal c haracterization w as actually stated b y Blac kwell (1956) in a more general wa y for all, non-necessarily conv ex, sets. 5 Dual c haracterization. In the sp eciﬁc case of closed conv ex sets, using von Neu- mann’s min-max theorem, the primal c haracterization stated ab ov e can b e transformed in to the following dual char acterization : C ⊆ R d is ( r , F ull) –approac hable ⇐ ⇒ ∀ y ∈ ∆( J ) , ∃ x ∈ ∆( I ) , r ( x, y ) ∈ C . (4) This characterization might b e simpler to form ulate and to c heck, y et it do es not provide an explicit approac hability strategy . 3. Related literature and the ob jective of this pap er In this section w e ﬁrst recall the existing results on approac hability with partial moni- toring and then explain in a more tec hnical wa y the ob jectiv es of the pap er. Results on approac hability with partial monitoring Concerning the primal c haracterization. Kohlberg (1975)—see also Mertens et al., 1994—studied sp eciﬁc frameworks (induced by rep eated games with incomplete informa- tion, see Section 6) in which approac hability dep ends mildly on the signaling structure. A prop ert y that we deﬁne in the sequel and call the upp er-right-c orner pr op erty holds b et ween the pa yoﬀ function r and the signaling structure H . Based on this prop erty it is rather straightforw ard to sho w that the primal characterization for the ( r , H ) – approac hability of orthan ts stated in Theorem 1 still holds. Section 6 pro vides more details on this matter. Concerning the dual characterization. P erchet (2011a) pro vided the following dual c haracterization of approachable closed conv ex sets under partial monitoring: C ⊆ R d is ( r , H ) –approachable ⇐ ⇒ ∀ h ∈ F , ∃ x ∈ ∆( I ) , m ( x, h ) ⊆ C . (5) It indeed generalizes Blac kwell’s dual characterization (4) with full monitoring, as F can b e iden tiﬁed with ∆( J ) in this case. Based on (5), P erchet constructed the ﬁrst ( r , H ) –approac hability strategy of an y closed con vex set C ; it was based on calibrated forecasts of the v ectors of F . Because of this, the p er-stage computational complexit y of this strategy increases indeﬁnitely and rates of conv ergence cannot b e inferred. Moreo ver, the construction of this strategy is unhelpful to infer a generic primal c haracterization. Mannor et al. (2011) tac kled the issue of complexity and devised an eﬃcient ( r, H ) – approac hability strategy for the case where the target set is some p olytop e. This strategy has a ﬁxed and b ounded p er-stage computational complexity . Moreo ver, its rates of con vergence are indep enden t of d : they are of the order of n − 1 / 5 , where n is the num b er of stages. On the other hand, P erchet and Quincamp oix (2011) uniﬁed the setups of approach- abilit y with full or partial monitoring and c haracterized approac hable closed (conv ex) sets using some lifting to the W asserstein space of probabilit y measures on ∆( I ) × ∆( J ) . 6 Ob jectives and tec hnical con tent of the pap er This pap er fo cuses on the primal c haracterization of approac hable closed con vex sets with partial monitoring. First, note that if a closed con vex set is ( r, H ) –approac hable, then it is also ( r , F ull) –approachable, and therefore, by (4), an y containing half-space is necessarily one-shot r –approachable. The question is: When is the latter statement a suf- ﬁcien t condition for ( r, H ) –approac habilit y? The diﬃculty , as noted already b y Perc het (2011a) and recalled at the b eginning of Section 5, is that since the notions of approac h- abilit y with full or partial monitoring do not coincide, it can b e that a closed con vex set is not ( r , H ) –approac hable while ev ery con taining half-space is one-shot r –approachable. Some situations where the usual dual characterization is indeed suﬃcient are formed ﬁrst, by the cases when the target set is a half-space (with no condition on the game), and second, by the cases when the target set is an orthant and the structure ( r, H ) of the game satisﬁes the upp er-right-corner prop erty . This ﬁrst series of results is detailed in Sections 4 and 5. Some light is then shed in Section 6 on the construction of Kohlberg (1975) for the case of rep eated games with incomplete information. The rest of the pap er (Sections 7, 8, and 9) relies on no assumption on the structure ( r , H ) of the game. It discusses a primal condition based on one-shot approac habilit y of half-spaces with resp ect to a mo diﬁed pay oﬀ function e r H that encompasses the links b et ween the signaling structure H and the original pay oﬀ function r . Dep ending on the geometry of the target closed con vex set, this primal condition is stated in the original pa yoﬀ space (for orthan ts, Section 7) or in some lifted space (for p olytop es or general con vex sets, see Section 8 and 9). W e explain in Example 1 wh y suc h a lifting seems inevitable. W e also illustrate how the exhibited primal condition leads to a new (and eﬃcient) strategy for ( r , H ) –approac habilit y in the case of target sets given by p olytop es. (Sec- tion 7 does it for orthants and the result extends to p olytop es via Lemma 2.) This new strategy is based on sequences of (modiﬁed) pay oﬀs, as in Kohlberg (1975), and is not only based on sequences of signals, as in Perc het (2011a), Mannor et al. (2011). The construction of this strategy also en tails some non-linear approachabilit y results (b oth in full and partial monitoring). 4. Primal approac habilit y of half-spaces W e ﬁrst fo cus on half-spaces, not only b ecause they are the simplest con vex sets, but b ecause they are the cornerstones of the primal characterization of Blackw ell (1956). The following prop osition ties one-shot r –approachabilit y with ( r , H ) –approac habilit y of half-spaces. Prop osition 1. F or al l half-sp ac es C hs , for al l signaling structur es H , C hs is ( r , H ) –appr o achable ⇐ ⇒ C hs is one-shot r –appr o achable . This result is a mere interpolation of tw o w ell-known results for the extremal cases where H = F ull and H = Dark . The former case corresp onds to Blackw ell’s primal 7 c haracterization. In the latter case, Nature could alwa ys play the same y ∈ ∆( J ) at all rounds and the play er cannot infer the v alue of this y . So he needs to ha ve an action x ∈ ∆( I ) suc h that r ( x, y ) b elongs to C hs , no matter y . Stated diﬀeren tly , the ab ov e prop osition indicates that as far as half-spaces are con- cerned, the approac hability is indep endent of the signaling structure. Pr o of. Only the direct implication is to b e pro ven, as the con verse implication is im- mediate b y the abov e discussion ab out games in the dark. W e thus assume that C hs is ( r, H ) –approac hable. Using the characterization (5) of ( r , H ) –approac hable sets, one then has that ∀ h ∈ F , ∃ x ∈ ∆( I ) , m ( x, h ) ⊂ C hs , whic h implies that ∀ y ∈ ∆( J ) , ∃ x ∈ ∆( I ) , r ( x, y ) ∈ C hs . The implication holds b ecause r ( x, y ) ∈ m ( x, h ) when h = H ( y ) . Now, let a ∈ R d and b ∈ R suc h that C hs =  ω ∈ R d : h ω , a i 6 b  . The ab ov e prop erty can b e further restated as ∀ y ∈ ∆( J ) , ∃ x ∈ ∆( I ) , h r ( x, y ) , a i 6 b , or equiv alently , max y ∈ ∆( J ) min x ∈ ∆( I ) h r ( x, y ) , a i 6 b . By v on Neumann’s min-max theorem, we then hav e that min x ∈ ∆( I ) max y ∈ ∆( J ) h r ( x, y ) , a i 6 b , that is, ∃ x 0 ∈ ∆( I ) , ∀ y ∈ ∆( J ) , h r ( x 0 , y ) , a i 6 b . This is exactly the one-shot approac hability of C hs . Since the complemen t of any δ –neigh b orho o d of a half-space is also a half-space, we get the follo wing additional equiv alence, in view of the resp ective deﬁnitions of exclud- abilit y and one-shot excludability . Corollary 1. F or al l half-sp ac es C hs , for al l signaling structur es H , C hs is ( r , H ) –excludable ⇐ ⇒ C hs is one-shot r –excludable . 5. Primal approac habilit y of orthan ts under the upp er-righ t-corner prop ert y This section is devoted to stating a primal characterization of ( r, H ) –approachable or- than ts, i.e., sets of the form C orth ( a ) =  a − ω : ω ∈ ( R + ) d  for some a ∈ R d . Orthants are the key for extension to p olytop es, because, as we will discuss later, up to some lifting in higher dimensions, every polytop e can b e seen as an orthan t. 8 W e start by indicating that the primal c haracterization stated in the previous section in terms of the original pa yoﬀ function r does not extend directly to general con vex sets, not ev en to orthan ts—at least without an additional assumption. How ev er, in this section we state such a suﬃcient assumption for its extension. W e study the most general primal characterization in Section 7, whic h will inv olv e a mo diﬁed pa yoﬀ function for the one-shot approac hability of half-spaces. Coun ter-example (adapted from P erchet, 2011a). W e show that the equiv alence of Prop osition 1 does not hold in general if the conv ex set C at hand is not a half- space. T o do so, we exhibit a game and a set C which is ( r, F ull) –approac hable but not ( r , Dark) –approachable. W e set I = { T , B } and J = { L, R } , and the pay oﬀ function r is giv en by the matrix L R T (0 , 0) (1 , − 1) B ( − 1 , 1) (0 , 0) W e consider the set C orth  (0 , 0)  = ( R − ) 2 . This set is ( r, F ull) –approachable as indicated b y the dual characterization (4): for each α ∈ [0 , 1] , r  αT + (1 − α ) B , αL + (1 − α ) R  = (0 , 0) ∈ C orth  (0 , 0)  . On the other hand, consider the signaling structure H = Dark , for which the only ﬂag is ∅ . F or all actions of the play er, i.e., for all α ∈ [0 , 1] , it holds that m  αT + (1 − α ) B , ∅  = n r  αT + (1 − α ) B , y  : y ∈ ∆( J ) o =  ( λ, − λ ) ; λ ∈ [ − α, 1 − α ]  * C orth  (0 , 0)  . Therefore, the c haracterization of r –approac hable closed con vex sets (5) do es not hold when pla ying in the dark. Upp er-righ t-corner prop erty . W e deﬁne the upp er-right c orner function R : ∆( I ) × F → R d of the compatible pa y oﬀ function m in a comp onent-wise manner. W e write the co ordinates of R as R = ( R 1 , . . . , R d ) and set, for all k ∈ { 1 , . . . , d } , ∀ x ∈ ∆( I ) , ∀ h ∈ F , R k ( x, h ) = max n ω k : ω =  ω 1 , . . . , ω d  ∈ m ( x, h ) o . The construction of R is illustrated in Figure 2. Note that the ` 2 –norm of R is in general b ounded b y M √ d . The term “upp er-right corner” comes from the fact that R ( x, h ) is the (comp onent- wise) smallest a such that m ( x, h ) ⊆ C orth ( a ) . Controlling the distance of R ( x, h ) to the orthan t en tails controlling the distance of the whole set m ( x, h ) to it. Thus, the p oin t R ( x, h ) is in some sense the worst-case pay oﬀ vector asso ciated with m ( x, h ) . Of course, R ( x, h ) is in general not a feasible pay oﬀ v ector, i.e., R ( x, h ) 6∈ m ( x, h ) . W e are interested in this section in the case where the upp er-righ t corner is indeed a feasible pa yoﬀ—an assumption that we call the upp er-right-c orner pr op erty . 9 Figure 2: F our illustrations of compatible pa y oﬀ sets m ( x, h ) and asso ciated upp er-righ t corners R ( x, h ) . In the t w o examples on the left, th is upp er-righ t corner d o es no t b elo ng to the set, while it do es in the t w o on the r igh t. (When it is so for all x and h , the g ame is said to ha v e the upp e r-righ t-co rner prop ert y .) Deﬁnition 2. The game ( r , H ) with p artial mo nitoring has the upp e r-righ t-co rner prop- ert y if ∀ x ∈ ∆( I ) , ∀ h ∈ F , R ( x, h ) ∈ m ( x, h ) . Of cour se, game s with full monitoring h a v e the upp er-righ t-corner prop ert y , a s for them F can b e iden t iﬁed with ∆( J ) and m can b e iden t iﬁed with the function { r } with v alues in the set o f all singleton subset s of R d . By deﬁnition, i n a game with the upp er-righ t-corner prop ert y , the ` 2 –norm of R is b ou nded b y M . Primal c haracterization under the upp er-righ t-corner prop ert y . The follo wing prop osition w as impl icitly used b y K ohlb erg (1975). Prop osition 2. F or al l games ( r , H ) with p artial monitoring that have the upp er-right- c orner pr op erty, for al l orthants C orth ( a ) , wher e a ∈ R d , C orth ( a ) is ( r , H ) –appr o ach able ⇐ ⇒ every half-sp ac e C hs ⊃ C orth ( a ) is one-shot r –appr o ach able . Stated diﬀeren tly , using Bla c kw ell’s primal c haracterizat ion of approac habilit y (The- orem 1), an or than t C orth ( a ) is ( r , H ) –approac hable in a game ( r , H ) satisfying the upp er- righ t-corner prop ert y if and only if C orth ( a ) is r –approac hable w ith full monitoring. Pr o of. The di rect implication is pro v ed b y applying Prop osition 1 to an y half-space C hs ⊃ C orth ( a ) , whic h is in particular ( r , H ) –approac hable as so on as C orth ( a ) is. The in teresting i mplication is th us the con v erse one. So, w e assume that ev ery half-space C hs ⊃ C orth ( a ) is one-shot r –approac hable and, follo wing K oh lb erg’s original pro of and inspired b y Blac kw ell’s strategy in the case of full monitoring, w e construct an ( r , H ) –approac habilit y strategy of C orth ( a ) . Flags observe d, mixe d p ayoﬀs obtaine d. F or simplici t y , assume ﬁrst that after stage n , the observ ation made b y the pla y er is not just the random si gnal s n but the en tire 10 Figure 2: F our illustrations of compatible pa yoﬀ sets m ( x, h ) and asso ciated upp er-right corners R ( x, h ) . In the tw o examples on the left, this upp er-right corner do es not b elong to the set, while it do es in the t wo on the right. (When it is so for all x and h , the game is said to ha ve the upp er-right-corner property .) Deﬁnition 2. The game ( r , H ) with p artial monitoring has the upp er-right-corner prop- ert y if ∀ x ∈ ∆( I ) , ∀ h ∈ F , R ( x, h ) ∈ m ( x, h ) . Of course, games with full monitoring ha ve the upp er-right-corner prop ert y , as for them F can be identiﬁed with ∆( J ) and m can b e iden tiﬁed with the function { r } with v alues in the set of all singleton subsets of R d . By deﬁnition, in a game with the upp er-righ t-corner prop ert y , the ` 2 –norm of R is b ounded b y M . Primal characterization under the upp er-righ t-corner prop ert y . The follo wing prop osition w as implicitly used by Kohlberg (1975). Prop osition 2. F or al l games ( r, H ) with p artial monitoring that have the upp er-right- c orner pr op erty, for al l orthants C orth ( a ) , wher e a ∈ R d , C orth ( a ) is ( r, H ) –appr o achable ⇐ ⇒ every half-sp ac e C hs ⊃ C orth ( a ) is one-shot r –appr o achable . Stated diﬀeren tly , using Blackw ell’s primal c haracterization of approachabilit y (The- orem 1), an orthant C orth ( a ) is ( r, H ) –approac hable in a game ( r , H ) satisfying the upp er- righ t-corner prop erty if and only if C orth ( a ) is r –approac hable with full monitoring. Pr o of. The direct implication is pro v ed b y applying Proposition 1 to any half-space C hs ⊃ C orth ( a ) , which is in particular ( r, H ) –approac hable as soon as C orth ( a ) is. The in teresting implication is thus the conv erse one. So, we assume that every half-space C hs ⊃ C orth ( a ) is one-shot r –approachable and, follo wing K ohlb erg’s original pro of and inspired b y Blac kw ell’s strategy in the case of full monitoring, w e construct an ( r, H ) –approachabilit y strategy of C orth ( a ) . 10 Flags observe d, mixe d p ayoﬀs obtaine d. F or simplicity , assume ﬁrst that after stage n , the observ ation made by the play er is not just the random signal s n but the entire v ector of probability distributions o ver the signals h n = H ( y n ) . (W e will indicate b elo w wh y this is not a restrictive assumption.) W e consider the surrogate pay oﬀ v ector R n = R ( x n , h n ) , which is a quantit y th us observed by the play er. When R n do es not already b elong to C orth ( a ) , since the latter set is con vex, the half-space C hs , n deﬁned b y C hs , n = n ω ∈ R d :  ω − π C orth ( a )  R n  , R n − π C orth ( a )  R n  6 0 o con tains C orth ( a ) , where we recall that π C orth ( a ) is the orthogonal pro jection on to C orth ( a ) . By assumption, C hs , n is th us one-shot r –approachable. That is, there exists x n +1 ∈ ∆( I ) suc h that ∀ y ∈ ∆( J ) ,  r ( x n +1 , y ) − π C orth ( a )  R n  , R n − π C orth ( a )  R n  6 0 . (6) By the upp er-righ t-corner prop erty , R n +1 ∈ m ( x n +1 , h n +1 ) , whic h en tails that there exists y 0 n +1 ∈ ∆( J ) such that H  y 0 n +1  = h n +1 and R n +1 = r  x n +1 , y 0 n +1  . As a con- sequence, R n +1 b elongs to C hs , n and the sequence ( R n ) satisﬁes the following condition, usually referred to as Blackwel l’s c ondition :  R n +1 − π C orth ( a )  R n  , R n − π C orth ( a )  R n  6 0 . This condition is trivially satisﬁed when R n already b elongs to C orth ( a ) . Just as (2) leads to (3), this condition implies that d C orth ( a )  R n  6 2 M / √ n . No w, (1 /n ) P n t =1 r ( x t , y t ) ∈ (1 /n ) P n t =1 m ( x t , h t ) and, as R is the upp er-righ t corner function, (1 /n ) P n t =1 m ( x t , h t ) ⊆ C orth  R n  . That is, (1 /n ) P n t =1 r ( x t , y t ) is component- wise smaller than R n . Since the distance to the orthant C orth ( a ) equals, for all ω ∈ R d , d C orth ( a ) ( ω ) = v u u t d X k =1 max { ω k − a k , 0 } 2 , (7) w e get that d C orth ( a ) 1 n n X t =1 r ( x t , y t ) ! 6 d C orth ( a )  R n  6 2 M √ n . Finally , by martingale conv ergence theorems (e.g., the Ho eﬀding–Azuma inequality), the sequence of the distances d C orth ( a )  r n  con verges uniformly to 0 with resp ect to strategies of Nature. The ab ov e argumen ts are illustrated in Figure 3. Flags not observe d, only r andom signals observe d, pur e p ayoﬀs. It remains to relax the assumption that the ﬂags h n are observed, while only the signals s n dra wn at random according to H ( i n , j n ) are. A standard tric k in the literature of partial monitoring (see K ohlb erg, 1975, Mertens et al., 1994, Lugosi et al., 2008) solv es the issue, together with martingale conv ergence theorems and the fact that the upp er-right corner R is a Lipschitz function for a w ell-chosen metric o ver sets (see Lemma 1 below). W e brieﬂy describ e 11 a R n (1 /n ) P n t =1 m ( x t , h t ) C hs , n m ( x n +1 , h ) m ( x n +1 , h 0 ) R ( x n +1 , h ) R ( x n +1 , h 0 ) Figure 3: Illustration of the guaran tee (6), in the case when the sets of compatible pa y oﬀs m ( x, h ) are all giv en b y r ectangles. sequence of elemen t s γ b ∈ (0 , 1) con v ergi ng to 0 is needed. The same mixed distribution x ( b ) is pl a y ed at all stages of blo c k b b y the p la y er; that is, x bL + t = x ( b ) for a ll 1 6 t 6 L b . This distribution is c hosen b y mixing a distribution x ( b ) orig. satisfying a constrain t of the form of (6) with the u niform distri bution. This is d one with resp ec tiv e w e igh ts 1 − γ b and γ b . Th e distribution x ( b ) then puts a p osi tiv e probabilit y mass of at least γ b > 0 on all actions. Doing so, an est imator of the a v erage ﬂag on bl o c k b can b e constructed. Its accuracy , as w ell a s the price to pa y for the mixin g, dep end o n γ b and L b . By suc h a price, w e mean ho w farth er a w a y w e are from C orth ( a ) b eca use w e did not pla y x ( b ) orig. but x ( b ) instead. Informally , e ac h blo c k no w pla ys the role of a stage in the setting ab o v e when ﬂags w ere o bserv ed. One can sho w that sui table v alues o f L b and γ b lead to uniform con v ergence of the a v erage pa y oﬀs, measured in terms of pure actions x t , to C orth ( a ) . Also, similar martingale con v ergence argum en ts sho w that measuring pa y oﬀs in terms of the mixed actions x t or p ure actions i t do es not matter. As indicated, w e omit th e tec hnical pro of of these facts (it already ap p eared in all the giv en reference s) but not ice, ho w ev er, that rates of con v ergence are adv ersely aﬀecte d b y this tric k. Remark 1. The construction in the pro o f ab o v e sh o ws that un der the upp er-r igh t- corner prop ert y , it is necessary and suﬃcien t to con tr ol t he b eha vior of the upp er-righ t corners R n . This prop ert y w as used only to sho w that R n +1 is equal to some r ( x n +1 , y 0 n +1 ) and th us that the seque nce ( R n ) satisﬁes Blac kw ell’s condition. When the prop ert y is no t satisﬁed an y more, the sequ ence of the upp er-righ t corners m a y fail to satisfy this condi- tion. F or instance, in the coun ter-example at the b e ginning of the p resen t section, the upp er-righ t corners equal ∀ α ∈ [0 , 1] , R  α T + (1 − α ) B , ∅  = ( α , 1 − α ) , so that, for all strategies of the pla y er, R n = ( λ, 1 − λ ) for some λ ∈ [0 , 1] . Th us, the distance of R n to C orth  (0 , 0)  is alw a ys larg er than 1 / √ 2 . 12 Figure 3: Illustration of the guarantee (6), in the case when the sets of compatible pa yoﬀs m ( x, h ) are all giv en by rectangles. this trick without w orking out the lengthy details. Time is divided into blo cks of time indexed b y b = 1 , 2 , . . . and with resp ective (large and increasing) lengths L b . Another sequence of elemen ts γ b ∈ (0 , 1) con verging to 0 is needed. The same mixed distribution x ( b ) is pla yed at all stages of blo c k b by the play er; that is, x bL + t = x ( b ) for all 1 6 t 6 L b . This distribution is c hosen b y mixing a distribution x ( b ) orig. satisfying a constraint of the form of (6) with the uniform distribution. This is done with resp ective weigh ts 1 − γ b and γ b . The distribution x ( b ) then puts a positive probabilit y mass of at least γ b > 0 on all actions. Doing so, an estimator of the a verage ﬂag on blo ck b can b e constructed. Its accuracy , as well as the price to pa y for the mixing, dep end on γ b and L b . By such a price, w e mean how farther a wa y we are from C orth ( a ) b ecause we did not play x ( b ) orig. but x ( b ) instead. Informally , eac h blo c k now plays the role of a stage in the setting ab o ve when ﬂags were observ ed. One can show that suitable v alues of L b and γ b lead to uniform conv ergence of the av erage pay oﬀs, measured in terms of pure actions x t , to C orth ( a ) . Also, similar martingale conv ergence arguments show that measuring pay oﬀs in terms of the mixed actions x t or pure actions i t do es not matter. As indicated, we omit the technical pro of of these facts (it already appeared in all the giv en references) but notice, ho wev er, that rates of con vergence are adversely aﬀected by this tric k. Remark 1. The construction in the pro of abov e shows that under the upp er-right- corner prop erty , it is necessary and suﬃcient to control the b ehavior of the upp er-righ t corners R n . This prop ert y was used only to show that R n +1 is equal to some r ( x n +1 , y 0 n +1 ) and th us that the sequence ( R n ) satisﬁes Blackw ell’s condition. When the prop ert y is not satisﬁed anymore, the sequence of the upp er-righ t corners may fail to satisfy this condi- tion. F or instance, in the counter-example at the b eginning of the presen t section, the upp er-righ t corners equal ∀ α ∈ [0 , 1] , R  αT + (1 − α ) B , ∅  = ( α, 1 − α ) , so that, for all strategies of the play er, R n = ( λ, 1 − λ ) for some λ ∈ [0 , 1] . Th us, the distance of R n to C orth  (0 , 0)  is alw ays larger than 1 / √ 2 . 12 6. In termezzo: K ohlb erg’s rep eated games with incomplete information W e consider in this section a diﬀeren t, yet related framew ork, which is the main fo cus of K ohlb erg (1975). W e ﬁrst describ e a setting where d games with partial monitoring are to b e pla yed simultaneously , and then establish the formal connection with Kohlberg’s results. Sim ultaneous games with partial monitoring. W e consider d suc h games, with common action sets I and J for the pla yer and Nature and common set H of signals, but with p ossibly diﬀerent pa yoﬀ functions and signaling structures. W e index these games by g . F or eac h game g ∈ { 1 , . . . , d } , the play er’s pay oﬀ function is denoted by r ( g ) : I × J → R and the signaling structure is giv en b y H ( g ) : I × J → ∆( H ) , with asso ciated maximal information mapping H ( g ) : ∆( J ) → ∆( H ) I . W e put some restrictions on the strategies of the pla yer and of Nature. The pla yer ma y only choose one action x t ∈ ∆( I ) at eac h stage t , the same for all games g . On the other hand, Nature can choose diﬀeren t mixed actions y ( g ) t in each game g , but these need to b e non-r eve aling , that is, they need to induce the same ﬂags. More formally , they need to b e pic ked in the following set, whic h w e assume to b e non-empty: NR = n  y (1) , . . . , y ( d )  ∈ ∆( J ) d : H (1)  y (1)  = · · · = H ( d )  y ( d )  o . (8) The ab o ve framew ork of sim ultaneous games can b e embedded into an equiv alent game that ﬁts the mo del studied in the previous sections. Indeed, by linearit y of each H ( g ) , the set NR of non-revealing actions is a p olytop e, thus it is the conv ex hull of its ﬁnite set of extremal p oin ts. W e denote the cardinalit y of the latter b y K and we write its elemen ts as K = n  y ( g ) 1  1 6 g 6 d , . . . ,  y ( g ) K  1 6 g 6 d o . Eac h  y ( g )  1 6 g 6 d ∈ NR can then b e represented by an element of ∆( K ) . Con versely , eac h z = ( z k ) k 6 K ∈ ∆( K ) induces the follo wing element of NR : Y ( z ) =  Y ( g ) ( z )  1 6 g 6 d = K X k =1 z k  y ( g ) k  1 6 g 6 d . So, with no loss of generality , w e can assume that K is the ﬁnite set of actions of Nature and that, giv en z ∈ ∆( K ) and x ∈ ∆( I ) , the pa yoﬀ in the game g is r ( g )  x, Y ( g ) ( z )  . This deﬁnes naturally an auxiliary game with linear vector-v alued pay oﬀ function r : ∆( I ) × ∆( K ) → R d and maximal information mapping H : ∆( K ) → ∆( H ) I deﬁned b y r ( x, z ) =  r ( g )  x, Y ( g ) ( z )   1 6 g 6 d and H ( z ) = H ( g )  Y ( g ) ( z )  for all g . 13 (The deﬁnition of H is indep enden t of g b y construction, as we restricted Nature to use non-revealing strategies.) This maximal information mapping H corresponds to an underlying signaling structure whic h we denote by H : I × K → ∆( H ) . The game ( r , H ) constructed ab ov e satisﬁes the upp er-right-corner prop ert y . Indeed, for all h ∈ H  ∆( K )  and all x ∈ ∆( I ) , m ( x, h ) =   r (1)  x, Y (1) ( z )  , . . . , r ( d )  x, Y ( d ) ( z )   : z ∈ ∆( K ) s.t. H ( z ) = h  =   r (1)  x, y (1)  , . . . , r ( d )  x, y ( d )   :  y (1) , . . . , y ( d )  ∈ ∆( J ) d s.t. H (1)  y (1)  = · · · = H ( d )  y ( d )  = h  . Because of the separation of the v ariables in the constraint, the following set, given h , n  y (1) , . . . , y ( d )  ∈ ∆( J ) d : H (1)  y (1)  = · · · = H ( d )  y ( d )  = h o is a Cartesian pro duct of subsets of ∆( J ) . Thus, its image m ( x, h ) by the mapping  y (1) , . . . , y ( d )  ∈ ∆( J ) d 7− →  r (1)  x, y (1)  , . . . , r ( d )  x, y ( d )   is also a Cartesian pro duct of closed interv als of R . In particular, the latter set contains its upp er-righ t corner, that is, R ( x, h ) ∈ m ( x, h ) , as claimed. W e assume, with no loss of generalit y , that in these simultaneous games, Nature max- imizes the pa yoﬀs and the play er minimizes them. A question that naturally arises—and whose answer will b e needed b elow—is to determine for which v ectors a = ( a 1 , . . . , a d ) ∈ R d the pla yer can sim ultaneously guarantee that his a verage pay oﬀ will b e smaller than a g in the limit in eac h game g ∈ { 1 , . . . , d } ; that is, to determine which orthan ts C orth ( a ) are ( r , H ) –approachable. By the exhibited upp er-righ t-corner prop erty , Prop osition 2 sho ws that a necessary and suﬃcient condition for this is that all containing half-spaces of C orth ( a ) be one-shot r –approachable. These half-spaces are parameterized b y the con vex distributions q ∈ ∆  { 1 , . . . , d }  and are denoted b y C ( q ) hs =  ω ∈ R d : h ω , q i 6 h a, q i  . (9) Stated equiv alently , the orthant C orth ( a ) is ( r , H ) –approachable if and only if the v alue of the zero-sum game with pay oﬀ function ( x, z ) ∈ ∆( I ) × ∆( K ) 7→ h r ( x, z ) , q i is smaller than h a, q i for all q ∈ ∆( { 1 , . . . , d } ) . K ohlb erg’s mo del of rep eated games with incomplete information. The set- ting of rep eated games with incomplete information, introduced b y Aumann and Maschler (1995), relies on the same ﬁnite family of games  r ( g ) , H ( g )  , where g ∈ { 1 , . . . , d } , 14 as describ ed ab o ve. They will ho wev er not b e pla yed simultaneously . Instead, a sin- gle game (state) G ∈ { 1 , . . . , d } is dra wn according to some probability distribution p ∈ ∆  { 1 , . . . , d }  kno wn by b oth the play er and Nature. Y et only Nature (and not the pla yer) is informed of the true state G . A rep eated game with partial monitoring then tak es place b etw een the play er and Nature in the G –th game. Pa yoﬀs are ev aluated in exp ectation with resp ect to the random dra w of G according to p . F or simplicit y , w e assume that all mappings H ( g ) ha ve the same range 2 and, with no loss of generality , that p has full supp ort. Because of these tw o prop erties, the considered setting of rep eated games with incomplete information can then b e embedded, from the pla yer’s viewp oint, into the ab ov e-describ ed setting of d sim ultaneous games under the restriction that Nature resorts to non-revealing strategies. Indeed, from the play er’s viewp oin t and because of the identical range of the H ( g ) , the mixed action used by Nature in the game G can b e though t of as the G –th component of some vector of mixed actions in the set NR deﬁned in (8). W e use the notation deﬁned ab ov e: as pa yoﬀs are ev aluated in exp ectation, the pa yoﬀ function is formed by the inner pro ducts ( x, z ) ∈ ∆( I ) × ∆( K ) 7→ h r ( x, z ) , p i . W e recall that Nature maximizes the pay oﬀ and that the pla yer minimizes it. F or eac h q ∈ ∆( { 1 , . . . , d } ) , w e denote by u ( q ) the v alue of the one-shot zero-sum game Γ( q ) with pay oﬀs ( x, z ) ∈ ∆( I ) × ∆( K ) 7→ h r ( x, z ) , q i . W e show that, as pro ved in the men tioned references, the v alue U of this rep eated game, as a function of the distribution p , may b e larger than u ( p ) and is given by cav[ u ]( p ) , where ca v[ u ] is the smallest conca ve function ab o ve u . First, the so-called splitting lemma sho ws that U is conca ve. Therefore, we ha ve U > cav[ u ] . (F or the splitting lemma, see Aumann and Masc hler, 1995 and also Mertens et al., 1994, Section V.1 or Sorin, 2002.) The inequalit y of in terest to us is the con verse one. Using the concavit y of the mapping cav[ u ] , Kohlberg (1975, Corollary 2.4) pro ves that for all p ∈ ∆( { 1 , . . . , d } ) , there exists some a p ∈ R d suc h that ca v [ u ]( p ) = h a p , p i and ∀ q ∈ ∆( { 1 , . . . , d } ) , ca v [ u ]( q ) 6 h a p , q i . In particular, u ( q ) 6 h a p , q i for all q ∈ ∆( { 1 , . . . , d } ) . The equiv alence stated after (9) sho ws that C orth ( a p ) is therefore ( r, H ) –approac hable. Hence, no matter the strategy of Nature, the pa y oﬀ in state G is asymptotically smaller than the G –th comp onen t of a p . (This is true for all realizations of G .) As a consequence, in exp ectation (with resp ect to the random c hoice of G ), the pay oﬀ is smaller than h a p , p i = cav[ u ]( p ) . This shows that U ( p ) 6 ca v[ u ]( p ) . In conclusion, Kohlberg (1975) implicitly used the consequences of the upp er-righ t- corner prop erty detailed ab ov e when constructing an optimal strategy for the uninformed pla yer. A close insp ection reveals that Lemma 5.4 therein do es not hold anymore in 2 In full generality , when this is n ot the case, Nature may resort to strategies that reveal that the true state G b elongs to some strict subset of { 1 , . . . , d } , and the play er must adapt his strategy in corresp ond ence with this knowledge, see Kohlberg (1975). But our assumption already captures the basic idea of the use of approachabilit y in this framew ork and the alluded technical adaptations are b ey ond the scop e of this pap er. 15 the more general framew ork without the upp er-right-corner prop erty (in particular, one migh t wan t to read it again with Remark 1 in mind). 7. Primal approac habilit y of orthan ts in the general case W e noted that the primal c haracterization in terms of one-shot r –approachabilit y of con taining half-spaces stated in Proposition 2 did not extend to games ( r, H ) without the upp er-right-corner prop erty . W e show in this section that it holds true in the general case when one-shot approac hability is with resp ect to the mo diﬁed pay oﬀ function e r H : ∆( I ) × ∆( J ) → R d deﬁned as follo ws: ∀ x ∈ ∆( I ) , ∀ y ∈ ∆( J ) , e r H ( x, y ) = R  x, H ( y )  . The c hange of pay oﬀ function can b e intuitiv ely explained as follows. As noted in Sec- tion 5, when the target sets are giv en b y orthan ts (and only b ecause of this), the b ehavior of (a verages of ) sets of compatible pa yoﬀs is dictated by their upp er-righ t corners. No w, the upp er-righ t-corner prop ert y indicated that even when measuring pay oﬀs with r , the w orst-case pay oﬀs were given b y the upp er-right corners and that it was thus necessary and suﬃcien t to consider the latter. If this prop erty do es not hold, then ev aluating actions with e r H enables and forces the consideration of these corners. Of course, in the case of full monitoring, as follows from the commen ts after Deﬁni- tion 2, no mo diﬁcation tak es place in the pay oﬀ function, that is, e r F ull = r . The main result of this section is the follo wing primal c haracterization. The rest of this section will then sho w how it leads to a new approac hability strategy under partial monitoring, based on surrogate pay oﬀs (upper-right corner pa yoﬀs) and not only on signals or on estimated ﬂags, as previously done in the literature (e.g., in the references men tioned in the last part of the pro of of Prop osition 2). Theorem 2. F or al l games ( r, H ) with p artial monitoring, for al l orthants C orth ( a ) , wher e a ∈ R d , C orth ( a ) is ( r, H ) –appr o achable ⇐ ⇒ every half-sp ac e C hs ⊃ C orth ( a ) is one-shot e r H –appr o achable . The proof of this theorem is as follows. The dual c haracterization (5) indicates that a necessary and suﬃcien t condition of ( r , H ) –approac hability for C orth ( a ) is that for all y ∈ ∆( J ) , there exists x ∈ ∆( I ) such that m  x, H ( y )  ⊆ C orth ( a ) . Since, by construction of R , the smallest orthant (in the sense of inclusion) in which m  x, H ( y )  is con tained is precisely C orth  e r H ( x, y )  , the necessary and suﬃcient condition can b e restated as the requirement that for all y ∈ ∆( J ) , there exists x ∈ ∆( I ) such that e r H ( x, y ) ∈ C orth ( a ) . Now, this reformulated dual characterization of approac hability in the context of orthan ts is seen to b e equiv alent to the follo wing primal characterization, whic h concludes the pro of of the theorem. 16 Prop osition 3. F or al l games ( r , H ) with p artial monitoring, for al l orthants C orth ( a ) , wher e a ∈ R d , ∀ y ∈ ∆( J ) , ∃ x ∈ ∆( I ) , e r H ( x, y ) ∈ C orth ( a ) ⇐ ⇒ every half-sp ac e C hs ⊃ C orth ( a ) is one-shot e r H –appr o achable . Before pro ving this prop osition, we need to state some prop erties of the function e r H . Giv en tw o p oints a, a 0 ∈ R d , the notation a 4 a 0 means that a is comp onen t-wise smaller than a 0 —or equiv alently , that a belongs to the orthant C orth ( a 0 ) . Lemma 1. The function e r H is Lipschitz c ontinuous. It is also c onvex in its ﬁrst ar- gument and c onc ave in its se c ond ar gument, in the sense that, for al l x, x 0 ∈ ∆( I ) , al l y , y 0 ∈ ∆( J ) , and al l λ ∈ [0 , 1] , e r H  λx + (1 − λ ) x 0 , y  4 λ e r H ( x, y ) + (1 − λ ) e r H ( x 0 , y ) and λ e r H ( x, y ) + (1 − λ ) e r H ( x, y 0 ) 4 e r H  x, λy + (1 − λ ) y 0  . Pr o of. Con vexit y and concavit y follow from the concavit y and the con vexit y of m for inclusion. F ormally , it follo ws from the v ery deﬁnition (1) of m and from the linearit y of r and H that, for all x, x 0 ∈ ∆( I ) , all h , h 0 ∈ F , and λ ∈ [0 , 1] , m  λx + (1 − λ ) x 0 , h  ⊆ λ m ( x, h ) + (1 − λ ) m ( x 0 , h ) and λ m ( x, h ) + (1 − λ ) m ( x, h 0 ) ⊆ m  x, λ h + (1 − λ ) h 0  . The second part of the lemma follo ws b y taking upper-right corners, whic h is a linear and non-decreasing op eration (for the resp ectiv e partial orders ⊆ and 4 ). As for the Lipsc hitz prop erty of e r H , it follo ws from a rewriting of m  x, H ( y )  as m  x, H ( y )  = X b ∈B φ b  H ( y )  r  x, H − 1 ( b )  , where B is a ﬁnite subset of F , the φ b are Lipschitz functions F → [0 , 1] , and H − 1 is the pre-image function of H , whic h takes v alues in the set of compact subsets of ∆( J ) . This rewriting was pro ved in Mannor et al. (2011, Lemma 6.1 and Remark 6.1). W e equip the set of compact subsets of the Euclidian ball with center (0 , . . . 0) and radius M , in which m tak es its v alues, with the Hausdorﬀ distance. F or this distance, x 7→ r  x, H − 1 ( b )  is M –Lipschitz for each b ∈ B . All in all, given the b oundedness of the φ b and of r , the mapping ( x, y ) 7→ m  x, H ( y )  is also Lipsc hitz contin uous. Since taking the upp er-right corner is a Lipsc hitz mapping as well (for the Hausdorﬀ distance), w e get, by comp osition, the desired Lipsc hitz prop erty for e r H . W e are now ready to pro ve Proposition 3. (Note that it needs a pro of and that it is not implied b y the v arious results discussed in Section 5. Indeed, ( e r H , H ) is an auxiliary game which, b y construction, has the upp er-right-corner prop ert y , but e r H is not linear, while linearity of the pa yoﬀ function was a crucial feature of the setting studied therein.) 17 Pr o of of Pr op osition 3. W e start with the direct implication and consider some contain- ing half-space C hs . The latter is parameterized by α ∈ R d and β ∈ R , and equals C hs =  ω ∈ R d : h α, ω i 6 β  . Since C hs con tains the orthant C orth ( a ) , there are sequences ( ω n ) in C hs with comp onen ts tending to −∞ . Therefore, we necessarily ha ve that α < 0 . The con vexit y/conca vity of e r H in the sense of 4 thus entails that the function G α,β : ( x, y ) 7− →  α, e r H ( x, y )  − β is also conv ex/concav e. The con tinuit y of G α,β follo ws from the one of e r H . The Sion–F an lemma applies and guaran tees that min x ∈ ∆( I ) max y ∈ ∆( J ) G α,β ( x, y ) = max y ∈ ∆( J ) min x ∈ ∆( I ) G α,β ( x, y ) , (the suprema and inﬁma are all attained and are denoted by maxima and minima). Now, b y assumption, for all y ∈ ∆( J ) , there exists x ∈ ∆( I ) suc h that e r H ( x, y ) ∈ C orth ( a ) . This means that the ab ov e max min G α,β is non-p ositive. Putting all things together, we ha ve prov ed that min x ∈ ∆( I ) max y ∈ ∆( J ) G α,β ( x, y ) 6 0 . That is, there exists x 0 ∈ ∆( I ) , e.g., the element attaining the ab o ve maximum, suc h that for all y ∈ ∆( J ) , one has G α,β ( x 0 , y ) 6 0 , or, equiv alently , e r H ( x 0 , y ) ∈ C orth ( a ) . This prop ert y is exactly the stated one-shot e r H –approac hability of C orth ( a ) . Con versely , assume that there exists some y 0 ∈ ∆( J ) suc h that, for all x ∈ ∆( I ) , one has e r H ( x, y 0 ) 6∈ C orth ( a ) . By con tinuit y of e r H and closedness of C orth ( a ) , there exists some δ > 0 suc h that d C orth ( a )  e r H ( x, y 0 )  > δ for all x ∈ ∆( I ) . No w, as indicated around (7), the distance to C orth ( a ) is non-decreasing for 4 . In view of the con vexit y of e r H in its ﬁrst argumen t, this shows that we also hav e d C orth ( a ) ( z ) > δ for all elements z of the con vex h ull C e r H ,y 0 of the set  e r H ( x, y 0 ) : x ∈ ∆( I )  . That is, the closed conv ex sets C e r H ,y 0 , whic h is compact, and C orth ( a ) , whic h is closed, are disjoint and thus, by the Hahn– Banac h theorem, are strictly separated b y some hyperplane. One of the tw o half-spaces th us deﬁned, namely , the one not con taining C e r H ,y 0 , is not one-shot e r H –approac hable. A new approac habilit y strategy of an orthan t under partial monitoring Theorem 2 suggests an approac hability strategy based on surrogate pay oﬀs and not only on the information gained, i.e., based on the mapping e r H and not only on the signal- ing structure H (and the estimated ﬂags). The ﬁrst approach was already considered b y Kohlberg (1975) while other works, lik e Perc het (2011a) and Mannor et al. (2011), resorted to the second one. The considered strategy is an adaptation of Blac kwell’s strategy (which was recalled after the statemen t of Theorem 1): such an adaptation is p ossible as the latter strategy only relies on the one-shot approac habilit y of half-spaces, whic h is satisﬁed here with the surrogate pay oﬀs e r H . 18 Description and con vergence analysis of the strategy . As in the pro of of Prop o- sition 2, w e assume initially that ﬂags h t = H ( y t ) are observed at the end of each round t and that mixed pay oﬀs are to b e con trolled. The play er then kno ws his mixed pa yoﬀs e r H,t := e r H ( x t , y t ) = R  x t , h t  and aims at controlling his av erage pay oﬀs, whic h we recall are denoted b y e r H,n . Similarly to what w as done in the pro of of Prop osition 2, the one-shot e r H –approac hability of the containing half-spaces of C orth ( a ) entails that for eac h round n , there exists x n +1 ∈ ∆( I ) suc h that ∀ y ∈ ∆( J ) , D e r H ( x n +1 , y ) − π C orth ( a )  e r H,n  , e r H,n − π C orth ( a )  e r H,n  E 6 0 . The sequence  e r H,n  th us satisﬁes Blackw ell’s condition and as a result w e get d C orth ( a ) 1 n n X t =1 e r H ( x t , y t ) ! 6 2 M √ d √ n . (See again the pro of of Prop osition 2 for this deriv ation and keep in mind that in the presen t setting where the upp er-righ t-corner prop ert y is not satisﬁed, R is only b ounded in ` 2 –norm b y M √ d .) Since r ( x t , y t ) ∈ m  x t , H ( y t )  ⊆ C orth  e r H ( x t , y t )  , we get r ( x t , y t ) 4 e r H ( x t , y t ) and, in view again of (7), d C orth ( a ) 1 n n X t =1 r ( x t , y t ) ! 6 d C orth ( a ) 1 n n X t =1 e r H ( x t , y t ) ! 6 2 M √ d √ n . The same tric k of playing i.i.d. in blocks as in the second part of the pro of of Prop osi- tion 2, together with martingale con vergence arguments, relaxes the assumptions of ﬂags b eing observ ed and pay oﬀs b eing ev aluated with mixed actions, leading to the desired ( r , H ) –approachabilit y strategy . (This is where w e need the Lipsc hitzness properties stated in Lemma 1 and its pro of.) A more careful study , whic h we omit here for the sake of brevit y , shows that ( r , H ) –approac hability takes place at a n − 1 / 5 –rate. What we prov ed in passing. W e prov ed in a constructive w ay that when an orthan t is ( r , H ) –approachable, it is also  e r H , F ull) –approachable. Con versely , assume that the equiv alen t conditions in Theorem 2 are not satisﬁed, i.e., that the orthant at hand, C orth ( a ) , is not ( r, H ) –approac hable. Then, (the pro of of ) Prop osition 3 indicates that there exists some y 0 ∈ ∆( J ) suc h that the set C orth ( a ) and the con v ex hull of  e r H ( x, y 0 ) , x ∈ ∆( I )  are strictly separated. This implies in particular that C orth ( a ) is  e r H , F ull  –excludable, and thus is not  e r H , F ull  –approac hable. Putting all things together, w e hav e prov ed the following equiv alence: C orth ( a ) is ( r , H ) –approachable ⇐ ⇒ C orth ( a ) is  e r H , F ull  –approac hable ⇐ ⇒ C orth ( a ) is not  e r H , F ull  –excludable. Note that the  e r H , F ull  –approac hability is a form of non-linear approac hability , b y whic h w e mean that the function e r H is not linear and y et, approac hability is p ossible. This result could be generalized (but w e omit the description of the extension for the sake of concision). 19 On the computational complexity of the ab o ve-described strategy . The strat- egy w e hav e exhibited reduces to solving, at eac h stage, a program of the form min x n +1 ∈ ∆( I )  max y ∈ ∆( J )  e r H ( x n +1 , y ) − β , α   for some vectors α , β ∈ R d . At ﬁrst sight, it cannot b e written as a ﬁnite linear program as e r H is not a linear function of its arguments. How ever, as prov ed in Mannor et al. (2011, Section 7.1), the function e r H is actually piecewise linear; that is, there exist some ﬁnite liftings of ∆( I ) and ∆( J ) with resp ect to whic h e r H is linear. (These liftings only need to be computed once, before the game starts.) Moreo ver, the p er-step computational complexit y of our strategy is constan t (in fact, it is polynomial in the sizes of these liftings; see Mannor et al., 2011 for more details). 8. Primal approac habilit y of p olytop es Recall that a con vex set C polyt is a p olytop e if it is the intersection of a ﬁnite num b er of half-spaces  ω ∈ R d : h ω , a ` i 6 b `  , for a ` , b ` ∈ R d and ` ranging in some ﬁnite set L . That is, C polyt = \ ` ∈L n ω ∈ R d : h ω , a ` i 6 b ` o =  ω ∈ R d : max ` ∈L h ω , a ` i − b ` 6 0  . (10) The following lemma (which is a mere exercice of rewriting) states that an approachabil- it y problem of a p olytop e can b e transformed in to an approachabilit y problem of some negativ e orthant. W e denote b y (0) L = (0 , . . . , 0) the null v ector of R L . The negative orthan t of R L is then denoted b y C orth  (0) L  . Lemma 2. The c onvex p olytop e C polyt deﬁne d in (10) is ( r, H ) –appr o achable if and only if the ne gative orthant C orth  (0) L  is ( s, H ) –appr o achable, wher e the ve ctor-value d p ayoﬀ function s : ∆( I ) × ∆( J ) → R L is deﬁne d as s ( x, y ) = h h r ( x, y ) , a ` i − b ` i ` ∈L . (11) Pr o of. The result follows from the equiv alence (see, e.g., prop erty 3 in App endix A.1 of P erchet, 2011b) of the distances to C polyt giv en by d C polyt and d C orth ((0) L )  T ( · )  , where T : R d → R L is the linear transformation ω 7→  h ω , a ` i − b `  ` ∈L . Theorem 2 can then b e rewritten, using Lemma 2 ab o ve, to pro vide the desired primal c haracterization of p olytop es. 20 Corollary 2. Consider the c onvex p olytop e C polyt given by (10) , to gether with the p ayoﬀ function s deﬁne d in (11) . Then, C polyt is ( r , H ) –appr o achable (12) ⇐ ⇒ every c ontaining half-sp ac e of C orth  (0) L  is one-shot e s H –appr o achable. When C polyt is indeed ( r, H ) –approachable, the results of the previous section provide an approachabilit y strategy of C orth  (0) L  based on the transformed pa yoﬀs e s H . This strategy also approaches C polyt in view of Lemma 2, ho wev er it migh t not b e represen table in the original space R d , as demonstrated in the follo wing (counter-)example. Example 1. Consider on the one hand the p olytop e C polyt =  ω ∈ R : ω ∈ [ − 1 , 1]  and the asso ciated linear transformation T deﬁned by T ( ω ) = ( ω − 1 , − ω − 1) ∈ R 2 for all ω ∈ R . Consider on the other hand the following game. The sets of pure actions are I = { T , B } and J = { L, R } , the signaling structure is H = Dark (with single signal denoted b y ∅ ), and the pay oﬀ function r is giv en by the matrix L R T − 1 2 B − 2 1 W e identify ∆( I ) and ∆( J ) with [0 , 1] . W e ﬁrst discuss the dual condition (5) for ( r, Dark) –approachabilit y . F or all x ∈ [0 , 1] , w e hav e m ( x, ∅ ) = [ − 2 + x, 1 + x ] . Thus, no mixed action x of the pla yer is such that m ( x, ∅ ) is included in C polyt , whic h is therefore not ( r, Dark) –approachable. W e no w turn to the primal condition as stated b y Corollary 2. W e denote by T = ( T 1 , T 2 ) the comp onen ts of the linear transformation T . F rom the linearit y of T , w e deduce from the ab ov e-stated expression of m (based on r ) that the sets of compatible pa yoﬀs in terms of s = T ( r ) are of the form T  m ( x, ∅ )  . T aking the maxima, w e th us get, for all mixed actions x ∈ [0 , 1] (and all y ∈ [0 , 1] as the game is play ed in the dark), e s Dark ( x, y ) =  max T 1  [ − 2 + x, 1 + x ]) , max T 2  [ − 2 + x, 1 + x ])  = ( x, 1 − x ) . Again, the necessary and suﬃcient condition for ( r , H ) –approachabilit y of C polyt fails, as no containing half-space of the negativ e orthan t but tw o of them is one-shot e s Dark – approac hable. More precisely , these half-spaces are parameterized by ( p, 1 − p ) where p ∈ [0 , 1] and corresp ond to the p oin ts  ( t 1 , t 2 ) ∈ R 2 : p 1 t 1 + p 2 t 2 6 0  . Except for the case when p = 0 or p = 1 , these half-spaces are strictly separated from the con vex set  ( x, 1 − x ) : x ∈ [0 , 1]  . The question now is whether w e could hav e determined this b y satisfying some primal condition in the original space R . First, consider some containing half-space of C polyt , t ypically , either ( −∞ , 1] or [ − 1 , + ∞ ) . Their transformations b y T into subsets of R 2 are included resp ectively in ( −∞ , 0] × R or R × ( −∞ , 0] . These are precisely the only t wo half-spaces that w ere one-shot e s Dark –approac hable (by resorting to one of the pure actions). Now, and more importantly , consider the containing half-space of the negativ e 21 orthan t in R 2 parameterized b y p = 1 / 2 , that is, C hs , 1 / 2 =  ( t 1 , t 2 ) ∈ R 2 : t 1 + t 2 6 0  . As indicated ab o ve, it is not one-shot e s Dark –approac hable. How ever, this half-space con tains all the original space, in the sense that T ( R ) ⊂ C hs , 1 / 2 , as follo ws from simple computations: T ( ω ) = ( ω − 1) + ( − ω − 1) = − 2 . Therefore, there is no hop e to pro ve, based even on general subsets of the original game with pa yoﬀs in R , that the necessary and suﬃcien t condition on the half-space C hs , 1 / 2 in the transformed space R 2 fails. The fundamental reason wh y the primal c haracterization in the transformed space cannot b e chec ked based on considerations in the original space is the follo wing. In the absence of a upp er-right-corner prop erty , the range of e s Dark is outside the range of s but w e can only access to the latter based on the original space. The moral of this example is that w e hav e to consider some hidden con taining half-spaces of the p olytop e C polyt in order to establish some primal characterization: this is precisely what Condition (12) do es. 9. Primal approac habilit y of general conv ex sets W e consider in this section the primal approachabilit y of general closed conv ex sets C . In the case of p olytop es, Lemma 2 was essen tially indicating that only ﬁnitely many directions in R d (the ones given by the a ` ) need b e considered. In the case of general con vex sets, all directions are to b e studied. W e do so b y resorting to supp ort functions, whic h we deﬁne based on the unit Euclidean sphere S =  ω ∈ R d : k ω k = 1  . More formally , the supp ort function φ C : S → R ∪ { + ∞} of a set C ⊆ R d is deﬁned b y ∀ s ∈ S , φ C ( s ) = sup  h c, s i : c ∈ C  . W e now construct a lifted setting in which one-shot approaching the containing half- spaces for some pay oﬀ function will b e equiv alen t to ( r, H ) –approac hing the original closed con vex set C . This setting is given by some set of integrable functions on S . W e equip the latter with the (induced) Leb esgue measure, for whic h S has a ﬁnite measure. That is, w e consider the set 2 ( S ) of Lebesgue square integrable functions S → R , equipp ed with the inner pro duct ( f , g ) ∈ 2 ( S ) × 2 ( S ) 7− → Z S f ( s ) g ( s ) s . . The orthan t in 2 ( S ) corresp onding to C ⊆ R d is C orth ( φ C ) =  f ∈ 2 ( S ) : f 6 φ C  . The description of the lifted setting is concluded by stating the considered pay oﬀ function Φ : ∆( I ) × ∆( J ) → 2 ( S ) . It indicates, as in the previous sections, how to transform pa yoﬀs given the signaling structure H . F ormally , ∀ x ∈ ∆( I ) , ∀ y ∈ ∆( J ) , Φ( x, y ) = φ m ( x, H ( y )) . 22 The square integrabilit y of Φ( x, y ) follows its b oundedness, whic h itself stems from the b oundedness of m  x, H ( y )  . (See Lemma 3 in app endix, prop erty 1, for a reminder of this w ell-known result and others on supp ort functions.) W e are now ready to state the primal characterization of approachabilit y with partial monitoring in the general case. Theorem 3. F or al l games ( r , H ) , for al l close d c onvex sets C ⊂ R d , C is ( r, H ) –appr o achable ⇐ ⇒ every half-sp ac e C hs ⊃ C orth ( φ C ) is one-shot Φ –appr o achable . Pr o of. W e ﬁrst note that we can assume with no loss of generality that C is b ounded thus compact. Indeed, C is ( r, H ) –approac hable if and only if its in tersection C ∩ r  ∆( I × J )  with the b ounded conv ex set of feasible pay oﬀs is approachable. This en tails that φ C ∈ 2 ( S ) . No w, the pro of follows along the lines of the pro of of Theorem 2. In particular, we exploit the dual c haracterization (5), that indicates that for all y ∈ ∆( J ) , there exists x ∈ ∆( I ) suc h that m  x, H ( y )  ⊆ C . It can b e restated equiv alently (see Lemma 3 in app endix, prop erty 3) as stating that for all y ∈ ∆( J ) , there exists x ∈ ∆( I ) such that Φ( x, y ) 6 φ C . W e th us only need to show that the stated primal c haracterization is equiv alent to the latter condition. W e start with the direct implication (from the dual condition to the primal condition). As recalled in the pro of of Lemma 1, the function m is conca v e/conv ex, whic h, together with prop erties 3 and 4 of Lemma 3, shows that Φ is also con vex/conca ve. Moreo ver, as prov ed at the end of the pro of of Lemma 1, the function ( x, y ) 7→ m  x, H ( y )  is a Lipschitz function, with Lipsc hitz constant denoted by L m , when the set of compact subsets of the Euclidean ball of R d with cen ter (0 , . . . , 0) and radius M is equipp ed with the Hausdorﬀ distance. This entails that Φ is also a Lipschitz function, with constant L m V , where V is the v olume of S for the induced Leb esgue measure. This is b ecause the Hausdorﬀ distance δ b et ween tw o sets D 1 and D 2 translates to a V δ –Euclidean distance b et ween φ D 1 and φ D 2 . Indeed, we hav e, by deﬁnition of the Hausdorﬀ distance, D 1 ⊆ D 2 + B δ and D 2 ⊆ D 1 + B δ , where B δ is the Euclidian ball of R d with cen ter (0 , . . . , 0) and radius δ . Prop erties 4 and 1 of Lemma 3 resp ectively yield the inequalities   φ D 1 − φ D 2   = max  φ D 1 − φ D 2 , φ D 2 − φ D 1  6 φ B δ 6 δ , with, b y integration,   φ D 1 − φ D 2   6 V δ . The ab o ve-stated prop erties of Φ imply that for all ψ ∈ 2 ( S ) with ψ > 0 , the game ( x, y ) 7→ h ψ , Φ( x, y ) i has a v alue v ( ψ ) , and that this v alue is ac hieved: there exists some x ψ ∈ ∆( I ) suc h that max y ∈ ∆( J )  ψ , Φ( x ψ , y )  = v ( ψ ) = max y ∈ ∆( J ) min x ∈ ∆( I )  ψ , Φ( x, y )  . No w, consider some half-space C hs con taining C orth  φ C  . It is of the form C hs =  f ∈ 2 ( S ) : h ψ , f i 6 β  , 23 where necessarily , as can b e shown b y con tradiction, ψ > 0 . The dual condition is satisﬁed b y assumption, that is, for all y ∈ ∆( J ) , there exists x ∈ ∆( I ) such that Φ( x, y ) ∈ C orth ( φ C ) , and therefore, Φ( x, y ) ∈ C hs . Thus, v ( ψ ) 6 β , as can b e seen with its expression as a max/min. The mixed action x ψ th us satisﬁes that  ψ , Φ( x ψ , y )  6 β for all y ∈ ∆( J ) , which is exactly saying that Φ( x ψ , y ) ∈ C hs for all y ∈ ∆( J ) . W e therefore pro ved the desired one-shot Φ –approachabilit y of C hs . Con versely , w e assume that the dual condition is not satisﬁed, i.e., that there exists some y 0 ∈ ∆( J ) suc h that for no x ∈ ∆( I ) one has Φ( x, y 0 ) ∈ C orth ( φ C ) . W e consider the con tinuous thus compact image Φ  ∆( I ) , y 0  of ∆( I ) by Φ( · , y 0 ) . Its Euclidean distance to the closed set C orth ( φ C ) is th us p ositive, w e denote it b y δ > 0 . No w, the distance of an elemen t f ∈ 2 ( S ) to C orth ( φ C ) is giv en by d C orth ( φ C ) ( f ) = Z S  f ( s ) − φ C ( s )  + s . . Since in addition, Φ is conv ex in its ﬁrst argumen t (as sho wn in the ﬁrst part of this pro of ), we ha ve that d C orth ( φ C ) ( f ) > δ not only for all f ∈ Φ  ∆( I ) , y 0  but also for all f in the con vex hull of Φ  ∆( I ) , y 0  . The latter set is p oint wise b ounded (by M , as follows from prop ert y 1 of Lemma 3) and is formed by equicon tinuous functions (they all are M – Lipsc hitz contin uous, as follows from prop erty 2 of the same lemma). The Arzela–Ascoli theorem th us ensures that the closure of this set is compact for the suprem um norm k · k ∞ o ver S . As b y in tegration k · k ∞ > k · k /V , the closure of the con vex hull of Φ  ∆( I ) , y 0  and the set C orth ( φ C ) are still δ /V –separated in k · k ∞ –norm, thus are disjoint. Since the former set is a con vex and compact set, and the latter is a closed con v ex set, the Hahn–Banac h theorem entails that they are strictly separated by some h yp erplane. In particular, one of the t wo thus-deﬁned half-spaces is not one-shot Φ –approachable. The abov e result is a generalization of the p olytopial case In Section 8 we sho w ed that when approaching a p olytop e, there are only ﬁnitely many directions (i.e., ﬁnitely man y elements of the sphere S ) of interest, namely , the directions corresp onding to the deﬁning h yp erplanes. The results w e obtained therein can in fact b e obtained as a corollary of Theorem 3 when the latter is stated (and prov ed) with a diﬀeren t measure instead of the Leb esgue measure, giv en by the sums of the Dirac masses on the directions of the deﬁning h yp erplanes. There are t wo wa ys to extend the primal characterization of approachabilit y under partial monitoring from p olytop es to general con vex sets. The one w e work ed out abov e relies on the observ ation that with general con vex sets, every direction might b e relev ant, as a general conv ex set is deﬁned as the in tersection of inﬁnitely many half-spaces, one p er elemen t of S . Based on this, w e in tro duced for general con vex sets a inﬁnite-dimensional lifting into the space of real-v alued mappings on the whole set S . W e also resorted the uniform Leb esgue measure since all directions are equally imp ortan t. The other w ay of generalizing the results relies on the fact that a closed conv ex set C ⊆ R d is approac hable if and only if all containing p olytop es are approachable. By 24 pla ying in blocks and approximating a giv en general con vex set b y a sequence of con- taining p olytop es, one could ha ve sho wn that C is ( r , H ) –approachable if and only if all con taining p olytop es satisfy the c haracterization of Corollary 2. Ho wev er, while this alternativ e wa y leads to a characterization, it is less intrinsic as there is no ﬁxed lifted space to b e considered. (The ﬁnite-dimensional lifted spaces depend on the appro ximat- ing p olytop es.) F or the sake of elegance, w e th us used the inﬁnite-dimensional lifting describ ed ab o ve. A c knowledgemen ts Shie Mannor was partially supp orted by the Israel Science F oundation under grant no. 920/12. Vianney P erchet ackno wledges supp ort by “Agence Nationale de la Rec herche,” under gran t JEUDY (ANR-10-BLAN 0112). References R.J. Aumann and M.B. Maschler. R ep e ate d Games with Inc omplete Information . MIT Press, 1995. D. Blackw ell. An analog of the minimax theorem for vector pay oﬀs. Paciﬁc Journal of Mathematics , 6:1–8, 1956. E. K ohlb erg. Optimal strategies in rep eated games with incomplete information. Inter- national Journal of Game The ory , 4:7–243, 1975. G. Lugosi, S. Mannor, and G. Stoltz. Strategies for prediction under imp erfect monitor- ing. Mathematics of Op er ations R ese ar ch , 33:513–528, 2008. S. Mannor, V. P erchet, and G. Stoltz. Robust approac hability and regret minimization in games with partial monitoring. An extended abstract w as published in the Pr o c e e dings of COL T’11 , 2011. J.-F. Mertens, S. Sorin, and S. Zamir. Rep eated games. T echnical Rep ort no. 9420, 9421, 9422, Univ ersité de Louv ain-la-Neuv e, 1994. V. Perc het. Approac habilit y of con vex sets in games with partial monitoring. Journal of Optimization The ory and Applic ations , 149:665–677, 2011a. V. Perc het. Internal regret with partial monitoring: Calibration-based optimal algo- rithms. Journal of Machine L e arning R ese ar ch , 12:1893–1921, 2011b. V. P erchet and M. Quincamp oix. On an uniﬁed framework for approac hability in games with or without signals. 2011. S. Sorin. A First Course on Zer o-Sum R ep e ate d Games . Number 37 in Mathématiques & Applications. Springer, 2002. 25 App endix A brief surv ey of some w ell-known prop erties of support functions F or the sake of self-completeness only we summarize in the lemma b elow some simple and w ell-known prop erties of supp ort functions. Lemma 3. W e c onsider a set C ⊆ R d . 1. If C is b ounde d in Euclidian norm by C , then φ C is b ounde d in supr emum norm by C and in Euclide an norm by V C , wher e V is the volume of S under the (induc e d) L eb esgue me asur e. 2. If C is b ounde d in Euclidian norm by C , then φ C is a Lipschitz function, with Lipschitz c onstant C . 3. F or al l C 0 ⊆ R d , if C ⊆ C 0 , then φ C 6 φ C 0 . The c onverse implic ation holds if in addition C 0 is a close d c onvex set. 4. The function φ is line ar, in the sense that for al l γ > 0 and al l al l C 0 ⊆ R d , one has φ γ C + C 0 = γ φ C + φ C 0 . Pr o of. Prop ert y 1 follows from the Cauch y–Sc hw arz inequalit y: for all s ∈ S ,   φ C ( s )   6 sup c ∈C   h c, s i   6 sup c ∈C k c k k s k = sup c ∈C k c k , as the elements s ∈ S hav e unit norm. The b ound in Euclidean norm follows by in te- gration o ver S . F or prop ert y 2, we note that s ∈ S 7→ h c, s i is a k c k –Lipsc hitz function (again, by the Cauc hy–Sc h warz inequality). Therefore φ C is the suprem um of C –Lipsc hitz functions and as suc h is also a C –Lipsc hitz function. The ﬁrst part of prop ert y 3 holds by the deﬁnition of a supremum. T o pro ve the con verse implication, we use an argument by con tradiction. W e consider tw o sets C and C 0 , where C 0 is closed and conv ex. W e assume that C is not included in C 0 and sho w that the existence of a s ∈ S such that φ C ( s ) > φ C 0 ( s ) . The set C \ C 0 is not empt y , let x b e one of its elements. The conv ex sets { x } , which is compact, and C 0 , whic h is closed, are disjoin t sets. The Hahn–Banach theorem ensures the existence of a strictly separating h yp erplane b etw een these con vex sets, whic h w e can write in the form  ω ∈ R d : h ω , s i = β  for some s ∈ S and β ∈ R such that φ { x } ( s ) = h x, s i > β and ∀ c 0 ∈ C 0 , h c 0 , s i < β . This en tails that φ C 0 ( s ) 6 β < φ { x } ( s ) 6 φ C ( s ) . Finally , the last prop erty is true b ecause by deﬁnition γ C + C 0 =  γ c + c 0 : c ∈ C , c 0 ∈ C 0  26 and th us, for all s ∈ S , sup c 00 ∈ γ C + C 0 h c 00 , s i = sup c ∈C , c 0 ∈C 0 γ h c, s i + h c 0 , s i = γ sup c ∈C h c, s i + sup c 0 ∈C 0 h c 0 , s i , where w e used the fact that γ > 0 in the last equalit y . 27

A Primal Condition for Approachability with Partial Monitoring

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment