Scenario approach for minmax optimization with emphasis on the nonconvex case: positive results and caveats

SCENARIO APPR O A CH FOR MINMAX OPTIMIZA TION WITH EMPHASIS ON THE NON CONVEX CASE: POSITIVE RESUL TS AND CA VEA TS MISHAL ASSIF P . K. Department of Mechanical Engineering IIT Bombay, P ow ai Mumbai 400076, India https:// mishalassif.github.io DEB ASISH CHA TTERJEE AND RA VI B AN A V AR Sys tems & Contr ol Engineering IIT Bombay, P ow ai Mumbai 400076, India http:// www.sc.iitb.ac.in/ ~chatterjee http:// www.sc.iitb.ac.in/ ~banavar Abstra ct . W e treat the so-called scenario approac h , a popular probabilistic approxim- ation method f or robust minmax optimization problems via independent and indenticall y distributed (i.i.d) sampling from the uncertainty set, from v arious perspectives. The scen- ario approach is well-studied in the impor tant case of conv ex robust optimization problems, and here we examine how the phenomenon of concentration of measures aﬀects the i.i.d sampling aspect of the scenar io approach in high dimensions and its relation with the optimal values. Moreo v er , w e per form a detailed study of both the asymptotic behaviour (consistency) and ﬁnite time behaviour of the scenario approach in the more general setting of nonconv e x minmax optimization problems. In the direction of the asymptotic behaviour of the scenar io approach, we present an obstruction to consistency that arises when the decision set is noncompact. In the direction of ﬁnite sample guarantees, we establish a general methodology f or extracting “probabl y approximatel y cor rect” type estimates f or the ﬁnite sample behaviour of the scenario approach for a larg e class of nonconv ex problems. § 1. The pr oblem, prior resul ts, perspectives and prelude to our resul ts § 1.1. Minmax optimization and scenario appro ximation. The minmax optimization problem is typically phrased as f ollo w s: Let d be a positiv e integer and ( Θ , ρ ) be a metr ic E-mail addresses : {mishal_assif, dchatter, banavar}@iitb.ac.in . Key wor ds and phrases. robust optimization, scenario approach, nonconv ex programs. This work was suppor ted in part by scholarships from the Ministry of Human Resource & Dev elopment, Go v ernment of India. W e thank Soumik Pal f or helpful discussions. 1 2 M. ASSIF P . K., D. CHA TTERJEE, AND R. BAN A V AR space. Let X be a nonempty subset of R d and ( Θ , B ( Θ ) , P ) be a probability space where B ( Θ ) is the Borel σ -algebra on Θ induced by the metric ρ . Let f : X × Θ − → R be a lo w er semicontinous (l.s.c) function. 1 W e are interested in the f ollo wing robus t optimization problem: (1.1) y ∗ B inf x ∈ X sup θ ∈ Θ f ( x , θ ) . Here on the one hand, x play s the role of the decision variable, and X is the set of variables from which a choice of one decision has to be made. On the other hand, θ pla ys the role of a parameter that aﬀects the cost associated with each decision variable, and takes a ﬁxed, albeit unkno wn, value in the set Θ . In problem ( 1.1 ), in eﬀect, w e pick a decision variable that incurs the least cost assuming that the wors t possible value of θ cor responding to each value of the decision variable is realised. If Θ is an inﬁnite set, then the minmax optimization problem ( 1.1 ) is an e xample of a semi-inﬁnite optimization problem. Semi-inﬁnite problems hav e been repor ted to be computationally intractable to solv e in general [ BTN98 , BTN99 , BTN01 ]. Ne vertheless, such optimization problems are of great impor tance in engineering, and, consequentl y , there is a natural need to ﬁnd computationally tractable tight approximations to the problem ( 1.1 ). The central object of study in this w ork is the f ollowing approximation to ( 1.1 ): (1.2) y ∗ m B inf x ∈ X max i = 1 , . . , m f ( x , θ i ) , where ( θ i ) m i = 1 is an independent and identially distributed (i.i.d) sequence of elements sampled from Θ . This approximation is also kno wn as the scenario appro ximation to the minmax optimization problem ( 1.1 ) [ CC05 ]. W e call each instance of the optimization problem ( 1.2 ) corresponding to the sample ( θ i ) m i = 1 a scenario optimization problem . Ob- serve that each scenario optimization problem is no longer semi-inﬁnite since the inner maximum in vol v es onl y ﬁnitely many variables. This makes the scenario optimization problem computationally more tractable, at least for moderate values of m , than other wise; and this is an attractiv e feature of ( 1.2 ). § 1.2. Desirable properties of scenario appro ximation. Before proceeding further we record the tw o follo wing deﬁnitions f or future ref erence: (1.3) ˆ f ( x ) B sup θ ∈ Θ f ( x , θ ) , and ˆ f m ( x ) B max i = 1 , . . , m f ( x , θ i ) . W e note that ˆ f m in v olv es an abuse of notation since ˆ f m depends on the sample ( θ i ) m i = 1 and not just on its size m ; ho w ev er , we suppress the explicit mention of this dependence in the interest of bre vity . It f ollow s immediatel y from the deﬁnition that ˆ f m ( x ) 6 ˆ f ( x ) f or all x ∈ X and m ∈ N ∗ , which means that (1.4) y ∗ m = inf x ∈ X ˆ f m ( x ) 6 inf x ∈ X ˆ f ( x ) = y ∗ . In other w ords, the value of each scenario approximation ( 1.2 ) is alwa y s an appro ximation of y ∗ from belo w . There naturally ar ises questions about the goodness of such approximations. One natural notion of goodness of the scenario approximation scheme is qualiﬁed in the f or m of: 1 Recall that a function F : Θ − → R is lower semicontinous (l.s.c.) if ev er y sublev el sets of F is closed, i.e.,  z ∈ Θ   F ( z ) 6 t  is closed f or all t ∈ R . SCENARIO APPR OA CH 3 (G1) Consistency: Recall that a numer ical approximation procedure is said to be consistent if, intuitiv ely as the lev el of appro ximation is made “ﬁner”, the approximate solution it computes con ver g es to the actual solution of the problem being appro ximated. Consistency is a v er y rudimentar y proper ty that most sound numer ical appro ximation procedures are e xpected to possess, and we sa y that the scenar io approach is consistent if P ∞   ( θ i ) + ∞ i = 1    lim m → + ∞ y ∗ m = y ∗   = 1 . T o wit, the scenario approach is consistent if f or almost ev ery (countable) sequence of samples from Θ , the appro ximate solution computed by the scenar io approach using onl y a ﬁnite inital segment of the sequence conv erg es to the solution of the or iginal problem with the length of the initial segment. W e will study consistency of the scenar io approach in greater detail in the later sections. In particular, w e will es tablish an obstruction to consistency that ar ises when the set X is noncompact, a condition under which the scenar io approach is guaranteed to be inconsistent. A second desirable property of the scenar io approximation is a good quality of: (G2) Finite sample beha viour: Obser v e that the condition of consistency is purely asymp- totic; it gives us no inf ormation about the nature of the approximate solution computed after drawing onl y a ﬁnite number of samples. But in the real w orld, inf ormation regarding the ﬁnite sample beha viour of the scenario approach is crucial and the beha viour of the scenario approach after drawing ﬁnitely man y samples also w ar rants attention. In addition, it is desirable that this inf ormation is a vailable to us a priori , bef ore the appro ximation procedure is e xecuted so that the number of samples dra wn can be deter mined based on the accuracy demanded by the application bef ore e xecuting the appro ximation scheme. W e start our study of ﬁnite sample behaviour by quantifying lev els of approximation and bad samples associated with scenar io appro ximations. For  > 0 deﬁne the set of “bad” samples of size m as those that give at least  -bad estimates of y ∗ , that is, those f or which y ∗ m is at least  aw ay from y ∗ : (1.5) B ( m ,  ) B  ( θ i ) m i = 1 ∈ Θ m    inf x ∈ X ˆ f m ( x ) 6 y ∗ −   . Deﬁning (1.6) ¯ B ( m ,  ) B  ( θ i ) m i = 1 ∈ Θ m    there e xist x ∈ X such that ˆ f m ( x ) 6 y ∗ −   =  x ∈ X  ( θ i ) m i = 1 ∈ Θ m    ˆ f m ( x ) 6 y ∗ −   , w e immediately get the sandwich relation (1.7) ¯ B ( m ,  ) ⊂ B ( m ,  ) ⊂ ¯ B  m ,  2  . W e ﬁnd it easier to work with ¯ B ( m ,  ) than B ( m ,  ) , and since these two sets are sandwiched betw een each other, estimates f or the probability of one will naturally lead to estimates f or probability of the other . Note that there is nothing special about the factor 2  in ( 1.7 ), the relation holds with an y factor strictly g reater than 1, w e chose 2 for conv enience. Since  ( θ i ) m i = 1 ∈ Θ m    ˆ f m ( x ) 6 y ∗ −   =  ( θ i ) m i = 1 ∈ Θ m    max i = 1 , . . , m f ( x , θ i ) 6 y ∗ −   = m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x , θ i ) 6 y ∗ −   , 4 M. ASSIF P . K., D. CHA TTERJEE, AND R. BAN A V AR the set deﬁned in ( 1.6 ) is, in fact, (1.8) B ( m ,  ) =  x ∈ X m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x , θ i ) 6 y ∗ −   . Since we already hav e the ob vious bound ( 1.4 ) that y ∗ m 6 y ∗ , if y ∗ m > y ∗ −  , then w e also get | y ∗ m − y ∗ | <  . In other w ords, if a sample lies outside the bad set B ( m ,  ) , then the diﬀerence between the appro ximate inﬁmum and the actual inﬁmum is less than than  . Naturall y , it is desirable that the bad set is as ‘small’ as possible. As mentioned earlier , in the real wor ld a prior i quantitative information regarding the ﬁnite sample behaviour of the scenario approach is cr ucial. W e are especially interested in results that pro vide an upper bound on the probability of occurence of the bad set B ( m ,  ) (1.9) β ( m ,  ) B P m ( B ( m ,  ) ) , bef ore the approximation procedure begins. Such quantitative bounds pro vide a P A C (“probably appro ximately cor rect ”) type guarantee that the scenario approximation ( 1.2 ) computed using m i.i.d. samples from Θ has an accuracy  with probability at least as much as 1 − β ( m ,  ) . Giv en an accuracy lev el  and a conﬁdence lev el β , from such a bound w e may determine the number of samples that are required to ensure that the appro ximate minimum is at most  aw a y from the actual minimum with a probability at least 1 − β . In the subsequent sections we will prov e such P A C type guarantees f or a large class of noncon ve x minmax optimization programs. § 1.3. A technical look at the sam pling probability. Since the scenario appro ximation procedure inv ol v es i.i.d sampling according to an arbitrary probability dis tribution, it is not reasonable to e xpect that the sampled maximum approximates the supremum. If the probability distribution has larg e “holes ” in regions of Θ where the supremum is achie ved, these regions are nev er explored and consequentl y the sampled maximum does not approximate the supremum. W e ref er the reader to [ Ram18 , §1] f or a more detailed discussion on this matter . A more meaningful notion of a supremum that w e e xpect the sampled maximum to appro ximate is that of the essential supremum: (1.10) ess sup θ ∈ Θ f ( x , θ ) : = inf  z ∈ R    P  { θ ∈ Θ | f ( x , θ ) 6 z }  = 1  . While the supremum of a set of numbers is its least upper bound, the essential supremum is the leas t “almost ” upper bound. If the probability distribution has “holes ” in cer tain regions, the essential supremum av oids considering the value of the function in these regions automatically , and therefore av oiding these regions while tr ying to approximate the essential supremum does not create any technical issues. How ev er , the supremum posesses a lot of nice proper ties that the essentail supremum does not posess, and in order to av oid ha ving to deal with the additional complications brought b y the essential supremum, w e w ould like to ensure that the supremum and the essential supremum are one and the same. Fortunately , it tur ns out that the assumption of lo w er semicontinuity of f that we made at the start is suﬃcient f or this, and to verify this statement, w e star t with a standard deﬁnition from measure theory . Deﬁnition 1.1 ([ Par67 , Deﬁnition 2.1, p. 28]) . The suppor t of the measure P is Θ s : =  θ ∈ Θ    P  N θ  > 0 for ev er y open neighbourhood N θ of θ  . SCENARIO APPR OA CH 5 It can be sho wn that Θ s is the smalles t (w .r .t set inclusion) closed subset of Θ that has P -measure 1. Lemma 1.2. Consider the pr oblem ( 1.1 ) with its associated data. If f : X × Θ − → R is low er semicontinuous, then for each x ∈ X ess sup θ ∈ Θ f ( x , θ ) = sup θ ∈ Θ s f ( x , θ ) . A proof of this result is provided in Appendix A . Lemma 1.2 sa y s that low er semicon- tinuity of f ensures that the essential supremum is equal to the supremum on a certain subset of probability 1. For the sake of brevity of notation in the f ollowing discussion, we assume that Θ s = Θ ; all the results that we der iv e below car r y ov er to the situation when Θ s ( Θ . Assumption 1.3 . W e stipulate that P is a full y supported probability measure, that is, Θ s = Θ . This is equiv alent to stipulating that P ( U ) > 0 f or all open subsets U ⊂ Θ . Assumption 1.3 ensures that it is not unreasonable to e xpect that the sampled maximum appro ximates the supremum, and this is the content of the next lemma: Lemma 1.4. Consider the problem ( 1.1 ) with its associated data. If Assumption 1.3 holds, then (1.11) sup θ ∈ Θ f ( x , θ ) = ess sup θ ∈ Θ f ( x , θ ) . All the assumptions made until this point will remain in f orce throughout the ar ticle, unless speciﬁcally mentioned otherwise; for the con v enience of the reader , w e recollect them here. ◦ X is a subset of R d and Θ is a metric space. ◦ f : X × Θ − → R is a low er semicontinous function. ◦ P is a fully supported probability measure on Θ . § 1.4. Prior w ork and contributions. The scenar io approach has been studied extensiv ely in the literature in the particular case where X is a conv ex set and f (· , θ ) : X − → R is a conv e x function f or each θ ∈ Θ . W e will henceforth refer to the problem as a random conv ex prog ram . W e revie w tw o recent representativ e results related to scenar io appro ximations of random con ve x programs: one on consistency and the other on ﬁnite sample beha viour of the scenario approach, and compare the contr ibutions of this ar ticle with the tw o results. W e point out that both of these results rely crucially on the results established in [ CC05 , CC06 , CG08 ]. The ﬁrst result from [ Ram18 ] establishes the consistency of the scenario approach f or random conv e x programs, under an additional stipulation of an appropriate notion of coercivity on the class of functions f (· , θ ) : X − → R . Recall that if Ξ is a metric space a function F : Ξ − → R is w eakly coerciv e if f or each t > 0 there exis ts a compact set C t ⊂ Ξ such that the t -sublev el set of F is contained in C t , that is,  x ∈ Ξ   F ( x ) 6 t  ⊂ C t . In particular, if Ξ is compact, ev er y function F is weakl y coerciv e. 6 M. ASSIF P . K., D. CHA TTERJEE, AND R. BAN A V AR Theorem 1.5 ([ Ram18 , Theorem 14 on p. 154]) . Consider the problem ( 1.1 ) with its associated data. Suppose there exists an ¯ N ∈ N ∗ suc h that (1.12) P ¯ N   ( θ i ) ¯ N i = 1    ˆ f ¯ N is w eakly coerciv e   > 0 . In addition, if f (· , θ ) : X − → R is convex for all θ ∈ Θ , the scenario approac h is consist ent in the sense that P ∞   ( θ i ) + ∞ i = 1    lim m → + ∞ y ∗ m = y ∗   = 0 . U nder the additional Assumption of ( 1.12 ), Theorem 1.5 establishes the consistency of the scenar io approach f or random conv ex prog rams. When the set X itself is compact ( 1.12 ) holds tr iviall y since any function on a compact set is weakl y coerciv e. If X is non-compact ( 1.12 ) ma y f ail to hold, and consequently the consistency of the scenario approach may also be jeopardized. W e study this situation in detail, and the ﬁrst main contr ibution of the ar ticle will be identifying an obstruction to consistency of the scenario approach when X is non-compact, that is, a condition that guarantees that the scenar io approach will not be consistent. W e now revie w the main result from [ ESL15 ] that establishes ﬁnite sample guarantees f or the scenar io approach applied to random con v ex prog rams. Theorem 1.6 ([ ESL15 , Theorem 14 on p. 5]) . The tail probability f or wors t case violation is the function p w : X × ] 0 , + ∞ [ − → [ 0 , 1 ] deﬁned by (1.13) p w ( x ,  ) = P   θ ∈ Θ    f ( x , θ ) > ˆ f ( x ) −    . Mor eov er , let (1.14) p w ∗ (  ) = inf x ∈ X p w ( x ,  ) . The function h : [ 0 , 1 ] − → ] 0 , + ∞ [ is called a unif orm lev el set bound (ULB) of p w ∗ if f or ev er y  ∈ [ 0 , 1 ] , (1.15) h (  ) > sup  δ > 0    p w ∗ (  ) 6 δ  . Deﬁne (1.16) N (  , β ) B min  N ∈ N ∗     d − 1  i = 0  N i   i ( 1 −  ) N − i 6 β  . Giv en a ULB h and numbers  , β ∈ ] 0 , 1 ] , for all N > N (  , β ) w e hav e (1.17) P N   ( θ i ) N i = 1 ∈ Θ N    y ∗ − y ∗ m 6 h (  )   > 1 − β . Theorem 1.6 pro vides a guarantee that if N (  , β ) number of points are sampled in an i.i.d fashion from Θ and the cor responding scenar io approximation problem is solv ed to obtain an appro ximate minimum, then one can say with conﬁdence 1 − β that the appro ximate minimum y ∗ N (  , β ) is at most h (  ) aw ay from the tr ue minimum y ∗ . Note that the guarantee is a pr iori: one does not need an y inf ormation related to the actual samples dra wn ( θ i ) m i = 1 in order to compute N (  , β ) , and consequently , one can use Theorem 1.6 to deter mine the number of samples required to be dra wn in order to obtain a solution of the given accuracy ﬁx ed at the beginning of the optimization procedure. One of the crucial ing redients in the proof of Theorem 1.6 is a result from [ CG08 ] which is valid only f or random conv ex programs. In the light of recent extensions of [ CG08 ] to the nonconv ex case in [ CGR18 ], SCENARIO APPR OA CH 7 one can extend some results of [ ESL15 ], including Theorem 1.6 , to nonconv e x robust optimization problems. Ho w ev er , the results of [ CGR18 ] in the noncon v e x regime are of an a pos teriori nature, meaning that the guarantees giv en depend on the sample ( θ i ) m i = 1 dra wn, and the e xtension of Theorem 1.6 to the nonconv ex case via that route inherits this same a posterior i proper ty . This means that one cannot deter mine the number of samples that give an appro ximate solution of desired accuracy bef ore the appro ximation procedure begins. Ho we v er, once a sample ( θ i ) m i = 1 is drawn and the cor respoding scenar io appro ximation is f ound, then one can ﬁnd the accuracy of the computed appro ximate solution. In other w ords, one can onl y assess the quality of a scenar io appro ximate solution after the solution is computed. In contrast, the second main contribution of the present article is a methodology to establish a pr iori P A C type ﬁnite sample guarantees similar to Theorem 1.6 that is applicable to scenar io approximations of a larg e class of noncon ve x minmax optimization problems. § 1.5. Numerical experiments in high dimensions. W e dev ote this subsection to e xamine in detail, with the aid of numer ical experiments, a simple minmax optimization problem and the quality of scenario appro ximations of it. Recall that f or a given v ector v ∈ R n the quantity k v k ∞ denotes its inﬁnity norm deﬁned b y k v k ∞ = max i = 1 , . .., n | v i | . Consider the optimization problem: (1.18) y ∗ = inf x ∈ [ 0 , 1 ] sup θ ∈ [ − 1 , 1 ] n  x k θ k ∞ − k θ k 2 ∞  . In the language of the problem ( 1.1 ), here we hav e chosen X = [ 0 , 1 ] , Θ = [ − 1 , 1 ] n , and the continuous function f ( x , θ ) = x k θ k ∞ − k θ k 2 ∞ . Obser v e that f or each θ ∈ Θ , f (· , θ ) is a con v ex function on the conv e x set X . In other words, ( 1.18 ) is a random con v e x prog ram. Moreo ver the set X is compact and theref ore ( 1.18 ) satisﬁes all the conditions of Theorem 1.5 , and the latter guarantees that the scenar io appro ximations will almost surel y conv erg e to y ∗ . T o study the ﬁnite sample behaviour of scenar io appro ximations of ( 1.18 ), we ﬁrst compute the optimal value y ∗ . This can be done by obser ving that (1.19) sup θ ∈ [ − 1 , 1 ] n  x k θ k ∞ − k θ k 2 ∞  = sup k θ k ∞ ∈ [ − 1 , 1 ]  x k θ k ∞ − k θ k 2 ∞  = x 2 4 . Consequentl y , (1.20) y ∗ = inf x ∈ [ 0 , 1 ] sup θ ∈ [ − 1 , 1 ] n  x k θ k ∞ − k θ k 2 ∞  = inf x ∈ [ 0 , 1 ] x 2 4 = 0 . Since w e ha v e the optimal v alue y ∗ in ( 1.18 ) and the scenario approximate solution y ∗ m can be computed numerically on a computer, w e can compute the error associated with the scenar io appro ximations of ( 1.18 ). In Figure 1 w e present the results of our numer ical e xper iments that giv e the er ror in the scenar io approximation ( 1.2 ) of ( 1.18 ) and its variation with the dimension n of the uncertainty set and the number m of samples. W e sampled independently from the unif orm distribution on Θ = [ − 1 , 1 ] n to obtain these scenar io appro ximations. The er ror shown in the ﬁgure f or each value of m and n was computed by taking the av erage er ror of the scenar io approximations ov er 25 sets of samples of length m from Θ = [ − 1 , 1 ] n . 8 M. ASSIF P . K., D. CHA TTERJEE, AND R. BAN A V AR n 10 20 30 40 50 60 70 80 90 100 m 10 1 10 2 10 3 10 4 10 5 10 6 Absolute error incurred 0 . 0 0 . 2 0 . 4 0 . 6 0 . 8 Figure 1. V ar iation of the er ror in scenar io appro ximations of the problem ( 1.18 ) with respect to the dimension ( n ) of the uncer tainty set and number ( m ) of i.i.d samples drawn from the uncer tainty set. The i.i.d samples w ere dra wn according to the unif orm distribution on Θ = [ − 1 , 1 ] n . The numbers repor ted here cor respond to the a verag e er ror of the scenar io appro ximations o ver 25 sets of samples of length m from Θ = [ − 1 , 1 ] n . Figure 1 follo ws the expected trend that the er ror decreases as the number of samples increases and the dimension of the uncer tainty set decreases. A closer look at the v alue of the error sho ws that ev en f or a moderate dimension of n = 20 (see Figure 2 ) of the uncertainty set, ev en after sampling as much as a million scenarios, one still gets an er ror as larg e as 0 . 25 . T o put this in perspective, obser v e that the v alue of the cost f ( x , θ ) v aries betw een − 1 and + 1 as x and θ v ar y o v er X and Θ , respectivel y , which means that this is an er ror of about 12 . 5% . The results are much worse for higher dimensions; for instance, when the dimension of the uncertainty set is 50 and a million scenarios are drawn, the er ror in the scenar io appro ximation is around 0 . 5 in absolute units, which puts the relative er ror at around 25% . W e get e v en w orse results if we consider the slightl y modiﬁed problem (1.21) y ∗ = inf x ∈ [ 0 , 1 ] sup θ ∈ R n  x k θ k ∞ − k θ k 2 ∞  , SCENARIO APPR OA CH 9 10 1 10 2 10 3 10 4 10 5 10 6 m 0 . 0 0 . 1 0 . 2 0 . 3 0 . 4 0 . 5 0 . 6 0 . 7 0 . 8 Absolute error incurred Figure 2. V ar iation of the er ror in scenar io appro ximations of the prob- lem ( 1.18 ) with respect to the number ( m ) of i.i.d samples drawn from the uncer tainty set when the dimension ( n ) of the uncer tainty set is ﬁxed at 20. The i.i.d samples were dra wn according to the uniform distribution on Θ = [ − 1 , 1 ] n . The numbers repor ted here cor respond to the av erage error of the scenario approximations o v er 25 sets of samples of length m from Θ = [ − 1 , 1 ] n . where the uncer tainty set is noncompact. The optimal value is y ∗ = 0 in this case as well; indeed, (1.22) sup θ ∈ R n  x k θ k ∞ − k θ k 2 ∞  = sup k θ k ∞ ∈ R  x k θ k ∞ − k θ k 2 ∞  = x 2 4 , and consequentl y , (1.23) y ∗ = inf x ∈ [ 0 , 1 ] sup θ ∈ R n  x k θ k ∞ − k θ k 2 ∞  = inf x ∈ [ 0 , 1 ] x 2 4 = 0 . In Figure 3 w e present the results of our numer ical e xperiments that giv e the er ror in the scenario appro ximation ( 1.2 ) of ( 1.21 ) and its v ar iation with the dimension n of the uncer tainty set and the number m of samples. W e sampled independently from the Gaussian distribution with mean 0 and variance I n on Θ = R n to obtain these scenario appro ximations. The er ror shown in the ﬁgure f or each value of m and n was computed by taking the av erage er ror of the scenar io approximations ov er 25 sets of samples of length m from Θ = R n . W e see that ev en f or a moderate dimension of n = 20 (see Figure 4 ) of the uncer tainty set, e v en after sampling as much as a million scenar ios, one still gets an error as larg e as 0 . 45 in absolute units. As expected, the results are much worse f or higher dimensions: for instance, when the dimension of the uncer tainty set is 50 and a million scenarios are drawn, the er ror in the scenar io appro ximation is around 1 . 3 in absolute units. Of course, the measur ing stick in the scenario approximations of the e xample immedi- ately abov e the cur rent parag raph is the probability measure cor responding to the Gaussian 10 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR n 10 20 30 40 50 60 70 80 90 100 m 10 1 10 2 10 3 10 4 10 5 10 6 Absolute error incurred 0 1 2 3 4 5 Figure 3. V ar iation of the er ror in scenar io appro ximations of the prob- lem ( 1.21 ) with respect to the dimension ( n ) of the uncer tainty set and number ( m ) of i.i.d samples drawn from the uncertainty set. The i.i.d samples w ere drawn according to the Gaussian distr ibution with mean 0 and v ariance I n on Θ = R n . The numbers repor ted here cor respond to the a v erage error of the scenar io appro ximations o v er 25 sets of samples of length m from Θ = R n . emplo yed for sampling, and the speciﬁc concentration proper ties of this measure naturally aﬀects the outcome of the experiment as a consequence. Whether these estimates are satisfactory or not is diﬃcult to assess unilaterall y and unif or ml y across the spectrum of robust minmax optimization problems, and such conclusions are best left to the judgment of the practitioners concerned. The main culpr it in the ex amples abo v e is the fact that scenario appro ximations rely on i.i.d samples, and i.i.d samples of high dimensional random v ectors tend to concentrate with high probability around certain regions of the space leaving the rest of the space une xplored; this feature leads to a pref erence f or cer tain (typicall y thin) regions of the sample space of the algor ithm, and unless the optimizers are in these thin sets, the quality of appro ximation ma y be lo w . The preceding observations clear ly point to the fact that there is still scope to dev elop general, computationally feasible, and tight appro ximation schemes f or robust optimization problems, especially in high dimensions insof ar as t he optimal value is concerned ; one such appro ximation method inv olving better sampling will be reported subsequently elsewhere. SCENARIO APPR OA CH 11 10 1 10 2 10 3 10 4 10 5 10 6 m 0 . 0 0 . 5 1 . 0 1 . 5 2 . 0 2 . 5 Absolute error incurred Figure 4. V ar iation of the er ror in scenar io appro ximations of the prob- lem ( 1.21 ) with respect to the number ( m ) of i.i.d samples drawn from the uncertainty set when the dimension ( n ) of the uncer tainty set is ﬁxed at 20. The i.i.d samples were drawn according to the Gaussian distr ibution with mean 0 and variance I n on Θ = R n . The numbers reported here correspond to the av erage er ror of the scenar io appro ximations o v er 25 sets of samples of length m from Θ = R n . § 2. An obstr uction to consistency Consistency of the scenario approach is not guaranteed, in general, for all problems of the f or m ( 1.1 ). Ev en in the par ticular case of random conv ex programs, observe that the statement of Theorem 1.5 has the additional requirement of coercivity , which is not alw a y s satisﬁed if the set X is not compact. W e begin with a simple e xample that illustrates this eﬀect. Example 2.1 . Let X = R and Θ = R . Assume that Θ is endow ed with the standard Gaussian probability measure with mean 0 and variance 1. Consider , in the language of ( 1.1 ) the cost function (2.1) f ( x , θ ) B          0 if x > θ, θ − x if θ − 1 6 x 6 θ, − 1 if x 6 θ − 1 . All the requirements of the problem ( 1.1 ) are satisﬁed by ( 2.1 ) in addition to Assumption 1.3 . Y et we sho w that the scenar io approach is inconsistent in this situation. For a given sample ( θ i ) m i = 1 w e deﬁne ˆ θ m B min i = 1 , . .., m θ i . One checks that (2.2) ˆ f ( x ) B sup θ ∈ Θ f ( x , θ ) = 0 , and ˆ f m ( x ) B max i = 1 , . . , m f ( x , θ i ) = f ( x , ˆ θ m ) , 12 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR which imply that y ∗ = inf x ∈ X ˆ f ( x ) = 0 , and (2.3) y ∗ m = inf x ∈ X ˆ f m ( x ) = − 1 for all m ∈ N ∗ . (2.4) This means that lim m → + ∞ y ∗ m = − 1 f or any sequence of samples ( θ i ) + ∞ i = 1 , which show s that consistency fails to hold. It is clear from Example 2.1 that the set X being noncompact can readily lead to inconsistency of the scenario approac h. In this section we study this issue further and characterize one possible obstruction to the consistency of the scenar io approach when the set of optimization variables X is noncompact. W e begin with the f ollowing deﬁnition that will be need in both the current and the next section: Deﬁnition 2.2 . The tail probability is the function p : X × ] 0 , + ∞ [ − → [ 0 , 1 ] deﬁned by p ( x ,  ) B P  { θ ∈ Θ | f ( x , θ ) > y ∗ −  }  . For each  > 0 , we deﬁne the inﬁmum of p ( x ,  ) ov er all x ∈ X b y p ∗ (  ) B inf x ∈ X p ( x ,  ) . The f ollowing theorem is the ke y result of this section. Theorem 2.3. Consider the problem ( 1.1 ) with its associated data, and suppose that Assumption 1.3 holds. If there exists some  > 0 satisfying p ∗ (  ) = 0 , then P ∞   ( θ i ) + ∞ i = 1    lim m → + ∞ y ∗ m = y ∗   = 0 . Proof. Since y ∗ m is monotone non-decreasing in m , recalling the deﬁnition of B ( m ,  ) from ( 1.8 ) w e see that  ( θ i ) ∞ i = 1 ∈ Θ    lim m → + ∞ y ∗ m 6 y ∗ −   =  m ∈ N B ( m ,  ) ⊃  m ∈ N ¯ B ( m ,  ) . This means that P ∞   ( θ i ) + ∞ i = 1 ∈ Θ    lim m → + ∞ y ∗ m 6 y ∗ − 2    > P ∞   m ∈ N ¯ B ( m ,  )  = inf m ∈ N P m  ¯ B ( m ,  )  . Ho we v er, in view of ( 1.8 ) and the fact that the θ i ’ s are sampled independently , P m  ¯ B ( m ,  )  = P m   x ∈ X m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x , θ i ) 6 y ∗ −    , > sup x ∈ X P m  m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x , θ i ) 6 y ∗ −    , > sup x ∈ X  P   ( θ i ) m i = 1 ∈ Θ m    f ( x , θ i ) 6 y ∗ −     m , > sup x ∈ X  1 − p ( x ,  )  m = 1 b y our assumption. SCENARIO APPR OA CH 13 In other w ords, P ∞   ( θ i ) + ∞ i = 1 ∈ Θ    lim m → + ∞ y ∗ m 6 y ∗ −    = 1 , and the assertion follo ws.  Remar k 2.4 . It is clear from the proof of Theorem 2.3 that the set { θ ∈ Θ | f ( x , θ ) > y ∗ −  } is precisely the set from which one needs to sample in order to get a solution with  accuracy; if the sampled sequence ( θ i ) + ∞ i = 1 does not contain any element from the set { θ ∈ Θ | f ( x , θ ) > y ∗ −  } corresponding to any one value of x , then the appro ximate solution y ∗ m is going to be atleast  far a wa y from y ∗ . In the light of this fact, the condition that p ∗ (  ) = 0 amounts to saying that the sets from which one needs to sample in order to get an approximation of accuracy  are arbitrar ily small regions of Θ ; consequently and in restrospect the abov e result appears to be natural. Remar k 2.5 . The function p ∗ (·) is v ery similar to an object w e ha ve encountered bef ore: cf. the function p w ∗ (·) deﬁned in ( 1.14 ). These tw o functions p ∗ and p w ∗ are weakl y related to each other . In general, one can say that p ∗ (  ) 6 p w ∗ (  ) f or ev er y  > 0 . Indeed, obser v e that f or each x ∈ X , since y ∗ 6 ˆ f ( x ) , { θ ∈ Θ | f ( x , θ ) > y ∗ −  } ⊂  θ ∈ Θ    f ( x , θ ) > ˆ f ( x ) −   since y ∗ 6 ˆ f ( x ) . This implies that p ( x ,  ) 6 p w ( x ,  ) for each x ∈ X and  > 0 , which further implies that p ∗ (  ) = inf x ∈ X p ( x ,  ) 6 inf x ∈ X p w ( x ,  ) = p w ∗ (  ) . In subsequent sections (see Remark 3.4 ) we will see fur ther evidence pointing to the fundamental nature of p ∗ (·) in relation to the scenar io approach . Example 2.1 (Continued) . W e check whether the type of obstruction introduced in Theorem 2.3 arises in the situation giv en in e xample 2.1 . Since w e know that y ∗ = 0 , if w e take  = 1 , we get { θ ∈ Θ | f ( x , θ ) > y ∗ −  } = { θ ∈ R | f ( x , θ ) > − 1 } = { θ ∈ R | θ 6 x − 1 } , which implies that 2 p ( x , 1 ) = P ( { θ ∈ Θ | θ 6 x − 1 } ) = er fc ( 1 − x ) , and theref ore, p ∗ ( 1 ) = inf x ∈ R erfc ( 1 − x ) = 0 . It is no w evident that the obstruction pointed out by Theorem 2.3 that pre v ents consistency in this e xample as well. The result of Theorem 2.3 is equall y valid in the case where X is compact. How ev er, w e started the discussion claiming that the obstruction arises when X is a noncompact set, and the ne xt proposition aﬃr ms this statement: w e show that when the set X is compact, the obstruction presented in Theorem 2.3 cannot arise. 2 Recall that the function R 3 z 7− → er fc ( z ) B 2 √ π  + ∞ z e xp (− t 2 ) d t is equal to the P ( { θ ∈ R | θ 6 − z } ) when P is the standard Gaussian measure (nor mal with mean 0 and variance 1) on R . Clearl y , lim z → + ∞ erfc ( z ) = inf z ∈ R erfc ( z ) = 0 . 14 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR Proposition 2.6. Consider the problem ( 1.1 ) along with its associated data. If Assumption 1.3 holds, then f or every  > 0 , p ( · ,  ) : X − → [ 0 , 1 ] is a positiv e low er semi continuous function, and consequently , for eac h  > 0 , p ∗ (  ) = inf x ∈ X p ( x ,  ) = min x ∈ X p ( x ,  ) > 0 . A proof of Proposition 2.6 is pro vided in Appendix A . § 3. Finite sample performance gu arantees in the nonconvex setting In this section, f or a larg e class of nonconv ex minmax problems, w e pro v e a general positiv e result that gives an upper bound on the a pr ior i probability of the bad set ( 1.8 ) f or ﬁnite samples of the scenario approach. In other words, we establish a ﬁnite sample performance guarantee in a g eneral noncon v ex setting. Of course, in the presence of more detailed structure, we ma y be able to reﬁne these preliminar y estimates, and as an illustration of this scheme w e then discusss sev eral special cases of this result. § 3.1. General performance guarantees. The ﬁrst order of business is making the w ord nonconv exity precise. The class of noncon ve x functions is vas t, and it appears that very little can be said about a pr ior i estimates under the scenar io method at this lev el of generality ; indeed, it is natural to e xpect, at leas t in pr inciple, that the g reater the regular ity of the functions under consideration, tighter the bounds that should be possible to obtain. Ph ysical considerations point us to wards f ocussing our in v estigations on classes of functions that arise naturall y in phy sical sy stems, e.g., trigonometr ic polynomials of ﬁnite bandwidth, smooth functions restricted to compact sets, etc. Our approach here f ollow s standard principles of functional analy sis and approximation theory via estimates inv olving co v er ing numbers à la [ CZ07 ]; the techniq ues e xposed here are f airly general, and conform to the f ollowing simple steps: Summary of our approach (I) W e ﬁnd upper bounds on the co v er ing number of the famil y of functions K f B  f ( x , ·) : Θ − → R   x ∈ X  in the supremum norm topology deﬁned belo w . This step provides us with a ﬁnite collection of representatives from the (possibly inﬁnite dimensional) class of functions under consideration. (II) The i.i.d proper ty of the sampling in the scenario approach per mits us to emplo y the bounds on the co vering number f ound in the preceding step in standard probabilistic inequalities to ar rive at bounds on the probability β ( m ,  ) . § 3.1.1. Bac k gr ound. The class of nonconv e x functions is v ast, and we consider only a f ew reasonable classes of ﬁnite and inﬁnite dimensional subsets of this class in the ar ticle at hand. The primar y diﬃculty with inﬁnite dimensionality of function classes is o vercome in a standard wa y by the consideration of co vering numbers. Recall that given a metr ic space ( M , d ) and a subset K ⊂ M , a set K 0 ⊂ K is called an  cov er of K if for ev er y element a ∈ K , there e xist a 0 ∈ K 0 such that d ( a , a 0 ) 6  . W e deﬁne the cov ering number N ( K ,  ) of K to be the smallest number n ∈ N such that there e xists an  co v er of K of cardinality SCENARIO APPR OA CH 15 n . It is a standard result that K ⊂ M is precompact iﬀ for all  > 0 , the co v ering number N ( K ,  ) is ﬁnite. Recall that if Θ is compact, the set C ( Θ ) of continuous real valued functions on Θ is a metric space when endow ed with the metr ic d ( g 1 , g 2 ) B k g 1 − g 2 k u inherited from the supremum norm given by k g k u B sup θ ∈ Θ | g ( θ ) | . § 3.1.2. Main Result. The follo wing theorem is the k e y result of this section. Giv en the problem ( 1.1 ) and its associated notation, let K f denote the famil y of functions (3.1) K f B  f ( x , ·) : Θ − → R   x ∈ X  . Theorem 3.1. Consider the pr oblem ( 1.1 ) along with its associated data, and suppose that Assumption 1.3 holds. Let K f be as deﬁned in ( 3.1 ) , and recall from ( 1.9 ) that β ( m ,  ) = P m ( B ( m ,  ) ) . If Θ is compact and the set of functions K f ⊂ C ( Θ ) deﬁned in ( 3.1 ) is pr ecompact in C ( Θ ) , then (3.2) β ( m ,  ) 6 N  K f ,  4  e xp  − m p ∗   4   . Proof. F ix  > 0 and m ∈ N ∗ . By deﬁnition of N  K f ,  4  , there exis ts a subset ( x i ) N ( K f ,  4 ) i = 1 of X such that for each x ∈ X there exis ts x i that satisﬁes sup θ ∈ Θ | f ( x , θ ) − f ( x i , θ ) | <  4 . Recalling the deﬁnition of ¯ B ( m ,  ) from ( 1.8 ), w e see that ¯ B  m ,  2  =  x ∈ X m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x , θ i ) 6 y ∗ −  2  ⊂ N ( K f ,  4 )  j = 1 m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x j , θ i ) 6 y ∗ −  4  . This means, in vie w of the deﬁnition of β ( m ,  ) in ( 1.9 ) β ( m ,  ) = P m ( B ( m ,  ) ) 6 P m  ¯ B  m ,  2   (†) 6 N ( K f ,  4 )  j = 1 P m  m  i = 1  ( θ i ) m i = 1 ∈ Θ m    f ( x j , θ i ) 6 y ∗ −  4   6 N ( K f ,  4 )  j = 1 P   θ ∈ Θ    f ( x j , θ ) 6 y ∗ −  4   m 6 N ( K f ,  4 )  j = 1  1 − P   θ ∈ Θ    f ( x j , θ ) > y ∗ −  4    m (‡) 6 N ( K f ,  4 )  j = 1 e xp  − m p  x j ,  4   6 N  K f ,  4  e xp  − m p ∗   4   , 16 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR as asser ted, where we ha v e emplo y ed the standard union bound in s tep (†) and the inequality 1 − z 6 e − z f or z ∈ [ 0 , 1 ] in step (‡) abo ve.  Remar k 3.2 . In the absence of any fur ther structure in the various sets that appear in the deﬁnition of B ( m ,  ) in the proof of Theorem 3.1 , it appears that the standard union bound emplo yed in step (†) in the proof abo ve is a reasonable option. How ev er, in cer tain speciﬁc cases it ma y be possible to reﬁne this particular step fur ther to ar rive at tighter bounds. Remar k 3.3 . Theorem 3.1 can be rewritten in the f ollowing wa y . Suppose we are giv en a desired accuracy lev el  and conﬁdence le v el β . W e deﬁne m (  , β ) B 1 p ∗ (  4 )  ln  1 β  + ln N  K f ,  4   . (3.3) If the number m of i.i.d samples (scenarios) drawn from Θ under P is at least m (  , β ) , then the probability of occurence of the bad set B ( m ,  ) is guaranteed to be at most β . Let us compare this estimate with that of Theorem 1.6 . W e begin by noting that N (  , β ) deﬁned in ( 1.16 ) can be re written explicitl y as (see [ CGP09 , Theorem 1]) (3.4) N (  , β ) > 2   ln  1 β  + d  . With this in mind, we can rewrite the result of Theorem 1.6 in the language of its statement as f ollo ws: If the number m of i.i.d samples drawn from Θ under P is at least N (  , β ) , then the probability of occurence of the bad set B ( m , h (  ) ) is guaranteed to be at most β . Observe that N (  , β ) samples guarantee an accuracy of h (  ) as opposed to  . T o remo ve this dependency on h (  ) and to obtain an e xplicit result, recall the deﬁnitions in ( 1.13 ), ( 1.14 ) and ( 1.15 ). If h (· ) has a w ell deﬁned in v erse h − 1 (·) , then sampling N ( h − 1 (  ) , β ) number of elements w ould guarantee an accuracy of  f or each  . How ev er such an in v erse function ma y not alwa ys e xist. Nev er thless, the r ight hand side of ( 1.15 ) in the deﬁnition of h ( ·) is a pseudo inv erse of the function p w ∗ . Indeed, if p w ∗ is monotonically increasing, then sup  δ > 0   inf x ∈ X p w ( x ,  ) 6 δ  is its in v erse in the usual sense. Due to this fact, if we emplo y p w ∗ as the in v erse of h ( ·) , deﬁne m c (  , β ) B 2 p w ∗ (  )  ln  1 β  + d  , (3.5) and sample m c (  , β ) number of i.i.d. samples from Θ , then Theorem 1.6 guarantees that the probability of occurence of the bad set B ( m ,  ) is at most β . Remar k 3.4 . Theorem 2.3 along with the estimate of Theorem 3.1 in the f or m giv en in ( 3.3 ) points to the fundamental nature of the function p ∗ (·) . On the one hand, Theorem 2.3 sa ys that if p ∗ (  ) is equal to zero, then the scenar io approach is not consistent and ev en as the number of i.i.d samples drawn approach inﬁnity , the scenar io approximation remains at least  a w a y from the true minimum. On the other hand, according to Theorem 3.1 , e v en if p ∗ (  ) is nonzero, the number of samples that need to be dra wn to guarantee an appro ximation of accuracy  grow s increasingly larg e as p ∗ (  ) goes to zero. This is reminiscent of the condition number of a matr ix in linear alg ebra; recall that a square matr ix is singular onl y if its condition number is inﬁnite. How ev er , ev en if the condition number is ﬁnite, it becomes increasingly harder to numer ically compute the in v erse of a matr ix as its condition number increases, to the e xtent that a matrix with a v ery larg e condition number is practicall y singular from a numer ical standpoint. In this sense, p ∗ (  ) is a measure of ho w well behav ed the scenar io appro ximations of a robust optimization problem are: as p ∗ (  ) decreases, the SCENARIO APPR OA CH 17 ﬁnite sample beha viour of the scenar io appro ximations also deteriorate, and ﬁnally , when p ∗ (  ) becomes zero, the performance deteriorates so much that ev en consistency is lost. § 3.2. Scenario bounds for bandlimited trigonometric functions. Here we employ The- orem 3.1 of the previous section to der iv e bounds on the probability of the “bad set” B ( m ,  ) in the situation where Θ is an n -dimensional hypercube and the set of functions K f = { f ( x , · ) : Θ − → R } x ∈ X ⊂ C ( Θ ) is a bounded subset of the linear subspace of trigo- nometric polynomials of bandwidth M . More precisely , our premise f or this subsection is the f ollowing: Assumption 3.5 . In the conte xt of the problem ( 1.1 ) and its associated data, we stipulate that: (i) Θ = [− 1 , 1 ] n ⊂ R n , (ii) For each x ∈ X , the function f ( x , ·) : Θ − → R is a trigonometr ic polynomial of bandwidth M . In other words, f ( x , θ ) =  k ∈ [− M , M ] n ∩ Z n b k ( x ) sin ( 2 π h k , θ i ) + c k ( x ) cos ( 2 π h k , θ i ) , (iii) sup x ∈ X k f ( x , ·) k 2 C B < + ∞ . The set of tr igonometr ic polynomials of bandwidth M is a ( 2 M + 1 ) n dimensional subspace of C ( Θ ) , and theref ore, any bounded subset of it is precompact. Consequently , Theorem 3.1 applies to this situation. In the follo wing Lemma, whose proof is deferred to Appendix B , w e pro vide estimates of the cov ering number N  K f ,   . Lemma 3.6. Suppose that Assumption 3.5 holds. Let K f be as deﬁned in ( 3.1 ) and deﬁne D B ( 2 M + 1 ) n . Then N  K f ,   6 1 D  π D 2 2  D   2 B  − 2 D . Lemma 3.6 in conjunction with Theorem 3.1 give us the f ollo wing bound on the prob- ability of the “bad set ” B ( m ,  ) deﬁned in ( 1.8 ). Theorem 3.7. Consider the problem ( 1.1 ) along with its associated data and suppose that Assumption 1.3 holds. In addition, suppose that the family of functions K f deﬁned in ( 3.1 ) satisﬁes Assumption 3.5 , and deﬁne D B ( 2 M + 1 ) n . Then, for β ( m ,  ) as deﬁned in ( 1.9 ) , w e hav e β ( m ,  ) 6 1 D  π D 2 2  D   8 B  − 2 D e xp  − m p ∗   4   . Proof. The asser tion f ollow s readily after substituting the estimate f or the co vering number N  K f ,   giv en in Lemma 3.6 in Theorem 3.1 .  Remar k 3.8 . As mentioned in Remark 3.3 , we can re wr ite the result of Theorem 3.7 in ter ms of the number of samples required to achie v e a desired lev el of accuracy and conﬁdence. As bef ore, suppose we are given an accuracy lev el  and a conﬁdence lev el β . If we draw m (  , β ) B 1 p ∗ (  4 )  ln  1 β  + 2 D ln  1   + ( 2 D − 1 ) ln D + D ln  32 B 2 π   . (3.6) number of i.i.d samples from Θ under P , then the probability of occurence of the bad set B ( m ,  ) is guaranteed to be less than β . 18 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR § 3.3. Scenario bounds for smooth functions on the n -torus. W e apply Theorem 3.1 to the problem of deter mining bounds on the probability of the “bad set” B ( m ,  ) when the uncertainty set is an n -dimensional torus T n and the cost function f is smooth with respect to the uncertain parameters. F ormally , w e ask: Assumption 3.9 . In the conte xt of the problem ( 1.1 ) and its associated data, we stipulate that: (i) Θ = T n ; (ii) there exis ts an integer p > n 2 such that f or each x ∈ X , the function f ( x , ·) : Θ − → R is a p -times continuously diﬀerentiable function on Θ ; (iii) sup x ∈ X k f ( x , ·) k 2 C B < + ∞ ; (iv) sup x ∈ X  n i = 1    ∂ p f ( x , ·) ∂θ p i    2 C B d < + ∞ . In the f ollowing Lemma, whose proof is pro vided in Appendix B , we provide estimates of the cov er ing number N  K f ,  2  that sho w that under Assumption 3.9 , Theorem 3.1 applies to K f . Lemma 3.10. Suppose that Assumption 3.9 holds. Let K f be as deﬁned in ( 3.1 ) and deﬁne D (  ) B  2   12 √ 2 B d 2 n ( 2 π ) p   1 p − n 2  + 1  n . Then, N  K f ,  4  6 1 D (  )  π D (  ) 2 2  D (  )   24 B  − 2 D (  ) . Lemma 3.10 in conjunction with Theorem 3.1 giv e us the f ollo wing bound on the probability of the “bad set ” B ( m ,  ) . Theorem 3.11. Consider the problem ( 1.1 ) along with its associated data and suppose that Assumption 1.3 holds. In addition, suppose that the family of functions K f deﬁned in ( 3.1 ) satisﬁes Assumption 3.9 . Then for β ( m ,  ) as deﬁned in ( 1.9 ) , we have (3.7) β ( m ,  ) 6 1 D (  )  π D (  ) 2 2  D (  )   24 B  − 2 D (  ) e xp  − m p ∗   4   , wher e D (  ) =  2   12 √ 2 B d 2 n ( 2 π ) p   1 p − n 2  + 1  n . Proof. The result follo ws by substituting the estimate f or the co vering number N  K f ,   giv en in Lemma 3.6 in Theorem 3.1 .  Remar k 3.12 . As mentioned in R emark 3.3 , we can rewrite the result of Theorem 3.11 in terms of the number of samples required to ac hiev e a desired lev el of accuracy and conﬁdence. As bef ore, suppose w e are given an accuracy lev el  and a conﬁdence lev el β . If w e dra w m (  , β ) B 1 p ∗ (  4 )  ln  1 β  + 2 D (  ) ln  1   + ( 2 D (  ) − 1 ) ln D (  ) + D (  ) ln  72 B 2 π   . number of i.i.d samples from Θ under P , then the probability of occurence of the bad set B ( m ,  ) is guaranteed to be less than β . SCENARIO APPR OA CH 19 Appendix A. Proofs of Lemma 1.2 and Pr oposition 2.6 Proof of Lemma 1.2 . W e ﬁrst show that ess sup θ ∈ Θ f ( x , θ ) 6 sup θ ∈ Θ s f ( x , θ ) . Indeed, since Θ s ⊂  θ ∈ Θ     f ( x , θ ) 6 sup θ ∈ Θ s f ( x , θ )  , w e ha v e P   θ ∈ Θ     f ( x , θ ) 6 sup θ ∈ Θ s f ( x , θ )   > P ( Θ s ) = 1 , which implies that ess sup θ ∈ Θ f ( x , θ ) 6 sup θ ∈ Θ s f ( x , θ ) . No w w e demonstrate that ess sup θ ∈ Θ f ( x , θ ) + δ > sup θ ∈ Θ s f ( x , θ ) f or any δ > 0 . Indeed, b y deﬁnition of the essential supremum, P   θ ∈ Θ | f ( x , θ ) 6 ess sup θ ∈ Θ f ( x , θ ) + δ   = 1 . Lo wer semicontinuity of f sho ws that the set  θ ∈ Θ     f ( x , θ ) 6 ess sup θ ∈ Θ f ( x , θ ) + δ  is closed. Since Θ s is the smallest closed set of probability 1, Θ s ⊂  θ ∈ Θ     f ( x , θ ) 6 ess sup θ ∈ Θ f ( x , θ ) + δ  , which in tur n means that sup θ ∈ Θ s f ( x , θ ) 6 ess sup θ ∈ Θ f ( x , θ ) + δ . Since the preceding statement is valid f or any δ > 0 , we hav e ess sup θ ∈ Θ f ( x , θ ) > sup θ ∈ Θ s f ( x , θ ) , yielding the assertion and completing the proof.  Proof of Proposition 2.6 . Firstl y , we show that for each ﬁx ed x ∈ X the set  θ ∈ Θ   f ( x , θ ) > y ∗ −   ⊂ Θ is open and nonempty . Since f is l.s.c, this set is clearly open since its complement is the sublev el set of a l.s.c function with the variable x held ﬁxed. By deﬁnition of the supremum, there exis ts θ ∈ Θ such that f ( x , θ ) > ˆ f ( x ) −  . Along with the f act that ˆ f ( x ) > y ∗ f or all x , this implies that the set is also nonempty . By Assumption 1.3 we see that the tail probability p ( x ,  ) = P ( { θ ∈ Θ | f ( x , θ ) > y ∗ −  } ) > 0 . Second, we sho w that the tail probability p ( x , θ ) is l.s.c in x . T o this end, ﬁx  ∈ [ 0 , 1 ] and observe that for each ﬁx ed x ∈ X , p ( x ,  ) = P ( f ( x , ·) > y ∗ −  ) = E  1 { f ( x , ·) > y ∗ −  }  . 20 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR Fix x ∈ X and pick a sequence ( x n ) + ∞ n = 1 in X conv erging to x , i.e., lim n → + ∞ x n = x . W e claim that lim inf n → + ∞ 1 { f ( x n , ·) > y ∗ −  } > 1 { f ( x , ·) > y ∗ −  } . Fix θ ∈ Θ . If f ( x , θ ) 6 y ∗ −  , then the preceding inequality is obvious, so let us assume that f ( x , θ ) > y ∗ −  . By low er semicontinuity of f , lim inf n → + ∞ f ( x n , θ ) > f ( x , θ ) > y ∗ −  , which means that for all n suﬃciently larg e, f ( x n , θ ) > y ∗ −  , and this in turn means lim inf n → + ∞ 1 { f ( x n , ·) > y ∗ −  } = 1 = 1 { f ( x , ·) > y ∗ −  } . It f ollow s that E  lim inf n → + ∞ 1 ( f ( x n , ·) > y ∗ −  )  > E  1 ( f ( x , ·) > y ∗ −  )  . By F atou’ s lemma [ DiB16 , Lemma 8.1, Chapter IV] we hav e lim inf n → + ∞ E  1 { f ( x n , ·) > y ∗ −  }  > E  lim inf n →∞ 1 { f ( x n , θ ) > y ∗ −  }  , and combining the inequalities we see that lim inf n → + ∞ E  1 { f ( x n , ·) > y ∗ −  }  > E  1 { f ( x , ·) > y ∗ −  }  , thereb y establishing that lim inf n → + ∞ p ( x n ,  ) > p ( x ,  ) . Since ( x n ) + ∞ n = 1 and x were arbitrar y , lo wer semicontinuity of p (· ,  ) f ollo w s f or each ﬁx ed  . This completes the proof of the ﬁrs t claim. Finall y , since p (· ,  ) is l.s.c f or each ﬁxed  , it attains its minimum on any compact subset of X b y W eiers trass ’ theorem [ DiB16 , Theorem 7.1, Chapter II] , which pro ves the second statement of our theorem.  Appendix B. Proofs of Lemma 3.6 and Lemma 3.10 Proof of Lemma 3.6 . W e estimate the cov er ing number N  K f ,   under the conditions of Assumption 3.5 . T o star t, w e deﬁne the set of trigonometr ic polynomial of bandwidth M : P M B  p : Θ − → R      p ( θ ) =  k ∈ [− M , M ]∩ Z n a k sin ( 2 π h k , θ i ) + b k cos ( 2 π h k , θ i )  , and deﬁne the L 2 -ball of radius B in P M b y P B M B  p ∈ P M   k p k 2 6 B  , where the L 2 -norm is deﬁned in the standard wa y . The f ollo wing technical lemma is needed to prov e our estimate of the co v er ing number N  K f ,   : Lemma B.1. Let ( M , ρ ) be a metric space and let U ⊂ M be a subset. Then, f or eac h  > 0 , N ( U ,  ) 6 2 N  M ,  2  . SCENARIO APPR OA CH 21 Proof. Pic k U ⊂ M and let ( a i ) N ( M ,  2 ) i = 1 ⊂ M be an  2 -co ver of M . After reorder ing these points if necessary , suppose that for i = 1 , . . . , p 6 N  M ,  2  , there e xists a b i ∈ U such that ρ ( a i , b i ) <  2 and that f or each i > p there e xists no b ∈ U such that ρ ( a i , b ) <  2 . W e claim that ( b i ) p i = 1 ⊂ U is an  -co v er of U . T ake an arbitrary element b ∈ U . By deﬁnition of a co ver , there exis ts an a i with i 6 p such that ρ ( a i , b ) <  2 . W e also kno w that ρ ( a i , b i ) 6  2 . Then, b y the triangle inequality , ρ ( b , b i ) 6 ρ ( b , a i ) + ρ ( a i , b i ) 6  . Theref ore, ( b i ) p i = 1 ⊂ U is an  -co v er of U and consequently N ( U ,  ) 6 p 6 N  M ,  2  .  Continuing with our proof of Lemma 3.6 , we see that the conditions of Assumption 3.5 are equiv alent to saying that K f ⊂ P B M , and w e kno w [ BG04 , pg. 787] that N  P B M ,  2  6 1 D  π D 2 2  D   2 B  − 2 D . These observations in conjunction with Lemma B.1 pro ves that N  K f ,   6 1 D  π D 2 2  D   2 B  − 2 D , as asserted.  The rest of this section will be dev oted to pro ving Lemma 3.10 that provides the ke y estimate of the cov er ing number N  K f ,   , where K f , the f amily of functions deﬁned in ( 3.1 ), satisﬁes Assumptions 3.9 . W e will need some preliminaries on Fourier analy sis on tori in order to deter mine these estimates. W e realize T 1 as the quotient R / Z ; this wa y we get on T n a measure, henceforth denoted b y µ , induced by the Lebesgue measure on R . If L is the R -v ector space of measurable functions g : T n − → R such that  T n | g ( x ) | 2 d µ < + ∞ , then by identifying functions in L that diﬀer on sets of µ -measure 0 , we g et the space L 2 ( T n ) of square-integrable R -valued functions on T n . It is a standard fact that L 2 ( T n ) is a Hilbert space when equipped with the inner product L 2 ( T n ) × L 2 ( T n ) 3 ( g 1 , g 2 ) 7− → h g 1 , g 2 i B  T n g 1 ( x ) g 2 ( x ) d µ , and the cor responding induced L 2 -norm k g k L 2 ( T n ) is deﬁned b y k g k L 2 ( T n ) B  h g , g i . For a positive integ er p w e denote p -times continuously diﬀerentiable functions on the tor us T n b y C p ( T n ) . For ξ = ( ξ 1 , . . . , ξ n ) ∈ Z n the ξ -th Fourier coeﬃcients of g ∈ L 2 ( T n ) is deﬁned b y (B.1)  g ( ξ ) B  T n g ( x ) e − 2 π i h ξ, x i d µ , which per mit us to represent g via its Fourier series given by g ] ( x ) B  ξ ∈ Z n  g ( ξ ) e 2 π i h ξ, x i , 22 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR where the conv erg ence of the aforementioned sum is understood in the L 2 -norm sense. It is w ell kno wn [ DM72 ] that when g ∈ C 1 ( T n ) , then its Fourier ser ies conv erg es pointwise and g = g ] ( x ) . Moreo v er , the follo wing Plancherel identity is valid f or all g ∈ L 2 ( T n ) : (B.2) k g k 2 L 2 ( T n ) =  ξ ∈ Z n |  g ( ξ ) | 2 . If g ∈ L 2 ( T n ) ∩ C 1 ( T n ) , then the Fourier series of g is related to that of its par tial derivativ e ∂g ∂θ j along the j -th direction by the f or mula (B.3)  ∂ g ∂ θ j ( ξ ) = − 2 π i ξ j  g ( ξ ) . If p is a positiv e integer and g ∈ L 2 ( T n ) ∩ C p ( T n ) , then appl ying the preceding f ormula repeatedly p -times we get (B.4)       ∂ p g ∂ θ p j ( ξ )      = | 2 π ξ j | p |  g ( ξ ) | . W e will also make use of the f ollowing result reg arding co v ering numbers in our discus- sion in this section. Lemma B.2. Let ( M , ρ ) be a metric space and let φ : M − → M be a map satisfying ρ ( a , φ ( a )) 6  f or all a ∈ M . If U ⊂ M is a subset, then N ( U , 3  ) 6 N ( φ ( U ) ,  ) . Proof. Pic k U ⊂ M and let  φ ( a i )  N ( φ ( U ) ,  ) i = 1 ⊂ φ ( U ) be an  -co v er of φ ( U ) . W e demonstrate that ( a i ) N ( φ ( U ) ,  ) i = 1 ⊂ U is a 3  -co v er of U . T o see this, let a be an arbitrar y element of U . Then f or some i , we hav e ρ ( a , a i ) 6 ρ ( a , φ ( a )) + ρ ( φ ( a ) , φ ( a i )) + ρ ( φ ( a i ) , a i ) 6 3  . The assertion follo ws.  Recall that f or ξ ∈ Z n , the inﬁnity norm is deﬁned as k ξ k ∞ B max i = 1 , . .., n | ξ i | . For a multiinde x ξ , we deﬁne ξ ∞ is any element in ar g max i = 1 , . .., n | ξ i | . It is a s tandard fact that for a giv en m ∈ N ∗ , the number of ξ ∈ Z n with k ξ k ∞ = m is given by (B.5) c ( m ) B ( 2 m + 1 ) n − ( 2 m − 1 ) n = n  j = 0  n j  2 n − j m n − j ( 1 − (− 1 ) j ) = d n 2 e  j = 0  n 2 j + 1  2 n − 2 j m n − 2 j − 1 6 4 n m n − 1 . Recalling the deﬁnitions of p and n in Assumption 3.9 , we will ha v e occasion to employ the fact that (B.6)  m > N 1 m 2 p − n + 1 6 2 N 2 p − n . SCENARIO APPR OA CH 23 T o see why this is tr ue, observe that  m > N N 2 p − n m 2 p − n + 1 6  m > N N 2 p − n m 2 p − n + 1 = 1 N  m > 0 N 2 p − n + 1 ( N + m ) 2 p − n + 1 = 1 N  m > 0 1  1 + m N  2 p − n + 1 . (B.7) The r ight-hand side of ( B.7 ) is the upper Darboux sum of the function x 7− →  1 1 + x  2 p − n + 1 , and hence decreases with increasing N ; in par ticular , if N = 1 , it is equal to  m > 1 1 m 2 p − n + 1 . Consequentl y , (B.8)  m > N N 2 p − n m 2 p − n + 1 6  m > 1 1 m 2 p − n + 1 (†) 6  m > 1 1 m 2 6 2 , where the inequality in step (†) f ollow s from the assumption that 2 p − n > 1 . ( B.6 ) now f ollow s directly from ( B.8 ). W e begin our proof of Lemma 3.10 with the follo wing Lemma. Lemma B.3. Consider the problem ( 1.1 ) , and suppose that the set of functions K f deﬁned in ( 3.1 ) and the uncertainty set Θ satisfy Assumption 3.9 . Then, for every  > 0 , if N ∈ Z is pic ked suc h that N >  √ 2 B d 2 n ( 2 π ) p   1 p − n 2 , then sup x ∈ X       f ( x , ·) −  ξ ∈[− N , N ] n ∩ Z n  f ( x , ·) ( ξ ) e 2 π i h ξ, · i       u 6  . Proof. F or any x ∈ X and θ ∈ Θ ,       f ( x , θ ) −  ξ ∈[− N , N ] n ∩ Z n  f ( x , ·) ( ξ ) e 2 π i h ξ, θ i       =        k ξ k ∞ > N  f ( x , ·) ( ξ ) e 2 π i h ξ, · i       6  k ξ k ∞ > N     f ( x , ·) ( ξ )    =  k ξ k ∞ > N       ∂ p f ( x , ·) ∂ θ p ξ ∞ ( ξ )          1 ( 2 π k ξ k ∞ ) p     , where the last equality f ollo w s from ( B.4 ). From the Schwarz inequality and Assumption 3.9 w e get the estimate (B.9)  k ξ k ∞ > N       ∂ p f ( x , ·) ∂ θ p ξ ∞ ( ξ )          1 ( 2 π k ξ k ∞ ) p     6      k ξ k ∞ > N       ∂ p f ( x , ·) ∂ θ p ξ ∞ ( ξ )      2     k ξ k ∞ > N     1 ( 2 π k ξ k ∞ ) p     2 6       k ξ k ∞ > N     1 6 j 6 n       ∂ p f ( x , ·) ∂ θ p j ( ξ )         2     k ξ k ∞ > N     1 ( 2 π k ξ k ∞ ) p     2 24 M. ASSIF P . K., D. CHATTERJEE, AND R. B ANA V AR 6  1 6 j 6 n      k ξ k ∞ > N       ∂ p f ( x , ·) ∂ θ p j ( ξ )      2     k ξ k ∞ > N     1 ( 2 π k ξ k ∞ ) p     2 6 B d     k ξ k ∞ > N     1 ( 2 π k ξ k ∞ ) p     2 . T o estimate the sum on the r ight hand side of ( B.9 ), we recall the deﬁnition of c ( m ) in ( B.5 ) and arr ive at (B.10)     k ξ k ∞ > N     1 k ξ k p ∞     2 =   m > N c ( m ) 1 m 2 p 6  4 n  m > N 1 m 2 p − n + 1 6  4 n 2 N 2 p − n . Putting all the abo ve estimates together , w e get (B.11) sup x ∈ X       f ( x , ·) −  ξ ∈[− N , N ] n ∩ Z n  f ( x , ·) ( ξ ) e 2 π i h ξ, · i       u 6 √ 2 B d 2 n ( 2 π ) p  1 N 2 p − n . Since N >  √ 2 B d 2 n ( 2 π ) p   1 p − n 2 , b y h ypothesis, ( B.11 ) giv es us (B.12) sup x ∈ X       f ( x , ·) −  ξ ∈[− N , N ] n ∩ Z n  f ( x , ·) ( ξ ) e 2 π i h ξ, · i       u 6  , pro ving the asser tion.  W e are ﬁnally ready f or the Proof of Lemma 3.10 . Proof of Lemma 3.10 . Pick  > 0 , and consider the famil y of functions K  f B   ξ ∈[− N , N ] n ∩ Z n  f ( x , ·) ( ξ ) e 2 π i h ξ, · i : Θ → R      x ∈ X  , where N =   12 √ 2 B d 2 n ( 2 π ) p   1 p − n 2  with the various constants as deﬁned in Lemma B.3 and the discussion bef ore it. Lemma B.3 in conjunction with Lemma B.2 gives us (B.13) N  K f ,  4  = N  K  f ,  12  . Observe that by the Plancherel’ s identity ( B.2 ), for each x ∈ X we ha ve  ξ ∈[− N , N ] n ∩ Z n |  f ( x , ·) ( ξ ) | 2 6  ξ ∈ Z n |  f ( x , ·) ( ξ ) | 2 6 k f ( x , ·) k L 2 ( T n ) 6 B . This means that K  f is a family of bandlimited trigonometr ic pol ynomials with bounded L 2 -norm. Consequentl y , Lemma 3.6 applies to K  f , and w e ha v e the estimate N  K f ,  2  6 N  K  f ,  12  6 1 D (  )  π D (  ) 2 2  D (  )   12 B  − 2 D (  ) , SCENARIO APPR OA CH 25 where D (  ) =  2   12 √ 2 B d 2 n ( 2 π ) p   1 p − n 2  + 1  n . This prov es the asser tion, thereby completing our proof.  References [BG04] R. Bass and K. Gröchenig. Random sampling of multivariate tr igonometric polynomials. SIAM Journal on Mathematical Analysis , 36(3):773–795, 2004. [BTN98] A. Ben- T al and A. Nemirov ski. Robust conv ex optimization. Mathematics of Operations Researc h , 23:769–805, 1998. [BTN99] A. Ben-Tal and A. Nemiro vski. Robust solutions of uncer tain linear prog rams. Operations Researc h Letter s , 25:1–13, 1999. [BTN01] A. Ben-Tal and A. Nemiro vski. Robust solutions of uncertain conic and quadratic programs. Solutions of Uncertain Linear Prog rams: Math. Prog ram , pages 351–376, 2001. [CC05] G. Calaﬁore and M. C. Campi. Uncertain conv ex programs: randomized solutions and conﬁdence lev els. Mathematical Prog ramming , 102(1):25–46, 2005. [CC06] G. Calaﬁore and M. C. Campi. The scenario approach to robust control design. IEEE T ransactions on Automatic Control , 51(5):742–753, 2006. [CG08] M. C. Campi and S. Garatti. Exact feasibility of randomized solutions of robust conv ex prog rams. SIAM Journal on Optimization , 19(3):1211–1230, 2008. [CGP09] M. C. Campi, S. Garatti, and M. Prandini. The scenar io approach f or sys tems and control design. Annual Reviews in Contr ol , 33(2):149–157, 2009. [CGR18] M. C. Campi, S. Garatti, and F . Ramponi. A general scenario theory f or nonconv ex optimization and decision making. IEEE T ransactions on Automatic Control , 63(12):4067–4078, 2018. [CZ07] F . Cuc ker and D.-X. Zhou. Learning Theor y : An Appro ximation Theore tic Viewpoint . Cambr idge Monographs on Applied and Computational Mathematics. Cambr idge Univ ersity Press, 2007. [DiB16] E. DiBenedetto. Real Analysis . Birkhauser , Basel, 2nd edition, 2016. [DM72] H. Dym and H. P . McKean. F ourier Series and Integr als . Springer - V erlag, New Y ork, 1972. [ESL15] P . M. Esfahani, T . Sutter , and J. L yg eros. Performance bounds f or the scenario approach and an extension to a class of non-conv ex programs. IEEE T ransactions on Automatic Control , 60(1):46–58, 2015. [Par67] K. R. Parthasarathy . Probability Measures on Metric Spaces . A cademic Press, 1967. [Ram18] F . A. Ramponi. Consistency of the scenario approach. SIAM Jour nal on Optimization , 28(1):135–162, 2018.

Scenario approach for minmax optimization with emphasis on the nonconvex case: positive results and caveats

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment