An Invariance Principle for Polytopes

Let X be randomly chosen from {-1,1}^n, and let Y be randomly chosen from the standard spherical Gaussian on R^n. For any (possibly unbounded) polytope P formed by the intersection of k halfspaces, we prove that |Pr [X belongs to P] - Pr [Y belongs…

Authors: Prahladh Harsha, Adam Klivans, Raghu Meka

An In v ariance Principle for P oly top es PRAHLADH HARSHA T ata Institute of F undamen tal Researc h, Mum bai AD AM KLIV ANS Univ ersit y of T exas, Austin RA GHU MEKA Univ ersit y of T exas, Austin No v ember 1, 2018 Abstract Let X be randomly c hos en from {− 1 , 1 } n , and let Y b e ra ndo mly c hosen from the standa r d spherical Gaussian on R n . F or any (pos sibly unbounded) polyto pe P formed b y the intersection of k halfspaces, we prove that | Pr [ X ∈ P ] − Pr [ Y ∈ P ] | ≤ log 8 / 5 k · ∆ , where ∆ is a par ameter that is s mall for p olytop es formed by the intersection of “ r egular” halfspaces (i.e., halfspaces with low influence). The nov elty of o ur inv aria nce principle is the po lylogar ithmic dep e ndence on k . Previous ly , only bo unds that w e r e at least linear in k w er e known. The pro o f of the inv a r iance principle is based on a generalizatio n of the Lindeber g metho d for proving cen tr al limit theorems a nd co uld b e o f use elsewhere. W e give tw o imp ortant applications of our in v ariance principle, one from learning theory and the o ther from pseudora ndomness: 1. A b ound of log O (1) k · ε 1 / 6 on the Bo olea n noise sensitivity of int er sections o f k “regula r” halfspaces (previous work gav e bo unds linea r in k ). This gives a co rresp onding agnostic learning algorithm for in ters e c tions of regular halfspac e s. 2. A pseudo random g enerator (P R G) for estimating the Gaussian volume o f p olytop es with k faces within er ror δ and s e ed-length O (log n p oly(lo g k , 1 /δ )). W e also o btain PRGs with similar parameters that fo ol p olytop es for med by in ters ection of regular halfspace s o ver the hypercub e. Using our PRG constructions, we obtain the fir st deter- ministic quasi- p oly nomial time a lgorithms for approximately co un ting the num b er of solutio ns to a br oad class of in teger progr ams, including dense cov ering pr oblems a nd c o nt ing ency tables . 1 In tro du ction: In v ariance Principles in Theoretical Computer Science An imp ortan t theme in theoretical computer science o v er th e last t w o decades has b een the us e- fulness of translating a combinato r ial problem ov er a discrete domain (e.g., {− 1 , 1 } n ) to a problem in con tinuous space. The notion of con ve x r elaxatio n , for example, is no w a standard tec hnique in the d esign of algo r ithms for optimization problems. More recen tly , the stud y of analytic pr op erties of Bo olean fu nctions (e.g., F ourier sp ectra and sen sitivit y) h as b een a fundamenta l to ol for proving results in hardness of appro ximation [ dW08 , O ’D08 ] and learning theory [ Man94 ]. 1 The influenti al work of Mossel, O’Donnell, and Oleszkiewicz [ MOO05 ] p ro ving the “Ma jority Is Stablest” conjecture h as led to a ric h collect ion of hardn ess results for constrain t satisfaction problems, most notably for the Max-Cut problem. The crux of their work is an inv ariance principle relating the b ehavio r of lo w -d egree p olynomials ov er the uniform measure on {− 1 , 1 } n to their b eha vior with r esp ect to Gaussians: Theorem 1.1 (in v ariance principle f or p olynomials [ MOO05 ]) . 1 L et P b e a multiline ar p olynomial such that k P k = 1 . Then, for any t ∈ R ,     Pr x ∈ u {− 1 , 1 } n [ P ( x ) > t ] − Pr x ←N n [ P ( x ) > t ]     ≤ τ . Here N n is the standard m ultiv ariate spherical Gaussian distrib ution on R n ; the parameter τ dep ends on the coefficients of P and is small if P is “regular” in the sense that the “influence” of eac h v ariable in P is small. Roughly sp eaking, the ab o ve in v ariance principle sa ys that the cumulativ e distribution function (cdf ) of a p olynomial ov er {− 1 , 1 } n is close to the cdf of a p olynomial o ver N n if th e co efficien ts of the p olynomial are sufficien tly regular. Additionally , the ab o ve in v ariance p rinciple and its generaliz ations in [ Mos08 ] ha ve had a w ealth of p ow erful applications in th e follo wing areas: hardness of appro ximation (see [ DMR09 , Aus07 , AM09 , Rag08 , O W09 , BK10 ], hardness of learning (see [ F GR W09 ]), s o cial c hoice theory (see [ Mos11 ]), testing (see [ BO10 ]), graph pro ducts (see [ DFR08 ]) and the analysis of Bo olean functions (see [ DHK + 10 ]) among others. Inv ariance principles are no w widely consider ed to b e p o we r ful to ols in computational complexit y theory . As suc h, it is im p ortan t to con tin ue to und er s tand, qu an tita- tiv ely , ho w a fun ction’s cum ulativ e d en sit y function c h an ges when translating from one underlyin g distribution to another. 1.1 An In v ariance Principle for P olytop es The main result of this pap er is an in v ariance prin ciple for charact eristic fun ctions of p olytopes. Recall that a p olytop e K is a (p ossibly unb ounded) conv ex set in R n formed b y the inte r s ection of some fin ite num b er of sup p orting halfspaces. W e refer to K as a k -p olytop e if it is equal to the in tersection of k halfspaces. T o state our result we need th e notion of r e gularity . Definition 1.2 (regula r it y) . A ve ctor u ∈ R n is ε -r e gular if P i u 4 i ≤ ε 2 k u k 2 2 . A matrix W ∈ R n × k is ε -r e gular if every c olumn of W is ε -r e gular. A p olytop e K = { x : W T x ≤ θ } is ε -r e g u lar if W is ε -r e gular 2 . W e will require our p olytop es to b e sufficien tly regular to apply our inv ariance p rinciple. Th is is necessary even in the case of a single halfspace; the function 1 x 1 + 0 x 2 + . . . + 0 x n = x 1 where eac h x i ∈ {− 1 , 1 } will never con verge to a Gaussian (this lin ear fun ction is highly non-regular). Note that regularit y do es not dep end on the threshold vect or θ . Our main theorem is as follo ws (see Theorem 3.1 for exact statemen t): 1 Similar inv ariance principles w ere also shown b y Chatterjee [ Cha05 ] and Rotar [ Rot79 ]. 2 “Regular p olytop es” ha ve a different meaning in com binatorics, but for the purp ose of th is paper, we will abuse notation and sa y a p olytop e is ε -regular if it is formed by the in tersection of ε -regular h alfspaces as in Definition 1.2 . 2 Theorem 1.3 (inv ariance principle for p olytop es) . F or K a ε -r e gular k - p olytop e,     Pr x ∈ u {− 1 , 1 } n [ x ∈ K ] − Pr x ←N n [ x ∈ K ]     ≤ C log 8 / 5 k · ε 1 / 6 . Our in v ariance principle also holds more ge n erally f or an y pro du ct distribution that is hyp er- con tractiv e and whose firs t four momen ts are appropr iately b ounded (often in this pap er w e f o cus on the sp ecial case of uniform on {− 1 , 1 } n ). The no velt y of our theorem is the dep enden ce of th e error on k . App lyin g a recen t resu lt due to Mossel [ Mos08 ], it is p ossible to obtain a statemen t similar to Theorem 1.3 with an error term that has a p olynomial dep enden ce on k . Achieving p olylogarithmic dep endence on k , h ow ev er, is m u c h harder, and we need to use some nontrivia l results from the analysis of conv ex sets in Gaussian space. W e remark that our result is optimal up to p olylogarithmic factors: an y inv ariance pr in ciple as ab o v e cannot h av e an error b oun d of o ( ε · √ log k ) (see Section 5 ). The case k = 1, a single halfspace, is equiv alen t to th e classica l Berry-Ess´ een theorem [ F el68 ], a fund amen tal theorem from probability and statistic s giving a quantitat ive ve r sion of the Central Limit Th eorem. W e can therefore view our prin ciple as a generalization of the Berry-Ess´ een theorem for p olytop es. 1.2 Applications of Our In v ariance Principle While w e b elieve the state ment of our main theorem is in teresting in and of itself, we app ly our in v ariance principle to obtain striking new resu lts in v arious subfields of theoretical computer science: • The Analysis of Bo olean F unctions: w e giv e n ew b ound s on the Noise Sensitivity of charac- teristic fu nctions of p olytop es. • Learning Theory: w e giv e the b est kn o wn algorithms for (agnostic ally) learning in tersections of h alfsp aces with r esp ect to the uniform d istribution on {− 1 , 1 } n . • Pseudorandomness: we build pseudorandom generato r s for p olytop es and giv e the fi rst de- terministic algorithms for approximate ly coun ting th e n umb er of s olutions to broad classes of integ er p rograms. W e elab orate on these applications b elo w. More generally , our m ain theorem giv es new insigh t on the stru cture of in teger p oints in p olytop es (that is, solutions to in teger programs). Und er - standing this stru cture is an imp ortant topic in computer science [ BV08 ], optimization [ Zie95 ], and com binatorics [ BR07 ], and w e b eliev e our inv ariance p r inciple will find man y future applications. 1.3 Application: Bounding the Noise Sensitivit y of I n tersections of Halfspaces The noise sens itivit y of Bo olean functions, introdu ced in the seminal w orks of Kahn, Kalai and Linial [ KKL88 ] and Benjamini, Kalai and Sc h ramm [ BKS99 ], is an imp ortan t notion in the analysis of Bo olean fun ctions. Roughly sp eaking, the n oise sensitivit y of a Bo olean function f measures the probability o ver a randomly chosen input x that f c hanges sign if eac h bit of x is fl ipp ed indep end en tly with p robabilit y δ . 3 Bounds on the n oise sensitivit y of Bo olean fu n ctions h a v e direct app lications in hardn ess of appro ximation [ H ˚ as01 , K KMO07 ], hardness amplification [ O’D04 ], circuit complexit y [ LMN93 ], the theory of so cial choic e [ Kal05 ], and quan tum complexity [ Shi00 ]. Here, w e fo cus on app lications in learning theory , wh er e it is kno wn that b ou n ds on the noise sensitivit y of a class of Boolean functions yield learning algorithms that su cceed in harsh noise mo dels suc h as the agnostic mo del of learning [ KKMS08 ]. A d ir ect application of our inv ariance p r inciple Theorem 1.3 giv es the follo wing new b ound on the n oise sensitivit y of intersectio n s of regular halfspaces: Theorem 1.4 (noise sensitivit y of intersect ions of h alfspaces) . L et f b e c ompute d by the interse ction of k , ε -r e gu lar halfsp ac es. Th en the Bo ole an noise sensitivity of f for noise r ate ε is at most (log k ) O (1) · ε 1 / 6 . The curr en t b est b ound for the n oise sensitivit y of intersect ion of k arbitrary halfspaces is O ( k √ ε ). This b oun d is ob tained by starting w ith the √ ε noise sens itivit y b oun d for a single halfspace due to P eres [ P er04 ] and applying a union b ound o v er k halfspaces. On the other h and, optimal b ounds of Θ( √ log k √ ε ) for the related Gaussian noise sensitivit y w ere obtained recen tly b y Kliv ans, O’Donnell and Serv edio [ K OS 08 ]. Our resu lt is an imp ortan t step to w ards imp ro ving noise sensitivit y b ounds for inte r sections of arbitrary (not necessarily regular) halfspaces. W e b eliev e that the r ight order for Bo olean noise sensitivit y of in tersection of k halfspaces is Θ( √ log k √ ε ) as w ell. 1.4 Application: Learning In tersections of H alfspaces W e giv e new result for agnostica lly learning intersec tions of halfspaces with resp ect to the un iform distribution on {− 1 , 1 } n . Lea r n ing in tersections of h alfsp aces (i.e., conv ex sets) is a fun damen tal c hallenge from learning theory . Distribution-free learning of ev en an int ers ection of tw o halfspaces remains a c hallenging op en problem. A natur al restriction of the problem is to assume the underly- ing d istribution is uniform ov er {− 1 , 1 } n (this can b e seen to b e more difficult than the case wh er e the u nderlying distribu tion is Gaussian). Applying a result of Kalai et al. [ KKMS08 ] and Kliv ans et al. [ KOS04 ], Theorem 1.4 implies the follo w in g: Theorem 1.5 (learning int ers ections of h alfspaces) . The c onept class of interse ctions of k half- sp ac e s ar e agnostic al ly le arnable with r esp e ct to the uniform distribution o n { − 1 , 1 } n in time n (log O (1) k ) for any c onstant err or p ar ameter. Agnostic learning corresp ond s to learning with adv ersarial noise (see Section 2.1 for a pr ecise definition). In particular, in tersections of {− 1 , 1 } halfspaces (orien ted ma j orities) are ε -regular and fall in to this class. T he pr evious b est algorithm for learning these concept classes, ev en in the easier P A C mo del, r an in time n O ( k 2 ) ([ K OS 04 , KKMS08 ]). The ob vious remaining op en pr oblem h ere is to agnostically learn in tersections of arbitrary (not necessarily regular) h alfspaces with resp ect to the uniform distribution while p reserving the quasip olynomial-time dep enden ce on the n u m b er of halfspaces. T ypically , h andling the regular case is the first step to w ard s suc h a result, and we ha v e accomplished th at here for the fir s t time. 1.5 Application: Pseudorandomness for Polytopes Our inv ariance pr inciple also yields n ew results for sev eral problems in d erandomization. In partic- ular, w e giv e th e fir st deterministic algorithms for approximat ely counting the num b er of solutions 4 to broad classes of in teger programs. Recall the follo wing defin ition of pseu d orandom generators (PR Gs): Definition 1.6. L et µ b e a distribution over R . A func tion G : { 0 , 1 } r → { 1 , − 1 } n is said to δ -fo ol a p olytop e K with r esp e ct to µ if the fol lowing ho lds.     Pr y ∈ u { 0 , 1 } r [ G ( y ) ∈ K ] − Pr X ← µ n [ X ∈ K ]     ≤ δ . Com bin in g our in v ariance pr inciple with a PRG similar to a recen t construction of Mek a and Zuc kerman [ MZ10 ], we obtain th e follo wing pseudorand om generator: Theorem 1.7 (PRG s for regular p olytop es) . F or al l δ ∈ (0 , 1) , ther e exists an explicit PR G G : { 0 , 1 } r → { 1 , − 1 } n with r = O ((log n log k ) /ε ) that δ -fo ols al l p olytop es forme d by the interse ction of k ε -r e gular halfsp ac es with r esp e ct to al l pr op er and hyp e r c ontr active distributions µ for ε = δ 5 / (log 8 . 1 k )(log(1 /δ )) . The constan ts ab o ve dep end on the h yp ercon tractivit y constan ts of µ . W e define prop er and h yp ercon tractiv e distributions in the next section and remark that the uniform distribution o ve r {− 1 , 1 } n and the Gaussian distrib ution are examples of su ch distributions. This pseud orandom generator gives an algorithm for appro ximately counting the num b er of {− 1 , 1 } n p oint s in p olytop es formed b y the intersec tion of regular h alfspaces. Put another w ay , giv en an inte ger program w h ose constraints are suffi ciently regular, we giv e a quasi-p olynomial time, deterministic algorithm f or appro ximately coun ting the num b er of {− 1 , 1 } n solutions: Corollary 1.8 (Appr o ximate coun ting for regular integ er pr ograms) . L et A b e an inte ger pr o gr am with n variables and k c onstr aints wher e e ach c onstr aint is an ε -r e gular halfsp ac e (we define r e gu- larity pr e cisely in Se ction 2 ). F or ε = δ 5 / (log 8 . 1 k )(log(1 /δ )) ther e exists a deterministic algorithm that runs in time exp ( O (log n log k ) /ε ) for estimating the numb e r of {− 1 , 1 } n p oints satisfying A to within an additive ε 2 n . Corollary 1.8 imp lies quasi-p olynomial time, d eterministic, appro ximate coun ting algorithms for a broad class of integ er programs. F or example, den se co ve r ing programs su c h as dense s et- co v er, and { 0 , 1 } -con tingency tables corresp ond to p olytop es formed by the in tersection of ε -regular halfspaces. F or these t yp es of in teger programs, w e can deterministically appro ximate, to within an additive error ε , the fraction of the hyp ercub e that are intege r solutions in quasi-p olynomial time (i.e., w e obtain an add i tiv e app ro ximation of the n umb er of int eger solutions to within ε 2 n ). While ther e has b een muc h work on appro ximately coun ting solutions to in teger programs using randomized algorithms, w e are una ware of r esults giving deterministic algorithms for these tasks (ev en for the case of regular in teger pr ograms) that run in sub exp onen tial time in the n umb er of constraints. V ery recen tly , there has b een wo r k on deterministic appro xim ate coun ting for the m ultidimensional knapsac k problem and the con tingency table problem [ GKM10 ], but th ese algorithms still run in time exp onentia l in th e n umb er of constraints. Another d ifference b et ween these r esults and the alg orithm of Gopalan et al. [ GKM10 ] is that [ GKM10 ] giv e (stronger) relativ e- error guaran tees, while here w e giv e add itiv e appro ximations. F or a fur ther discuss ion of related w ork see Section 1.8 and Section 7.2 . 5 1.6 Additional In v ariance Principles As stated, our in v ariance principle applies to p olytop es whose b ounding hyp erplanes ha v e co effi- cien ts that are s ufficien tly regular. In some cases, how ever, we can randomly rota te an arbitrary p olytop e so that all the b ounding h yp er p lanes b ecome regular. As such, after applying a suitable random transformation (wh ic h we derandomize), w e can build PR Gs for arbitr ary p olytop es if th e underlying distrib ution is sp herically symmetric (e.g., Gaussian): Theorem 1.9 (PRGs for P olytop es in Gaussian space) . F or a universal c onstant c > 0 and al l δ > c log 2 k /n 1 / 11 , ther e e xists an explicit P R G G N : { 0 , 1 } r → R n with r = O ((log n )(log 9 . 1 k ) /δ 5 . 1 ) that δ -fo ols al l k - p olytop es with r e sp e ct to N . Additionally , w e pro ve an in v ariance prin ciple for p olytop es with r esp ect to the uniform d istr i- bution ov er the n -dimensional spher e S n − 1 . This allo ws us to easily m o dify our P R G for p olytop es in Gaussian space and b uild PRGs for intersecti ons of spherical caps: Theorem 1.10 (PRGs for in tersections of spherical caps) . F or a universal c onstant c > 0 and al l δ > c log 2 k /n 1 / 11 , ther e exists an explicit PRG G sp : { 0 , 1 } r → S n − 1 with r = O ((log n )(log 9 . 1 k ) /δ 5 . 1 ) that δ -fo ols al l k - p olytop es with r e sp e ct to the uniform distribution over S n − 1 . An immediate consequence of the ab o ve PRG construction is a p olynomial time derand omiza- tion of the Go emans-Williamson approxima tion algorithm for Max-Cut [ GW95 ] an d other similar h yp erplane based randomized roundin g sc hemes. Observe that this derand omization is a black-b ox derandomization as opp osed to some earlier derandomizations of the Goemans-Williamson algo- rithm, wh ic h are instance-sp ecific (e.g., [ MH99 ]). 1.7 Pro of Outline of the Main Theorem In this section, we giv e a high lev el outline of the pro of of our inv ariance principle and contrast it with the tec h niques of Mossel et al. [ MOO05 ] and Mossel [ Mos0 8 ]. The pro of pro ceeds in t wo steps. Step One: As in [ MOO05 ] and [ Mos0 8 ], w e fi rst use the Lindeb erg (or “replacemen t”) metho d 3 (see [ PR89 ]) to prov e an in v ariance prin ciple for smo oth functions. By this we mean pr o ving that     E X ∈{− 1 , 1 } n [Ψ( ℓ 1 ( X ) , . . . , ℓ k ( X ))] − E Y ∈N n [Ψ( ℓ 1 ( Y ) , . . . , ℓ k ( Y ))]     ≤ γ , (1.1) where ℓ 1 , . . . , ℓ k are linear functions (corresp onding to the n orm als of th e faces of the k -p olytop e) and Ψ is a smo oth fun ction. The v alue γ will dep end on k , the co efficien ts of the ℓ p ’s and th e deriv ative s of Ψ. The function Ψ is often called a “test” fu n ction and is smo oth if there is a uniform b ound on its fourth deriv ativ e. Notice her e that Ψ m aps R k to R ; in [ MOO05 ], they were concerned with the v alue Ψ( Q ( X )) for a lo w-degree p olynomial Q and a un iv ariate test function Ψ. A t this p oin t, w e could tak e Ψ to b e the k -wise pro du ct of a test function constru cted b y Mossel et al. to appro ximate the logica l AND function. F urther, Mossel provides a v ery general 3 The Lindeb erg method entails replacing eac h of the X i ’s with Y i ’s one step at a time and b ound ing th e error in eac h step. This is more commonly referred to as the h yb rid argument in theoretical compu ter science literature since the in termediate random v ariables are a hybrid of b oth X and Y . 6 framew ork for obtaining m ultiv ariate test functions and giv es b ounds for the o verall error incurred b y th e h yb rid argumen t. Here w e run in to our fir s t difficult y: the standard hybrid argument as used by Mossel et al. and Mossel results in a bad dep enden ce on the coefficients of the ℓ p ’s. In particular, the resulting error term is not small ev en for p olytop es formed b y th e in tersection of regular h alfspaces. T o solv e this problem, we u se a non-standard hybrid argumen t that groups the inpu t v ariables in to b lo c ks. W e observe that in the Lindeb erg method it is ir relev ant in whic h ord er we replace X i ’s with Y i ’s – in fact a random order wo u ld suffice. F urther, we can group the X i ’s into blo c ks and p ro ceed b lo c kwise with the hybrid argument. T o implement th is in tuition, we partition [ n ] randomly in to a set of blo c ks and replace all the X i ’s within a blo c k by the corresp onding Y i ’s one blo c k at a time. Pro ceeding in this fashion with a ran d om partitioning has a “smo othing effect” on th e coefficient s of the linear fun ctions resu lting in a muc h b etter b ound on th e error in terms of the co efficient s. Roughly sp eaking, if ℓ pi denotes the i ’th coefficient of ℓ p , then the standard h ybr id argumen ts of [ PR89 ], [ MOO05 ], [ Mos08 ] in cur an error pr op ortional to P i ∈ [ n ]  max p ∈ [ k ] | ℓ pi | 4  , whic h can b e as large as Ω ( k ) ev en for regular functions ℓ p . In con trast, our rand omized-blo c kwise-h yb rid argumen t only suffers an error of (log k ) · max p ∈ [ k ] P i | ℓ pi | 4 , whic h is small f or r egular functions. It tur ns out that in the ab o ve analysis, w e can c ho ose the random partitioning in to blo c ks in a Θ(log k )-wise indep end en t manner , instead of uniformly at rand om, and this is crucial f or our PRG constru ctions. Step Two: Giv en the ab o ve in v ariance principle for smooth functions, we no w aim to translate the closeness in exp ectation for smo oth functions to closeness in cdf distance. Here the smo othn ess of the test function Ψ b ecomes imp ortan t, and w e run in to our second p roblem: the natural choice of test fu n ction Ψ (the m u ltiv ariate v ersion of the test function from Mossel et al.) leads to an error b ound on the order of k , rather than p oly(log k ). T o get around this problem, we fi rst observe that in Mossel’s pro of of the m u ltiv ariate in v ariance prin ciple as in our randomized-blo ckwise- hybrid argu- men t, it su ffices to b ound the ‘ l 1 -norm’ of the fourth d eriv ativ e sup x ∈ R k ( P p,q ,r,s ∈ [ k ] | ∂ p ∂ q ∂ r ∂ s Ψ( x ) | ), instead of u niformly b ounding the fourth deriv ativ e sup x ∈ R k ,p,q ,r,s ∈ [ k ] ( | ∂ p ∂ q ∂ r ∂ s Ψ( x ) | ). Th us , it suf- fices to obtain a smo oth appro ximation of the AND function f or which the former quan tit y is s mall. F ortun ately for us, w e h a v e unco v ered a b eautiful result due to Ben tkus [ Ben90 ], wh o constructs a smo oth appr o ximation of the AND function with precisely this prop erty . The final difficulty for translating closeness in exp ectation as in Equ ation 1.1 to closeness in cdf distance is to prov e that Ψ differs from the c haracteristic function only on a s et of small Gaussian measure. T o this end, we sh o w that it su ffices to b ound the Gaussian measure of l ∞ -neigh b orh o o ds around the b oun d ary of k -p olytop es. F or an l ∞ -neigh b orh o o d of width λ , a union b oun d w ould imply Gaussian measure on the order of k λ . A t this p oin t, ho wev er, we can apply a result due to Nazaro v [ Naz03 ] on the Gaussian surface area of k -p olytop es to ge t the m uc h b etter b ound of √ log k λ . This r esult of Nazaro v w as used b efore by Kliv ans, O’Donnell and Servedio [ K OS08 ] in the conte xt of learning inte r sections of halfspaces with resp ect to Gaussian distribu tions. W e giv e an outline of the pro ofs of th e applications of the inv ariance prin ciple to noise sensitivity and PRGs in the corresp onding sections. 7 1.8 Related W ork As men tioned earlier, the classical Berry-Ess´ een th eorem [ F el71 ] f r om probabilit y , a quan titativ e v ersion of the Central Limit Theorem, giv es an in v ariance principle f or th e case of a single halfspace (i.e., k = 1). M ore precisely , for any w ∈ R n , such that k w k = 1 an d eac h co efficien t of w is at most ε , the Berry-Ess´ een theorem states th at     Pr x ∈{− 1 , 1 } n [ h w, x i ≥ t ] − Pr x ←N n [ h w, x i ≥ t ]     ≤ O ( ε ) . Ben tkus [ Ben03 ] prov es a m ultidimen sional Berry-Ess ´ een theorem for sums of vec tor-v alued random v ariables eac h w ith id entit y co v ariance matrix, whose error term dep end s on th e Gaussian surface area of the test s et. Although h is pap er deals w ith topics relate d to our w ork, his result seems to ha ve no imp lications in our s etting. There is a long history of researc h on approxima tely coun ting the n umber of s olutions to inte ger programs, esp ecially with regard to con tingency tables [ JS97 , CD03 ]. Ho wev er, not m uc h is kn o wn in terms of deterministic algorithms, and we b eliev e that our deterministic quasi-p olynomial time algorithms for dense co ve r ing problems and dense set co v er instances is the first result of its kind . Regarding contingency tables, Dye r [ Dy e03 ] ga ve a randomized relativ e-error appr o ximation algorithm for coun ting solutions to con tingency tables that runs in time exp onenti al in the n u m b er of r o ws. In con trast, w e obtain an algorithm that runs in quasi-p olynomial time in the n umb er of ro ws (ho we ver, we do not giv e a relat ive-e r ror appr o ximation). Although not stated explicit ly b efore, it is easy to see that the p seudorandom generato r for small space mac h ines of Imp agliaz zo, Nisan and Wigderson [ INW94 ] yields a d eterministic algorithm for coun ting n × k con tingency tables with additive error at most ε 2 n and run time 2 O (log 2 ( nk/ε )) . Th is is incomparable to our algorithm for contingency tables w hic h has run time 2 (log n ) · p oly(log k, 1 /ε ) . In our case, we obtain a p olynomial-time, b lac k-b o x derandomization for cont in gency tables with a constan t n umb er of ro ws (for ε = O (1)). F or P RGs f or in tersections of halfspaces, recen tly Gopala et al. [ GO WZ10 ] and Diak onikol as, Kane and Nelson [ DKN10 ] ga v e results incomparable to ours. Gopalan et al. giv e generators for ar- bitrary in tersections of k h alfspaces w ith seed length linear in k but logarithmic in 1 /δ . Diak onik olas et al. sho w that b ounded ind ep endence fo ols in tersections of quadr atic thr eshold f u nctions and in particular, get generators with seed length O ((log n ) · p oly ( k , 1 /ε )) fo oling in tersections of k halfs- paces. Due to the at least linear dep endence on k , the results of the ab o ve wo rk s do n ot yield goo d algorithms for counting solutions to in teger programs, as in this setting k is t ypically large (e.g ., p oly( n )). 1.9 Discussion and F uture W or k One ob vious w eakness of our app licatio n s to noise sensitivit y b ounds ( Th eorem 1.4 ) and P R Gs o ver the h yp ercu b e ( Theorem 1.7 ) is the regularity r equiremen t. Recent r esu lts on sen sitivit y b ound s and PRGs for halfspaces and PTFs ([ DHK + 10 , MZ10 ]) use certain r egularit y lemmas whic h allo w one to “redu ce” the pr oblem for arbitrary f u nctions to the regular case and then use inv ariance to handle the regular case. Unfortunately , app lying the redu ctions to the regular case as in the ab ov e w orks leads to b ou n ds that are at least linear in k , ev en when using our stronger b ounds for the regular case. W e (optimisticall y) b eliev e that th e ab o v e difficult y could b e o v ercome and a b etter reduction to the regular case can b e ac hiev ed. 8 2 Notation and Preliminaries W e use the follo wing notation. 1. F or W ∈ R n × k , θ ∈ R k , K ( W , θ ) denotes the p olytop e K ( W, θ ) = { x : W T x ≤ θ } . W e sa y a p olytop e K ( W , θ ) as ab o v e has k faces. 2. Un less state d otherwise, w e work with the same p olytop e K ( W , θ ) and assum e that the columns of the matrix W hav e norm one. W e often shorten K ( W , θ ) to K if W , θ are clear from cont ext. W e assume that k ≥ 2. 3. F or A ∈ R m 1 × m 2 , A T denotes the transp ose of A an d for p ∈ [ m 2 ], A p denotes the p ’th column of A . 4. T he all ones v ector in R k is denoted by 1 k . 5. F or u ∈ R k , define rectangle Rect ( u ) = ( −∞ , u 1 ] × ( −∞ , u 2 ] × · · · × ( −∞ , u k ] . Note that x ∈ K ( W , θ ) if and only if W T x ∈ R ect ( θ ). 6. N n (where N = N (0 , 1)) denotes th e standard m ultiv ariate spherical Gaussian distribution o v er R n with mean 0 and id en tit y co v ariance matrix. 7. F or a 4-times d ifferen tiable function ψ : R k → R , let k ψ (4) k 1 = sup    X p,q ,r,s ∈ [ k ] | ∂ p ∂ q ∂ r ∂ s ψ ( a 1 , . . . , a k ) | : ( a 1 , . . . , a k ) ∈ R k    . W e call ψ a smo oth function, if the ab ov e quantit y is finite. 8. W e denote all u niv ersal constan ts by c, C , ev en wh en we ha v e in mind differen t constant s in the same equation. Also, if left unsp ecified, w e write k u k for k u k 2 . The main resu lts of this pap er are applicable to a large class of pr o duct distributions that satisfy the follo w in g tw o pr op erties. Definition 2.1 (pr op er distributions) . A distribution µ over R i s pr op er if for X ← µ , E [ X ] = 0 , E [ X 2 ] = 1 and E [ X 3 ] = 0 . Definition 2.2 (hyp ercon tractiv e distributions) . A distribution µ over R i s hyp er c ontr active, if ther e exists a c onstant c µ such that the fol lowing holds. F or any m , ve ctor u ∈ R m , and any q ≥ 2 ,  E X ← µ m [ |h u, X i| q ]  1 /q ≤ c µ √ q  E X ← µ m  |h u, X i| 2   1 / 2 . Tw o imp ortant examples of pr o duct distributions that are prop er and h yp ercon tractiv e are the uniform distrib ution o ver the h yp er cu b e { 1 , − 1 } n and the m ultiv ariate sp herical Gaussian N n . W e also use the follo win g hyp ercon tractivit y inequ alit y f or degree d m ultilinear p olynomials o v er the h yp ercub e (see [ Jan97 ] for instance). 9 Lemma 2.3 ((2 , q )-h yp ercon tractivit y) . F or any q ∈ [2 , ∞ ) and any de gr e e d multiline ar p olynomial P : { 1 , − 1 } n → R ,  E x ∈ u { 1 , − 1 } n [ | P ( x ) | q ]  1 /q ≤ q d/ 2  E x ∈ u { 1 , − 1 } n  | P ( x ) | 2   1 / 2 . W e s hall also us e the follo wing classical large-deviation inequ alit y for Lipsc hitz fu nctions in Gaussian space. F or a fu nction f : R n → R , the Lip sc hitz constan t of f is defi ned as k f k Lip = sup {| f ( x ) − f ( y ) | / k x − y k 2 : x 6 = y ∈ R n } . Theorem 2.4 ([ L T91 ]) . F or f : R n → R with a b ounde d Lipschitz c onstant, µ ( f ) = E y ← N n [ f ( y )] , and t > 0 , Pr x ←N n [ | f ( x ) − µ ( f ) | > t ] ≤ 2 exp( − t 2 / 2 k f k Lip ) . 2.1 Agnostic Learning Here w e describ e the agnostic framew ork of learning (a generalization of P A C learning) and describ e ho w noise-sensitivit y b ounds translate in to learnin g algorithms. First we define noise sensitivit y: Definition 2.5 (n oise sensitivit y) . L et f b e a Bo ole an function f : { 1 , − 1 } n → { 1 , − 1 } . F or any δ ∈ (0 , 1) , let X b e a r andom element of the hyp er cub e { 1 , − 1 } n and Z a δ - p erturb ation of X define d as fol lows: for e ach i indep endently, Z i is set to X i with pr ob ability 1 − δ and − X i with pr ob ability δ . The noise sensitivity of f , denote d N S δ ( f ) , for noise δ i s then define d as fol lows: NS δ ( f ) = Pr [ f ( X ) 6 = f ( Z )] . No w w e describ e the learning mo del of fo cus in this pap er, agnostic le arning . In the agnostic learning framew ork [ KSS94 , Hau92 ], the learner receiv es lab elled examples ( x, y ) dra w n from a fixed d istr ibution o v er example-lab el pairs. Definition 2.6 (Agnostic Learning) . L et D b e any distribution on X × R and let C b e a c onc ept class of functions. De fine opt = min f ∈C Pr ( x,y ) ∼D [ f ( x ) 6 = y ] . That is, opt is the err or of th e b est fitting c onc ept in C with r esp e ct to D . We say that an algorithm A agnostic al ly le arns a c onc ept class C over D if the fol lowing holds: for any D on X × R with mar ginal distribution D X on X , if A is given r andom examples dr awn fr om D , then with high pr ob ability A outputs a hyp othesis h such that Pr ( x,y ) ∼D [ h ( x ) 6 = y ] ≤ opt + δ . Note that when opt = 0, this corresp ond s to the P A C mo del of learnin g. Successful agnostic learning corresp onds to learning in the presence of “adv ersarial” noise. The follo win g lemma, considered folklore (see [ K OS 04 ]), sho ws that n oise stable functions are w ell-approxima ted by lo w-d egree p olynomials. Lemma 2.7. L et Π = Π 1 × Π 2 × · · · × Π n b e a pr o duct distribution over { 1 , − 1 } n , and let f : { 1 , − 1 } n → R b e a function such that k f k = 1 and NS δ ( f ) ≤ α ( δ ) f or some incr e asing function α : [0 , 1 / 2] → [0 , 1] . Then ther e exists a multiline ar p olynomial p : { 1 , − 1 } n → R of de g r e e 1 α − 1 ( δ/ 2 . 32) such that E x ∼ Π  ( f − p ) 2  < δ . 10 The “ L 1 P olynomial Regression Algorithm” due to Kalai et al. [ KKMS08 ] sho ws that one can agnostic al ly learn lo w-degree p olynomials. Theorem 2.8 ([ KKMS08 ]) . Fix distribution D on X × R with mar ginal D X on X . Supp ose that for any f ∈ C , E x ∼ D X  ( f − p ) 2  < δ 2 for some de gr e e d p olynomial p . Then, with high pr ob ability, the L 1 Polynomial R e g r ession Algorith m outputs a hyp othesis h such that Pr ( x,y ) ∼D [ h ( x ) 6 = y ] ≤ opt + δ in time p oly ( n d /δ ) . 3 In v ariance Principle for P olytop es Our main in v ariance p rinciple for p olytop es K ( W, t ) is as follo ws: Theorem 3.1 (in v ariance prin ciple for p olytop es) . F or any pr op er and hyp er c ontr active distribution µ over R and any ε - r e gular k - p olytop e K ,     Pr X ← µ n [ X ∈ K ] − Pr Y ←N n [ Y ∈ K ]     ≤ C c 2 µ (log 8 / 5 k ) ( ε log(1 /ε )) 1 / 5 . (3.1) The pr o of of the theorem can b e divided in to th ree parts. 1. W e establish an in v ariance p rinciple for smo oth fu nctions on p olytop es ( Theorem 3.2 ) u sing an extension of Lind eb erg’s metho d; Section 4 is dev oted to pr o ving this part. 2. W e pro ve that for random v ariables A, B o v er R k , closeness with resp ect to smo oth fun ctions and anti- concentrati on b ound s f or one of the v ariables imp ly closeness w ith resp ect to r ect- angles ( Lemma 3.3 ). T o do so, we use a resu lt of Bentkus [ Ben90 ] on smo oth approximati ons for the l ∞ norm. 3. W e use a result of Nazaro v [ Naz03 ] on Gaussian s u rface area of p olytop es to b oun d the Gaussian m easur e of “ l ∞ -neigh b orh o o ds” of p olytop es in R n ( Lemma 3.4 ). W e b egin by stating an invarianc e principle for smo oth functions ψ : R k → R . The pr o of is in vol ved, making use of the randomized-blo ckwise-h ybrid argument alluded to in the in tro d u ction. F or clarit y we presen t the pro of in the n ext section ( Section 4 ). Theorem 3.2 (in v ariance pr inciple f or s m o oth f u nctions) . F or any pr op er and hyp er c ontr active distribution µ over R and any ε -r e gular W and smo oth function ψ : R k → R ,     E X ← µ n  ψ ( W T X )  − E Y ←N n  ψ ( W T Y )      ≤ C c 2 µ k ψ (4) k 1 (log 3 k ) ( ε log(1 /ε )) . The follo w in g lemma sho w s th at f or t w o random v ariables A, B ov er R k , closeness with resp ect to smo oth functions and anti-c onc entr ation b ounds for the v ariable B imply closeness with resp ect to rectangles. Note that to u se the lemma we d o n ot need anti -concentrat ion b ounds for the r andom v ariable A . Lemma 3.3 (smo oth appro ximation of AND) . L et A, B b e two r andom variables over R k satisfying the fol lowing c onditions: 11 • Ther e exists ∆ ≥ 0 such that for al l smo oth functions ψ : R k → R , | E [ ψ ( A )] − E [ ψ ( B )] | ≤ ∆ k ψ (4) k 1 . • Ther e exists a func tion g k : [0 , 1] → [0 , 1] su c h that the fol lowing hol ds: ∀ λ ∈ [0 , 1] , sup θ ∈ R k { Pr [ B ∈ Rect ( θ + λ 1 k ) \ R ect ( θ ) ] } ≤ g k ( λ ) . Then, ∀ θ ∈ R k , λ ∈ (0 , 1) , | Pr [ A ∈ Rect ( θ )] − Pr [ B ∈ Rect ( θ )] | ≤ C ∆ log 3 k λ 4 + C g k ( λ ) . Finally , w e use the follo wing ant i-concent r ation b ound that follo ws from Naza r o v’s estimate on the Gaussian surface area of p olytop es [ Naz03 ]: Lemma 3.4 (anti- concentrat ion b ound for l ∞ -neigh b orh o o d of recta n gles) . F or 0 < λ < 1 , and W ∈ R n × k such that e ach of the c olumns ha ve norm 1, Pr x ←N n  W T x ∈ R ect ( θ ) \ Rect ( θ − λ 1 k )  = O ( λ p log k ) . W e first pro ve Theorem 3.1 using the ab ov e three results and then p ro ve Lemmas 3.3 and 3.4 in Sections 3.1 and 3.2 . Th eorem 3.2 is th en prov ed in S ection 4 . of The or em 3.1 . Let X ← µ n , Y ← N n and let random v ariables A = W T X , B = W T Y . Then, b y Lemma 3.4 and Theorem 3.2 , Pr [ B ∈ R ( θ + λ 1 k ) \ R ( θ )] ≤ C p log k λ, | E [ ψ ( A )] − E [ ψ ( B )] | ≤ C c 2 µ (log 3 k ) ε log (1 /ε ) k ψ (4) k 1 , where ψ : R k → R is any smo oth function, θ ∈ R k and λ ∈ (0 , 1). T herefore, b y Lemma 3.3 , for θ ∈ R k , | Pr [ A ∈ Rect ( θ )] − Pr [ B ∈ Rect ( θ )] | ≤ C (log 6 k ) log (1 /ε ) ε/ λ 4 + C p log k λ. The theorem no w follo w s b y setting λ = (log 11 / 10 k ) ( ε log (1 /ε )) 1 / 5 . 3.1 Smo oth appro ximation of AND W e no w p ro ve Lemma 3.3 . F or this, we use the f ollo w ing r esult of Bentkus [ Ben90 ] on smo oth appro ximations for the l ∞ norm. Theorem 3.5 (Ben tkus [ Ben90 ]) . F or every α > 0 and 0 < λ < 1 , ther e exists a function ψ ≡ ψ α,λ : R k → R such that k ψ (4) k 1 ≤ C log 3 k /λ 4 and ψ ( a ) =      1 if k a k ∞ ≤ α 0 if k a k ∞ > α + λ ∈ [0 , 1] other w ise . 12 Corollary 3.6. F or al l u ∈ R k , 0 < λ < 1 , T > k u k ∞ , ther e exists a f u nction ψ ≡ ψ u,λ,T : R k → R such that k ψ (4) k 1 ≤ C log 3 k /λ 4 and ψ ( a ) =      1 if ∀ l ∈ [ k ] , − T + u l ≤ a l ≤ u l 0 if ∃ l ∈ [ k ] , a l > u l + λ ∈ [0 , 1] other w ise . Pr o of. Let ψ T / 2 ,λ b e the function from Theorem 3.5 with α = T / 2. Define ψ ≡ ψ u,λ,T : R k → R by ψ u,λ,T ( a 1 , . . . , a k ) = ψ T / 2 ,λ ( a 1 + T / 2 − u 1 , a 2 + T / 2 − u 2 , . . . , a k + T / 2 − u k ) . It is easy to chec k that ψ satisfies the conditions of the theorem. of L e mma 3.3 . Fix θ ∈ R k , 0 < λ < 1. Cho ose T ∈ R large enough so that T > k θ k ∞ , Pr [ k A k ∞ ≥ T ] < ∆ and Pr [ k B k ∞ ≥ T ] < ∆. Th en, b y the c h oice of T | Pr [ A ∈ Rect ( θ )] − Pr [ A ∈ Rect 2 T ( θ )] | ≤ ∆ , | Pr [ B ∈ Rect ( θ )] − Pr [ B ∈ Rect 2 T ( θ )] | ≤ ∆ , (3.2) where Rect T ( θ ) = [ − T + θ 1 , θ 1 ] × [ − T + θ 2 , θ 2 ] × · · · × [ − T + θ k , θ k ] ⊆ R k . Let ψ : R k → R b e the function obtained from app lying Corollary 3.6 to θ , λ, 2 T . Observe that from the defi n ition of ψ in Corollary 3.6 and Equation 3.2 , we h a v e Pr [ A ∈ Rect ( θ )] ≤ E [ ψ ( A )] + ∆ ≤ E [ ψ ( B )] + ∆ k ψ (4) k 1 + ∆ . Similarly , E [ ψ ( B )] ≤ Pr [ B ∈ Rect ( θ + λ 1 k )] = Pr [ B ∈ R ect ( θ )] + Pr [ B ∈ Rect ( θ + λ 1 k ) \ R ect ( θ )] ≤ Pr [ B ∈ R ect ( θ )] + g k ( λ ) , where the last inequalit y follo ws from the definition of g k . C om bining the ab o ve t wo equations we get Pr [ A ∈ Rect ( θ )] ≤ Pr [ B ∈ Rect ( θ )] + 2∆ k ψ (4) k 1 + g k ( λ ) ≤ Pr [ B ∈ Rect ( θ )] + C ∆ log 3 k λ 4 + g k ( λ ) . Pro ceeding similarly for th e fun ction ψ L : R k → R obtained by applyin g Corollary 3.6 to θ − λ 1 k , λ, 2 T , w e get Pr [ A ∈ Rect ( θ )] ≥ Pr [ B ∈ Rect ( θ )] − C ∆ log 3 k λ 4 − g k ( λ ) . Therefore, | Pr [ A ∈ Rect ( θ )] − Pr [ B ∈ Rect ( θ )] | ≤ C ∆ log 3 k λ 4 + g k ( λ ) . 13 3.2 An ti-concentration b ound for l ∞ -neigh b orho o d of rec tangles Lemma 3.4 follo ws s tr aigh tforw ard ly from the follo wing result of Nazaro v [ Naz03 ]. F or a con vex b o dy K ⊆ R n with b oundary ∂ K , let Γ( K ) denote the Gaussian sur face area of K defined by Γ( K ) = Z y ∈ ∂ K e −k y k 2 2 dσ ( y ) , where dσ ( y ) d enotes the surface elemen t at y ∈ ∂ K . Theorem 3.7 (Nazaro v (see [ K OS 08 , Th eorem 20])) . F or a p olytop e K with at most k fac es, Γ( K ) ≤ C √ log k . of L emma 3.4 . Consider an in cr easing (under set inclusion) family of p olytop es K ρ for 0 ≤ ρ ≤ λ suc h that K 0 = { x : W T x ∈ R ect ( θ − λ 1 k ) } and K λ = { x : W T x ∈ R ect ( θ ) } . T hen, Pr x ←N n  W T x ∈ R ect ( θ ) \ Rect ( θ − λ 1 k )  = Z λ ρ =0 Γ( K ρ ) dρ ≤ C p log k λ, where the last inequalit y follo ws fr om Theorem 3.7 . 4 In v ariance principle for Smo oth F unctions o v er P olytop es W e now p r o v e Theorem 3.2 . The pr o of of the th eorem is based on the Lind eb erg metho d for pro ving limit theorems with exp licit error bou n ds. Let t = ⌈ 1 /ε ⌉ and let H = { h : [ n ] → [ t ] } be a family of (2 log k )-wise indep endent fu n ctions. T h at is, f or all I ⊆ [ n ] , | I | ≤ 2 log k and b ∈ [ t ] I , Pr h ∈ u H [ ∀ i ∈ I , h ( i ) = b i ] = 1 t | I | . W e remark that to pro ve Theorem 3.2 we could take the hash family to b e the set of all functions. Ho w ever, w e w ork with a (2 log k )-wise ind ep endent family as the analysis is no more complicated and we n eed to w ork with such hash families while constru cting p s eudorandom generato r s. F or S ⊆ [ n ], let W S b e th e matrix formed b y the ro ws of W with ind ices in S (th u s W i S is the i th column of the subm atrix whose ro ws are give n b y the ind ices in S ). Define H ( W ) def = t X i =1   E h   k X p =1 k W p h − 1 ( i ) k 4 log k     1 / log k . Theorem 3.2 follo ws immediately fr om the follo wing tw o lemmas. Lemma 4.1. F or ε -r e gular W , H ( W ) ≤ C log k ( ε log (1 /ε )) . Lemma 4.2. F or any smo oth function ψ : R k → R ,     E X ← µ n  ψ ( W T X )  − E Y ←N n  ψ ( W T Y )      ≤ 4 c 2 µ (log 2 k ) H ( W ) k ψ (4) k 1 . of L emma 4.1 . Fix a l ∈ [ t ], p ∈ [ k ]. F or i ∈ [ n ], let X i b e the ind icator random v ariable that is 1 if h ( i ) = l and 0 otherwise. Then , Pr [ X i = 1] = 1 /t and the v ariables X 1 , . . . , X n are (2 log k )-wise indep end en t. F urther, Z ′ p ≡ k W p | h − 1 ( l ) k 2 = n X i =1 W 2 ip X i . 14 Let Y i b e i.i.d indicator random v ariables with Pr [ Y i = 1] = 1 /t and let Z p = P n i =1 W 2 ip Y i . Obs er ve that Z ′ p and Z p ha ve id entical d ’th moment s for d ≤ 2 log k . Moreo ver, by Ho effding’s inequalit y applied to Z p , for an y γ > 0, Pr      Z p − 1 t     ≥ γ  ≤ 2 exp − 2 γ 2 P n i =1 W 4 ip ! ≤ 2 exp  − 2 γ 2 ε 2  = 2 exp ( − 2 t 2 γ 2 ) . The ab o ve tail b oun d for Z p implies strong b ounds on the moments of Z p b y standard argumen ts. Setting γ = √ 2 log k log t/t in the ab o v e equation, w e get Pr  | Z p | ≥ √ 3 log k log t t  ≤ 1 t 2 l og k . Therefore, from the ab ov e equation and the f act that Z p ≤ 1 E h Z 2 l og k p i ≤ (3 log k log t ) log k t 2 l og k + Pr  | Z p | ≥ √ 3 log k log t t  ≤ (4 log k log t ) log k t 2 l og k . Therefore, E h ∈ u H h k W p | h − 1 ( l ) k 4 log k i = E h ( Z ′ p ) 2 l og k i = E h Z 2 log k p i ≤ (4 log k log t ) log k t 2 log k . Therefore, from the defin ition of H ( W ) and the ab ov e equation, H ( W ) = t X i =1   k X p =1 E h h k W p h − 1 ( i ) k 4 log k i   1 / log k ≤ t 4 log k log t t 2 = 4(log k )( ε log (1 /ε )) . The pro of of Lemma 4.2 uses a blo c kwise h yb r id argument and careful applications of h yp er- con tractivit y as sk etc hed in the p ro of outline in the in tro du ction. T o gain some in tuition of the adv antag e of our randomized blo ckwise h yb rid argument o v er the standard Lindeb erg method, it migh t b e helpf ul to compare b oth arguments for the follo win g cases: Example 1 : The b ou n ding h yp erplanes of K are oriented ma jorities: W ∈ { 1 / √ n, − 1 / √ n } n × k . In th is case, the stand ard Lin deb erg m etho d in conjunction with Ben tkus’s smo othing function and Nazaro v’s surface area b ound as used in Lemmas 3.3 , 3.4 can b e adapted (without ha ving to do a blo c kwise h yb rid argument ) to get a b ound as in Theorem 3.1 . Example 2 : The b ounding h yp erplanes of K are oriente d ma j orities on disjoint s ets of v ari- ables: F or m = n/k and eac h p ∈ [ k ], m = n/k , W p i = 1 / √ m, ( p − 1) m + 1 ≤ i ≤ pm and W p i = 0 otherwise. In th is case, ho wev er, when m ≥ 1 /ε 2 (so eac h b oundin g h yp erplane is still regular), it is easy to see th at the standard Lin d eb erg metho d (ev en when used in conju nction with Lemmas 15 3.3 , 3.4 ) leads to an error b ound that is at least linear in k . W e use the follo wing form of the stand ard T aylor series expansion (the in terested reader can fi nd more ab out the m ultiv ariate T a ylor theorem on the wikip ed ia p age for “T a ylor’s Theorem”). F or a smo oth fu nction ψ : R k → R , x ∈ R k and p 1 , . . . , p r ∈ [ k ], let ∂ p 1 ,...,p r ψ ( x ) = ∂ p 1 ∂ p 2 · · · ∂ p r ψ ( x ). F or indices p 1 , . . . , p r ∈ [ k ], let ( p 1 , . . . , p r )! = s 1 ! s 2 ! . . . s k !, where, f or l ∈ [ k ], s l denotes the n umb er of o ccurr ences of l in ( p 1 , . . . , p r ). F act 4.3 (Multiv ariate T a ylor’s Th eorem) . F or any smo oth function ψ : R k → R , and x, y ∈ R k , ψ ( x + y ) = ψ ( x ) + X p ∈ [ k ] ∂ p ψ ( x ) y p + X p,q ∈ [ k ] 1 ( p, q )! ∂ p,q ψ ( x ) y p y q + X p,q ,r ∈ [ k ] 1 ( p, q , r )! ∂ p,q ,r ψ ( x ) y p y q y r + err ( x, y ) , wher e | err ( x, y ) | ≤ k ψ (4) k 1 · max p ∈ [ k ] | y p | 4 . of L emma 4.2 . Let X ← µ n and Y ← N n . W e first p artition [ n ] into blo c ks u sing a random hash function h ∈ u H and then u se a b lo ckwise-h ybrid argumen t. Fix a hash function h ∈ H . View X as X 1 , . . . , X t , where eac h X l = X h − 1 ( l ) is c hosen indep endently and uniformly from µ | h − 1 ( l ) | . Similarly , view Y as Y 1 , . . . , Y t where eac h Y l = Y h − 1 ( l ) is c h osen indep endentl y and uniformly from N | h − 1 ( l ) | . W e pr o v e the claim via a h ybr id argumen t where we replace the blo cks X 1 , . . . , X t with Y 1 , . . . , Y t one at a time. F or 0 ≤ i ≤ t , let Z i b e the distrib ution with Z i | h − 1 ( j ) = X j for i < j ≤ t and Z i | h − 1 ( j ) = Y j for 1 ≤ j ≤ i . Then, Z 0 is d istributed as µ n and Z t is d istributed as N n . F or l ∈ [ t ], let h ( W , l ) =   k X p =1 k W p h − 1 ( l ) k 4 log k   1 / log k . Claim 4.4. F or 1 ≤ l ≤ t , and fixe d h ∈ H ,     E X ,Y h ψ ( W T Z l ) i − E X ,Y h ψ ( W T Z l − 1 ) i     ≤ C c µ log 2 k k ψ (4) k 1 h ( W , l ) . Pr o of. Without loss of generalit y , supp ose that h − 1 ( l ) = { 1 , . . . , m } . Note that Z l , Z l − 1 ha ve the same rand om v ariables in p ositions m + 1 , . . . , n . Let Z l − 1 = ( X 1 , . . . , X m , Z m +1 , . . . , Z n ) and Z l = ( Y 1 , . . . , Y m , Z m +1 , . . . , Z n ) where ( X 1 , . . . , X m ) is un iform o v er µ m and ( Y 1 , . . . , Y m ) is un iform o v er N m . Note that ( Z m +1 , . . . , Z n ) is indep endent of ( X 1 , . . . , X m ), ( Y 1 , . . . , Y m ). Let W 1 ∈ R m × k b e th e matrix formed by the first m ro ws of W and similarly let W 2 ∈ R ( n − m ) × k b e the matrix f ormed b y the last n − m r o ws of W . Lastly , let V = W T 2 ( Z m +1 , . . . , Z n ) and U b e one of X = ( X 1 , . . . , X m ) or Y = ( Y 1 , . . . , Y m ). No w, b y using a T aylo r expansion of ψ at V as in F act 4.3 , 16 ψ ( W T ( U 1 , . . . , U m , Z m +1 , . . . , Z n )) = ψ ( W T 1 U + V ) = ψ ( V ) + X p ∈ [ k ] ∂ p ψ ( V ) h W p 1 , U i + X p,q ∈ [ k ] 1 ( p, q )! ∂ p,q ψ ( V ) h W p 1 , U i h W q 1 , U i + X p,q ,r ∈ [ k ] 1 ( p, q , r )! ∂ p,q ,r ψ ( V ) h W p 1 , U i h W q 1 , U i h W r 1 , U i + err ( V , W T 1 U ) . (4.1) No w, using the fact that k z k ∞ ≤ k z k log k for z ∈ R k ,   err ( V , W T 1 U )   ≤ k ψ (4) k 1 · m ax p ∈ [ k ] |h W p 1 , U i| 4 ≤ k ψ (4) k 1   k X p =1 |h W p 1 , U i| 4 log k   1 / log k . (4.2) No w, b y h yp ercon tractivit y of µ , E X      k X p =1 |h W p 1 , X i| 4 l og k   1 / log k    ≤   E X   k X p =1 |h W p 1 , X i| 4 l og k     1 / log k (b y p ow er-mean inequalit y) =   k X p =1 E X h |h W p 1 , X i| 4 l og k i   1 / log k ≤   k X p =1 ( c µ log k ) 2 log k k W p 1 k 4 log k   1 / log k (b y h yp ercon tractivit y of µ ) ≤ C c 2 µ (log 2 k ) h ( W , l ) . (4.3) Similarly , b y hyper contract ivity of N , E Y      k X p =1 |h W p 1 , Y i| 4 l og k   1 / log k    ≤ C (log 2 k ) h ( W , l ) . (4.4) Since µ is prop er, for any u 1 , u 2 , u 3 ∈ R m , E  h u 1 , X i  = E  h u 1 , Y i  , E  h u 1 , X i h u 2 , X i  = E  h u 1 , Y i h u 2 , Y i  E  h u 1 , X i h u 2 , X i h u 3 , X i  = E  h u 1 , Y i h u 2 , Y i h u 3 , Y i  . F r om the ab ov e equations, Equ ations ( 4.1 ), ( 4.2 ), ( 4.3 ) , ( 4.4 ) and the fact that X, Y , V are indep end en t of one another, it follo ws that    E h ψ ( W T Z l ) − ψ ( W T Z l − 1 ) i    ≤ C c 2 µ (log 2 k ) k ψ (4) k 1 h ( W , l ) . Lemma 4.2 n o w follo ws from the ab o ve claim, summing fr om l = 1 , . . . , t , and taking exp ectation with resp ect to h ∈ u H . 17 5 Lo w erb oun d for Error W e now sho w that Theorem 3.1 is essen tially optimal by sh o wing that the error b ound in an y such result cannot b e o ( ε · √ log k ). W e do so by constructing a (1 / √ log k )-regular k -p olytop e for whic h the err or is 1 − o (1). Let k = 2 r and let K = { x : P r i =1 | x i | < r } ⊆ R r b e the ℓ 1 -ball of dimen s ion r and radius r . W e will u se the follo w in g simple facts. F act 5.1. K is a (1 / √ r ) - r e gular k - p olytop e. Pr o of. Note that K = { x ∈ R r : h x, y i < r , ∀ y ∈ { 1 , − 1 } r } . As y ∈ { 1 , − 1 } r are (1 / √ r )-regular, the claim follo ws. F act 5.2. F or X ∈ u { 1 , − 1 } r , Y ← N r , | Pr [ X ∈ K ] − Pr [ Y ∈ K ] | = 1 − o (1) . Pr o of. Clearly , Pr [ X ∈ K ] = 0. W e next show that Pr [ Y ∈ K ] = 1 − exp(Ω( r )). Observe that Y ∈ K if and only if k Y k 1 < r . By linearit y of exp ectatio n , E [ k Y k 1 ] = r · E y ← N (0 , 1) [ | y | ] = r c, where c = p 2 /π is a constan t strictly less than 1. Note th at by Cauch y-Sc hw arz, the ℓ 1 norm has Lipsc hitz constan t √ r . Th erefore, b y Theorem 2.4 , Pr [ k Y k 1 ≥ r ] ≤ Pr [ |k Y k 1 − cr | ≥ (1 − c ) r ] ≤ 2 exp ( − (1 − c ) 2 r 2 /r ) = exp( − Ω ( r )) . Th u s, Pr [ Y ∈ K ] = Pr [ k Y k 1 < r ] = 1 − exp( − Ω( r )). The claim no w follo ws. The ab o ve t wo claims show that an y in v ariance pr inciple as in Th eorem 3.1 m u st incur an err or of Ω( ε √ log k ), wh ic h matc hes the b ound of Theorem 3.1 u p to a p olylogarithmic factor in k and a p olynomial f actor in ε . 6 Noise Sensitivit y of In tersections of Regular Halfspaces W e now d escrib e h o w our in v ariance principle yields a b oun d on the av er age and noise sensitivit y of in tersections of regular halfspaces. (see Definition 2.5 for d efinition of (Bo olean) noise sen sitivit y). Let f 1 , . . . , f k : { 1 , − 1 } n → { 1 , − 1 } b e halfsp aces w ith f p ( x ) = sign ( h W p , x i − θ p ) and let f ∧ k : { 1 , − 1 } n → { 1 , − 1 } b e their in tersection, f ∧ k = f 1 ∧ f 2 ∧ . . . ∧ f k . Theorem 6.1. F or f ∧ k ε -r e gular, NS δ ( f ∧ k ) ≤ C (log 1 . 6 ( k /δ )) ( ε 1 / 6 + δ 1 / 2 ) . W e pro ve the theorem by fi rst redu cing b oundin g n oise sens itivit y of f ∧ k to b oundin g the Bo olean volume of l ∞ -neigh b orh o o ds of p olytop es. W e then use our in v ariance principle, Theo- rem 3.1 , to prov e the r equired b oun ds on the Bo olean v olume of b oundaries of p olytop es. As m en tioned b efore, the ab o v e theorem implies a n log O (1) k algorithm for learning inte r sections of regular halfspaces in the agnostic mo del for any constan t error rate. W e use the follo wing tail b ound that follo ws f rom Pinelis’s su bgaussian tail estimates [ Pin94 ]. F act 6.2. Ther e exist absolute c onstants c 1 , c 2 > 0 such that al l w ∈ R m , t > 0 , Pr x ∈ u { 1 , − 1 } m [ |h w , x i| > t k w k ] ≤ c 1 exp( − c 2 t 2 ) . 18 The follo wing clai m sa ys that f or W ε -regular, random x ∈ u { 1 , − 1 } n , and a δ -p erturbation y of x , W T x is close to W T y in l ∞ distance. Claim 6.3. F or x ∈ { 1 , − 1 } n , let y ( x ) b e a r andom δ - p erturb ation of y ( x ) of x . Then, Pr x ∈ u { 1 , − 1 } n ,y ( x )  k W T x − W T y ( x ) k ∞ ≥ λ  ≤ 2 δ , wher e λ = C log( k /δ ) 1 / 2 δ 1 / 2 + C log ( k /δ ) 3 / 4 ε 1 / 2 . Pr o of. Let Y = ( Y 1 , . . . , Y n ) b e i.i.d indicator v ariables with Pr [ Y i = 1] = δ . Let S ( Y ) = suppor t ( Y ). No w, for p ∈ [ k ], k W p S ( Y ) k 2 = P n i =1 W 2 ip Y i and E h k W p S ( Y ) k 2 i = δ . F urther, since W is ε -regular, b y Ho effding’s in equ alit y , for all t > 0, Pr h |k W p S ( Y ) k 2 − δ | ≥ γ i ≤ 2 exp − 2 γ 2 P i W 4 ip ! ≤ 2 exp  − 2 γ 2 ε 2  . Th u s, by a union b ound Pr Y h ∃ p ∈ [ k ] , k W p S ( Y ) k 2 ≥ δ + 2 p log( k /δ ) ε i ≤ δ . (6.1) Note that for a fixed Y and sufficien tly large C , by F ac t 6.2 and a union b ound , Pr x ∈ u { 1 , − 1 } n h ∃ p ∈ [ k ] , |h W p S ( Y ) , x S ( Y ) i| ≥ C p log( k /δ ) k W p S ( Y ) k i ≤ δ . F r om Equation 6.1 and the ab o v e equation, w e get that for a s ufficien tly large constan t C Pr x ∈ u { 1 , − 1 } n ,Y h ∃ p ∈ [ k ] , |h W p S ( Y ) , x S ( Y ) i| ≥ C log( k /δ ) 1 / 2 δ 1 / 2 + C log( k /δ ) 3 / 4 ε 1 / 2 i ≤ 2 δ . (6.2) No w, observe that that for x ∈ { 1 , − 1 } n , to generate a δ -p erturb ation of x , y ( x ), we can first generate a rand om Y as ab o ve and flip the b its of x in the supp ort of Y . Thus, from Equation 6.2 , Pr x ∈ u { 1 , − 1 } n ,Y [ ∃ p ∈ [ k ] |h W p , x i − h W p , y ( x ) i| ≥ λ ] = Pr x ∈ u { 1 , − 1 } n ,Y h ∃ p ∈ [ k ] | h W p S ( Y ) , x S ( Y ) i| ≥ λ i ≤ 2 δ, where λ = C log ( k /δ ) 1 / 2 δ 1 / 2 + C log ( k /δ ) 3 / 4 ε 1 / 2 . T herefore, Pr x ∈ u { 1 , − 1 } n ,Y  k W T x − W T y ( x ) k ∞ ≥ λ  ≤ 2 δ . The follo win g cla im can b e s een as an anti -concentrat ion b ound for regular p olytop es ov er the h yp ercub e and may b e of indep enden t interest: Claim 6.4. F or ε -r e gular W ∈ R n × k , θ ∈ R k , and 0 < λ < 1 , Pr x ∈ u { 1 , − 1 } n  W T x ∈ R ect ( θ + λ 1 k ) \ R ect ( θ − λ 1 k )  ≤ C (log 1 . 6 k ) ( ε log(1 /ε )) 1 / 5 + p log k λ. 19 Pr o of. F oll ows d irectly from Theorem 3.1 and Lemma 3.4 . W e can n o w pro ve Theorem 6.1 . of The or em 6.1 . Note that for x, y ∈ R n , f ∧ k ( x ) 6 = f ∧ k ( y ) implies th at W T x ∈ Rect ( θ + γ 1 k ) \ Rect ( θ − γ 1 k ), wh ere γ = k W T x − W T y k ∞ . Hence, NS δ ( f ∧ k ) = Pr x ∈ u { 1 , − 1 } n ,Y h f ∧ k ( x ) 6 = f ∧ k ( y ( x )) i ≤ Pr x ∈ u { 1 , − 1 } n ,Y h f ∧ k ( x ) 6 = f ∧ k ( y ( x )) | k W T x − W T y ( x ) k ∞ ≤ λ i + 2 δ ( Claim 6.3 ) ≤ Pr x ∈ u { 1 , − 1 } n  W T x ∈ R ect ( θ + λ 1 k ) \ Rect ( θ − λ 1 k )  + 2 δ ≤ C (log 1 . 6 k ) ( ε log(1 /ε )) 1 / 5 + p log k λ + 2 δ. ( Claim 6.4 ) The theorem no w follo w s. Applying L emm a 2.7 and Theorem 2.8 with Theorem 6.1 , we immediately obtain our main result for learning inte r sections of halfspaces, namely Theorem 1.5 . 7 Pseudorandom G enerators for P olytop es W e n ow pro v e our main theorems for constructing pseudorandom generators for p olytop es with resp ect to a v ariet y of d istributions (Theorems 1.7 , 1.9 , and 1.10 ). The results in this section are based on a recen t PR G constru ction d ue to Mek a and Zu c k erm an [ MZ10 ] for p olynomial threshold functions using the in v ariance principle of Mossel et al. [ MOO05 ]. A closer lo ok at their constru ction revea ls a general program for constructing PR Gs from in v ariance princi- ples. Giv en this observ ation, it is natural to ask if our inv ariance principle can b e used to construct PR Gs for regular p olytop es. Indeed it can, and we use the Mek a and Zuc kerman generator b ut with a different setting of its parameters. The analysis, ho wev er, is a little more complicated in our setting (ev en giv en our in v ariance prin ciple) and r equires a careful app lication of h yp ercon tractivit y . 7.1 Main Generator Construct ion W e b egin b y describing the construction of the PR G we use; it is a sligh tly mo dified v ersion of the PR G used by [ MZ10 ] to f o ol regular halfsp ac es (i.e., the case k = 1). Giv e δ ∈ (0 , 1), let ε = Ω( δ 6 / log 9 . 6 k ) b e suc h that log 1 . 6 k ( ε log (1 /ε )) 1 / 5 = δ . Let t = 1 /ε and let H = { h : h : [ n ] → [ t ] } b e a (2 log k )-wise indep end ent f amily of hash functions. That is, for all I ⊆ [ n ] , | I | ≤ 2 log k and b ∈ [ t ] I , Pr h ∈ u H [ ∀ i ∈ I , h ( i ) = b i ] = 1 t | I | . Efficien t constructions of h ash families H as ab ov e with |H| = O ( n 2 l og k ) are kn o wn. T o a v oid some tec hnical iss u es that can b e o vercome easily , we assume that ev ery hash function h ∈ H is equi-distributed in the follo wing sense: for all j ∈ [ t ], |{ i : h ( i ) = j }| = n /t . 20 Let m = n/t and let G 0 : { 0 , 1 } s → { 1 , − 1 } m generate a (4 log k )-wise indep endent distribution o v er { 1 , − 1 } m . T hat is, for all I ⊆ [ n ] , | I | ≤ 2 log k and b ∈ { 1 , − 1 } I , Pr x = G 0 ( z ) ,z ∈ u { 0 , 1 } s [ ∀ i ∈ I , x i = b i ] = 1 2 | I | . Efficien t constructions of generators G 0 as ab o v e with s = O (log k log n ) are known [ NN93 ]. Giv en a hash family and generator G 0 as ab o ve , we consider the follo wing generator. Define G : H × ( { 0 , 1 } s ) t → { 1 , − 1 } n b y G ( h, z 1 , . . . , z t ) = x, where x | h − 1 ( i ) = G 0 ( z i ) for i ∈ [ t ]. 7.2 Pseudorandom Generators for Regular Polytopes W e now argue that the ge n erator G d efined in the last section fools regular p olytop es and pro ve Theorem 1.7 . of The or em 1.7 . The b ound on the seed le n gth of the generator G follo ws from the construction. The follo w in g statemen t follo ws fr om an argument sim ilar to that of th e pro of of Theorem 3.2 : for an y smo oth fu n ction ψ : R k → R and ε -regular W ,     E y ∈ u { 0 , 1 } r  ψ ( W T G ( y ))  − E Y ←N n  ψ ( W T Y )      ≤ C log 3 k ( ε log(1 /ε )) k ψ (4) k 1 . (7.1) Indeed, to observe that Lemma 4.1 holds f or any (2 log k )-wise indep end en t family of hash f unctions and the pro of of Lemma 4.2 relies only on tw o k ey prop erties of X ← µ n : (1) F or a fixed hash function h , the blo c ks X h − 1 (1) , X h − 1 (2) , . . . , X h − 1 ( t ) are ind ep endent of one another. (2) F or a fixed hash function h , and j ∈ [ t ], the distr ibution of eac h blo c k X h − 1 ( j ) satisfies (2 , 2 log k )- h yp ercon tractivit y for all j ∈ [ t ]. In other words, we used the prop ert y th at for all j ∈ [ t ], u ∈ R | h − 1 ( j ) | , E h |h u, X h − 1 ( j ) i| 4 l og k i ≤ ( C log k ) 2 l og k k u k 4 log k . (7.2) Note that X generated according to the generator G satisfies b oth the ab ov e conditions: 1) F or a fixed function h , the blocks are indep end en t by defi n ition and 2) the hyp ercon tractivit y inequalit y ( 7.2 ) only inv olves the fir st (4 log k )-moments of the d istribution of X h − 1 ( j ) . As a consequence, inequalit y ( 7.2 ) holds for any (4 log k )-wise indep endent distrib u tion o ve r { 1 , − 1 } | h − 1 ( j ) | . W e can no w mo ve from closeness in exp ectation to closeness in cdf distance by an argumen t similar to the pro of of Theorem 3.1 , w h ere we use Equation 7.1 instead of Theorem 3.2 , to get | Pr y ∈ u { 0 , 1 } r [ G ( y ) ∈ K ] − Pr Y ←N n [ Y ∈ K ] | ≤ δ. The theorem no w follo w s from the ab o ve equation and Theorem 3.1 . 7.2.1 Appro ximate Counting for Integer Programs The PRG from Theorem 1.7 coupled with enumeration o ve r all p ossible seeds imm ediately implies a quasi-p olynomial time, deterministic algorithm for approxima tely counting, within a small ad d itiv e error, the num b er of solutions to “regular” { 0 , 1 } -in teger programs. It tur ns out that “regular” 21 in teger programs corresp ond to a broad class of well-st u died com binatorial problems. F or example, w e obtain deterministic, app ro ximate counti n g algo r ithms for dense set co v er problems and { 0 , 1 } - con tingency tables. W e obtain quasi-p olynomial time algorithms ev en w hen there are a p olynomial n u mb er of constr aints (or p olynomial n umb er of rows in th e con tingency table setting). As far as w e kno w, there is n o prior w ork giving non trivial deterministic algorithms for count in g solutions to in teger programs with man y constrain ts. Here w e discuss the case of dense set co ver instances and r emark that we get similar resu lts for the sp ecial case of count in g cont in gency tables. Co vering in teger programs are a fundament al class of integ er programs and can b e form ulated as follo w s. min X i X i s.t. X i a ij X i ≥ c j , j = 1 , . . . , k , (7.3) X ∈ { 0 , 1 } n , where the co efficien ts of the constrain ts a ij and c j are all non-n egativ e. An imp ortant sp ecial class of co vering in teger programs is set co ver, whic h in turn is a generaliza tion of m an y imp ortant problems in com binatorial optimizatio n suc h as edge co ver and m ultidimens ional { 0 , 1 } -knapsac k. In the standard set cov er prob lem, the inpu t is a family of sets S 1 , . . . , S n o v er a universe U of size k and an integer t . The goal is to find a subfamily of sets C such that |C | ≤ t and the union of all the sets in C equals U . This corresp onds to a co v ering p rogram (as giv en b elo w) with k constrain ts and n un kno wns from { 0 , 1 } . min n X i =1 X i s.t. X i : j ∈ S i X i ≥ 1 , j ∈ U, (7.4) X ∈ { 0 , 1 } n , Call an instance of set co ver ε -dense if eac h elemen t in U app ears in at least 1 /ε 2 of the differen t sets S i . Clearly , all th e linear constraints that app ear in Equation 7.4 are ε -regular if the set co v er instance is ε -dense. Th ese constrain ts contin ue to b e ε -regular ev en after translating fr om { 0 , 1 } to { 1 , − 1 } and app ropriate normalizatio n . Thus, u sing the generator from Theorem 1.7 and en umer ating o v er all seeds to th e generator, w e ha ve the follo win g: Theorem 7.1. Ther e exists a deterministic algor ithm that, given instanc e of an ε -dense set c ov- ering pr oblem with k c onstr aints over a universe of size n , appr oximates the numb e r of solutions to within an additive err or of at most δ 2 n in time n poly (log k , 1 /δ ) as long as ε ≤ δ 5 / (log 8 . 1 k )(log(1 /δ )) . W e no w elab orate on approximat ely coun ting the n umb er of { 0 , 1 } con tingency tables. The problem of coun ting { 0 , 1 } -c ontingency tables is the follo wing. Giv en, p ositiv e integ ers n, k n > k , r = ( r 1 , . . . , r n ) ∈ Z n , c = ( c 1 , . . . , c k ) ∈ Z k w e wish to coun t the num b er of solutions, CT ( r , c ), to the follo wing intege r pr ogram whose solutions are matrices X ∈ { 0 , 1 } n × k with ro w and column 22 sums giv en by r , c . Find X ∈ { 0 , 1 } n × k s.t. X j X ij = r i , 1 ≤ i ≤ n, X i X ij = c j , 1 ≤ j ≤ k . Observe th at, after translating from { 0 , 1 } to { 1 , − 1 } and appr op r iately normalizing, solutions to the abov e in teger program corresp ond to p oin ts from { 1 , − 1 } n × k that lie in an intersecti on of 2( n + k )-halfspaces eac h of whic h is (1 / √ k )-regular (reca ll that the notio n of regularit y do es not dep end on the v alue of the c i ’s or r j ’s). Thus, as with d en se instances of set co v er, we can u se Theorem 1.7 to count the num b er of { 0 , 1 } -con tingency tables: Theorem 7.2. Ther e exists a deterministic algorithm that on input r ∈ Z n , c ∈ Z k , appr oximates CT ( r , c ) / 2 nk , the fr action of { 0 , 1 } -c ontingency tables with sums r , c , to within additive err or δ , and runs in time n poly (log k , 1 /δ ) . W e remark that using results of W olff [ W ol07 ], who s h o ws hypercontrac tivity for v arious d iscrete distributions, w e can appr o ximately coun t num b er of solutions to dense s et co ver instances and con tingency tables ov er most natural domains. 7.3 Pseudorandom Generators for P olyt op es in Gaussian Space W e now p ro ve Theorem 1.9 . W e u se an idea of Ailo n and Chazelle [ A C09 ] and the in v ariance of the Gaussian measure to unitary rotations to obtain PR Gs w ith r esp ect to N n for al l p olytop es. Similar ideas w ere used by Mek a and Z uc kerman to obtain PRGs for spherical caps (i.e., the case of one hyp erplane). In our setting, we must pro ve that, with r esp ect to a random r otation, al l of the b ound ing hyp erplanes b ecome r egular with h igh probability . Such a tail b ound requir es applying h yp ercon tractivit y . Let H ∈ R n × n b e the normalized Hadamard matrix with H H T = I n and H ij ∈ { 1 / √ n, − 1 / √ n } . Ailon and Chazell e sho w that for an y w ∈ R n , and a random diago n al matrix D with un if orm ly random { 1 , − 1 } en tr ies, the vec tor H D w is regular with high probability . W e derand omize their observ ation u sing h yp ercon tractivit y . F or a v ector x ∈ R n , let D ( x ) ∈ R n × n b e the diagonal matrix with diagonal en tries x . Lemma 7.3. Ther e exists a c onstant C > 0 such that the fol lowing holds. F or any w ∈ R n , k w k = 1 , 0 < δ < 1 and any ( C log( k /δ )) -wise indep endent distribution D over { 1 , − 1 } n , Pr x ←D  k H D ( x ) w k 4 4 ≥ C log 2 ( k /δ ) /n  ≤ δ/k . Pr o of. Fix a w ∈ R n and a C log ( k /δ )-wise indep endent d istribution D for constan t C to b e c hosen later. Let r andom v ariable Z = k H D ( x ) w k 4 4 = P i ( P l H il x l w l ) 4 for x ← D . Note that x satisfies 23 (2 , q )-h yp ercon tractivit y for q ≤ C log ( k /δ ). No w, E  Z 2  = X i,j E   X l H il x l w l ! 4 X l ′ H j l ′ x l ′ w l ′ ! 4   ≤ X ij v u u u t E   X l H il x l w l ! 8   · E   X l H j l x l w l ! 8   Cauc hy-Sc h warz inequalit y ≤ X i,j 8 4   E   X l H il x l w l ! 2     2   E   X l H j l x l w l ! 2     2 (2 , 8)-h yp ercon tr activit y = 8 4 X i,j 1 n 4 = c n 2 . The last equalit y follo ws from the fact that E [ x i x j ] = 0 for i 6 = j and that eac h H 2 ij = 1 /n . O bserv e that Z is a degree 4 m ultilinear p olynomial o ver x 1 , . . . , x n . Therefore, by (2 , q )-hyp ercon tractivit y , Lemma 2.3 , applied to the random v ariable Z , for q ≤ C log( k /δ ) / 4, E [ | Z | q ] ≤ q 2 q ( E  Z 2  ) q / 2 ≤ c q / 2 q 2 q n q . Hence, by Mark o v’s in equalit y , for γ > 0, Pr [ | Z | > γ ] = Pr [ | Z | q > γ q ] ≤ c 1 / 2 q 2 γ n ! q . The lemma no w follo ws by taking q = 2 log( k /δ ) an d γ = 2 c 1 / 2 q 2 /n . Let G : { 0 , 1 } r → { 1 , − 1 } n b e the generator fr om Theorem 1.7 for r = O ((log n log k ) /ε ). Let G 1 : { 0 , 1 } r 1 → { 1 , − 1 } n generate a C log ( k /δ )-wise indep end en t distrib ution, for constant C as in Lemma 7.3 . Generato r s G 1 as ab o ve with r 1 = O (log( k /δ ) log n ) are known. Defin e G N : { 0 , 1 } r 1 × { 0 , 1 } r → R n as follo ws : G N ( x, y ) = D ( G 1 ( x )) H G ( y ) . W e claim that G N δ -fo ols all p olytop es w ith resp ect to N n . of The or em 1.9 . Recall that ε = Ω( δ 5 . 1 / log 8 . 1 k ) > 1 /n . 51 . The seed length of G N is r 1 + r = O (log n log k /ε ). Fix W ∈ R n × n . Observ e that W T G N ( x, y ) = ( H D ( G 1 ( x )) W ) T G ( y ). No w, fr om Lemma 7.3 and a un ion b oun d it follo ws that Pr x ∈ u { 0 , 1 } r 1 [ H D ( G 1 ( x )) W is not ε -regular ] ≤ δ. (7.5) 24 F u rther, from the inv ariance of N n with resp ect to unitary rotations, for an y x ∈ { 0 , 1 } r 1 , Pr z ←N n  ( H D ( G 1 ( x )) W ) T z ∈ Rect ( θ )  = Pr z ←N n  W T z ∈ Rect ( θ )  . Th u s, from Theorem 1.7 applied to N , w e get that for H D ( G 1 ( x )) W ε -regular,     Pr y ∈ u { 0 , 1 } r  ( H D ( G 1 ( x )) W ) T G ( y ) ∈ Rect ( θ )  − Pr z ←N n  W T z ∈ Rect ( θ )      ≤ δ . (7.6) The theorem no w follo w s from Equations ( 7.5 ), ( 7.6 ). 7.4 Pseudorandom Generators for In ter sections of Spherical Caps Theorem 1.1 0 follo ws from Theorem 1. 9 and the follo wing new in v ariance principle f or p olytop es o v er S n − 1 : T he pro of us es Nazaro v’s b ound on Gaussian surface area and the large d eviatio n b ound from Th eorem 2.4 . Lemma 7.4. F or any p olytop e K with k fac es,     Pr X ∈ u S n − 1 [ X ∈ K ] − Pr Y ←N n  Y / √ n ∈ K      ≤ C log n log k √ n . Pr o of. Fix a p olytop e K ( W, θ ). Let X ∈ u S n − 1 and Y ← N n . Note that Y / k Y k is un if orm ly distributed o v er S n − 1 . Fix δ = c/n 1 / 2 for a constan t c to b e c hosen late r . Ob serv e that for Y ← N n , and u ∈ R n , k u k = 1, h u, Y i is distrib uted as N . Hence, for any u ∈ R n , k u k = 1, Pr h |h u, Y i| ≥ p log( k /δ ) i ≤ δ k . Therefore, by a union b ound, Pr h k W T Y k ∞ > p log( k /δ ) i ≤ δ . F u rther, b y applying Theorem 2.4 to the E uclidean n orm (which has Lipsc hitz constan t 1) and the fact that E [ k Y k ] = Ω( √ n ), we get Pr h | W T Y k ∞ / k Y k > C p log( k /δ ) / √ n i ≤ Pr h k W T Y k ∞ > p log( k /δ ) i + Pr  k Y k 2 < √ n/C  ≤ δ + 2 exp ( − Ω( n )) ≤ 2 δ, for a sufficientl y large constant C and large n . Therefore, as Y / k Y k is un iformly d istributed o ver S n − 1 , Pr h k W T X k ∞ > p C log ( k /δ ) / √ n i ≤ 2 δ , F r om the ab o v e t wo equations, it follo ws that to pr o v e the theorem w e can assume that k θ k ∞ < p C log ( k /δ ) /n. 25 No w, applying T h eorem 2.4 (again to the Euclidean norm) an d the ab ov e equation it follo ws that Pr h |k Y k − √ n | k θ k ∞ ≥ p C log (1 /δ ) log ( k /δ ) /n i ≤ 2 δ . (7.7) Let λ = p C log (1 /δ ) log ( k /δ ) /n . Th en, since Y / k Y k ∈ u S n − 1   Pr [ X ∈ K ] − Pr  Y / √ n ∈ K    =   Pr  W T X ∈ Rect ( θ )  − Pr  W T Y / √ n ∈ Rect ( θ )    =   Pr  W T Y ∈ k Y k Rect ( θ )  − Pr  W T Y ∈ √ n Rect ( θ )    ≤ Pr  |k Y k − √ n | k θ k ∞ ≥ λ  + Pr  W T Y ∈ Rect ( √ nθ + λ 1 k ) \ R ect ( √ nθ − λ 1 k )  ≤ 2 δ + O ( λ p log k ) . ( Equation 7.7 , Lemma 3.4 ) The lemma no w follo ws by c ho osing δ = c/n 1 / 2 for a sufficien tly large constan t c . of The or em 1.10 . Define G sp : { 0 , 1 } r 1 × { 0 , 1 } r → S n − 1 b y G sp ( x, y ) = G N ( x, y ) / √ n . It follo ws from Theorem 1.9 and Lemma 7.4 that G sp fo ols p olytop es o ver S n − 1 as in the theorem. Ac kno wledgmen ts Thanks to F edja Nazaro v for helping us compute an in tegral. W e h ad useful con versat ions with Carly Kliv ans, Ry an O’Donnell, Alistair S inclair, Eric Vigo da, and Da vid Z uc kerman. References [A C09] Nir Ailon and Bernard Chazelle, The fast Johnson–Lind enstr auss tr ansform and ap- pr oximate ne ar est nei g hb ors , SIAM J. Computin g 39 (2009), no. 1, 302–3 22, (Prelimi- nary version in 38th STO C , 2006). [AM09] P er Austrin and Elc hanan Mossel, Appr oximation r esistant pr e dic ates fr om p airwise indep endenc e , Computational C omplexit y 18 (2009), no. 2, 249–2 71. [Aus07] P er Austrin, Balanc e d ma x 2-sat might not b e the har dest , Pro c. 39th ACM Symp. on Theory of Computing (S T OC), A CM, 2007, pp . 189– 197. [Ben90] Vidman tas K . Ben tkus, Smo oth app r oximations of the norm and differ entiable functions with b ounde d supp ort in Banach sp ac e l k ∞ , Lith uanian Mathematic al Jour n al 30 (1990), no. 3, 223–2 30. [Ben03] , On the dep endenc e of the Berry-Esse en b ound on dimension , Journ al of Sta- tistical P lanning and Inference 113 (2003), no. 2, 385–4 02. [BK10] Nikhil Bansal and S ubhash Khot, Inappr oximability of hyp er gr aph vertex c over and applic ations to sche duling pr oblems , Pro c. 37th Int ern ational Colloqu iu m of Automata , Languages and P r ogramming (ICALP), P art I (Samson Abramsky , Cyril Ga v oille, Claude Kirc h ner, F r iedhelm Mey er auf d er Heide, and Paul G. S pirakis, eds.), LNCS, v ol. 6198, Sp r inger, 2010, pp . 250–26 1. 26 [BKS99] Itai Benjamini, Gil Kalai, and O d ed Sc hramm, Noise sensitivi ty of Bo ole an functions and applic ations to p er c olation , In st. Hautes ´ Etudes Sci. Publ. Math. 90 (1999), no. 1, 5–43. [BO10] Eric Blais and Ry an O’Donnell, L ower b ounds for testing func tion isomorphism , Pro c. 25th IEEE Conference on Computational Complexity , IEEE, 2010, p p. 235–246. [BR07] Matthias Bec k and Sinai Robins, Computing the c ontinuous discr etely: Inte ger-p oint enumer ation in p olyhe dr a , 1st ed., Undergraduate T exts in Mathematics, Sprin ger, 2007. [BV08] Al exand er Barvinok and Ellen V eomett, The c omputational c omplexity of c onvex b o dies , Surveys on Discrete and Comp utational Geometry: Twen t y Y ears Later (Jacob E. Go o dman, J ´ anos Pa ch, and Ric hard Poll ack, eds.), Con temp orary Mathemati cs, vol. 453, AMS, 2008, pp. 117– 137. [CD03] Mary Cry an and Martin E. Dy er, A p olynomial-time algorithm to appr oximately c ount c ontingency tables when the numb er of r ows is c onstant , J. C omp uter and S ystem Sciences 67 (2003 ), no. 2, 291–310. [Cha05] Soura v Chatterjee, A simp le invarianc e the or em , 2005. [DFR08] Irit Dinur, Eh u d F r iedgut, and Oded R egev, Indep endent sets in gr aph p owers ar e almost c ontaine d i n juntas , Geometric and F unctional Analysis 18 (200 8), no. 1, 77– 97. [DHK + 10] Ilias Diak onikola s, Prahladh Harsha, Ad am Kliv an s , Raghu Mek a, Prasad Rag h a v en- dra, Ro cco Ser vedio, and Li-Y ang T an, Bounding the aver age sensitivity and noise sensitivity of p olynomial thr e shold fu nctions , Pro c. 42nd A CM Symp . on Theory of Computing (ST OC), A C M, 2010, p p. 533–542 . [DKN10] Ilias Diak onik olas, Daniel M. Kane, and Jelani Nelson, Bounde d indep endenc e fo ols de gr e e- 2 thr eshold functions , P ro c. 51st IEEE Symp. on F oundations of Comp . Science (F OCS), IEEE, 2010, pp. 11–20. [DMR09] Irit Din ur, Elc hanan Mossel, and Oded Regev, Conditional har dness for appr oximate c oloring , SIAM J. Computing 39 (2009), no. 3, 843–87 3, (Preliminary version in 38th STOC , 2006). [dW08] Ronald d e W ol f , A brief intr o duction to Fourier analysis on the Bo ole an cub e , T heory of Comp uting, Graduate Surv eys 1 (2008), 1–20. [Dy e03] Martin E. Dy er, A ppr oximate c ounting by dynamic pr o gr amming , Pr o c. 35th A C M Symp. on Theory of Computing (STOC ), A CM, 2003, pp. 693– 699. [F el68] William F eller, An intr o duction to pr ob ability the ory and its applic ations, volume 1 , 3rd ed., Wiley , 1968. [F el71] , An i ntr o duction to pr ob ability the ory and its app lic ations, volume 2 , 2nd ed ., Wiley , 1971. 27 [F GR W09] Vit aly F eldman, V enk atesan Guruswami, Prasad R aghav end ra, and Yi W u, A gnostic le arning of monomials by halfsp ac es is har d , Pro c. 50t h IEEE Symp. on F oundations of Comp . Science (FOCS), IEEE, 200 9, pp . 385–39 4. [GKM10] P arikshit Gopalan, Adam Kliv ans, and Ragh u Mek a, Polynomia l- time appr oximation schemes for knapsack and r elate d c ounting pr oblems using b r anching pr o gr ams , 20 10. [GO WZ10] P ariksh it Gopalan, Ry an O’Donnell, Yi W u, and Da vid Zuc kerman, F o oling functions of halfsp ac es under pr o duct distributions , Pr o c. 25th IEEE Conference on Computational Complexit y , I EEE, 2010, pp. 223–23 4. [GW95] Mic hel X. Go emans and Da vid P . Williamson, Impr ove d app r oximation algorithms for maximum cut and satisfiability pr oblems using semidefinite pr o gr amming , J . ACM 42 (1995 ), no. 6, 1115–1145 , (Preliminary version in 26th STOC , 1994). [H ˚ as01] Johan H ˚ astad, Some optimal inappr oximability r esults , J. ACM 48 (200 1), no. 4, 798– 859, (Preliminary V ersion in 29th STOC , 1997 ). [Hau92] Da vid Haussler, De cision the or etic g ener alizations of the P AC mo del for neur al net and other le arning applic ations , Inf. Comp u t. 100 (199 2), no. 1, 78–15 0, (Preliminary v ersion in 1st AL T , 1990 ). [INW94] Russ ell Impagliazzo, Noam Nisan, and Avi Wigderson, Pseudor andomness f or ne twork algorithms , Pro c. 26th ACM Symp. on Th eory of Computing (STOC), ACM, 1994 , pp. 356–364. [Jan97] S. Janson, Gaussian hilb e rt sp ac es , Cam br id ge T racts in Mathematics, Cam br idge Uni- v ersity Press, 199 7. [JS97] Mark Jerru m and Alistair Sinclair, The Markov chain Monte Carlo metho d: An ap- pr o ach to appr oximate c ounting and inte gr ation , Approximat ion Algorithms for NP- hard Pr oblems (Dorit S . Ho c hbaum, ed.), PWS Pub lishing Compan y , 1997. [Kal05] Gil Kalai, Noise sensitivity and chao s in so cial choic e the ory , T ec h. Rep ort 399, C en ter for Rational ity and Interacti ve Decisio n Theory , Hebrew Univ ersity of Jerusalem, 2005 . [KKL88] Jeff K ahn, Gil Kala i, and Nathan Linial, The influenc e of variables on Bo ole an func- tions (extende d abstr act) , Pro c. 29th IEEE Symp. on F oundations of Comp. Science (F OCS), IEEE, 1988, pp. 68–80. [KKMO07] Subhash Khot, Gu y Kindler, Elc hanan Mossel, and Ry an O’Donnell, O ptimal inap- pr oximability r esults for MAX-CUT and other 2-variable CSP s? , S IAM J. Computing 37 (2007), no. 1, 319– 357, (Preliminary v ersion in 45th FOCS , 2004). [KKMS08] Adam T auman K alai, Ad am R. Kliv ans, Yisha y Mansour, and Ro cco A. Serv edio, A gnostic al ly le arning halfsp ac e s , SIAM J. C omputing 37 (2008), no. 6, 1777 –1805, (Preliminary v ersion in 46th FOCS , 2005 ). [K OS 04] Adam R. Kliv ans, Ry an O ’Donnell, and Ro cco A. Serv edio, L e arning interse ctions and thr esholds of halfsp ac es , J. Computer and System Sciences 68 (2004 ), no. 4, 808– 840, (Preliminary v ersion in 43r d F OCS , 2002 ). 28 [K OS 08] , L e arning ge ometric c onc e pts via Gaussian surfac e ar e a , Pro c. 49th IEE E S ymp. on F oundations of Comp. Science (F OCS), IEEE, 2008, pp. 541–550 . [KSS94] Mic hael J. Kearns, Rob ert E. Schapire, and Lin d a S ellie, T owar d efficient agnostic le arning , Mac hine Learning 17 (1994), no. 2–3, 115 –141, (Pr eliminary v ersion in 5th COL T , 1992). [LMN93] Nathan Linial, Yishay Mansour, and Noam Nisan, Constant depth ci r cuits, Fourier tr ansform, and le arnability , J. A C M 40 (1993), no. 3, 607–620, (Preliminary version in 30th FOCS , 1989) . [L T 91] Mic hel L edoux and Mic h el T alagrand, Pr ob ability in b anach sp ac es: Isop e rimetry and pr o c esses , S pringer, 1991. [Man94] Yisha y Mansour, L e arning Bo ole an f unctions via the Fourier tr ansform , Theoretical Adv ances in Neural Comp utation and Learning (Vwani P . Royc howdh ury , Kai- Y eung Siu, and Alon Or litsky , eds.), K lu wer Academic Pub lishers, 1994, pp . 391–4 24. [MH99] Sanjeev Maha jan and Ramesh Hariharan , Der andomizing app r oximation algorithms b ase d on semidefinite pr o gr amming , SIAM J. C omputing 28 (1999), no. 5, 1641–16 63, (Preliminary v ersion in 36th FOCS , 1995 ). [MOO05] Elc hanan Mossel, Ry an O’Donnell, and Krzysztof Oleszkiewic z, Noise stability of func- tions with low influenc es invaria nc e and optimality , Pro c. 46th IEEE S ymp. on F oun- dations of Comp. Science (F OCS), IEEE, 2005, pp. 21–30. [Mos08] Elc hanan Mossel, Gaussian b ounds for noise c orr elation of functions and tight anal ysis of long c o des , Pro c. 49th IEEE Symp . on F oundations of Comp. S cience (F O CS), IEEE, 2008, p p. 156–1 65. [Mos11] , A q u antitative Arr ow The or em , 2011. [MZ10] Ragh u Mek a an d Da vid Zuc ke r man, Pseudor andom gener ators for p olynomial thr eshold functions , Pro c. 42nd ACM Symp. on Theory of C omputing (STOC ), A CM, 2010, pp. 427–436. [Naz03 ] F edor Nazaro v, On the maximal p erimeter of a c onvex set in R n with r e sp e ct to a Gaussian me asur e , Geometric Asp ects of F unctional Analysis (Israel Seminar 2001– 2002) , Lecture Notes in Mathematic s, v ol. 180 7/2003, S pringer, 2003, pp . 169–18 7. [NN93] Joseph Naor and Moni Naor, Smal l-bias pr ob ability sp ac es: Efficient c onstructions and applic ations , SIAM J. Computing 22 (1 993), no. 4, 838–856, (Preliminary V ersion in 22nd STOC , 1990 ). [O’D04] Ry an O’Donnell, Har dness amplific ation within NP , J. Compu ter and System S ciences 69 (2004), no. 1, 68–9 4, (Pr eliminary v ers ion in 34th STOC , 2002). [O’D08] , Some topics in analysis of Bo ole an functions , Pr o c. 40th A C M S ymp. on Theory of Computing (S T OC), A CM, 2008, pp . 569– 578. 29 [O W09] Ryan O’Donnell and Yi W u, Conditiona l har dness for satisfiable 3-CSPs , Pro c. 41st A CM Symp. on Theory of C omputing (STO C), A CM, 2009, pp. 493– 502. [P er04] Y uv al P eres, Noise stability of weighte d majority , 2004. [Pin94] Iosif Pinelis, Extr emal pr ob abilistic pr oblems and ho tel lings T 2 test under a symmetry c ondition , Ann. Statist. 22 (1994), no. 1, 357–3 68. [PR89] Vyganta s P aulausk as and Alfredas Raˇ ck ausk as, Appr oximation the ory in the c entr al limit the or em: Exact r esults in Banach sp ac es , Kluw er Academic Publish ers, 1989, (T ranslated from Ru ssian). [Rag08] Prasad Ragha v endr a, Optimal algorithm s and inappr oximability r esults for every CSP? , Pro c. 40th A CM Symp. on T heory of Computing (STO C), A CM, 2008, pp . 245– 254. [Rot79] Vladimir Il’ic h Rotar, Limit the or ems for p olyline ar forms , Journal of Multiv ariate Analysis 9 (1979), n o. 4, 511 – 530. [Shi00] Y aoyun Shi, L ower b ounds of quantum black- b ox c omplexity and de gr e e of appr oximating p olynomials by influenc e of Bo ole an variables , I nf. Pr o cess. Lett. 75 (2000), no. 1-2, 79–83 . [W ol07] P aw el W olff, Hyp er c ontr activity of simple r andom variables , Studia Math 180 (2007), no. 3, 219–2 36. [Zie95] G ¨ unt er M. Ziegler, L e ctur es on p olytop es , Graduate texts in Mathematics, v ol. 152, Springer, 199 5. 30

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment