The Gaussian Surface Area and Noise Sensitivity of Degree-$d$ Polynomials
We provide asymptotically sharp bounds for the Gaussian surface area and the Gaussian noise sensitivity of polynomial threshold functions. In particular we show that if $f$ is a degree-$d$ polynomial threshold function, then its Gaussian sensitivity …
Authors: Daniel M. Kane
The Gaussian Surfa ce Area and Nois e Sensitivit y of Degree- d P olynomials Daniel M. Kane August 14, 201 8 1 In tro duction W e provide asymptotically sharp bounds fo r the Gaussia n s urface a rea a nd the Gaussian noise sensitivit y of po lynomial threshold functions. In pa rticular we show that if f is a degree- d p olyno mial thresho ld function, then its Gaussia n sensitivity at noise rate ǫ is less than some quantit y asymptotic to d √ 2 ǫ π and the Gaussian surface a r ea is at most d √ 2 π . F urther mo re these b ounds are asymp- totically tight a s ǫ → 0 and f the thre s hold function of a pro duct of d distinct homogeneous linear functions. The noise sensitivity and surface area a re bo th o f fundamen tal interest and useful in the analys is of agnostic learning algorithms (see [6]). In pa rticular our re sults imply that the clas s of degree- d po lynomial thre s hold functions is agnostica lly learnable under the n -dimensional Gaussian distribution in time n O ( d 2 /ǫ 4 ) . A nu mber of other authors ha v e attempted to prove bounds along these lines. [7] proves a b ound on noise sensitivity in ter ms of sur face area and w e rela te our b ounds essentially by a lso proving the o ther direction of this inequality for bo olean functions with smo oth interface that switch signs a b ounded n umber of times on an y line through the orig in. Our b ounds are obta ined via a s imple computation in the case o f d = 1. A b ound of ˜ O ( ǫ 1 / (2 d ) ) noise sensitivity was recently pr oved by [3] a nd indep endently b y [5] for multilinear p olynomials. There is a lso interest in related questions for p oints pick ed uniformly from vertices of the hypercub e rather than with the Ga us sian distribution. It is conjectured in [4] that the corresp o nding noise sensitivity in this case is also alwa ys O ( d √ ǫ ) . The d = 1 c ase of this conjecture w as prov ed by [8 ], improving upo n a bo und of O ( ǫ 1 / 4 ) of [2]. It is noted in [3] that such a r esult would imply a similar b ound for the Gaussian case. Hence our results can be thought of as a first step toward proving this conjecture. 1 1.1 Basic Definitions Given a function f : R n → {− 1 , 1 } we define the Gaussian noise sensitivity at noise rate ǫ a s GNS ǫ ( f ) := Pr( f ( X ) 6 = f ( Z )) where X is an n -dimensional Gaussian random v ariable, and Z = (1 − ǫ ) X + √ 2 ǫ − ǫ 2 Y for Y an indep endent n -dimensional Gauss ia n. This is closely related to the Gaussian sur face area o f f − 1 (1). In par ticular we define the Gaussian surface area of a set A to b e Γ( A ) := lim inf δ → 0 GaussianV olume( A δ \ A ) δ . Where the Ga ussian volume of a regio n R is P r( X ∈ R ) for X a Gauss ian random v ariable, and wher e A δ is the set of po ints x so that d ( x, A ) ≤ δ (under the E uclidean metric). W e note that if A is an open regio n whose b ounda ry is smo oth aw a y from co dimension 2, that its Gaussian surface are a is equal to Z ∂ A φ ( x ) dσ . Where φ ( x ) is the Gaussian densit y , and dσ is the surface measure on ∂ A . F urthermore if A is suc h a region, then its Gaussian surface area is seen to be equal to lim δ → 0 GaussianV olume(( ∂ A ) δ ) 2 δ . F or f a bo olean function, we define Γ( f ) := Γ( f − 1 (1)) . The concepts o f noise sensitivity a nd s ur face area ar e r elated to each other by noting that the no ise sensitivity is roug hly the probability that X is clo se enough to the b oundar y that wigg ling it will push it ov er the b ounda ry . 1.2 Statemen t of Results W e fo cus o n proving tw o main res ults. W e define f to b e a degree d p o ly nomial threshold function if f ( x ) = sgn( p ( x )) for some degree d po lynomial p . W e prov e the follo wing Theorems ab out such functions : Theorem 1. If f is a de gr e e d p olynomial thr eshold funct ion, t hen GNS ǫ ( f ) ≤ d arcsin( √ 2 ǫ − ǫ 2 ) π ∼ d √ 2 ǫ π = O ( d √ ǫ ) . F urt hermor e this b oun d is asymptotic al ly tight as ǫ → 0 for t he thr eshold fun c- tion of any pr o duct of distinct line ar functions. Theorem 2. If f is a de gr e e- d p olynomial thr eshold function then Γ( f ) ≤ d √ 2 π . Section 2 will b e dev oted to the pro of o f Theorem 1 , Section 3 to the pro of of Theorem 2, and Section 4 will provide some closing notes. 2 2 Pro of of the Noise Sensitivit y Bound Pr o of of The or em 1. W e beg in by letting θ = a rcsin( √ 2 ǫ − ǫ 2 ). W e need to bo und p := GNS ǫ ( f ) = Pr( f ( X ) 6 = f (co s( θ ) X + sin( θ ) Y )) . (1) W e note tha t the v alue o f p g iven in Equa tio n 1 remains the same if X and Y are replaced by any X ′ and Y ′ that are i.i.d. Gauss ia n distributions. In particular we define X φ = cos( φ ) X + sin( φ ) Y . Note that X φ and X φ + π / 2 are i.i.d. Gaussians. U sing these distr ibutions we find that for a ny φ that since X θ + φ = cos( θ + φ ) X + sin( θ + φ ) Y = cos( θ ) cos( φ ) X − sin( θ ) sin( φ ) X + cos( θ ) sin( φ ) Y + sin( θ ) cos( φ ) Y = cos( θ ) X φ + sin( θ ) X φ + π / 2 , we hav e p = Pr( f ( X φ ) 6 = f ( X φ + θ )) . Therefore we have for any int eger n that np = Pr( f ( X 0 ) 6 = f ( X θ ))+Pr( f ( X θ ) 6 = f ( X 2 θ ))+ . . . +Pr( f ( X ( n − 1) θ ) 6 = f ( X nθ )) . (2) W e define the random function F : R → {− 1 , 1 } by F ( φ ) = f ( X φ ) . ( F depends on X and Y as well as φ ). W e note that the left hand side of Equation 2 is a t most the num ber of times that F ( φ ) changes signs on the int erv al [0 , nθ ]. There fore we have that p ≤ E[n um ber of times F changes s igns on [0 , nθ ]] n . (3) W e note that F ( φ ) is pe r io dic in φ with p erio d 2 π . Therefore the num ber of times F changes sign on [0 , nθ ] is the n umber of times that F changes sign on [0 , 2 π ) times nθ 2 π + O (1) . Applying this to Equa tion 3, w e get that p ≤ E[n um ber of sign changes of F on [0 , 2 π )] nθ 2 π + O (1) n . T aking a limit as n → ∞ y ie lds p ≤ θ E[num b er of sign changes of F on [0 , 2 π ]] 2 π . (4) W e now make use of the fact that f is a deg ree d p olyno mial thr e s hold function. In par ticular we will show that for any X and Y that F changes s ig ns 3 at most 2 d times on [0 , 2 π ). W e let f = sgn( g ) for so me de g ree d p olyno mial g . W e no te that the nu mber of sign changes o f F is equal to the num b er of zero es of the function g (cos( φ ) X + sin( φ ) Y ) (unless this function is identically 0, which happ ens with pro bability 0 a nd ca n be igno red). It should b e no ted though that g (cos( φ ) X + sin( φ ) Y ) = 0 if and only if z = e iφ is a ro o t of the degree-2 d p olyno mial z d g z + z − 1 2 X + z − z − 1 2 i Y . Therefore the exp ectatio n in Equation 4 is at most 2 d . Therefore we have p ≤ 2 dθ 2 π = dθ π as desired. W e also note the wa ys in which the ab ov e b ound can fail to b e tight. Firstly , there may be some probability that F changes signs less than 2 d times o n a full circle. Secondly , the num b er o f times F changes signs may be more than the fraction of the time that f ( nθ ) 6 = f (( n + 1) θ ) if sign changes are spaced mor e tightly tha n θ . O n the o ther hand it should b e noted that if f is the threshold function for a pro duct of d distinct homog eneous linear functions, the first ca se happ ens with probability 0, and the pro bability of the second case o ccur ring will necessar ily go to 0 as ǫ do e s. Therefor e for s uch functions our b o und is asymptotically corr ect a s ǫ → 0 . 3 Pro of of the Gaussian Surface Area Bound s W e will first need to b ound a slight v aria nt of the noise sensitivity of a p olyno- mial threshold function. W e begin by proving the following Lemma: Lemma 3. If f is a de gr e e d p olynomial t hr eshold fun ction in n dimensions, ǫ > 0 and X a r andom Gauss ian variab le, then Pr ( f ( X ) 6 = f ( X (1 + ǫ ))) ≤ dǫ r n 4 π . Pr o of. First note that by first conditioning on the line that X lies in we may reduce this problem to the case of a one dimensio nal distribution. Note that f changes sign at most d times along this line. W e need to b ound the probability that at least one of thes e s ign changes is b etw een X a nd (1 + ǫ ) X . It therefore suffices to prove that for any one of these sign changes, that it lies b etw een X and (1 + ǫ ) X with probability at most ǫ p n 4 π . Note that the probability that X is on the co rrect side of the o rigin is 1 2 . Beyond tha t | X | 2 satisfies the χ 2 distribution with n degrees of freedom, namely 1 2 n/ 2 Γ( n/ 2) x n/ 2 − 1 e − x/ 2 dx . Letting y = lo g( x ) = 2 log( | X | ) we find that that y has distribution 1 2 n/ 2 Γ( n/ 2) e ny / 2 e − e y / 2 dy . 4 W e wan t the probability that y is within a particular range of size 2 log (1 + ǫ ). This is at most 2 ǫ times the densest part of the density function. This is a chiev ed when ny − e y is maximal, or when y = log( n ). Then the densit y is 1 2 n/ 2 Γ( n/ 2) n n/ 2 e − n/ 2 = r n 4 π ( n/ 2) n/ 2 e − n/ 2 p 2 π ( n/ 2) ( n/ 2)! ≤ r n 4 π . Multiplying this by d , 2 ǫ and 1 2 (the pro bability that X is on the corre ct side of 0), we get o ur b o und. Notice also that this bound should b e nearly sharp if the p olynomial giving f is a pro duct of ter ms of the for m | X | 2 − r i for r i approximately n a nd spaced apar t by factors of (1 + ǫ ) 2 . W e ca n now prov e a b ound on a quantit y mo re r e lev ant to Gaussian surface area: Corollary 4. If f is an n dimensional, de gr e e d p olynomial thr eshold fu nction, ǫ > 0 and X and Y inde p endent Gaussians, then Pr ( f ( X ) 6 = f ( X + ǫY )) ≤ dǫ π + dǫ 2 4 r n π . Pr o of. W e let r = √ 1 + ǫ 2 , θ = a rctan( ǫ ), a nd le t Z = cos( θ ) X + sin( θ ) Y b e a normal random v ar iable. Note that X + ǫY = r Z . W e then ha ve that Pr( f ( X ) 6 = f ( X + ǫY )) ≤ P r( f ( X ) 6 = f ( Z )) + Pr( f ( Z ) 6 = f ( r Z )) . By Theorem 1 a nd Lemma 3 this is at most dθ π + d ( r − 1) r n 4 π ≤ dǫ π + dǫ 2 4 r n π . In particula r , w e relate this to Ga ussian surface ar e a by: Lemma 5 . If f is a b o ole an function with f − 1 (1) op en with smo oth b oun dary and Gaussian ar e a S , and if X and Y ar e indep endent Gaussians t hen, lim ǫ → 0 Pr ( f ( X ) = − 1 and f ( X + ǫY ) = 1) ǫ = S √ 2 π . (5) Pr o of. First note that if A = f − 1 (1), then lim ǫ → 0 GaussianV olume( A ǫ \ A ) ǫ = S rather tha n just the liminf being equal. Next note that since the pro bability that | ǫY | > ǫ 2 / 3 go es r apidly to 0 as ǫ → 0, w e can throw aw ay all cases where X is not within ǫ 2 / 3 of ∂ A from the left ha nd side o f Equation 5. When X is close to ∂ A a nd when f ( X ) = − 1 , we may approximate the probability that 5 f ( X + ǫ Y ) = 1 b y the probability that the comp onent of ǫ Y in the dir ection of the sho rtest path from X to A is more than d ( X, A ). Since ∂ A is smo oth, this approximation is accurate for X close to A , and in particular fo r X within ǫ 2 / 3 should introduce an e rror of O ( ǫ 4 / 3 ), which can b e igno red. Hence if Z is a nor malized one v ariable Gauss ian, the n umerator o f left hand side can b e replaced by Pr( ǫZ ≥ d ( X , A ) > 0) = Pr Z ≥ d ( X, A ) ǫ > 0 . This is eas ily seen to b e Z ∞ 0 1 √ 2 π e − x 2 / 2 Pr(0 < d ( X , A ) ≤ ǫx ) dx = Z ∞ 0 1 √ 2 π e − x 2 / 2 GV ol( A ǫx \ A ) dx = Z ∞ 0 1 √ 2 π e − x 2 / 2 S ǫx (1 + o (1)) dx = S ǫ √ 2 π + o ( ǫ ) . Thu s completing our pr o of. Pr o of of The or em 2. This fo llows immediately from Co r ollary 4 and Lemma 5 after noting that Pr( f ( X ) = − 1 , f ( X + ǫY ) = 1) ∼ 1 2 Pr( f ( X ) 6 = f ( X + ǫY )) ∼ dǫ 2 π . 4 Conclusion W e hav e shown near ly tigh t bounds on the Gaussia n surface area and nois e sensitivity o f po lynomial thres hold functions. O ne might hop e to generalize these results to w ork for other distributions, such as the uniform distr ibution on vertices of the hyp e r cub e. Unfortunately , several asp ects of this pro of ar e difficult to genera lize. Perhaps most s ignificantly , we lose the symmetry that allow ed us to pr ov e our o riginal result o n noise sensitivity . Another difficult y would be in the r e la tion b etw een no ise sensitivit y and surface ar ea. In our case the t wo ar e es s entially equiv a le n t quantities of study . On the other hand [6] defined a no tion of surface area for the hypercub e distribution and proved that for even linear threshold functions there could b e a g ap b etw een nois e sens itivity and surface ar ea of as muc h a s Θ( p log( n )). References [1] K. Ball, The R everse Isop erimetric Pr oblem for Gaussian Me asur e. Dis- cr ete and Computational Ge ometry , V ol. 10, pp. 4114 20, 1 993. 6 [2] I. Benjamini, G. Kalai and O. Schramm Noise sensitivity of Bo ole an func- tions and applic ations to p er c olation , Inst. Hautes Etudes Sci. Publ. Math., V ol. 90 , pp. 5-43 , 2 001. [3] I. Diakonik olas and P . Ra ghav e ndr a and R. Servedio and L .-Y. T an, Av- er age sensitivity and noise sensitivity of p olynomial thr eshold functions , Manuscript, av a ilable at http://arxiv.org/ abs/09 09.5011 , 2009. [4] C. Go tsman a nd N. Linial Sp e ctr al Pr op erties of Thr eshold F unctions , Combinatorica, V ol. 14 #1, pp. 35- 50, 1 994. [5] P . Harsha and A. Kliv ans and R. Mek a , Bounding the sensi- tivity of p olynomial thr eshold fun ctions , Man uscript, av ailable at ht tp://arxiv .org/a bs/0909.5175 , 20 09. [6] Adam R. Kliv a ns, Ry an O’Donnell, Rocco A. Servedio, L e arning Ge omet- ric Conc epts via Gaussian Su rfac e A r e a. FOCS 2008: 5 41-55 0 [7] M. Ledoux , Semigr oup pr o ofs of the isop erimetric ine quality in Euclide an and Gauss sp ac e , Bull. Sci. Math., V ol. 11 8 , pp. 48551 0, 19 94. [8] Y uv a l Peres Noise Stability of Weighte d Majority , Manuscript, av ailable at ht tp://arx iv .org/ a bs/math/04 12377 , 2 004. 7
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment