Efficiently Testing Sparse GF(2) Polynomials
We give the first algorithm that is both query-efficient and time-efficient for testing whether an unknown function $f: \{0,1\}^n \to \{0,1\}$ is an $s$-sparse GF(2) polynomial versus $\eps$-far from every such polynomial. Our algorithm makes $\poly(…
Authors: Ilias Diakonikolas, Homin K. Lee, Kevin Matulef
Efficiently T esting Sparse GF (2) Polyn omials Ilias Diakonikolas, Homin K. Lee, K e vin Matulef, Rocco A. Servedio, and Andre w W an { ilias,homin,roc co,atw12 } @cs.col umbia.edu, matulef@mit.e du Abstract. W e gi ve the first algorithm that is both query-efficient and time-efficient for testing whether an unkn o wn function f : { 0 , 1 } n →{ 0 , 1 } is an s -sparse GF (2) polynomial versus ǫ -far from ev ery such polynomial. Our algorithm makes p oly( s, 1 /ǫ ) black-box queries to f and runs in time n · p oly ( s, 1 /ǫ ) . The only pre vious algorithm for this testing problem [DLM + 07] used p oly ( s, 1 /ǫ ) queries, but had running time e xponen tial i n s and super-p olynomial in 1 /ǫ . Our approach significantly e xtends the “testing by implicit learning” methodol- ogy of [DLM + 07]. The learning component of that earlier work was a brute- force exhausti ve search ov er a concept class to find a hypothesis consistent with a sample of random examples. In this work, the learning component i s a sophis- ticated exact learning algorithm for sparse GF (2) polynomials du e to Schapire and Sellie [SS96]. A crucial element of this work, which enables us to simu- late the membership queries r equired by [SS96], is an analysis establishing new properties of how sparse GF (2) polynomials simplif y under certain restricti ons of “lo w-influence” sets of variables. 1 Intr oduction Background and motivation. Giv en black -box access to a n unknown f unction f : { 0 , 1 } n →{ 0 , 1 } , a natural question to ask is whether the f unction has a particular fo rm. Is it representab le by a small decision tree, or small circu it, or sp arse polyno mial? I n the field of comp utational learning theory , the standard appro ach to this pr oblem is to assume that f belongs to a specific class C of fu nctions of interest, an d the g oal is to identify or approx imate f . In contrast, in proper ty testing nothing is assumed about the unknown function f , and the goal o f the testing algor ithm is to ou tput “yes” with hig h probab ility if f ∈ C and “no ” with high probab ility if f is ǫ -far from every g ∈ C . (Here the d istance b etween two func tions f , g is measured with r espect to the unif orm distribution on { 0 , 1 } n , so f and g are ǫ -far if th ey disagree on m ore than an ǫ fraction of all inp uts.) The comp lexity of a testing algo rithm is measure d bo th in ter ms of the number of black-bo x q ueries it m akes to f ( query complexity ) as well as the time it takes to process the results of those queries ( time complexity ). There are many con nections b etween learnin g theor y and testing, and a growing body of work relating th e two fields (see [Ron 07] and its refer ences). T esting a lgorithms have be en given fo r a ran ge of different f unction classes such as linear fun ctions over GF (2) (i.e. p arities) [BLR93]; degree- d GF (2) polyno mials [AKK + 03]; Boolean liter- als, conju nctions, and s -term mo noton e DNF f ormulas [PRS02]; k -jun tas (i.e. fun ctions which depend on at most k variables) [FKR + 04]; halfspaces [MORS07]; and more. Recently , Diakonikolas et al. [DLM + 07] gave a g eneral techn ique, called “testing by imp licit learnin g, ” wh ich they used to test a variety of d ifferent f unction classes 2 that were not p reviously known to be te stable. Intuitively , these classes co rrespon d to function s with “concise r epresentation s, ” such as s - term DNFs, size- s Boo lean for mu- las, size- s Boolean circuits, an d s -sparse polyno mials over constant-size finite fields. For each o f these classes, the testing algo rithm of [D LM + 07] ma kes only po ly ( s, 1 /ǫ ) queries (indep endent of n ). The main drawback of th e [DLM + 07] testing algo rithm is its tim e com plexity . F or each of the classes me ntioned above, the algorithm’ s ru nning tim e is 2 ω ( s ) as a func- tion of s , and ω (poly (1 /ǫ )) as a functio n of ǫ . 1 Thus, a n atural question asked by [DLM + 07] is wh ether any o f these classes can be tested with both time complexity and query complexity poly ( s, 1 /ǫ ) . Our result: efficiently testing spar se GF (2) poly nomials. In this paper we fo cus o n the class of s - sparse polynomials over GF (2) . Polyno mials ov er GF (2) (equivalently , parities of ANDs of input variables) are a simple and well-studied representation for Boolean functions. It is well known that e very Boolean function has a unique represen- tation as a mu ltilinear polyno mial ov er GF (2) , so th e sparsity (n umber of mo nomials) of this po lynomial is a very natural measure of the co mplexity of f . Sparse GF (2) polyno mials hav e been stud ied by many autho rs from a ran ge of d ifferent perspectives such as learn ing [BS90,FS92,SS96,Bsh97a,BM02], app roximatio n and interpo lation [Kar89,GKS90,RB91], the com plexity of ( approx imate) counting [EK8 9,KL93,L VW93], and prop erty tes ting [DLM + 07]. The main result of th is paper is a testing algorith m for s -sparse GF (2) po lynom ials that is both time-efficient and q uery-e fficient: Theorem 1. Ther e is a poly ( s, 1 /ǫ ) -qu ery alg orithm with the following pe rformance guarantee: g iven parameters s, ǫ and black-box acce ss to an y f : { 0 , 1 } n →{ 0 , 1 } , it runs in time p oly( s, 1 /ǫ ) and tests whether f is a n s -sparse GF (2) po lynomial versus ǫ -far fr om every s -sparse polynomial. This answers the q uestion of [DLM + 07] by exhibitin g an interesting and natural class o f fu nctions with “concise rep resentations” that can b e tested efficiently , both in terms of query complexity and running time. W e obtain our m ain result by extending the “testing by implicit learning ” approach of [DLM + 07]. In that w ork the “implicit learning” step used a n aiv e brute-force s earch for a consistent h ypoth esis; in this paper we em ploy a sophisticated pro per learning al- gorithm due to Schapire and Sellie [SS96]. It is much more diffi cult to “implicitly” run the [ SS96] algo rithm than the brute-f orce search of [DLM + 07]. On e of the m ain te ch- nical contributions of this paper is a n ew structu ral theorem about how s -spar se GF (2) polyno mials are affected by c ertain carefu lly ch osen r estrictions; this is an essential ingredien t th at enables us to use the [SS96] algorithm. W e e laborate on this below . T echniques. W e begin with a brief r evie w of the main ideas of [DL M + 07]. The ap - proach of [ DLM + 07] builds on the ob servation of Go ldreich et al. [GGR98] that any 1 W e note that the algorithm also has a linear running time dependence on n , the number of input v ariables; this is in s ome sense ine vitable since th e algo rithm mu st set n bit v al ues just to pose a black-box query to f . Our algorithm has running time linear in n for the same reason. For the rest of the paper we discu ss the running time only as a function of s and ǫ. 3 pr op er lear ning alg orithm f or a function class C can be u sed as a testing algo rithm fo r C . (Recall that a proper learn ing alg orithm f or C is one which ou tputs a hypoth esis h that itself belo ngs to C .) The idea be hind this observation is th at if the fu nction f be - ing tested belo ngs to C then a pr oper learnin g algor ithm will su cceed in constru cting a hypoth esis that is close to f , while if f is ǫ - far f rom every g ∈ C then any hy pothesis h ∈ C th at the learnin g algorithm outp uts must necessarily be far from f . Th us any class C can be tested to a ccuracy ǫ using essentially the same number of quer ies that are required to prop erly learn the class t o accuracy Θ ( ǫ ) . The basic approa ch of [GGR98] did n ot y ield query -efficient testing algo rithms (with query comp lexity ind ependen t of n ) since virtua lly ev ery in teresting class of fu nc- tions over { 0 , 1 } n requires Ω (log n ) examples fo r pro per learn ing. Howe ver, [DLM + 07] showed that for many classes of f unction s defined by a size paramete r s , it is possible to “implicitly” run a (very naive) proper learning algorith m ov er a numbe r of variables that is in depend ent of n , and th us obtain an overall qu ery complexity indepen dent of n. More pr ecisely , they first observed that for m any classes C every f ∈ C is “ very close” to a functio n f ′ ∈ C for wh ich the number r of relev ant variables is polyn omial in s and inde penden t of n ; rough ly speakin g, th e relev ant variables f or f ′ are the variables that have high influence in f . (For examp le, if f is an s - sparse GF (2) po lynomial, an easy argument shows that there is a f unction f ′ – ob tained by discarding fr om f all mon omials o f degree m ore than log ( s/τ ) – that is τ -close to f and depends on at most r = s log ( s/τ ) variables.) They then showed how , using ideas of Fischer et a l. [FKR + 04] for testing juntas, it is possible to c onstruct a sample of un iform random ex- amples over { 0 , 1 } r which with high probability are all lab eled according to f ′ . At this point, the pro per learn ing algor ithm em ployed by [DLM + 07] was a naive br ute-for ce search. The algorithm tried all po ssible fun ctions in C over r (as opposed to n ) vari- ables, to see if any were consistent with the labeled sample. [DLM + 07] thus o btained a testing algorithm with ov erall query complexity p oly ( s/ǫ ) but whose running time w as dominated by the b rute-fo rce search. For the c lass of s -sp arse GF (2) poly nomials, their algorithm used ˜ O ( s 4 /ǫ 2 ) q ueries but had running time at least 2 ω ( s ) · (1 / ǫ ) log log (1 /ǫ ) . Current approach. The high-level idea of the cur rent work is to employ a much mo re sophisticated – and efficient – proper lear ning algor ithm than brute-force search. In p ar- ticular we would like to use a p roper learnin g alg orithm which, when app lied to lea rn a f unction over only r variables, run s in time polyno mial in r an d in the size par am- eter s . For the c lass of s -sparse GF (2) polyno mials, p recisely suc h an algor ithm was giv en by Schap ire and Sellie [SS96]. Their algor ithm, w hich we describe in Section 2. 1, is co mputation ally efficient and gene rates a hypo thesis h which is an s -sparse GF (2) polyno mial. But this po wer co mes at a price: the algorithm req uires a ccess to a member - ship query o racle, i.e. a blac k-box oracle f or the f unction being lea rned. Thu s, in ord er to run the Sch apire/Sellie algor ithm in th e “testing by imp licit learning” framework, it is necessary to simu late me mbership qu eries to a n ap proxim ating function f ′ ∈ C which is close to f but depen ds o n only r variables. This is sign ificantly more ch allenging than gen erating u niform random examples labeled accor ding to f ′ , which is all that is required in the original [DLM + 07] appro ach. T o see why m embership que ries to f ′ are mo re difficult to simu late than uniform random exam ples, recall that f and the f ′ described abov e (o btained fro m f by discard- 4 ing high-degree mono mials) ar e τ -close. Intuitively this i s extremely close, disagreeing only on a 1 / m fr action of inpu ts for an m that is m uch larger than the numb er of ran- dom examples requ ired for lea rning f ′ via brute- force search (this n umber is “small” – in depend ent of n – because f ′ depend s on o nly r variables). Thu s in the [DLM + 07] approa ch it suffices to use f , the fu nction to which we actually h av e black- box access, rather than f ′ to label the rand om examples used for learning f ′ ; since f and f ′ are so close, an d the examples are u niform ly rando m, with high prob ability all th e labels will also be co rrect for f ′ . Howe ver , in the m embership query scen ario of the cur rent paper, th ings are no longer that simp le. For any given f ′ which is clo se to f , o ne can no longer assume that the learnin g algorithm ’ s queries to f ′ are unif ormly distributed and hence unlikely to hit the er ror region – indeed, it is possible that the learnin g algorithm ’ s membersh ip q ueries to f ′ are clustered on the fe w inputs where f an d f ′ disagree. In order to successfully simulate member ship qu eries, we m ust somehow con sis- tently a nswer queries accordin g to a p articular f ′ , ev en tho ugh we only have oracle access to f . Moreover this mu st b e d one implicitly in a query -efficient way , since explic- itly identifyin g even a sing le variable relevant to f ′ requires at least Ω (log n ) queries. This is the main technical challenge in the paper . W e meet this ch allenge by showing tha t for any s - sparse poly nomial f , an ap prox- imating f ′ can be obtained as a restriction of f by setting cer tain carefully ch osen subsets of variables to zero . Roughly speaking, this restriction is obtained by randomly partitioning all of the input variables into r sub sets an d zero ing ou t all sub sets wh ose variables h av e small “collective influence” (more precisely , small variation in the sense of [FKR + 04]). It is importan t th at the restriction sets these v ariab les to zero, rathe r than a random assignm ent; intu iti vely this is b ecause setting a variable to z ero “kills” all monom ials that contain the variable, whereas setting it to 1 does no t. Our main tec hni- cal theorem (Theo rem 3, gi ven in Section 3) shows that this f ′ is indeed close to f and has at mo st on e of its r elev an t v ariables in each of the surviving subsets. W e mor eover show th at th ese relevant variables for f ′ all h av e hig h influence in f (the con verse is not true; examples can be gi ven which show that not every variable th at h as “high influ- ence” in f will in gen eral become a relev an t variable f or f ′ ). This property is impor tant in ena bling our simulatio n of membership quer ies. In additio n to th e cruc ial r ole that Theorem 3 p lays in the co mpleteness proo f for ou r test, we feel that the new insights the theorem gives into how sparse po lynomials “simp lify” under (appro priately defined ) random restrictions may be of indepen dent in terest. Organization. In Section 4, we present our testing algorithm , T est-Sparse-P oly , along with a high-level descrip tion and sketch o f cor rectness. I n Section 2.1 we d escribe in detail the “learning compo nent” of the algor ithm. In Sectio n 3 we state Theorem 3, which provid es intuition behin d the algo rithm and serves as the main tech nical tool in the com pleteness p roof. Due to sp ace limitations, the p roof of Theor em 3 is pr esented in Append ix A, w hile the com pleteness and soun dness pro ofs are given in Appen dices B and C, respectively (see fu ll version a vailable online). 2 Pr eliminaries and Backgroun d GF (2) Polynomials: A GF (2) po lynomial is a p arity of m onoton e conjun ctions (mo no- mials). It is s -spa rse if it contains a t mo st s mon omials (inclu ding the constant- 1 mono- 5 mial if it is present). The length of a mono mial is the number of distinct variables that occur in it; over GF (2 ) , this is simply its degree. Notation: For i ∈ N ∗ , den ote [ i ] def = { 1 , 2 , . . . , i } . It will be conv enient to view the output r ange o f a Boolean fun ction f as {− 1 , 1 } rather than { 0 , 1 } , i.e. f : { 0 , 1 } n → {− 1 , 1 } . W e view the hypercub e as a measur e sp ace en dowed with the unifo rm prod uct probab ility measur e. For I ⊆ [ n ] we denote by { 0 , 1 } I the set of all partial assignments to the coord inates in I . For w ∈ { 0 , 1 } [ n ] \ I and z ∈ { 0 , 1 } I , we write w ⊔ z to deno te the assignment in { 0 , 1 } n whose i -th coordin ate is w i if i ∈ [ n ] \ I an d is z i if i ∈ I . Whenever an element z in { 0 , 1 } I is ch osen r andomly (we den ote z ∈ R { 0 , 1 } I ), it is chosen with respect to the uniform measure on { 0 , 1 } I . Influence, V ariation and the Independence T est: Recall the classical notion of in- fluence [KKL8 8]: Th e influen ce of the i -th coor dinate on f : { 0 , 1 } n → {− 1 , 1 } is Inf i ( f ) def = Pr x ∈ R { 0 , 1 } n [ f ( x ) 6 = f ( x ⊕ i )] , where x ⊕ i denotes x with the i -th bit flipp ed. The f ollowing gen eralization of influence, the variation of a subset of the co ordina tes of a Boolean function , p lays an important role for us: Definition 1 (variatio n, [FKR + 04]). Let f : { 0 , 1 } n → {− 1 , 1 } , and let I ⊆ [ n ] . W e define the variation of f o n I as V r f ( I ) def = E w ∈ R { 0 , 1 } [ n ] \ I V z ∈ R { 0 , 1 } I [ f ( w ⊔ z )] . When I = { i } we will sometimes write V r f ( i ) instead of V r f ( { i } ) . It is easy to check tha t V r f ( i ) = Inf i ( f ) , so variation is indeed a g eneralization of influence. Intu- iti vely , th e variation is a measure of the ability o f a set of variables to sw ay a function’ s output. The following two simple properties of the variation will be useful for the anal- ysis of our testing algorithm: Lemma 1 (monot onicity and sub-additivity , [FK R + 04]). Let f : { 0 , 1 } n → {− 1 , 1 } and A, B ⊆ [ n ] . Th en V r f ( A ) ≤ V r f ( A ∪ B ) ≤ V r f ( A ) + V r f ( B ) . Lemma 2 (probability of detectio n, [FK R + 04]). Let f : { 0 , 1 } n → {− 1 , 1 } a nd I ⊆ [ n ] . I f w ∈ R { 0 , 1 } [ n ] \ I and z 1 , z 2 ∈ R { 0 , 1 } I ar e chosen indepe ndently , th en Pr[ f ( w ⊔ z 1 ) 6 = f ( w ⊔ z 2 )] = 1 2 V r f ( I ) . W e n ow recall the indep endence test from [FKR + 04], a simple two query test used to determine whether a function f is independ ent of a g i ven set I ⊆ [ n ] of coordin ates. Independence te st: Given f : { 0 , 1 } n → {− 1 , 1 } and I ⊆ [ n ] , choo se w ∈ R { 0 , 1 } [ n ] \ I and z 1 , z 2 ∈ R { 0 , 1 } I indepen dently . Accept if f ( w ⊔ z 1 ) = f ( w ⊔ z 2 ) an d reject if f ( w ⊔ z 1 ) 6 = f ( w ⊔ z 2 ) . Lemma 2 implies that the ind epende nce test rejects with probab ility exactly 1 2 V r f ( I ) . Random Partitions: Th rough out the paper we will use the following notion of a ran- dom partition of the set [ n ] o f input coord inates: Definition 2. A r andom partition of [ n ] into r subsets { I j } r j =1 is constructed by in de- penden tly assigning each i ∈ [ n ] to a randomly chosen I j for some j ∈ [ r ] . 6 W e now define the notio n of low- and high- variation sub sets with respec t to a p artition of the set [ n ] and a parameter α > 0 . Definition 3. F or f : { 0 , 1 } n →{− 1 , 1 } , a pa rtition o f [ n ] into { I j } r j =1 and a p aram- eter α > 0 , d efine L ( α ) def = { j ∈ [ r ] | V r f ( I j ) < α } (lo w-variation sub sets) an d H ( α ) def = [ r ] \ L ( α ) ( high-va riation subsets). F or j ∈ [ r ] and i ∈ I j , if V r f ( i ) ≥ α we say that the variable x i is a high-variation elem ent of I j . Finally , the notion of a well-structur ed subset will b e important for us: Definition 4. F or f : { 0 , 1 } n → { − 1 , 1 } and p arameters α > ∆ > 0 , we sa y tha t a subset I ⊆ [ n ] of co or dinates is ( α, ∆ ) -well structured if there is an i ∈ I such th at V r f ( i ) ≥ α and V r f ( I \ { i } ) ≤ ∆ . Note that since α > ∆ , by monotonicity , the i ∈ I in the above defin ition is uniqu e. Hence, a well-structure d subset contain s a single hig h-influen ce co ordinate , while th e remaining coordina tes have small total variation. 2.1 Background on Schapir e and Sellie’ s algo rithm. In [SS96] Schap ire an d Sellie gave an algor ithm, which we re fer to as LearnPoly , for exactly learning s -sp arse GF (2) polyno mials using memb ership queries (i.e. black - box q ueries) and eq uiv alen ce quer ies. Their algorith m is pr op er ; this mean s that every equiv alence q uery t he algo rithm makes (in cluding the final hypo thesis of the algorithm) is an s -sparse po lynomial. (W e shall see that it is indee d cru cial fo r ou r purp oses that the algorithm is pro per .) Recall tha t in an equivalence query the lear ning algor ithm propo ses a hy pothesis h to th e o racle: if h is lo gically equiv alent to the target fu nction being lear ned then the respon se is “co rrect” and learning ends successfully , otherwise the respo nse is “no ” and the lear ner is g iv en a counterexample x such that h ( x ) 6 = f ( x ) . Schapire and Sellie proved the following about their algorithm: Theorem 2. [[SS96], Theorem 1 0] Algo rithm LearnP oly is a pr oper exact learn ing algorithm for the cla ss of s -sp arse GF (2) po lynomials over { 0 , 1 } n . The alg orithm runs in p oly ( n, s ) time and ma kes at most p oly ( n, s ) membership q ueries an d a t most ns + 2 equivalen ce q ueries. W e ca n easily also charac terize the beha vior of LearnPoly if it is run on a fun ction f that is not an s -sparse polynom ial. In this case, since the algorithm is p roper all of its equiv alence q ueries h av e s -spar se poly nomials as their h ypotheses, and consequen tly no equi valence query will ev er b e an swered “correct. ” So if the ( ns + 2 ) -th equiv alen ce query is not answered “correct, ” the alg orithm m ay in fer that th e target fu nction is no t an s -sparse polyno mial, and it r eturns “not s -sparse. ” A well-known result due to Angluin [Ang88] says that in a Probably Ap proxima tely Correct or P A C setting (where t here is a distribution D over examp les an d the g oal i s to construct an ǫ - accurate hyp othesis with r espect to that distribution), equiv alence que ries can be straightfo rwardly simu lated using random examples. Th is is done simp ly by drawing a sufficiently large sample of r andom examples for each equivalence q uery 7 and ev a luting b oth the hy pothesis h and the target function f on e ach po int in th e sample. T his either yields a counter example (which simulates an equiv alen ce qu ery), or if no counterexample is obtained the n simple arguments show that f or a large enou gh ( O (lo g(1 / δ ) /ǫ ) -size) sample, with p robability 1 − δ th e function s f and h m ust be ǫ -close un der th e distribution D , wh ich is the success criter ion for P AC learnin g. T his directly giv es the follo wing corollary of Theorem 2: Corollary 1. Ther e is a u niform d istribution membership que ry pr oper lea rning alg o- rithm, which we ca ll LearnP o ly ′ ( s, n, ǫ, δ ) , which makes Q ( s, n, ǫ, δ ) def = p oly ( s, n, 1 / ǫ , log(1 /δ )) membership queries an d runs in p o ly( Q ) time to learn s -sparse polyn omials over { 0 , 1 } n to accuracy ǫ an d confidence 1 − δ u nder the uniform distrib ution. 3 On re strictions which simplify sparse polynomial s This section pr esents Theorem 3, wh ich gives the intuition b ehind our testing algo rithm, and lies at the h eart o f the complete ness proof . W e give the fu ll proof of Th eorem 3 in Append ix A (see the full version). Roughly speaking, the theor em says the following: co nsider any s -sp arse GF (2) polyno mial p . Suppose th at its co ordinates are ran domly partitioned into r = p oly( s ) many subsets { I j } r j =1 . Th e first two statemen ts say that w .h.p . a ran domly chosen “threshold value” α ≈ 1 / p oly( s ) will have the pro perty that no single coor dinate i , i ∈ [ n ] , or subset I j , j ∈ [ r ] , has V r p ( i ) o r V r p ( I j ) “to o close” to α . Mo reover , the high-variation subsets (w .r . t. α ) are precisely those that co ntain a single high variation element i (i.e. V r p ( i ) ≥ α ), and in fact each such subset I j is well- structured (part 3). Also, the n umber of su ch high- variation subsets is small (part 4) . Finally , let p ′ be the restriction of p obtain ed by setting all variables in the lo w-variation su bsets to 0 . T hen, p ′ has a nice stru cture: it has at most on e relev ant variable p er high -variation subset (part 5), and it is close to p ( part 6). Theorem 3. Let p : { 0 , 1 } n →{− 1 , 1 } be an s -spa rse p olynomia l. F ix τ ∈ (0 , 1) an d ∆ such that ∆ ≤ ∆ 0 def = τ / (1600 s 3 log(8 s 3 /τ )) and ∆ = po ly( τ /s ) . Let r def = 4 C s/ ∆ , for a suita bly lar ge constant C . Let { I j } r j =1 be a random p artition o f [ n ] . Cho ose α uniformly at random f r om the set A ( τ , ∆ ) def = { τ 4 s 2 + (8 ℓ − 4) ∆ : ℓ ∈ [ K ] } where K is the lar gest integer such that 8 K ∆ ≤ τ 4 s 2 . Then with pr oba bility at least 9 / 10 (over the choice of α an d { I j } r j =1 ), all of the following statements hold: 1. Every variable x i , i ∈ [ n ] , has V r p ( i ) / ∈ [ α − 4 ∆, α + 4 ∆ ] . 2. Every subset I j , j ∈ [ r ] , has V r p ( I j ) / ∈ [ α − 3 ∆, α + 4 ∆ ] . 3. F or every j ∈ H ( α ) , I j is ( α, ∆ ) -well structur ed. 4. | H ( α ) | ≤ s log(8 s 3 /τ ) . Let p ′ def = p | 0 ←∪ j ∈ L ( α ) I j (the r estriction obtained by fixing all variables in low-variation subsets to 0 ). 5. F or every j ∈ H ( α ) , p ′ has at most one r elevant variable in I j (hence p ′ is a | H ( α ) | -junta) . 8 Algorithm T est-S parse-Poly ( f , s, ǫ ) Input: Bl ack-box access to f : { 0 , 1 } n →{− 1 , 1 } ; sparsity pa rameter s ≥ 1 ; error parameter ǫ > 0 Output: “yes” if f is an s -sparse GF (2) polynomial, “no” i f f i s ǫ -far from ev ery s -sparse GF (2) polynomial 1. Let τ = Θ ( ǫ ) , ∆ = Θ ( p oly( τ , 1 /s )) , r = Θ ( s/ ∆ ) , δ = Θ ( p oly( τ , 1 /s )) . a 2. Set { I j } r j =1 to be a random partition of [ n ] . 3. Choose α uniformly at random from the set A ( τ , ∆ ) def = { τ 4 s 2 + (8 ℓ − 4) ∆ : 1 ≤ ℓ ≤ K } where K is the largest intege r such that 8 K ∆ ≤ τ 4 s 2 . 4. For each subset I 1 , . . . , I r run the independen ce test M def = 2 ∆ 2 ln(200 r ) times and let f V r f ( I j ) denote 2 × ( fraction of the M runs on I j that the test rejects ) . If any subset I j has f V r f ( I j ) ∈ [ α − 2 ∆, α + 3 ∆ ] then exit and return “ no , ” otherwise continue. 5. Let e L ( α ) ⊆ [ r ] denote { j ∈ [ r ] : f V r f ( I j ) < α − 2 ∆ < α } and let e H ( α ) denote [ r ] \ e L ( α ) . Let e f ′ : { 0 , 1 } n →{− 1 , 1 } denote the function f | 0 ←∪ j ∈ e L ( α ) I j . 6. Draw a sample of m def = 2 ǫ ln 12 uniform r andom examples from { 0 , 1 } n and ev aluate both e f ′ and f on each of these examples. If f and e f ′ disagree on a ny of the m examples then exit and return “ no . ” If the y agree on all examp les then continue. 7. Run the learning algorithm LearnP oly ′ ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100) from [SS96] using SimMQ ( f , e H ( α ) , { I j } j ∈ e H ( α ) , α, ∆, z , δ /Q ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100)) t o simulate each membership query on a string z ∈ { 0 , 1 } | e H ( α ) | that LearnPoly ′ makes. If LearnPoly ′ returns “not s -sparse” then exit and r eturn “ no . ” Otherwise the algorithm terminates successfully; in this case return “ yes . ” a More precisely , we set τ = ǫ/ 600 , ∆ = min { ∆ 0 , ` τ / 8 s 2 ´` δ / ln(2 /δ ) ´ } , r = 4 C s/ ∆ (for a suitable constant C from Theorem 3), where ∆ 0 def = τ / ` 1600 s 3 log( 8 s 3 /τ ) ´ and δ def = 1 / “ 100 s log(8 s 3 /τ ) Q ` s, s log (8 s 3 /τ ) , ǫ/ 4 , 1 / 100 ´ ” Fig. 1 . The algorithm T est-Sparse-Poly . 6. The function p ′ is τ -close to p . Theorem 3 natur ally sugg ests a testing alg orithm, wh ereby we attempt to parti- tion the coordinates of a function f into “high- variation” subsets and “low-variation” subsets, then zero- out the v ariables in low-v ar iation subsets and implicitly learn the re- maining fun ction f ′ on on ly p oly ( s, 1 /ǫ ) many variables. This is exactly the app roach we take in the next s ection. 4 The testing algorithm T est-Sparse-P oly In this sectio n we p resent o ur main testing algo rithm and give high- lev el sketches of the arguments establishing its c ompletene ss and soundn ess. The a lgorithm, which is called T est-Spar se-Poly , takes a s input the values s , ǫ > 0 and black- box access to f : { 0 , 1 } n →{− 1 , 1 } . It is presented in full in Figure 1. 9 Algorithm Set-High-Influ ence-V ariable ( f , I , α, ∆, b, δ ) Input: Black-box access to f : { 0 , 1 } n →{− 1 , 1 } ; ( α, ∆ ) -well-structured set I ⊆ [ n ] ; bit b ∈ { 0 , 1 } ; failure parameter δ . Output: assignment w ∈ { 0 , 1 } I to t he variables in I such that w i = b with probability 1 − δ 1. Draw x uniformly from { 0 , 1 } I . Defi ne I 0 def = { j ∈ I : x j = 0 } and I 1 def = { j ∈ I : x j = 1 } . 2. Apply c = 2 α ln( 2 δ ) iterations of the independence test to ( f , I 0 ) . If any of t he c itera- tions reject, mark I 0 . Do the same for ( f , I 1 ) . 3. If both or neither of I 0 and I 1 are marked, stop and output “fail”. 4. If I b is mark ed then return the assignment w = x . Ot herwise return the assignment w = x (the bitwise ne gation of x ). Fig. 2 . The subroutin e Set -High-Influence-V ariable . Algorithm SimMQ ( f , H , { I j } j ∈ H , α, ∆, z , δ ) Input: Bl ack-box access to f : { 0 , 1 } n →{− 1 , 1 } ; sub set H ⊆ [ r ] ; disjoint subsets { I j } j ∈ H of [ n ] ; parameters α > ∆ ; string z ∈ { 0 , 1 } | H | ; failure prob ability δ Output: bit b which, with probability 1 − δ is the value of f ′ on a random assignment x in which each high-v ari ation variable i ∈ I j ( j ∈ H ) is set according to z 1. For each j ∈ H , call Set-High-Influence-V ariable ( f , I j , α, ∆, z j , δ / | H | ) and get back an assignment (call it w j ) to the v ariables in I j . 2. Construct x ∈ { 0 , 1 } n as follows: for each j ∈ H , set the va riables in I j according to w j . This defines x i for all i ∈ ∪ j ∈ H I j . Set x i = 0 for all other i ∈ [ n ] . 3. Return b = f ( x ) . Fig. 3. The subr outine SimM Q . The first thing T est-Sparse-P oly d oes (Step 2) is ran domly partition the coordin ates into r = ˜ O ( s 4 /τ ) subsets. I n Steps 3 an d 4 the alg orithm attempts to disting uish su bsets that contain a high-in fluence variable from subsets that do not; th is is done by using the indepen dence test to estimate the variation o f each subset (see Lemma 2). Once th e high-variation a nd low-variation sub sets h av e been iden tified, in tuitiv ely we would like to fo cus ou r atten tion o n the high -influence variables. Thus, Step 5 o f the algo rithm defines a function e f ′ which “zero es o ut” all o f the variables in all low- variation s ubsets. Step 6 of T est-Sparse-Poly checks that f is clo se to e f ′ The final step of T est-Sparse-Poly is to run the algorithm Learn Poly ′ of [SS96] to learn a sparse polyn omial, wh ich we call e f ′′ , which is isom orphic to e f ′ but is de fined only over th e hig h-influen ce variables of f (recall tha t if f is ind eed s -spar se, ther e is at most o ne f rom each hig h-variation subset). Th e overall T est-Sparse-Poly a lgorithm accepts f if and on ly if LearnP oly ′ successfully returns a final hy pothesis ( i.e. does not halt an d outp ut “fail”). The member ship queries that the [SS96] algo rithm requir es are simulated using the SimMQ pr ocedure , wh ich in turn uses a sub routine called Set- High-Influence-V a riables . 10 The proced ure Set-High-Influence-V a riable ( SHIV ) is presented in Figure 2. Th e idea o f this proced ure is that when it is r un on a well-structured sub set of variables I , it retur ns an a ssignment in which the high-variation v ariable is set to the desired bit value. In tuitively , the executions of the indep endenc e test in the pro cedure a re used to determine wh ether the h igh-variation variable i ∈ I is set to 0 o r 1 under the as signment x . Depending on whether this setting agr ees with the desired value, the algorithm either returns x or the bitwise negation of x ( this is sligh tly different fro m Construct-Sample , the an alogous subro utine in [D LM + 07], wh ich is conte nt with a rand om x an d thu s never need s to negate co ordina tes). Figure 3 g i ves the SimMQ proce dure. When ru n on a fu nction f and a collection { I j } j ∈ H of disjoint well-structured s ubsets of v ariab les, SimMQ takes as in put a string z of length | H | which s pecifies a desired setting for each high-variation variable in ea ch I j ( j ∈ H ). SimMQ constructs a ran dom assignmen t x ∈ { 0 , 1 } n such that the high - variation variable in ea ch I j ( j ∈ H ) is set in the desired way in x , and it r eturns the value f ′ ( x ) . 4.1 Time and Query Complexity of T est-Spar se-Poly As stated in Figu re 1, th e T est-Sparse-P oly algor ithm runs LearnPoly ′ ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100) using SimMQ ( f , e H ( α ) , { I j } j ∈ e H ( α ) , α, ∆, z , 1 / (10 0 Q ( s, | e H ( α ) | , z , 1 / 100 ))) to simulate each m embership query on an input string z ∈ { 0 , 1 } | e H ( α ) | . Thus the algo - rithm is b eing run over a domain of | e H ( α ) | variables. Since we c ertainly have | e H ( α ) | ≤ r ≤ po ly( s, 1 ǫ ) , Corollary 1 g iv es that LearnPoly ′ makes at most po ly( s, 1 ǫ ) many calls to SimMQ . From this point, by in spection of SimMQ , SHIV an d T est-Spar se-Poly , it is straigh tforward to verify that T e st-Sparse-Poly indeed makes p oly ( s, 1 ǫ ) many queries to f a nd ru ns in time p oly ( s, 1 ǫ ) as c laimed in T heorem 1. T hus, to p rove The- orem 1 it remains only to establish correctness of the test. 4.2 Sketch of completeness The main tool behind ou r com pleteness argume nt is Theorem 3. Suppose f is indeed an s -sparse polyn omial. Then Theorem 3 guarantees that a randomly chosen α will w .h.p. yield a “gap ” such that subsets with a h igh-influ ence v ariable have v a riation above the gap, and subsets with n o high-in fluence variable have variation below the gap. This means that the estima tes of each subset’ s variation (obtain ed by the algorithm in step 4) are accu rate e nough to e ffecti vely separate the hig h-variation subsets from the low- variation o nes in step 5. Thus, th e fu nction e f ′ defined by the algor ithm will w .h.p be equal to the function p ′ from Theo rem 3. Assuming that f is an s -sparse p olynom ial (and th at e f ′ is eq ual to p ′ ), Th eorem 3 additionally imp lies th at the fu nction e f ′ will be close to the o riginal function (so Step 6 will pass), that e f ′ only depends on p oly( s, 1 /ǫ ) m any variables, and that all of th e subsets I j that “survive” into e f ′ are well-stru ctured. As we sh ow in Appen dix B, this condition is sufficient to en sure th at SimMQ can successfully simulate memb ership queries to e f ′′ . Thu s, for f an s -sparse poly nomial, the LearnP o ly ′ algorithm can r un successfully , and the test will accept. 11 4.3 Sketch of soundness Here, we briefly argue th at if T est-Sparse- Poly accep ts f with high probability , then f m ust be close to some s -sparse p olyno mial (we g iv e the f ull proof in Ap pendix C). Note tha t if f p asses Step 4 , then T est -Sparse-Poly m ust have obtain ed a partition of variables into “h igh-variation” subsets and “low-variation” subsets. If f p asses Step 6, then it must mor eover be the c ase that f is close to the f unction e f ′ obtained by zeroing out the low-v ariation s ubsets. In the last step, T est-Sparse-Poly attemp ts to r un the LearnPoly ′ algorithm using e f ′ and the high-variation su bsets; in th e course of d oing this, it makes calls to SimM Q . Since f could be an ar bitrary function, we do not know whether ea ch high- variation subset ha s at most one variable relev an t to e f ′ (as would be the case, by Theo rem 3, if f were an s -sparse polyno mial). However , we are able to show (Lem ma 11) that, if with high pro bability all calls to the SimMQ r outine are an swered withou t its ever returnin g “fail, ” th en e f ′ must be close to a junta g whose re lev an t variables ar e the in- dividual “hig hest-influen ce” variables in each of the high-variation subsets. No w , given that LearnPoly ′ halts successfu lly , it must b e the case that it constru cts a final hy poth- esis h that is itself an s -spa rse po lynomia l and that ag rees with many calls to SimMQ on rand om examples. L emma 12 states that, in this event, h must be close to g , h ence close to e f ′ , and hence close to f . 5 Conclusion and futur e direct ions An o bvious question r aised b y o ur work is wheth er similar method s can b e u sed to ef- ficiently test s -spa rse po lynomials over a gene ral finite field F , with qu ery and time complexity poly nomial in s , 1 /ǫ , an d | F | . The basic algorith m of [DLM + 07] uses ˜ O (( s | F | ) 4 /ǫ 2 ) q ueries to test s -sparse polyno mials over F , but h as r unning time 2 ω ( s | F | ) · (1 /ǫ ) log log(1 /ǫ ) (arising, as discussed in Section 1, f rom brute-f orce search for a con - sistent hy pothesis.). O ne migh t hop e to impr ove that algo rithm by u sing techn iques from the curren t paper . Howe ver , doin g so re quires an algo rithm for proper ly learn- ing s - sparse p olynom ials over g eneral finite fields. T o the b est o f our knowledge, th e most efficient algorithm for do ing this (given only b lack-box access to f : F n → F ) is the algorithm of Bshouty [Bsh97b] which requir es m = s O ( | F | l og | F | ) log n quer ies and ru ns in p o ly( m, n ) time. ( Other learn ing algorith ms are kn own which do n ot have this expo nential dep endenc e on | F | , but they either require evaluating the polynom ial at complex roots of un ity [Man95] or on inpu ts b elonging to an extension field of F [GKS90,Kar89].) I t would be interesting to k now whe ther there is a testing a lgorithm that simultan eously achieves a po lynom ial r untime (an d hence que ry comp lexity) de- penden ce o n both the size parameter s and the cardinality of the field | F | . Another g oal f or f uture work is to apply our meth ods to othe r classes b eyond just polyno mials. Is it po ssible to combin e the “testing by implicit learning” approac h of [DLM + 07] with o ther mem bership-q uery-b ased learnin g algorithms, to achieve time and query effi cient testers for other natural classes? 12 Refer ences [AKK + 03] N. Alon, T . Kaufman, M. Kriv ele vich, S. Litsyn, and D. Ron. T esting low-de gree polynomials ov er GF(2). In P r oc. RANDOM , pages 188–199, 2003. [Ang88] D. Angluin. Queries and concept learning. Machine Learning , 2:319 –342, 1988. [BLR93] M. B lum, M. Luby , a nd R. Rubinfeld. Self-testing/correcting with applications to nu- merical problems. J. Comp. Sys . Sci. , 47:549–595, 1993 . Earlier version in STOC’90 . [BM02] N. Bshouty and Y . Mansour . Simple Learning Algorithms for Decision T rees and Multi v ariate Polynomials. SIAM J . Comput. , 31(6):190 9–1925, 2002. [BS90] A. Blum and M. Singh. Learning functions of k terms. In Pr oceedings of the 3rd Annual W orksho p on Computational Learnin g T heory (COLT) , pages 144–1 53, 1990 . [Bsh97a] N. Bshouty . On learning multiv ariate polynomials under the uniform distribution. Information Pr ocessing Lett ers , 61 (3):303–30 9, 1997. [Bsh97b] N. Bshouty . Simple learning algorithms using divide and conquer . Computational Complexity , 6:1 74–194, 1997. [DLM + 07] I. Diakonik olas, H. Lee, K. Matulef, K. Onak, R. Rubinfeld, R. S ervedio , and A. W an. T esting for concise representations. In Pr oc. 48th Ann. Symposium on Computer Science (FOCS) , pages 549–55 8, 2007. [EK89] A. Ehrenfeucht and M. Karpinski. The computational complexity of (xor , and)- counting problems. T echnical report, p reprint, 1989. [FKR + 04] E. Fischer , G. Kindler , D. Ron, S. Safra, and A. Samorodnitsky . T esting juntas. J ournal of Computer & System Sciences , 68:753–7 87, 2004. [FS92] P . Fischer and H.U. S imon. On learning ring-sum exp ansions. SIAM Journ al on Computing , 21(1):181 –192, 1992. [GGR98] O. Goldreich, S. Go ldwaser , and D. Ron. P roperty testing a nd its connection to learn- ing and approx imation. Journ al of the ACM , 45:653 –750, 1998. [GKS90] D. Grigoriev , M. Karpinski, and M. Singer . Fast parallel algorithms for sparse mul- tiv ariate polynomial interpolation ov er finite fi elds. SIAM J ournal on C omputing , 19(6):1059– 1063, 1990. [Kar89] M. Karpinski. Boolean circuit complex ity of algebraic interpolation problems. (T R- 89-027), 1989. [KKL88] J. Kahn, G. Kalai, and N. Linial. The influence of variab les on boolean functions. In Pr oc. 29th FOCS , pages 6 8–80, 1988. [KL93] M. Karpinski and M. Luby . Approximating the Number of Zeros of a GF [2] P oly- nomial. Journal of Algorithms , 14:280 –287, 1993. [L VW93] Michael L uby , Boban V eli cko vic, and A vi Wigderson. Deterministic approx imate counting of depth-2 circuits. In Proce edings of the 2nd ISTCS , pages 18–24, 1993. [Man95] Y . Mansour . Randomized interpolation and approximation of sparse polynomials. SIAM Jo urnal on Computing , 24(2):357–36 8, 1995. [MORS07] K. Matulef, R. O’Donn ell, R. Ru binfeld, a nd R. Servedio. T esting Halfspaces. T ech- nical Report 128, Electronic Colloquium in Computationa l Complexity , 2007. [MR95] R. Motw ani and P . Ragha van. Randomized Algorithms . Cambridge Univ ersity Press, Ne w Y ork, NY , 1995. [PRS02] M. Parnas, D. Ron, and A. Samorodnitsky . T esting basic boolean formulae. SIAM J. Disc. Math. , 16:20–46 , 2002. [RB91] R. Roth and G. Benedek. Interpolation and approximation of sparse multiv ariate polynomials ov er GF (2) . SIAM J . C omput. , 20(2):291–3 14, 1991. [Ron07] D. Ron. Property testing: A learning theory perspectiv e. COL T 2007 In vited T alk, slides av ailable at http://www .eng.tau.ac.il/ danar/Public-ppt/colt07.ppt, 2007. [SS96] R. Schapire and L. Sellie. Learning sparse multiv ariate polynomials ov er a field with queries and countere xamples. J. Comput. & Syst. Sci. , 52(2):201–2 13, 1996. 13 A Proof of Theor em 3 In Section A.1 we pr ove some usefu l prelim inary lemm as a bout the variation of indi- vidual variables in sparse poly nomials. In Section A.2 we extend this analysis to get high-p robability statements about v ariation of subsets { I j } r j =1 in a random p artition. W e p ut the pieces together to finish the proof of Theorem 3 in Section A.3. Throu ghout th is section the p arameters τ , ∆ , r an d α ar e all as defin ed in T heo- rem 3. A.1 The influence of variables in s -sparse polynomials W e start with a simple lemma stating that only a small nu mber of variables can hav e large v ar iation: Lemma 3. Let p : { 0 , 1 } n →{− 1 , 1 } b e an s -sparse po lynomial. F or any δ > 0 , there ar e at most s log(2 s/δ ) many variables x i that have V r p ( i ) ≥ δ. Pr oof. Any variable x i with V r p ( i ) ≥ δ m ust occur in some term of length at mo st log(2 s/δ ) . (Oth erwise each occu rrence of x i would contribute less than δ / s to the variation o f the i -th coor dinate, and sinc e th ere are at most s term s this would im - ply V r p ( i ) < s · ( δ / s ) = δ .) Since a t most s lo g(2 s/δ ) distinct variables can occu r in terms of length at most log(2 s/ δ ) , the lemma follows. Lemma 4. W ith pr obab ility at least 96 / 1 00 over th e choice of α , n o va riable x i has V r p ( i ) ∈ [ α − 4 ∆, α + 4 ∆ ] . Pr oof. The uniform random variable α has suppo rt A ( τ , ∆ ) of size no less than 50 s log(8 s 3 /τ ) . Each possible value of α d efines the interval of variations [ α − 4 ∆, α + 4 ∆ ] . Note that α − 4 ∆ ≥ τ / (4 s 2 ) . In o ther words, the only variables w hich could lie in [ α − 4 ∆, α + 4 ∆ ] are those with variation at least τ / (4 s 2 ) . By Lemm a 3 there are at most k def = s log(8 s 3 /τ ) such candidate variables. Since we h ave at least 50 k intervals (two consecutive such intervals overlap at a single point) a nd a t most k can didate variables, by the pigeonh ole principle, at lea st 48 k intervals wil l be empty . Lemma 3 is based on the observation that, in a s parse polyn omial, a variable with “h igh” influence (v ariation) must occur in some “sho rt” term. The fo llowing lemma is in some sense a quantitative converse: it states that a variable with “small” influence can only appear in “long” terms. Lemma 5. Let p : { 0 , 1 } n →{− 1 , 1 } be an s -sp arse po lynomial. S uppose t hat i is such that V r p ( i ) < τ / ( s 2 + s ) . Then the variable x i appea rs only in terms of length gr e ater than log( s/τ ) . Pr oof. By co ntradiction . Assuming that x i appears in som e ter m of leng th a t most log( s/τ ) , we will show th at V r p ( i ) ≥ τ / ( s 2 + s ) . L et T be a shor test term th at x i ap- pears in . The functio n p can be uniqu ely deco mposed as follows: p ( x 1 , x 2 , . . . , x n ) = x i · ( T ′ + p 1 ) + p 2 , whe re T = x i · T ′ , the term T ′ has len gth less th an lo g( s/τ ) and 14 does not depen d o n x i , and p 1 , p 2 are s -sparse polyn omials that do no t dep end on x i . Observe that since T is a sho rtest term that contains x i , the p olyno mial p 1 does not contain the constant term 1 . Since T ′ contains fewer tha n lo g( s/τ ) many variables, it evaluates to 1 on a t least a τ /s fraction of all inp uts. The par tial assign ment th at sets all the variables in T ′ to 1 induces an s -sparse po lynomial p ′ 1 (the restriction of p 1 accordin g to the pa rtial assignment). Now o bserve that p ′ 1 still does not conta in the constan t te rm 1 (fo r since each ter m in p 1 is of length at least th e length of T ′ , no term in p 1 is a subset of the variables in T ′ ). W e now recall the following (nontrivial) resu lt of Karpinski and Luby [KL93]: Claim ([KL93], Cor o llary 1). Let g be an s -spar se m ultiv aria te GF (2) polyn omial which does not con tain the c onstant- 1 term. Then g ( x ) = 0 for at least a 1 / ( s + 1) fraction of all inputs. Applying this coro llary to the poly nomial p ′ 1 , we have that p ′ 1 is 0 on at least a 1 / ( s + 1) fractio n of its in puts. Therefor e, the po lynomial T ′ + p 1 is 1 on at least a ( τ /s ) · 1 / ( s + 1) fraction o f all inp uts in { 0 , 1 } n ; th is in turn implies that V r p ( i ) ≥ ( τ /s ) · 1 / ( s + 1 ) = τ / ( s 2 + s ) . By a simple ap plication of Lemma 5 we can show that setting low-v ariation vari- ables to zero does not change the polyno mial b y much: Lemma 6. Let p : { 0 , 1 } n →{− 1 , 1 } be an s -sparse po lynomial. Let g be a function obtained fr om p by setting to 0 some subset of variables all o f which have V r p ( i ) < τ / (2 s 2 ) . Then g and p are τ - close. Pr oof. Setting a variable to 0 removes all the terms th at co ntain it from p . By Lemma 5, doing this only removes terms o f length grea ter than log ( s/τ ) . Removing one suc h term changes the fun ction on at mo st a τ /s fraction of the inputs. Sinc e there are at most s terms in total, the lemma follows b y a union boun d. A.2 Partitioning variables into random s ubsets The following lemma is at the hea rt of T heorem 3. Th e lemma states that when w e ran- domly partition the v ariables (coord inates) into subsets, ( i ) each subset gets at mo st one “high-in fluence” variable (the term “h igh-influ ence” here means r elativ e to an app ro- priate threshold value t ≪ α ), an d ( ii ) the remaining (low-influence) variables (w .r .t. t ) have a “very small” contribution to the subset’ s total v a riation. The first part of the lemma follows easily f rom a b irthday– parado x typ e argum ent, since there are many more subsets than high-influ ence variables. As intuition for th e second part, we n ote t hat in e xpectation , the total v ariation of each subset is very small. A mor e caref ul argum ent lets us argue that the to tal con tribution of th e low-influence variables in a giv en subset is unlikely to highly e xceed its expectation. Lemma 7. F ix a valu e of α satisfying th e first statement o f Th eor em 3. Let t def = ∆τ / (4 C ′ s ) , wher e C ′ is a suitably la r ge con stant. Then with p r oba bility 99 / 100 over the rand om partition the following statements hold true: 15 – F or every j ∈ [ r ] , I j contains at most one variable x i with V r p ( i ) > t . – Let I ≤ t j def = { i ∈ I j | V r p ( i ) ≤ t } . Then, for all j ∈ [ r ] , V r p ( I ≤ t j ) ≤ ∆ . Pr oof. W e sh ow that each statemen t of the lemma fails indep endently with pro bability at most 1 / 200 fr om which the lemma follows. By Lemma 3 ther e a re at mo st b = s log (2 s/t ) coor dinates in [ n ] with variation more than t . A stand ard argum ent yield s that the probab ility there exists a subset I j with more th an on e such variable is at most b 2 /r . It is easy to verify tha t this is less than 1 / 2 00 , as long as C is large enou gh relative to C ′ . The refore, with prob ability at least 199 / 200 , every subset contains at most one variable with variation greater tha n t . So the first statement fails with probability no more than 1 / 200 . Now for the secon d statement. Con sider a fixed sub set I j . W e analyze the contri- bution o f variables in I ≤ t j to the total variation V r p ( I j ) . W e will show that with high probab ility th e contribution of these v ariables is at most ∆ . Let S = { i ∈ [ n ] | V r p ( i ) ≤ t } an d r enumb er the coordina tes such th at S = [ k ′ ] . Each variable x i , i ∈ S , is co ntained in I j indepen dently with p robability 1 /r . Let X 1 , . . . , X k ′ be the corre sponding independent Bernoulli random v ariables. Recall that, b y sub-add itivity , the variation of I ≤ t j is uppe r boun ded by X = P k ′ i =1 V r p ( i ) · X i . I t thus suffices to upp er boun d th e pro bability Pr[ X > ∆ ] . Note that E [ X ] = P k ′ i =1 V r p ( i ) · E [ X i ] = (1 / r ) · P k ′ i =1 V r p ( i ) ≤ ( s/r ) , since P k ′ i =1 V r p ( i ) ≤ P n i =1 V r p ( i ) ≤ s . The last inequa lity follo ws from the fo llowing simple fact (the pr oof of which is lef t for the reader) . Fact 4 Let p : { 0 , 1 } n →{− 1 , 1 } be an s -sparse polynomia l. The n P n i =1 V r p ( i ) ≤ s . T o fi nish the proof, we need the following version o f the Chernoff bo und: Fact 5 ([MR95]) F or k ′ ∈ N ∗ , let α 1 , . . . , α k ′ ∈ [0 , 1] an d let X 1 , . . . , X k ′ be ind e- penden t B ernoulli trials. Let X ′ = P k ′ i =1 α i X i and µ def = E [ X ′ ] ≥ 0 . Then for an y γ > 1 we have Pr[ X ′ > γ · µ ] < ( e γ − 1 γ γ ) µ . W e apply th e ab ove bound for the X i ’ s with α i = V r p ( i ) /t ∈ [0 , 1] . (Recall that the coordin ates in S h ave v ar iation at mo st t .) W e h av e µ = E [ X ′ ] = E [ X ] /t ≤ s/ ( r t ) = C ′ s/C τ , and we are interested in the event { X > ∆ } ≡ { X ′ > ∆/t } . Note that ∆/t = 4 C ′ s/τ . He nce, γ ≥ 4 C and the ab ove bo und imp lies that Pr[ X > ∆ ] < e/ (4 C ) 4 C ′ s/τ < (1 / 4 C 4 ) C ′ s/τ . Therefo re, for a fixed sub set I j , we have Pr[V r p ( I ≤ t j ) > ∆ ] < (1 / 4 C 4 ) C ′ s/τ . By a union b ound , we conclude that th is happens in every subset with failure pr obability at most r · (1 / 4 C 4 ) C ′ s/τ . This is l ess than 1 / 200 as long as C ′ is a lar ge enough a bsolute constant (indep endent of C ) , which completes the proof. Next we show that b y “ze roing o ut” the variables in low-variation sub sets, we a re likely to “kill” all terms in p tha t contain a lo w-influence v ariable. Lemma 8. W ith p r oba bility a t least 99 / 100 over the r an dom partition, every monomial of p conta ining a variab le with influ ence at most α ha s at least on e of its variables in ∪ j ∈ L ( α ) I j . 16 Pr oof. By Lemma 3 there are at most b = s log(8 s 3 /τ ) variables with influence m ore than α . Thus, n o matter the partition , at m ost b sub sets fro m { I j } r j =1 contain such variables. Fix a lo w-influence v ariab le ( influence at mo st α ) fro m e very monom ial con- taining such a variable. For each fixed variable, the prob ability th at it e nds up in the same subset as a high -influence variable is at most b/r . Union bo unding over each of the (at m ost s ) m onomials, th e failur e pro bability of the lemma is u pper bounded by sb/r < 1 / 1 00 . A.3 Proof of Theor em 3 Pr oof. (Theo rem 3) W e prove each statement in turn. Th e first statement of th e th eorem is im plied by Lemm a 4. (No te tha t, as expected, the validity of th is statement does not depend on the rando m partition.) W e claim that statem ents 2 - 5 essentially follow f rom Lem ma 7. (In con trast, the validity o f these statements crucially depend s on the random partition.) Let us first prove the thir d statement. W e want to show that (w .h.p . over the choice of α and { I j } r j =1 ) for every j ∈ H ( α ) , ( i ) there exists a unique i j ∈ I j such th at V r p ( i j ) ≥ α an d ( ii ) that V r p ( I j \ { i j } ) ≤ ∆ . Fix some j ∈ H ( α ) . By Lemma 7, for a giv en value of α satisfying the first statem ent o f the the orem, we have: ( i’ ) I j contains at most one variable x i j with V r p ( i j ) > t an d ( ii’ ) V r p ( I j \ { i j } ) ≤ ∆ . Since t < τ / 4 s 2 < α (with pro bability 1 ), ( i’ ) c learly implies that, if I j has a h igh-variation element (w .r .t. α ), th en it is uniqu e. In fact, we claim that V r p ( i j ) ≥ α . For o therwise, by sub-a dditivity of variation, we would have V r p ( I j ) ≤ V r p ( I j \ { i j } ) + V r p ( i j ) ≤ ∆ + α − 4 ∆ = α − 3 ∆ < α , which contrad icts the assumption tha t j ∈ H ( α ) . No te that we have used th e fact that α satisfies the first statem ent of th e theorem, that is V r p ( i j ) < α ⇒ V r p ( i j ) < α − 4 ∆ . Hence, for a “goo d” v alue of α (one satisfying the first statement of the theor em), the third statement is satisfied w ith pr obability at least 99 / 10 0 over the rand om partition . By Lemma 4 , a “goo d” value of α is chosen with probab ility 96 / 100 . B y indep endence , th e conclusions of Lemma 4 and Lem ma 7 hold simultaneou sly with probab ility m ore than 9 / 10 . W e now establish the second statement. W e assume as b efore that α is a “g ood” value. Con sider a fixed subset I j , j ∈ [ r ] . If j ∈ H ( α ) (i.e. I j is a high-variation subset) th en, with pro bability at least 99 / 1 00 (over th e ran dom p artition), there exists i j ∈ I j such th at V r p ( i j ) ≥ α + 4 ∆ . The mon otonicity of variation y ields V r p ( I j ) ≥ V r p ( i j ) ≥ α + 4 ∆ . If j ∈ L ( α ) then I j contains no high- variation variable, i.e. its maximum v ariation element h as variation at most α − 4 ∆ an d b y the second part of Lemma 7 the remain ing v ariables con tribute at most ∆ to its total variation. Hence, by sub-add iti vity we h av e that V r p ( I j ) ≤ α − 3 ∆ . Since a “good” value o f α is chosen with probability 96 / 1 00 , th e desired statement follows. The four th statemen t follows f rom the a foremen tioned and the f act that th ere exist at most s log(8 s 3 /τ ) variables with variation at least α (as follows from Lemma 3 , given that α > τ / (4 s 2 ) ). Now for th e fifth statement. Lemma 8 and mo noton icity imp ly that th e o nly vari- ables that remain rele vant in p ′ are (some of) th ose with high influence (at least α ) in p , and, as argued above, each high-variation subset I j contains at most one such variable. 17 By a union b ound , the con clusion of Lemm a 8 holds simultaneo usly with the conclu- sions of Lemma 4 and Lemma 7 with prob ability at least 9 / 10 . The sixth statement (that p and p ′ are τ -close) is a co nsequence of Lemma 6 ( since p ′ is o btained from p by setting to 0 variables with variation less than α < τ / (2 s 2 ) ). This concludes the proof of Theorem 3. B Completeness of the test In this section we show that T est-Sparse-Poly is complete: Theorem 6. Suppose f is an s -spa rse G F (2) p olynomia l. Then T est- Sparse-Poly ac- cepts f with pr oba bility at least 2 / 3 . Pr oof. Fix f to b e an s - sparse GF (2) polynomial over { 0 , 1 } n . By th e cho ice of the ∆ and r parameters in Step 1 of T est- Sparse-Poly we may apply Th eorem 3, so with failure pr obability at most 1 / 10 over the cho ice of α and I 1 , . . . , I r in Steps 2 and 3 , statements 1–6 of Theor em 3 all hold. W e shall wr ite f ′ to denote f | 0 ←∪ j ∈ L ( α ) I j . Note that at each su ccessi ve stage of the pr oof we shall assume that th e “failure proba bility” ev ents do n ot occur, i. e. h enceforth we shall assume that statem ents 1– 6 all ho ld for f ; we take a union bound over all failure prob abilities at th e end of the proof. Now consider the M execution s of the indepen dence test fo r a given fixed I j in Step 4. Lem ma 2 gives that each run rejects with prob ability 1 2 V r f ( I j ) . A standard Hoeffding b ound im plies that for the algo rithm’ s cho ice of M = 2 ∆ 2 ln(200 r ) , the value f V r f ( I j ) obtained in Step 4 is within ± ∆ o f the true value V r f ( I j ) with fail- ure pro bability at most 1 100 r . A unio n boun d over all j ∈ [ r ] gives th at with failure probab ility at most 1 / 100 , we have that e ach f V r f ( I j ) is within an add itiv e ± ∆ of the true value V r f ( I j ) . This me ans that (by statemen t 2 of Th eorem 3) every I j has f V r f ( I j ) / ∈ [ α − 2 ∆, α + 3 ∆ ] , and he nce in Step 5 of th e test, the sets e L ( α ) and e H ( α ) are ide ntical to L ( α ) and H ( α ) respec ti vely , which in turn means th at the f unction e f ′ defined in Step 5 is identical to f ′ defined above. W e now turn to Step 6 o f the test. By statement 6 of T heorem 3 we have th at f and f ′ disagree on at most a τ fractio n o f inputs. A union bo und over the m random examples d rawn in Step 6 i mplies that w ith f ailure probability at most τ m < 1 / 100 the test proceed s t o Step 7. By statement 3 of The orem 3 we h av e that each I j , j ∈ e H ( α ) ≡ H ( α ) , con tains precisely one high- variation e lement i j (i.e. which satisfies V r f ( i j ) ≥ α ), and these are all of the h igh-variation elements. Consider the set of the se | e H ( α ) | hig h-variation variables; statement 5 o f Th eorem 3 implies that these ar e the only variables wh ich f ′ can depen d o n (it is possible th at it d oes not dep end on some of these v ariables). Let us write f ′′ to d enote the f unction f ′′ : { 0 , 1 } | e H ( α ) | →{− 1 , 1 } co rrespond ing to f ′ but whose input variables are these | e H ( α ) | h igh-variation variables in f , one pe r I j for each j ∈ e H ( α ) . W e thu s h av e that f ′′ is isomorp hic to f ′ (obtained from f ′ by d iscarding irrelev ant v ar iables). The main idea behind the completeness p roof is that in Step 7 of T est-Sparse-P oly , the lear ning algo rithm LearnPoly ′ is b eing run with target fu nction f ′′ . Since f ′′ is 18 isomorph ic to f ′ , wh ich is an s -sparse polyn omial (since it is a restrictio n of an s - sparse polyn omial f ), with high pro bability LearnPoly ′ will r un su ccessfully a nd th e test will accept. T o show that th is is what actually ha ppens, we must show that with high prob ability each call to SimMQ which LearnPoly ′ makes correctly simulates the correspo nding membership query to f ′′ . This is established by the following lem mas: Lemma 9. Let f , I , α, ∆ be such tha t I is ( α, ∆ ) -well-structur ed with ∆ ≤ αδ / (2 ln(2 /δ )) . Then with pr o bability at least 1 − δ , the outpu t of SHIV ( f , I , α, ∆, b, δ ) is an assign - ment w ∈ { 0 , 1 } I which has w i = b. Pr oof. W e assume that I b contains the high-variation variable i (the oth er case being very similar). Recall th at by Lem ma 2 , each run of th e ind ependen ce test o n I b rejects with probability 1 2 V r f ( I b ) ; by Lem ma 1 (mono tonicity) this is at least 1 2 V r f ( i ) ≥ α/ 2 . So the prob ability that I b is not marked e ven o nce after c iterations of the independence test is at mo st (1 − α/ 2) c ≤ δ / 2 , b y o ur choice o f c . Sim ilarly , the prob ability that I b is e ver marked during c iteration s of the independence test is at most c ( ∆/ 2) ≤ δ / 2 , by the condition o f th e lemma. Th us, th e probability of failing at step 3 o f SH IV is at most δ , and since i ∈ I b , the assignment w sets variable i correctly in step 4. Lemma 10. W ith total failur e pr o bability a t m ost 1 / 100 , each of the Q ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100 ) calls to SimMQ ( f , e H ( α ) , { I j } j ∈ e H ( α ) , α, ∆, z , 1 / (10 0 Q ( s, | e H ( α ) | , ǫ/ 4 , 1 / 1 00))) that LearnPoly ′ makes in Step 7 of T est-Sparse-P oly r etu rns the correct value of f ′′ ( z ) . Pr oof. Consider a single call to the p rocedu re SimMQ ( f , e H ( α ) , { I j } j ∈ e H ( α ) , α, ∆, z , 1 / (100 Q ( s, | e H ( α ) | , ǫ/ 4 , 1 / 1 00))) made b y LearnPoly ′ . W e show that with failure probab ility at most δ ′ def = 1 / (100 Q ( s, | e H ( α ) | , ǫ/ 4 , 1 / 1 00) this call return s the value f ′′ ( z ) , and the lemma then follows b y a union bound over the Q ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100 ) many calls to SimM Q . This call to SimMQ m akes | e H ( α ) | calls to SHIV ( f , I j , α, ∆, z j , δ ′ / e H ( α ) | ) , one for each j ∈ e H ( α ) . Con sider any fix ed j ∈ e H ( α ) . Statement 3 of Theo rem 3 gives that I j ( j ∈ e H ( α ) ) is ( α, ∆ ) -well-structure d. Since α > τ 4 s 2 , it is e asy to c heck the con di- tion of L emma 9 ho lds where the ro le of “ δ ” in that in equality is p layed by δ ′ / | e H ( α ) | , so we may apply L emma 9 and co nclude that with failure pro bability at m ost δ ′ / | e H ( α ) | (recall th at by statement 4 of The orem 3 we h av e | e H ( α ) | ≤ s log (8 s 3 /τ ) ), SHIV re- turns an assignm ent to the variables in I j which sets the high-variation variable to z j as requ ired. By a union bo und, th e overall failure p robab ility that any I j ( j ∈ e H ( α ) ) has its h igh-variation variable n ot set accor ding to z is at m ost δ ′ . Now statement 5 and the discussion pr eceding this lemma (the isom orphism between f ′ and f ′′ ) give that SimMQ sets all of the variables that ar e relev ant in f ′ correctly accor ding to z in the assignment x it constructs in Step 2. Since this assignment x sets all variables in ∪ j ∈ e L I j to 0, the bit b = f ( x ) that is return ed is the correct value o f f ′′ ( z ) , with failure probab ility a t most δ ′ as required . W ith Le mma 1 0 in hand, we have that with failure prob ability at m ost 1 / 100 , th e execution of LearnP oly ′ ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100 ) in Step 7 o f T est-Spa rse-Poly cor- rectly simulates a ll membersh ip q ueries. As a con sequence, Coro llary 1 thus gives 19 that LearnP oly ′ ( s, | e H ( α ) | , ǫ/ 4 , 1 / 100 )) retur ns “no t s -sparse” with prob ability at m ost 1 / 100 . Su mming all the f ailure probabilities o ver t he entire ex ecution of the algo rithm, the overall p robab ility tha t T est-Sparse-Poly does not output “yes” is at most Theorem 3 z }| { 1 / 10 + Step 4 z }| { 1 / 100 + Step 6 z }| { 1 / 100 + Lemma 10 z }| { 1 / 100 + Corollary 1 z }| { 1 / 100 < 1 / 5 , and the completen ess theorem is proved. (Theor em 6 ) C Soundness of the T est In this section we prove the so undne ss o f T est-Sparse-Poly : Theorem 7. If f is ǫ -far fr om any s -sparse polynomia l, then T est-Sparse-P oly accepts with pr ob ability at most 1 / 3 . Pr oof. T o prove th e soundness of the test, we start by assuming that th e function f has progr essed to step 5, so the re are subsets I 1 , . . . , I r and e H ( α ) satisfying f V r f ( I j ) > α + 2 ∆ for all j ∈ e H ( α ) . As in th e proof of comp leteness, we have th at th e actual variations of all subsets should be close to the estimates, i.e. that V r f ( I j ) > α + ∆ for all j ∈ e H ( α ) except with with pro bability at most 1 / 1 00 . W e may then com plete the proof in two parts by establishing the following: – If f and e f ′ are ǫ a -far , s tep 6 will accept with prob ability at most δ a . – If e f ′ is ǫ b -far f rom e very s -sparse poly nomial, step 7 will accept with p robab ility at most δ b . Establishing these statements with ǫ a = ǫ b = ǫ/ 2 , δ a = 1 / 12 and δ b = 1 / 6 will allow us to com plete the pro of (and we ma y assume thro ughou t the rest o f the pro of that V r f ( I j ) > α for each j ∈ e H ( α ) ). The first stateme nt follows immediately by our ch oice o f m = 1 ǫ a ln 1 δ a with ǫ a = ǫ/ 2 an d δ a = 1 / 12 in Step 6. Our main task is to establish the second statement, which we do u sing Lemma s 1 1 and 12 stated b elow . I ntuitively , we would like to show tha t if Lea rnP oly ′ outputs a hyp othesis h (which must b e an s -sparse p olynom ial since LearnP oly ′ is p roper) with p robab ility greater than 1/6 , then e f ′ is clo se to a ju nta iso- morph ic to h . T o do this, we establish that if LearnPoly ′ succeeds with hig h p robab ility , then the last hypo thesis on which an eq uiv a lence que ry is p erform ed in LearnPoly ′ is a function which is close to e f ′ . Our proo f uses two lemmas: Lem ma 12 tells us that this holds if the high variation subsets satisfy a certain structure , an d Lemma 11 tells us that if LearnP oly ′ succeeds with high prob ability th en the subsets indeed satisfy this structure. W e now state these lem mas form ally and com plete the p roof of th e theore m, deferrin g the proofs of the lemmas until later . Recall that the algorithm LearnP oly ′ will make repeated calls to Sim MQ which in turn makes rep eated calls to SHIV . Lemma 1 1 states that if, with probability gre ater than δ 2 , all of these calls to SHIV retur n witho ut failure, then the subsets associated with e H ( α ) have a special s tructure. 20 Lemma 11. Let J ⊂ [ n ] be a subset of variables o btained b y including th e high est- variation elemen t in I j for each j ∈ e H ( α ) (br eaking ties arbitrarily). Suppo se that k > 30 0 | e H ( α ) | /ǫ 2 queries ar e made to SimMQ . Suppo se mo r eover that Pr[ every call to SH IV th at is m ade d uring the se k qu eries returns without outputting ‘fail’ ] is gr eater than δ 2 for δ 2 = 1 /Ω ( k ) . Then the following bo th hold: – Every sub set I j for j ∈ e H ( α ) satisfies V r f ( I j \ J ) ≤ 2 ǫ 2 / | e H ( α ) | ; and – The function e f ′ is ǫ 2 -close to the junta g : { 0 , 1 } | e H ( α ) | →{− 1 , 1 } defin ed as as: g ( x ) def = sig n( E z [ e f ′ (( x ∩ J ) ⊔ z )]) . Giv en that the sub sets associated with e H ( α ) have this sp ecial structure, Lemma 12 tells us that the hypoth esis output by LearnP oly ′ should be close to the junta g . Lemma 12. Define Q E as the ma ximum numbe r o f c alls to SimMQ th at tha t will be made by Lea rnP oly ′ in all of its equivalence queries. Suppo se that for every j ∈ e H ( α ) , it h olds tha t V r f ( I j \ J ) < 2 ǫ 2 / | e H ( α ) | with ǫ 2 < α 800 Q E . Then the pr oba bility that LearnP oly ′ outputs a hyp othesis h which is ǫ/ 4 -far fr om the junta g is at mo st δ 3 = 1 / 100 . W e now show tha t Lemm as 1 1 and 12 suffice to p rove the desired re sult. Supp ose that LearnP oly ′ accepts with p robab ility at least δ b = 1 / 6 . Assume L earnPoly ′ makes at least k queries to SimMQ (we address this in the next p aragrap h); th en it follows from Lem ma 1 1 that the bins associated with e H ( α ) satisfy the condition s o f Lem ma 12 a nd that e f ′ is ǫ 2 -close to the ju nta g . Now ap plying L emma 12, we have th at with failure prob ability at mo st 1 / 1 00 , LearnPoly ′ outputs a hyp othesis which is ǫ/ 4 -close to g . But then e f ′ must be ( ǫ 2 + ǫ/ 4) -close to th is hypo thesis, which is an s -sparse polyno mial. W e n eed to establish that LearnP oly ′ indeed makes k > 3 00 | e H ( α ) | /ǫ 2 SimMQ queries for an ǫ 2 that satisfies t he c ondition on ǫ 2 in Lemma 12. (Note th at if LearnP o ly ′ does not actually make this many queries, we can simply hav e it make artificial c alls to SHIV to ach iev e this. An easy extension of our completen ess proof han dles this sligh t extension of the algorithm ; we omit the details.) Since we need ǫ 2 < α/ 80 0 Q E and Theorem 2 gi ves u s that Q E = ( | e H ( α ) | s + 2) · 4 ǫ ln 300( | e H ( α ) | s + 2 ) (each eq uiv a lence query is simulated using 4 ǫ ln 300( | e H ( α ) | s + 2 ) rando m examples), an easy compu tation shows that it suffices to take k = p oly ( s, 1 /ǫ ) , and th e proof of Theorem 7 is complete. Before p roving Lem ma 12 and Lemma 11, we prove th e following about the be- havior of SHIV when it is called with parameter s α, ∆ th at do not quite m atch the real values α ′ , ∆ ′ for which I is ( α ′ , ∆ ′ ) -well-structured : Lemma 13. If I is ( α ′ , ∆ ′ ) -well-structur ed, then the pr ob ability that SHIV ( f , I , α, ∆, b , δ ) passes (i.e. doe s not output “ fail”) and sets the high variation variable incorr ectly is at most ( δ / 2 ) α ′ /α · (1 / α ) · ∆ ′ · ln(2 /δ ) . 21 Pr oof. The on ly way for SHIV to pass with an in correct setting of th e high-variation variable i is if it fails to mark the sub set contain ing i for c iterations of th e indep endence test, and m arks the oth er subset at least o nce. Since V r ( i ) > α ′ and V r ( I \ i ) < ∆ ′ , the probab ility o f this occurring is at mo st (1 − α ′ / 2) c · ∆ ′ · c/ 2 . Since SHIV is called with failure parameter δ , c is set to 2 α ln 2 δ . W e n ow gi ve a proof of Lemma 12, followed by a pro of of Lemma 11. Pr oof. (Lemma 12) By assumption each V r f ( I j \ J ) ≤ 2 ǫ 2 / | e H ( α ) | and V r f ( I j ) > α , so sub additivity of variation gives us that for each j ∈ e H ( α ) , there exists an i ∈ I j such that V r f ( i ) > α − 2 ǫ 2 / | e H ( α ) | . Thus for ev ery each call to SHIV mad e b y SimMQ , the condition s o f Lem ma 13 ar e satisfied with V r f ( i ) > α − 2 ǫ 2 / | e H ( α ) | a nd V r f ( I j \ J ) < 2 ǫ 2 / | e H ( α ) | . W e show that as lon g as ǫ 2 < α 800 Q E , the prob ability that any particu lar query z to SimMQ has a variable set incorrectly is at most δ 3 / 3 Q E . Suppose SHIV has bee n called with failure pro bability δ 4 , then the probability giv en by Lemma 13 is at most: ( δ 4 / 2) 1 − 2 ǫ 2 / ( α ·| e H ( α ) | ) · 2 α ln 2 δ 4 · 2 ǫ 2 / | e H ( α ) | , (1) W e shall sho w that this is at most δ 3 / 3 | e H ( α ) | Q E = 1 / 30 0 Q E | e H ( α ) | . T a king ǫ 2 ≤ α/ 800 Q E simplifies (1) to: 1 300 Q E | e H ( α ) | · ( δ 4 / 2) 1 − 2 ǫ 2 / ( α ·| e H ( α ) | ) · 3 4 ln 2 δ 4 , which is at most 1 / 30 0 | e H ( α ) | Q E as long as (2 /δ 4 ) 1 − 2 ǫ 2 / ( α ·| e H ( α ) | ) > 3 4 ln 2 δ 4 , which certainly holds for our choice of ǫ 2 and the setting of δ 4 = 1 / 100 k | e H ( α ) | . Each call to SimMQ u ses | e H ( α ) | calls to SHIV , so a unio n bo und g iv es th at e ach ra ndom query to SimMQ returns an incorre ct ass ignment with probab ility at m ost 1 / 300 Q E . Now , since e f ′ and g ar e ǫ 2 -close and ǫ 2 satisfies ǫ 2 Q E ≤ δ 3 / 3 , in the un iform random samples u sed to simu late the final (accep ting) equivalence qu ery , LearnP oly ′ will rec eiv e exam ples labeled corr ectly acc ording to g with prob ability a t least 1 − 2 δ 3 / 3 . Finally , note that LearnPoly ′ makes a t most | e H ( α ) | s +2 eq uiv a lence queries and hence each qu ery is simulated u sing 4 ǫ ln 3( | e H ( α ) | s +2) δ 3 random examples (f or a failure probab ility of δ 3 | e H ( α ) | s +2 for each eq uiv alence qu ery). Then Lear nPoly ′ will reject with probab ility at lea st 1 − δ 3 / 3 unle ss g a nd h are ǫ/ 4 -close. This c onclude s the p roof o f Lemma 12. Pr oof. (Lemma 11) W e p rove that if V r f ( I j \ J ) > 2 ǫ 2 / | e H ( α ) | fo r som e j ∈ e H ( α ) , then the p robab ility that a ll c alls to SHIV retu rn successfully is at most δ 2 . The close- ness of e f ′ and g follows easily by the sub additivity of variation and Propo sition 3.2 o f [FKR + 04]. 22 First, we prove a mu ch weaker statem ent wh ose ana lysis and c onclusion will b e used to prove the p roposition . W e show in Pr oposition 1 that if the test acc epts with high pr obability , then th e variation from each variable in any sub set is small. W e use the b ound on each variable’ s variation to o btain the concentr ation result in Proposition 2, and then complete the proo f o f Lemma 11. Proposition 1. Suppo se tha t k calls to SHIV are ma de with a pa rticular subset I , and let i be the variab le with the highest v ariation in I . I f V r f ( j ) > ǫ 2 / 100 | e H ( α ) | for some j ∈ I \ i , then the p r oba bility th at SHIV r eturns without outputting ‘fail’ for all k calls is at most δ ∗ = e − k/ 18 + e − c . Pr oof. Suppo se th at ther e exist j, j ′ ∈ I with V r f ( j ) ≥ V r f ( j ′ ) ≥ ǫ 2 / 100 | e H ( α ) | . A standard Che rnoff bound giv es that excep t with p robab ility at most e − k/ 18 , fo r at le ast (1 / 3) k of the calls to SHIV , v ariables j and j ′ are in d ifferent p artitions. I n these ca ses, the pro bability SHIV does not output ‘fail’ is at most 2(1 − ǫ 2 / 100 | e H ( α ) | ) c , since for each o f the c run s of the indepen dence test, one of the p artitions must no t be marked. The probability no ca ll outputs ‘fail’ is at mo st e − k/ 18 + 2 (1 − ǫ 2 / 100 | e H ( α ) | ) ck/ 3 . Ou r choice of k > 300 | e H ( α ) | /ǫ 2 ensures that (1 /e ) ckǫ 2 / 300 | e H ( α ) | ≤ (1 /e ) c . Since in our setting | I j | may de pend on n , using the monoton icity of v ariatio n wi th the previous claim does no t give a usef ul bound on V r f ( I \ i ) . But we s ee f rom the pro of that if the variation of each partition is not mu ch less than V r f ( I \ i ) and V r f ( I \ i ) > 2 ǫ 2 / | e H ( α ) | , th en with enoug h calls to SHIV one of the se calls should outpu t “fail. ” Hence the lemma will be easily proven o nce we establish the following p roposition : Proposition 2. Suppo se tha t k calls to SHIV ar e made with a particular subset I hav- ing V r f ( I \ i ) > 2 ǫ 2 / | e H ( α ) | a nd V r f ( j ) ≤ ǫ 2 / 100 | e H ( α ) | for every j ∈ I \ i . Then with pr ob ability gr eater th an 1 − δ ∗∗ = 1 − e − k/ 18 , at least 1 / 3 o f the k calls to SHIV yield both V r f ( I 1 ) > η V r f ( I \ i ) / 2 an d V r f ( I 0 ) > η V r f ( I \ i ) / 2 , wher e η = 1 /e − 1 / 50 . Pr oof. W e would like to show that a ran dom partition of I into two parts will r esult in parts ea ch o f wh ich h as variation no t mu ch less than the variation of I \ i . Ch oosing a partition is equiv alent to cho osing a ran dom subset I ′ of I \ i an d inclu ding i in I ′ or I \ I ′ with equ al prob ability . Thu s it suffices to show that f or rand om I ′ ⊆ I \ i , it is unlikely that V r f ( I ′ ) is m uch smaller than V r f ( I \ i ) . This does not ho ld for general I , but by boun ding the variation of any particular variable in I , which we hav e don e in Propo sition 1, an d compu ting th e uniq ue-variatio n (a tech nical too l intro duced in [FKR + 04]) o f I ′ , we may ob tain a d eviation bou nd on V r f ( I ′ ) . The following statement follows fro m Lemma 3.4 of [FKR + 04]: Proposition 3 ([FKR + 04]). Define th e uniqu e-variation of variable j (with r espect to i ) as Ur f ( j ) = V r f ([ j ] \ i ) − V r f ([ j − 1] \ i ) . Then for any I ′ ⊆ I \ i , V r f ( I ′ ) ≥ X j ∈ I ′ Ur f ( j ) = X j ∈ I ′ V r f ([ j ] \ i ) − V r f ([ j − 1] \ i ) . 23 Now V r f ( I ′ ) is lower b ounde d by a sum of indepe ndent, non -negative ra ndom variables whose expectation is gi ven by E [ X j ∈ I ′ Ur f ( j )] = n X j =1 (1 / 2)Ur f ( j ) = V r f ( I \ i ) / 2 def = µ. T o o btain a co ncentration proper ty , we requir e a b ound on each Ur f ( j ) ≤ V r f ( j ) , which is prec isely what w e showed in the p revious propo sition. Note that Ur f ( i ) = 0 , and recall that we have assumed that µ > ǫ 2 / | e H ( α ) | and every j ∈ I \ i satisfies V r f ( j ) < µ/ 10 0 . Now we may use the b ound from [FKR + 04] in Proposition 3 .5 with η = 1 /e − 2 / 100 to obtain: Pr[ X j ∈ I ′ Ur f ( j ) < η µ ] < e xp( 100 e ( η e − 1)))] ≤ 1 /e 2 . Thus the probab ility that one o f I 0 and I 1 has v ariation less th an η µ is at mo st 1 / 2 . W e expect that half of th e k calls to SHIV will result in I 0 and I 1 having variation at least η µ , so a Chernoff bound comp letes the proof of the claim with δ ∗∗ ≤ e − k/ 18 . T his conclud es the proof of Proposition 2. Finally , we proceed to prove the lemma. Suppose that there exists some I such th at V r f ( I \ i ) > 2 ǫ 2 / | e H ( α ) | . Now the prob ability that a par ticular call to SHIV with subset I succeeds is: Pr[marked( I 0 ); ¬ marked( I 1 )] + Pr[marked( I 1 ); ¬ marked( I 0 )] . By Prop ositions 1 and 2, if with p robab ility at least δ ∗ + δ ∗∗ none o f the k calls to SHIV retur n fail, then fo r k / 3 ru ns of SHIV both V r f ( I 1 ) and V r f ( I 0 ) ar e at least η ǫ 2 / | e H ( α ) | > ǫ 2 / 4 | e H ( α ) | and thus both prob abilities are at most (1 − ǫ 2 / 4 | e H ( α ) | ) c . As in the analysis of the first p roposition , we may co nclude that every sub set I which is c alled with SHIV at least k times either satisfies V r f ( I \ i ) < 2 ǫ 2 / | e H ( α ) | or will cau se the test to reject with p robability at least 1 − δ ∗∗ − 2 δ ∗ . Recall that δ ∗ = e − c + e − k/ 18 ; since SHIV is set to run with failure probability at most 1 / | e H ( α ) | k , we have th at δ 2 is 1 /Ω ( k ) . This con cludes the proof of Lemma 11.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment