Communication Lower Bounds Using Dual Polynomials
Representations of Boolean functions by real polynomials play an important role in complexity theory. Typically, one is interested in the least degree of a polynomial p(x_1,...,x_n) that approximates or sign-represents a given Boolean function f(x_1,…
Authors: Alex, er A. Sherstov
Commun ication Lower Bounds Using Dual P olynomial s ∗ A lexander A. S hersto v Univ . of T exas at A ustin, De pt. of Comp. Sciences, sherstov@cs.ute xas.edu November 9, 2018 Abstract Representations of Boolean fun ctions by real polyn omials play an im- portant role in complexity theory . T y pically , one is interested in the lea st degree of a polynom ial p ( x 1 , . . . , x n ) that approx imates or sign- represents a giv en Boolean fun ction f ( x 1 , . . . , x n ) . This article surveys a new and grow- ing body of work in commun ication comp lexity that centers around the dual objects, i.e., po lynomials that certify the d i ffi culty of appr oximating or sign- representin g a given function. W e provide a unified g uide to the fo llowing results, complete with all the ke y proo fs: • Sherstov’ s De gr ee / Discr epan cy Theor em, which translates lower bound s on the threshold d egree of a Boo lean functio n into upp er bound s on the discrepa ncy of a related func tion; • T wo di ff erent methods for proving lo wer bounds o n bounded- error commun ication based on the appro ximate degree: Sherstov’ s pattern matrix method an d Shi and Zhu’ s block composition method ; • Extension of the p attern matrix meth od to the m ultiparty model, ob- tained by Lee an d Shraibm an and by Chattopad hyay and Ada, an d the resulting improved lower bounds for di sjointness ; • David and Pit assi’ s separation of NP and BPP in multiparty communi- cation complexity for k 6 (1 − ǫ ) log n players. ∗ In vit ed survey for The Bulletin of the Eur opean Association for Theor etical Computer Science (EA TCS). 1 Introd uction Represen tations of Boolean functio ns by real polynomia ls are of considera ble im- portan ce in complexity theory . The ease or di ffi culty of representi ng a giv en Boolean function by p olynomials fr om a gi ven set o ften yields valuab le insig hts into the structu ral complex ity of that functio n. W e focus on two concrete rep resentatio n schemes th at in v olve p olynomi- als. The first of these corresp onds to threshold computatio n. For a B oolean functi on f : { 0 , 1 } n → { 0 , 1 } , its thr eshold d e gre e deg ± ( f ) is the min imum deg ree of a poly nomial p ( x 1 , . . . , x n ) suc h that p ( x ) is pos itiv e if f ( x ) = 1 and neg ati ve otherwise. In other words, the threshold degree of f is the least deg ree of a polynomial that re presents f in sign. Sev eral aut hors ha ve ana- lyzed the threshold degre e of common Boolean functio ns [MP88, BRS95, O S03]. The result s of these in vesti gations ha ve found numero us applications to circuit comple xity [ABF R94, BRS 95, KP97, KP 98] an d computati onal learning the- ory [KS04, K OS04, KS07b]. The other representa tion scheme that we consid er is approximation in the unifor m norm. For a Boolean functio n f : { 0 , 1 } n → { 0 , 1 } and a constant ǫ ∈ (0 , 1 / 2) , the ǫ -app roximate de gree of f is the least degree of a pol ynomial p ( x 1 , . . . , x n ) with | f ( x ) − p ( x ) | 6 ǫ for all x ∈ { 0 , 1 } n . Note that this repre- sentat ion is strictly stronge r than the fi rst: no longer are we content with rep- resent ing f in sign, but rather we wish to closel y approximate f on e very in- put. There is a consider able litera ture on the approx imate degree of specific Boolean function s [NS 92, Pat92, KLS96, BCWZ99, AS04, She08, W ol08]. T his classic al notion has been crucial to progress on a va riety of questions, includ- ing quantu m query comple xity [BCW Z99, BBC + 01, AS04], commun ication com- ple xity [BW01, Raz03, BVW07] a nd comput ational learning theory [TT99, KS04, KKMS05, KS07a]. The app roximate deg ree and t hreshold deg ree can b e con ven iently analyzed by means of a linea r pro gram. In pa rticular , whene ver a giv en fu nction f cannot be ap- proximat ed or sign-repre sented by po lynomials o f l ow degre e, linear -programming dualit y implies the e xistence of a certai n dual object to witne ss that fact. This dual object , which is a real function or a proba bility dis trib ution, rev eals useful ne w informat ion about the structural comple xity of f . The purpo se of this article is to surv ey a ver y recent and growing bod y of work in c ommunication complex ity that re vo lves arou nd the dual formulat ions of the approx imate degree and thresh old deg ree. Our ambition here is to pro vide a unified vie w of these di verse resul ts, complete with all the key proo fs, and thereby to encoura ge further inquiry into the potent ial of the dual approach. In the remainder of this section, we gi ve an intui tiv e ov ervie w of our surve y . 1 Degr ee / Discr epancy Theorem. The first result that we surve y , in Section 3, is the au thor’ s De gr ee / Discr epancy Theor em [She07a]. This theo rem and its proof techni que are the foundation for much of the subse quent work surv eye d in this article [She07 b , Cha0 7 , LS07, CA08, DP08]. F ix a Bo olean functio n f : { 0 , 1 } n → { 0 , 1 } and l et N be a giv en inte ger , N > n . In [ She07a], we i ntroduce d the two-p arty communica tion problem of computing f ( x | V ) , where the Boolean string x ∈ { 0 , 1 } N is Alice’ s input and the set V ⊂ { 1 , 2 , . . . , N } of size | V | = n is Bob’ s input. The symbol x | V stands for the projectio n of x onto the indices in V , in other words , x | V = ( x i 1 , x i 2 , . . . , x i n ) ∈ { 0 , 1 } n , where i 1 < i 2 < · · · < i n are the elements of V . Intuiti vely , this proble m models a situatio n when Alice and Bob’ s joint computati on depend s on only n of the inputs x 1 , x 2 , . . . , x N . Alice kno ws the valu es of all the inputs x 1 , x 2 , . . . , x N b ut does not kno w which n of them are relev ant. Bob, on the other hand, knows which n inputs are relev ant b ut does not know thei r values . W e prov ed in [She07a] that the thresho ld degre e d of f is a lower bound on the communicatio n requiremen ts of this problem. More precisely , the D e- gree / Discr epancy Theorem s hows that th is co mmunication probl em has di screp- anc y exp( − Ω ( d )) as soo n as N > 11 n 2 / d . This e xponent ially small discrepanc y immediatel y giv es an Ω ( d ) lower bound on communicatio n in a variet y of models (determin istic, nondetermini stic, randomized, quantu m with and wit hout entan gle- ment). Moreov er , the resultin g lo wer bounds on communica tion hold ev en if the desire d error probabili ty is v anishi ngly close to 1 / 2 . The proof of th e Degre e / Discrepanc y Theorem introduces a nove l techni que based on the dual formulatio n of the threshold degree . In fact, it appears to be the first use of the thresho ld de gree (in its primal or dual for m) to pro ve communicat ion lo wer bounds. As an application , we exhibit in [She07a] the first AC 0 circuit with exp onentiall y small disc repanc y , thereby sep arating A C 0 from depth-2 majority circuit s and solving an open problem of Krause and Pudl ´ ak [KP97, § 6]. Indepe n- dently o f the author , Buhrman e t al. [BVW 07] e xhibit ed another A C 0 functi on with exp onentiall y small discrep ancy , using much di ff erent techni ques. Bounded-Error Commu nication. Next, we prese nt two recent results on bound ed-error communicati on complexi ty , due to S hersto v [She07b] and S hi and Zhu [SZ07]. T hese papers use the notion of approximat e degree to contrib ute strong lo wer bounds for rather broad classes of functions, subsuming Razborov ’ s breakt hrough work on symmetric predica tes [Raz03]. The lo wer bounds are valid not o nly in th e randomiz ed model, b ut als o in t he quan tum model with and without prior entangl ement. 2 The setting in which to view these two works is the genera lized di scr epancy method, a simple b ut v ery usef ul prin ciple in troduced by Klauck [Kla0 1 ] and refo r- mulated in it s curr ent f orm by Razboro v [Raz0 3 ]. Let f ( x , y ) be a B oolean function whose quantu m communication comple xity is of interes t. The method asks for a Boolean functio n h ( x , y ) and a distrib ution µ on ( x , y )-pairs such that: (1) the func tions f and h are hig hly correlated under µ ; and (2) all lo w-cost protocols hav e negl igible adv antage in computin g h under µ. If such h and µ inde ed exist, it follo ws that no low-c ost protocol can compute f to high accura cy (or else it would be a good predictor for the hard function h as w ell!). This method is in no way restric ted to the quantum model but, rather , applies to any model of communication [She07b, § 2.4]. The importance of the generalized discre pancy method is that it makes it possible , in theory , to pro ve lo wer bound s for fun ctions such as disjointness , to which the traditi onal discrep ancy method does not apply . In Section 4, we prov ide det ailed historical background on the genera lized discr epancy method and compile its quantitati ve v ersions for sev eral models. The hard part, of course, is fi nding h and µ. Except in rather restricted cases [Kla01, Thm. 4], it was not kno wn ho w to do it. As a result, the generalize d discre pancy metho d was of limited prac tical use. This di ffi cult y was ov ercome indepe ndently by Sherstov [She07b] and Shi and Zhu [SZ07], who used the dual charac terization of the approx imate deg ree to obtain h and µ for a broad range of proble ms. T o our kno w ledge, the work in [She07b] and [SZ07] is the first use of the dua l characteri zation of the appro ximate degree to prov e commun ication lo wer bound s. The specifics of these two works are very di ff ere nt. The const ruction of h and µ in [She07b], which we cal led the pa ttern matrix met hod for lo wer bounds o n bound ed-error co mmunication , is built around a ne w matrix-analyt ic techn ique (th e patter n matrix ) inspired by the author’ s Degre e / Discrepanc y Theor em. The con- structi on in [SZ07], the bloc k-composit ion m ethod, is based on the idea of hardness amplificatio n by compositio n. These two me thods exhibit quite di ff eren t beha vior , e.g., th e pa ttern matr ix meth od f urther e xtends to the mul tiparty mode l. W e present the two methods indi vidually in Sections 5.1 and 5.2 and provid e a detaile d com- pariso n of their strength and applicabi lity in S ection 5.3. Extensions to the Mu ltiparty Model. Both the De gree / Discrepa ncy Theo- rem [She07a] and the patt ern matrix method [She07 b ] general ize to the multipa rty number -on-the-for ehead model. In the case of [She07a], th is e xtension was for- malized by Chattopadhy ay [Cha07]. As before, let f : { 0 , 1 } n → { 0 , 1 } be a giv en functi on. Recall that in the two-party case, there was a Boolean string x ∈ { 0 , 1 } N 3 and a single set V ⊂ { 1 , 2 , . . . , N } . The k -party communicatio n proble m featu res a Boolean string x ∈ { 0 , 1 } N k − 1 and sets V 1 , . . . , V k − 1 ⊂ { 1 , 2 , . . . , N } . The k inputs x , V 1 , . . . , V k − 1 are d istrib uted among the k par ties as usua l. The goa l is to c ompute f ( x | V 1 ,..., V k − 1 ) def = f x i 1 1 ,..., i k − 1 1 , . . . , x i 1 n ,..., i k − 1 n , (1.1) where i j 1 < i j 2 < · · · < i j n are the ele ments of V j (for j = 1 , 2 , . . . , k − 1). This way , again no party kno ws at on ce the B oolean string x and the rele vant bits in it. W ith this setup in plac e, it beco mes relati vely straigh tforward to bound the discre pancy by trav ersing the same line of reasoning as in [She07a]. The extens ion of th e pattern matri x method [She07 b ] to the multip arty model uses a similar setup and was done by Lee and Shraib man [LS 07] and in depende ntly by Chattopadh yay and Ada [CA08]. W e present the proofs of these extension s in S ection 6, placing them in close correspond ence with the tw o-party cas e. These e xtensio ns do not subsu me the two-party result s, howe ver (see Section 6 for details ). The authors of [LS 07] and [CA08] gav e important application s of their work to the k -party randomized communication complexity of disjointness , impro ving it from Ω ( 1 k log n ) to n Ω (1 / k ) 2 − O (2 k ) . As a coro llary , they separate d the multiparty communica tion classes NP cc k and BPP cc k for k = (1 − o (1)) log 2 log 2 n parties. They also obtained new results for Lov ´ asz-Schrijv er proof systems, in light of the work due to Beame, Pitassi, and Segerlin d [BP S07]. Separation of NP cc k and BPP cc k . The separation of the classes NP cc k and BPP cc k in [LS07, CA08] for k = (1 − o (1)) log 2 log 2 n partie s was follo wed by another ex- citing de velo pment, due to David and Pitassi [DP08], who separate d these classes for k 6 (1 − ǫ ) log 2 n partie s. Here ǫ > 0 is an arbitrary constant . Since the curren t barrier for expli cit lower bound s on multiparty communicatio n complexity is pre- cisely k = log 2 n , Dav id and P itassi’ s s eparation matches the state of the art. W e presen t this work in Section 7 . The p owerfu l idea in this resul t was to redefine the project ion operator x | V 1 ,..., V k − 1 in (1.1). Specifically , Davi d and P itassi obse rved that it su ffi ces to define the project ion operato r at random, using the probabi listic method. This insight remo ved the k ey tech nical obstacle present in [LS07, CA08]. In a follo w-up work by D a vid, Pitassi, and V iola [DPV08], the proba bilistic con- structi on was deran domized to yield an expl icit separation. Other R elated W ork. For completene ss, we will mention se veral duality-b ased results in communic ation comple xity that f all outside the scope of this surv ey . Re- cent work has seen other applicati ons of dual polyno mials [She07c, RS08], w hich are considerab ly more complicated and no longer correspond to the approximate 4 deg ree o r thresho ld de gree. More broadly , sev eral rece nt re sults feature other forms of duality [L S07b, LS ˇ S08], such as the duality of norms or semidefinite program- ming duality . 2 Pr eliminaries This section re vie ws our notati on and pro vides rele vant technica l backgroun d. 2.1 General Backgr ound A Boolean function is a mapping X → { 0 , 1 } , w here X is a finite set such as X = { 0 , 1 } n or X = { 0 , 1 } n × { 0 , 1 } n . The notation [ n ] stands for the set { 1 , 2 , . . . , n } . For inte gers N , n with N > n , the symbol [ N ] n denote s the family of all size- n subsets of { 1 , 2 , . . . , N } . For x ∈ { 0 , 1 } n , we write | x | = x 1 + · · · + x n . For x , y ∈ { 0 , 1 } n , the notati on x ∧ y refers as u sual to t he compon ent-wise AND of x and y. In partic ular , | x ∧ y | stands for the number of posi tions where x and y both ha ve a 1 . Throughout this manuscr ipt, “log” refers to the logarith m to base 2 . For tensors A , B : X 1 × · · · × X k → R (where X i is a finite set, i = 1 , 2 , . . . , k ), define h A , B i = P ( x 1 ,..., x k ) ∈ X 1 ×···× X k A ( x 1 , . . . , x k ) B ( x 1 , . . . , x k ) . When A and B are vec tors or matrices, this is the stan dard definition of inner p roduct. The Hadamar d pr oduct of A and B is the ten sor A ◦ B : X 1 × · · · × X k → R giv en by ( A ◦ B )( x 1 , . . . , x k ) = A ( x 1 , . . . , x k ) B ( x 1 , . . . , x k ) . The symbol R m × n refers to the family of all m × n matrices with real entries. The ( i , j )th entry of a m atrix A is denoted by A i j . W e frequently us e “generic-e ntry” notati on to specify a matrix succinctly : we write A = [ F ( i , j )] i , j to mean that the ( i , j )th entry of A is giv en by the expres sion F ( i , j ) . In most matrices that arise in this work, th e exact orde ring of the columns (and ro w s) is irrele van t. In such cases we describe a matrix by the notation [ F ( i , j )] i ∈ I , j ∈ J , w here I and J are some inde x sets. Let A ∈ R m × n . W e use the followin g sta ndard n otation: k A k ∞ = max i , j | A i j | and k A k 1 = P i , j | A i j | . W e d enote the sin gular va lues of A by σ 1 ( A ) > σ 2 ( A ) > . . . > σ min { m , n } ( A ) > 0 . Recall that the spectra l norm of A is gi ven by k A k = max x ∈ R n , k x k = 1 k A x k = σ 1 ( A ) . An excellen t referenc e on m atrix analy sis is [HJ86]. W e conclude with a re view of the Fourier trans form ove r Z n 2 . Consid er the vec tor spac e of function s { 0 , 1 } n → R , equipped with the inner product h f , g i = 2 − n P x ∈{ 0 , 1 } n f ( x ) g ( x ) . For S ⊆ [ n ] , define χ S : { 0 , 1 } n → {− 1 , + 1 } by χ S ( x ) = ( − 1) P i ∈ S x i . Then { χ S } S ⊆ [ n ] is an or thonormal ba sis fo r the inner pro duct space in questi on. As a result, e very fu nction f : { 0 , 1 } n → R has a u nique repr esentatio n of the form f ( x ) = P S ⊆ [ n ] ˆ f ( S ) χ S ( x ) , where ˆ f ( S ) = h f , χ S i . The reals ˆ f ( S ) are called 5 the F ourier coe ffi cients of f . The fo llo wing fact i s immedia te from the definition o f ˆ f ( S ): Pro position 2.1. Fix f : { 0 , 1 } n → R . Then max S ⊆ [ n ] | ˆ f ( S ) | 6 2 − n X x ∈{ 0 , 1 } n | f ( x ) | . 2.2 Communication Complexity This su rve y featur es sev eral st andard models o f commun ication. In the case o f two communica ting parties , one considers a function f : X × Y → { 0 , 1 } , where X and Y are some finite sets. Alice rece iv es an input x ∈ X , B ob recei ves y ∈ Y , and the ir object iv e is to pr edict f ( x , y ) with good accur acy . T o th is end, Alice and Bob share a communicat ion channel (classica l or quantum, dependin g on the model). Alice and Bob’ s communicat ion protocol is said to hav e err or ǫ if it outputs the correct answer f ( x , y ) with probab ility at least 1 − ǫ on e very input. The cost of a gi ven protoc ol is the maximum number of bits excha nged on any input. The two-par ty models of in terest to us are th e randomized model, the quantum model withou t prior entanglemen t, and the quantum model with prior entang lement. T he least cost of an ǫ -error protocol for f in the se models is de noted by R ǫ ( f ) , Q ǫ ( f ) , and Q ∗ ǫ ( f ) , respec tiv ely . It is standard p ractice to omit the subscript ǫ when error parameter is ǫ = 1 / 3 . Recall that the error probabil ity of a protocol can be decrea sed fr om 1 / 3 to any other constant ǫ > 0 at the expense of increas ing the communicati on cost by a constant factor; we w ill use this fact in m any proofs of this surve y , often without e xplicitly mentioni ng it. Excellent reference s on these communication models are [KN97] and [W ol01]. A generali zation of two-pa rty communica tion is number -on-the-for ehead mul- tipart y communica tion. Here one consid ers a function f : X 1 × · · · × X k → { 0 , 1 } for some fi nite sets X 1 , . . . , X k . T here are k pl ayers. A giv en input ( x 1 , . . . , x k ) ∈ X 1 × · · · × X k is distrib uted among the players by p lacing x i on the forehea d of player i (for i = 1 , . . . , k ). In othe r words, player i knows x 1 , . . . , x i − 1 , x i + 1 , . . . , x k b ut not x i . The playe rs can communicate by writing bits o n a share d blackboard , visible to all. T hey add itionally hav e access to a shared source of random bits. T heir goal is to devi se a communication protoc ol that will allo w them to accurately predict the v alue of f on ev ery input. Analogous to the two-pa rty case, the randomiz ed com- municati on comple xity R k ǫ ( f ) is th e least cost of an ǫ -err or communic ation p rotocol for f in this model. The final section of this pap er also considers the nondet ermin- istic communicatio n comp lexity N k ( f ) , which is the minimum cost of a protoco l 6 for f that always outputs the correct answer on the inputs f − 1 (0) and has error prob- ability less than 1 on each of the inputs f − 1 (1) . Analogou s to computat ional com- ple xity , BPP cc k (respe ctiv ely , NP cc k ) is the class of functions f : ( { 0 , 1 } n ) k → { 0 , 1 } with R k ( f ) 6 (log n ) O (1) (respe ctiv ely , N k ( f ) 6 (log n ) O (1) ). S ee [KN97] for furt her details . A crucial tool for pro ving communica tion lo wer bounds is the disc r epancy method. G i ven a fun ction f : X × Y → { 0 , 1 } and a distrib ution µ on X × Y , the discr epancy of f with re spect to µ is defined as disc µ ( f ) = max S ⊆ X , T ⊆ Y X x ∈ S X y ∈ T ( − 1) f ( x ,y ) µ ( x , y ) . This definitio n gen eralizes to the multiparty case as follows. Fix f : X 1 × · · · × X k → { 0 , 1 } and a distrib ution µ on X 1 × · · · × X k . T he discr epancy of f with r espect to µ is defined as disc µ ( f ) = max φ 1 ,...,φ k X ( x 1 ,..., x k ) ∈ X 1 ×···× X k ψ ( x 1 , . . . , x k ) k Y i = 1 φ i ( x 1 , . . . , x i − 1 , x i + 1 , . . . , x k ) , where ψ ( x 1 , . . . , x k ) = ( − 1) f ( x 1 ,..., x k ) µ ( x 1 , . . . , x k ) and the m aximum ranges ove r all functi ons φ i : X 1 × · · · X i − 1 × X i + 1 × · · · X k → { 0 , 1 } , for i = 1 , 2 , . . . , k . N ote that for k = 2 , this definition is ident ical to the one giv en previ ously for the two-par ty model. W e put d isc( f ) = min µ disc µ ( f ) . W e identify a fun ction f : X 1 × · · · × X k → { 0 , 1 } with its communicat ion ten sor M ( x 1 , . . . , x k ) = ( − 1) f ( x 1 ,..., x k ) and speak of the disc repanc y of M and f interch angeably (and lik ewise for oth er comple xity measures , such as R k ( f )). Discrepa ncy is di ffi cult t o an alyze as defined. T ypicall y , one uses th e fol lowin g well-kno wn estimate , deriv ed by repea ted application s of the Cauchy -Schwartz inequa lity . Theor em 2.2 ([BNS92, CT 93, Raz00]) . F ix f : X 1 × · · · × X k → { 0 , 1 } and a distrib ution µ on X 1 × · · · × X k . P ut ψ ( x 1 , . . . , x k ) = ( − 1) f ( x 1 ,..., x k ) µ ( x 1 , . . . , x k ) . T hen disc µ ( f ) | X 1 | · · · | X k | ! 2 k − 1 6 E x 0 1 ∈ X 1 x 1 1 ∈ X 1 · · · E x 0 k − 1 ∈ X k − 1 x 1 k − 1 ∈ X k − 1 E x k ∈ X k Y z ∈{ 0 , 1 } k − 1 ψ ( x z 1 1 , . . . , x z k − 1 k − 1 , x k ) . 7 In the case of k = 2 parties, there are other ways to estimate the discrepan cy , e.g., using the spectral norm of a matrix. For a function f : X 1 × · · · × X k → { 0 , 1 } and a dist ribu tion µ ov er X 1 × · · · × X k , let D k ,µ ǫ ( f ) denote the least cost of a deterministic protoco l for f whose probab ility of error with respect to µ is at most ǫ . This quantity is kn own as th e µ -distrib utional comple xity of f . Since a randomized protoc ol can be vie wed as a probabili ty distrib ution ov er determinist ic protocol s, we immediately hav e that R k ǫ ( f ) > max µ D k ,µ ǫ ( f ) . W e are no w ready to state the discrepan cy method. Theor em 2.3 (Discrepanc y method; see [KN97]) . F or every f : X 1 × · · · × X k → { 0 , 1 } , e very distrib ution µ on X 1 × · · · × X k , and every γ ∈ (0 , 1] , R k 1 / 2 − γ/ 2 > D k ,µ 1 / 2 − γ/ 2 ( f ) > log 2 γ disc µ ( f ) . In other words, a function with small discrepanc y is hard to compute to any non- tri vial adv antage ov er random guessing (let alone compute it to high accurac y). In the case of k = 2 parties, discrepan cy yields analogou s lower bounds ev en in the quantu m model, regardl ess of prior entan glement [Kre95, Kla01, LS07b]. 3 The Degr ee / Discr epancy Theorem This section present s the author’ s Degree / Discrepanc y Theor em, who se proof tech- nique is the fou ndation for much of the subs equent work su rve yed in this arti- cle [She07b, Cha07, LS07, CA08, DP08]. The origina l moti v ation b ehind th is re sult came from circuit complexi ty . A nat- ural and well-stud ied co mputation al mode l is that of a polynomia l-size c ircuit of majority gates. Research has shown that majori ty circuit s of depth 2 and 3 already posses s surprising computationa l po wer . Indeed, it is a long-s tanding open prob- lem [KP97] to exhi bit a Boolean function that cannot be compute d by a depth-3 majority circuit of polynomia l size. Another e xtensi vely studied model is that of polynomial-si ze consta nt-depth circuit s with and , or , n o t gates, denoted by AC 0 . Allender’ s classic result [All89 ] states that e very func tion in AC 0 can be computed by a depth-3 majority circuit of quasip olynomial size. Krause and Pudl ´ ak [KP97, § 6] ask whether this simulation can be improv ed, i.e., whether ev ery function in AC 0 can be computed by a depth- 2 majority circ uit of quasi polynomia l size. W e recently gav e a strong negati ve answer to this questio n: 8 Theor em 3.1 ([She07a]) . Ther e is a function F : { 0 , 1 } n → { 0 , 1 } , e xplicitl y given and computab le by an AC 0 cir cuit of depth 3 , whose computation r equir es a ma- jority vote of exp( Ω ( n 1 / 5 )) thr eshold gates. W e prove d Theorem 3.1 by exhib iting an A C 0 functi on with expone ntially small discre pancy . All previo usly known functions with expon entially small disc rep- anc y (e.g., [GHR92, Nis93]) conta ined p arity or majority as a sub function and therefo re could not be computed in A C 0 . Buhrman et al. [BVW07] obtained , inde- pende ntly of the author and with much di ff erent techniques , another AC 0 functi on with e xponen tially small disc repanc y , thereby also answeri ng Krause and Pud l ´ ak’ s questi on. 3.1 Bounding the Discrep ancy via the Thr eshold Degree T o construct an A C 0 functi on w ith small discrepanc y , we dev eloped in [She07a] a nov el techniqu e for generatin g low-disc repanc y functions, which we no w de scribe. This tec hnique is not specia lized in an y way to A C 0 b ut, rather , is based on the abstra ct notion of threshol d degree. For a Boolean fun ction f : { 0 , 1 } n → { 0 , 1 } , recall from Section 1 tha t its thr eshold de gr ee deg ± ( f ) is the minimum de gree of a polynomia l p ( x 1 , . . . , x n ) with p ( x ) > 0 ⇔ f ( x ) = 1 and p ( x ) < 0 ⇔ f ( x ) = 0 . In man y cases [MP88], it is straightforw ard to obtain strong lower bound s on the threshold deg ree. S ince the thresho ld degree is a measure of the comple xity of a giv en Boolean functio n, it is natural to wonder whether it can yield lo wer bound s on communicat ion in a suitab le setting. As we pro ve in [She07a], this intuit ion turns out to be correct for e very f . More prec isely , fix a Boolean func tion f : { 0 , 1 } n → { 0 , 1 } with threshold deg ree d . Let N be a gi ven intege r , N > n . In [She07a], we introdu ced the two- party communicati on problem of computing f ( x | V ) , where the Boolean string x ∈ { 0 , 1 } N is Alice’ s input and the set V ⊂ { 1 , 2 , . . . , N } of size | V | = n is Bob’ s input. The symbol x | V stands for the projectio n of x onto the indices in V , in other words , x | V = ( x i 1 , x i 2 , . . . , x i n ) ∈ { 0 , 1 } n , where i 1 < i 2 < · · · < i n are the elements of V . Intuiti vely , this proble m models a situatio n when Alice and Bob’ s joint computati on depend s on only n of the inputs x 1 , x 2 , . . . , x N . Alice kno ws the valu es of all the inputs x 1 , x 2 , . . . , x N b ut does not kno w which n of them are relev ant. Bob, on the other hand, knows which n inputs are relev ant b ut does not kno w their v alues. As one would hope, it turns out that d is a lower bound on the communicati on requirement s of this problem: 9 Theor em 3.2 (Degree / Disc repanc y Theorem [She 07a]) . Let f : { 0 , 1 } n → { 0 , 1 } be given with thr eshold de gr ee d > 1 . Let N be a given inte ger , N > n . Define F = [ f ( x | V )] x , V , wher e the r ows are inde xed by x ∈ { 0 , 1 } N and columns by V ∈ [ N ] n . Then disc( F ) 6 4e n 2 N d ! d / 2 . T o our kno w ledge, Theorem 3 .2 is the first use of the th reshold de gree to pro ve communica tion lo wer bounds. Gi ven a function f with thresho ld de gree d , The- orem 3.2 generates a communication problem with discrepa ncy at most 2 − d (by setting N > 16e n 2 / d ). This expo nentially small discrepanc y immediately giv es an Ω ( d ) lo wer bound on communi cation in a v ariety of models (determini stic, nonde- terminist ic, rando mized, quan tum with and without entanglemen t; see Section 2.2). Moreo ver , the resulting lo wer bounds on communication remain v alid w hen Alice and Bob merely seek to predict the an swer with non neglig ible adv antage , a critica l aspect for lo wer bounds against threshold circuits. W e will gi ve a detailed proof of the Degree / Disc repanc y Theorem in the next subsec tion. For no w we will briefly sk etch ho w we used it in [She07a] to prov e the main result of that paper , Theorem 3.1 abov e, on the exis tence of an AC 0 functi on that requir es a depth-2 majority circuit of expone ntial size. Consider the function f ( x ) = m _ i = 1 4 m 2 ^ j = 1 x i j , for which M insk y and Papert [MP 88] sho w ed that deg ± ( f ) = m . Since f has high thresh old deg ree, an applica tion of Theorem 3.2 to f yields a communicati on prob- lem with low discrep ancy . This communicat ion problem itself can be vie w ed as an A C 0 circuit of dep th 3 . Recalling that its discre pancy is exp onentiall y small, we conclu de that it cannot be computed by a depth- 2 majority circuit of subexpo nen- tial size. 3.2 Pr oof of the Degree / Discr epancy Theorem A ke y i ngredien t in our proof is the follo wing dual character ization of the threshold deg ree, which is a classica l result known in greate r generality as Gordan’ s Tra ns- positi on Theorem [Sch98, § 7.8]: Theor em 3.3. Let f : { 0 , 1 } n → { 0 , 1 } be arbitr ary , d a nonne gative inte ger . Then e xactly o ne o f the fo llowing h olds: (1) f has t hr eshold d e gr ee at most d ; (2) th er e is a distri but ion µ o ver { 0 , 1 } n suc h that E x ∼ µ [( − 1) f ( x ) χ S ( x )] = 0 for | S | = 0 , 1 , . . . , d . 10 Theorem 3.3 follo ws from linear -programming duality . W e will also mak e the follo wing simple observ ation. Observ ation 3.4. Let κ ( x ) be a pr obabil ity distrib ution on { 0 , 1 } r . F ix i 1 , . . . , i r ∈ { 1 , 2 , . . . , r } . Then P x ∈{ 0 , 1 } r κ ( x i 1 , . . . , x i r ) 6 2 r −|{ i 1 ,..., i r }| , wher e |{ i 1 , . . . , i r }| denotes the number of distinc t inte ger s among i 1 , . . . , i r . W e are now ready for the proo f of the Degree / Discrepanc y T heorem. Theor em 3.2 (Restated from p. 10). Let f : { 0 , 1 } n → { 0 , 1 } be given w ith thr esh- old de gr ee d > 1 . Let N be a given inte ger , N > n . Define F = [ f ( x | V )] x , V , wher e the r ows are i nde xed by x ∈ { 0 , 1 } N and columns by V ∈ [ N ] n . Then disc( F ) 6 4e n 2 N d ! d / 2 . Pr oof [She07a] . Let µ be a pro bability distrib ution ov er { 0 , 1 } n with respe ct to which E z ∼ µ [( − 1) f ( z ) p ( z )] = 0 for e very real-v alued fun ction p of d − 1 or fewer of the vari ables z 1 , . . . , z n . The existe nce of µ is assured by Theorem 3.3. W e will analyz e the discrepa ncy of F with respect to the distrib ution λ ( x , V ) = 2 − N + n N n ! − 1 µ ( x | V ) . Define ψ : { 0 , 1 } n → R by ψ ( z ) = ( − 1) f ( z ) µ ( z ) . By Theorem 2.2, disc λ ( F ) 2 6 4 n E V , W | Γ ( V , W ) | , (3.1) where we put Γ ( V , W ) = E x [ ψ ( x | V ) ψ ( x | W )] . T o analyz e this expre ssion, we prov e two ke y claims. Claim 3.5. Assume that | V ∩ W | 6 d − 1 . T hen Γ ( V , W ) = 0 . Pr oof. T he claim is immediate from the f act tha t the Fourie r trans form of ψ is suppo rted on charact ers of o rder d and hig her . For completene ss, we will now giv e a more detailed and elementar y e xplanati on. Assume for notation al con ve nience that V = { 1 , 2 , . . . , n } . Then Γ ( V , W ) = E x [ µ ( x 1 , . . . , x n )( − 1) f ( x 1 ,..., x n ) ψ ( x | W )] = 1 2 N X x 1 ,..., x n µ ( x 1 , . . . , x n )( − 1) f ( x 1 ,..., x n ) X x n + 1 ,..., x N ψ ( x | W ) = 1 2 N E ( x 1 ,..., x n ) ∼ µ ( − 1) f ( x 1 ,..., x n ) · X x n + 1 ,..., x N ψ ( x | W ) | { z } ∗ . 11 Since | V ∩ W | 6 d − 1 , the starred expr ession is a real-v alued function of at most d − 1 varia bles. The claim follo w s by the definition of µ. Claim 3.6. Assume that | V ∩ W | = i . Then | Γ ( V , W ) | 6 2 i − 2 n . Pr oof. T he claim is immediate from Observ ation 3.4. For completene ss, we w ill gi ve a more detaile d expl anation. For not ational con veni ence, assume that V = { 1 , 2 , . . . , n } , W = { 1 , 2 , . . . , i } ∪ { n + 1 , n + 2 , . . . , n + ( n − i ) } . W e hav e: | Γ ( V , W ) | 6 E x [ | ψ ( x | V ) ψ ( x | W ) | ] = E x 1 ,..., x 2 n − i [ µ ( x 1 , . . . , x n ) µ ( x 1 , . . . , x i , x n + 1 , . . . , x 2 n − i )] 6 E x 1 ,..., x n [ µ ( x 1 , . . . , x n )] | {z } = 2 − n · max x 1 ,..., x i E x n + 1 ,..., x 2 n − i [ µ ( x 1 , . . . , x k , x n + 1 , . . . , x 2 n − i )] | {z } 6 2 − ( n − i ) . The bound s 2 − n and 2 − ( n − i ) follo w because µ is a p robabilit y distrib ution. In vie w of C laims 3.5 and 3.6, ine quality (3.1) simplifies to disc λ ( F ) 2 6 n X i = d 2 i P [ | V ∩ W | = i ] , which complete s the proof of Theorem 3.2 after some routine calculatio ns. The discrepanc y bou nd in T heorem 3.2 is no t tigh t. In follo w-up work (see Section 5.1), the author prov ed a substant ially str onger bound using matrix-analyt ic techni ques. H o wev er , that matrix-a nalytic approac h does not s eem to e xtend to the multipar ty model, and a s we will se e later in Sections 6 and 7, all multiparty p apers in this surv ey use adaptati ons of the analysis just presented. 4 The Generalized Discr epancy Method As we sa w in Section 2.2, the discrepan cy method is particular ly strong in tha t it gi ves communication lo wer bounds not onl y for boun ded-error protocols but also for protocols with error van ishingly close to 1 2 . Ironica lly , this strength of the discre pancy meth od is also its weakness. For example, the disjointness function disj ( x , y ) = W n i = 1 ( x i ∧ y i ) has a simple lo w -cost protocol with error 1 2 − Ω 1 n . 12 As a result, disjointness has high discrepanc y , and no use ful lo wer bounds can be obtain ed for it via the discrepanc y method. Y et it is well-kno wn that dis - jointness ha s bound ed-error communi cation comple xity Ω ( n ) in th e rando mized model [KS92, Raz92] and Ω ( √ n ) in the quantu m model [Raz03]. The remainder of this surv ey (Sections 5–7) is concern ed with bounded - error communication . Crucial to this de velop ment is the gen eraliz ed discrep - anc y m ethod, an ingenious extensi on of the traditional discrepanc y method that a vo ids the di ffi culty just cited. T o our knowled ge, this idea originated in a paper by Klauck [Kla01, Thm. 4] and was reformu lated in its curren t form by Razboro v [Raz03]. The de velop ment in [Kla01] and [Raz03] ta kes plac e in the quantu m model of communication. Howe ver , the basic mathematica l techni que is in no way restric ted to the quan tum model, and we will focus here on a model- indepe ndent versio n of the generaliz ed discrepan cy method from [She07b, § 2.4]. Specifically , consid er an arbitr ary communication model and let f : X × Y → { 0 , 1 } be a giv en functi on whose communicati on complexit y w e wish to estimate. Suppose w e can find a function h : X × Y → { 0 , 1 } and a distrib ution µ on X × Y that satisfy the follo wing two prope rties. 1. Corr elation of f and h . The functions f and h are well corre lated under µ : E ( x ,y ) ∼ µ h ( − 1) f ( x ,y ) + h ( x ,y ) i > ǫ , (4.1) where ǫ > 0 is typically a constant . 2. Hardness of h . No lo w-cost protocol Π in the gi ven model of communica tion can compute h to a substant ial adv antage under µ. Formally , if Π is a protocol in the gi ven model with cos t C , then E ( x ,y ) ∼ µ h ( − 1) h ( x ,y ) E h ( − 1) Π ( x ,y ) ii 6 2 O ( C ) γ, (4.2) where γ = o (1) . The inner exp ectation in (4.2) is over the internal operat ion of the proto col on the fixed inp ut ( x , y ) . If the above two conditio ns hold, we claim that any protocol in the giv en model that compute s f with erro r at most ǫ / 3 on each i nput must ha ve cost Ω log ǫ γ . Indeed, let Π be a protoc ol with P [ Π ( x , y ) , f ( x , y )] 6 ǫ / 3 for all x , y. Then stand ard manipula tions rev eal: E ( x ,y ) ∼ µ h ( − 1) h ( x ,y ) E h ( − 1) Π ( x ,y ) ii > E ( x ,y ) ∼ µ h ( − 1) f ( x ,y ) + h ( x ,y ) i − 2 · ǫ 3 (4.1) > ǫ 3 . In vie w of (4.2), this sho ws that Π must hav e cost Ω log ǫ γ . 13 The abo ve framewor k from [She07b] is meant to emphasize the basic mathe- matical technique in question , which is indep endent of the communicatio n model. Indeed , the communication model enters the picture only in (4.2). It is here that the analys is must exploit the particula rities of the model. T o place an uppe r bound on the adv antage under µ in the quantum model with entangle ment, one considers the qua ntity k K k √ | X | | Y | , where K = [( − 1) h ( x ,y ) µ ( x , y )] x ,y . In the rando mized m odel and the quant um model w ithout entanglemen t, the quantity to estimate happens to be disc µ ( h ) . (In fact, Linial and Shraibman [LS07b] recently sho wed that disc µ ( h ) also works in the quan tum model with entang lement.) For future referenc e, we no w record a quantitat iv e v ersion of the general ized discre pancy method for the quant um model. Theor em 4.1 ([She07b], implicit in [Raz03, SZ07]) . Let X , Y be finite sets and f : X × Y → { 0 , 1 } a given functio n. Let K = [ K x y ] x ∈ X , y ∈ Y be any r eal matrix w ith k K k 1 = 1 . Then for each ǫ > 0 , 4 Q ǫ ( f ) > 4 Q ∗ ǫ ( f ) > h F , K i − 2 ǫ 3 k K k √ | X | | Y | , wher e F = h ( − 1) f ( x ,y ) i x ∈ X , y ∈ Y . Observ e that Theorem 4.1 uses slightly m ore succinct notation (matrix vs. function; weighted sum vs. expe ctation) b ut is equi va lent to the abstrac t formulation abov e. So far , we ha ve focused on two- party communication . This d iscussio n extends essent ially word-for -word to the m ultipar ty model, with discre pancy serving once again as the natu ral measure of the adv antage att ainable by low-co st prot ocols. This ext ension was formalized by Lee and S hraibman [LS 07, Thms. 6, 7] and in- depen dently by Chattop adhyay and Ada [CA08, Lem. 3.2], who prov ed (4.3) and (4.4) belo w , respecti vely: Theor em 4.2 (cf. [LS07, CA08]) . F ix F : X 1 × · · · × X k → {− 1 , + 1 } and ǫ ∈ [0 , 1 / 2) . Then 2 R k ǫ ( F ) > (1 − ǫ ) max H , P h H ◦ P , F i − 1 1 − ǫ ǫ disc P ( H ) (4.3) and 2 R k ǫ ( F ) > max H , P ( h H ◦ P , F i − 2 ǫ disc P ( H ) ) , (4.4) wher e in both cases H r ange s over sign tensors an d P r ange s ove r ten sors w ith P > 0 and k P k 1 = 1 . 14 Pr oof. F ix an optimal ǫ -error protoc ol Π for F . Define ˜ F ( x 1 , . . . , x k ) = E [( − 1) Π ( x 1 ,..., x k ) ] , wher e the e xpecta tion is ov er any int ernal rando mization in Π . Let δ ∈ (0 , 1] be a parameter to be fi xed la ter . T hen 2 R k ǫ ( F ) disc P ( H ) > h H ◦ P , ˜ F i = δ ( h H ◦ P , F i + * H ◦ P , 1 δ ˜ F − F +) > δ ( h H ◦ P , F i − 1 δ max {| 1 − δ − 2 ǫ | , 1 − δ } ) . where the first inequality restates the original discrepanc y method (Theorem 2.3). No w (4.3) and (4.4) follo w by settin g δ = 1 − ǫ and δ = 1 , respe ctiv ely . The proof in [CA08] is similar to the one just giv en for the special case δ = 1 . The proof in [LS07] is rather di ff erent and works by defining a suitab le norm and passin g to its dual. The norm-based approach was employed earlier by L inial and Shraibman [LS07b] and can be thoug ht of as a purely analy tic analo gue of the genera lized discrepan cy method. 5 T wo-P arty Bounded-Err or Communication For a functi on f : { 0 , 1 } n → R , recall from Section 1 that its ǫ -app roximate degree deg ǫ ( f ) is the least degree of a polyno mial p ( x 1 , . . . , x n ) with | f ( x ) − p ( x ) | 6 ǫ for all x ∈ { 0 , 1 } n . W e move on to discus s two recent papers on bounded-e rror com- municati on that use the notion of ap proximate de gree to co ntrib ute strong lower bound s for rather broad classes of functi ons, subsumin g Razboro v’ s breakthro ugh work on symmetric predicates [Raz03]. These lo wer bound s are valid not only in the randomized model, but also in the quantum model (rega rdless of entangl ement). The setting in which to view these two works is Klauck and Razboro v’ s gen- eralize d discrepanc y m ethod (see Sections 1 and 4). Let F be a sign matrix whose bound ed-error qua ntum communi cation complexi ty is of int erest. The quan tum ver sion of this method (Theorem 4.1) states that to prov e a communication lo wer bound for F , it su ffi ces to exhibit a real matrix K such that h F , K i is lar ge b ut k K k is small. The importan ce of the generalized discrepan cy meth od is that it makes it possible, in theory , to pro ve lower bound s for functio ns such as disjointness , to which the traditio nal discrepa ncy method (Theorem 2.3) does not apply . The hard part, of course , is findin g the matr ix K . Except in rather restri cted cases [Kla01, Thm. 4], it was not kno wn h ow to do it. As a result, the general- ized discrepanc y method was of limited practica l use . (In particular , Razboro v’ s celebr ated work [Raz03] did not use the generalized discrepanc y method. Instead , 15 he int roduced a nove l alterna te techniq ue that was restricte d to symmetric func- tions.) This di ffi culty was overc ome indep endently by Sherstov [She07b] and Shi and Zhu [SZ07], w ho used the dual characteriz ation of the approximate de gree to obtain the matrix K for a broad range of problems. T o our kno w ledge, the work in [She07b] and [SZ07] is the fi rst use of the dual character ization of the approx i- mate degre e to pro ve communicat ion lower bou nds. The specifics of the se two wor ks are very di ff erent. The constructio n of K in [She07b], which we called the pattern matrix method for lower bounds on bound ed-error communication, is b uilt around a ne w matrix-an alytic technique (the pattern matri x ) ins pired by the author’ s D egre e / Discrepanc y Theorem. The constr uction of K in [SZ07], the block -compositi on method, is based on the idea of hardness amplificatio n by composi tion. What unites them is use of the dual charac terization of the approximate degree , gi ven by the follo wing theorem. Theor em 5.1 ([She0 7b , SZ07]) . F ix ǫ > 0 . Let f : { 0 , 1 } n → R be g iven with d = deg ǫ ( f ) > 1 . T hen ther e is a functio n ψ : { 0 , 1 } n → R su ch that: ˆ ψ ( S ) = 0 for | S | < d , X z ∈{ 0 , 1 } n | ψ ( z ) | = 1 , X z ∈{ 0 , 1 } n ψ ( z ) f ( z ) > ǫ . Theorem 5.1 foll ows from linear -programming dua lity . W e shall first cov er the two papers indi vidually in Sections 5.1 and 5.2 and then compare them in detail in Section 5.3. 5.1 The Patter n Matrix Method The setting for this work resembles that of the Degree / Discrepanc y Theorem in [She0 7a] (see Section 3). L et N and n be positi ve inte gers, where n 6 N / 2 . For con ven ience, we will further assume that n | N . Fix an arbitrary function f : { 0 , 1 } n → { 0 , 1 } . Consi der the communica tion problem of computin g f ( x | V ) , where the bit string x ∈ { 0 , 1 } N is Alice’ s input and the set V ⊂ { 1 , 2 , . . . , N } with | V | = n is Bob’ s input. A s before, x | V denote s the projection of x onto the indices in V , i.e., x | V = ( x i 1 , x i 2 , . . . , x i n ) ∈ { 0 , 1 } n where i 1 < i 2 < · · · < i n are the elements of V . 16 The similarities with [She07a], howe ver , do not extend beyon d this point. Un- like that earlier w ork, we will ac tually st udy the easier communic ation prob lem in which Bob’ s inpu t V is restricted to a rather special form. N amely , we will only allow those sets V th at contai n precisely one element from each block in the follo wing partition of { 1 , 2 , . . . , N } : 1 , 2 , . . . , N n ∪ ( N n + 1 , . . . , 2 N n ) ∪ · · · ∪ ( ( n − 1) N n + 1 , . . . , N ) . (5.1) Even for this easier communication p roblem, we will pr ove a much stronge r re- sult than w hat would hav e been possible in the original setting with the methods of [She07a]. In particul ar , we will considera bly improv e the Degree / Discrepanc y Theorem from [She07a] along the way . T he main results of this work are as fol- lo ws. Theor em 5 .2 ([She07b]) . A ny classic al or quantu m pr otoco l, with or without prior entang lement, that computes f ( x | V ) with err or pr obability at most 1 / 5 on eac h input has communicati on cost at least 1 4 deg 1 / 3 ( f ) · log N 2 n − 2 . In vie w of the restric ted form of B ob’ s inputs, we can restate T heorem 5.2 in terms of functi on compositio n. Setting N = 4 n f or concretene ss, we ha ve: Cor ollary 5.3 ([She07b]) . Let f : { 0 , 1 } n → { 0 , 1 } be given. Define F : { 0 , 1 } 4 n × { 0 , 1 } 4 n → { 0 , 1 } by F ( x , y ) = f x 1 y 1 ∨ x 2 y 2 ∨ x 3 y 3 ∨ x 4 y 4 , x 5 y 5 ∨ x 6 y 6 ∨ x 7 y 7 ∨ x 8 y 8 , . . . x 4 n − 3 y 4 n − 3 ∨ x 4 n − 2 y 4 n − 2 ∨ x 4 n − 1 y 4 n − 1 ∨ x 4 n y 4 n , wher e x i y i = ( x i ∧ y i ) . Any classical or quantum pr otocol, with or without prior entang lement, that computes F ( x , y ) with err or pr obability at most 1 / 5 on each input has cost at least 1 4 deg 1 / 3 ( f ) − 2 . W e now turn to the proof. Let V ( N , n ) denote the set of B ob’ s inputs , i.e., the family of subsets V ⊆ [ N ] that hav e exact ly on e element in each of the blocks of the partiti on (5.1). C learly , |V ( N , n ) | = ( N / n ) n . W e will b e work ing with the follo wing family of matrice s. 17 Definition 5 .4 (Pat tern matrix [She07b]) . For φ : { 0 , 1 } n → R , the ( N , n , φ ) -patter n matrix is the real matrix A gi ven by A = h φ ( x | V ⊕ w ) i x ∈{ 0 , 1 } N , ( V ,w ) ∈V ( N , n ) ×{ 0 , 1 } n . In words, A is the matrix of size 2 N by 2 n ( N / n ) n whose ro ws are index ed by string s x ∈ { 0 , 1 } N , w hose columns are inde xed by pairs ( V , w ) ∈ V ( N , n ) × { 0 , 1 } n , and whose entries are giv en by A x , ( V ,w ) = φ ( x | V ⊕ w ) . T he logic behin d the term “patte rn matrix” is as follows: a mosaic arises from repetition s of a pattern in the same way that A arises from application s of φ to vario us subset s of the v ariables. Our intermediat e goal will be to determine the spectral norm of any gi ven pat- tern matrix A . T oward that en d, we will actually end up d etermining e very singu lar v alue of A and its multiplic ity . Our approach will be to represent A as the sum of simpler matrices and analy ze them instead. For this to work , we need to be abl e to recons truct the singular v alues of A from those of the simpler matrices. Just when this can be done is the subje ct of the follo wing lemma from [She07b]. Lemma 5.5 (Singu lar values of a matrix sum [She07b]) . Let A , B be r eal matrices with A B T = 0 and A T B = 0 . Then the nonzer o singula r values of A + B , coun ting multiplic ities, ar e σ 1 ( A ) , . . . , σ rank A ( A ) , σ 1 ( B ) , . . . , σ rank B ( B ) . W e are ready to analyze the singular values of a patte rn matrix. Theor em 5.6 (Singular valu es of a pattern matrix [She07b]) . L et φ : { 0 , 1 } n → R be given . L et A be the ( N , n , φ ) -pattern m atrix. Then the nonz er o singula r values of A , counting multiplicities , are : [ S : ˆ φ ( S ) , 0 r 2 N + n N n n · | ˆ φ ( S ) | n N | S | / 2 , r epeated N n | S | times . In particu lar , k A k = r 2 N + n N n n max S ⊆ [ n ] ( | ˆ φ ( S ) | n N | S | / 2 ) . Pr oof [She07b] . For each S ⊆ [ n ] , let A S be the ( N , n , χ S )-patte rn matrix. Then A = P S ⊆ [ n ] ˆ φ ( S ) A S . Fo r any S , T ⊆ [ n ] with S , T , a calculation rev eals that A S A T T = 0 and A T S A T = 0 . By L emma 5.5, th is mean s that the nonzero si ngular v alues of A are the union of the nonzero singula r v alues of all ˆ φ ( S ) A S , counting multiplic ities. Therefo re, the proof will be complete once we show that the only nonze ro singular va lue of A T S A S is 2 N + n ( N / n ) n −| S | , with multiplici ty ( N / n ) | S | . 18 For this , it is con veni ent to write A T S A S as the Kroneck er produc t A T S A S = [ χ S ( w ) χ S ( w ′ )] w,w ′ ⊗ X x ∈{ 0 , 1 } N χ S ( x | V ) χ S ( x | V ′ ) V , V ′ . The first matrix in this fact orization has rank 1 and entries ± 1 , w hich means that its only nonzero singular value is 2 n with multiplicit y 1 . The other matrix, call it M , is permutation -similar to 2 N diag( J , J , . . . , J ) , w here J is the all-ones square matrix of order ( N / n ) n −| S | . This m eans that the only nonzero singula r val ue of M is 2 N ( N / n ) n −| S | with multiplicit y ( N / n ) | S | . It follo ws from elementary propertie s of the Kroneck er product that the spectru m of A T S A S is as desired . W e are now prepared to formulate and prov e the pattern matrix method for lo wer bound s on bounded-e rror communicatio n, which gi ves strong lo wer bounds for ev ery p attern matrix generated by a Boolean function with high approximate deg ree. Theorem 5.2 and its corollar y will fall out readily as consequ ences. Theor em 5.7 (Patter n matrix method [She07b]) . Let F be the ( N , n , f ) -pattern matrix, wher e f : { 0 , 1 } n → { 0 , 1 } is gi ven. Put d = deg 1 / 3 ( f ) . T hen Q 1 / 5 ( F ) > Q ∗ 1 / 5 ( F ) > 1 4 d log N n − 2 . Pr oof [She07b] . Define f ∗ : { 0 , 1 } n → {− 1 , + 1 } by f ∗ ( z ) = ( − 1) f ( z ) . T hen it is e asy to veri fy that deg 2 / 3 ( f ∗ ) = d . By Theorem 5.1, there is a function ψ : { 0 , 1 } n → R such that: ˆ ψ ( S ) = 0 for | S | < d , (5.2) X z ∈{ 0 , 1 } n | ψ ( z ) | = 1 , (5.3) X z ∈{ 0 , 1 } n ψ ( z ) f ∗ ( z ) > 2 3 . (5.4) Let M be the ( N , n , f ∗ )-patte rn matri x. Let K be the ( N , n , 2 − N ( N / n ) − n ψ )-patte rn matrix. Immediate conseque nces of (5.3) and (5.4) are: k K k 1 = 1 , h K , M i > 2 3 . (5.5) Our last task is to calcul ate k K k . By (5.3) and P roposi tion 2.1, max S ⊆ [ n ] | ˆ ψ ( S ) | 6 2 − n . (5.6) 19 Theorem 5.6 yields, in vie w of (5.2) and (5.6): k K k 6 n N d / 2 2 N + n N n n ! − 1 / 2 . (5.7) The desired lower bounds on quantum communication no w follo w dir ectly from (5.5) and (5.7) by the generali zed discrepa ncy method (Theorem 4.1). Remark 5.8 . In the proof of T heorem 5.7, w e bounded k K k using the subtle cal- culatio ns of the spectrum of a patter n matrix. Anoth er possibi lity would be to bound k K k precisel y in the same way that w e bounded the discrepanc y in the D e- gree / Discr epancy Theorem (se e Section 3). This, ho we ver , would result in polyno - mially weake r lo wer bounds on communicatio n. Theorem 5.7 immediately implies Theorem 5.2 abov e and its corollar y: Pr oof of Theor em 5.2 [She07b] . The j N 2 n k n , n , f -patter n matrix occurs as a sub- matrix of [ f ( x | V )] x ∈{ 0 , 1 } N , V ∈V ( N , n ) . Impro ved D egr ee / Discr epancy Theorem. W e will mention a fe w more appli- cation s of thi s work. The fi rst of these is an improv ed ver sion of the author’ s Degre e / Discrepanc y Theorem (Theorem 3.2). Theor em 5.9 ([She07b]) . Let F be the ( N , n , f ) -pattern matrix, wher e f : { 0 , 1 } n → { 0 , 1 } has thre shold de gr ee d . Then dis c ( F ) 6 ( n / N ) d / 2 . The proof is similar to the proof of the pattern matrix method. Theore m 5.9 im- pro ves c onsiderab ly on the origin al Degree / Disc repanc y Theorem. T o illustrate , consid er f ( x ) = W m i = 1 V m 2 j = 1 x i j , a function on n = m 3 v ariables. Applying Theo- rem 5.9 to f leads to an e xp( − Θ ( n 1 / 3 )) upper b ound on the di screpanc y of AC 0 , im- pro ving on the pre vious bound of exp( − Θ ( n 1 / 5 )) from [She07a]. The e xp( − Θ ( n 1 / 3 )) bound is also the bound o btained by Buh rman e t al. [ BVW07] i ndepend ently of t he author [She07a, She07b], using a di ff erent function and di ff erent techniq ues. Razbor ov’ s L ower Bounds for Symmetric Fun ctions. As another application, we are abl e to gi ve an alternate proof of R azboro v’ s breakthrou gh result on the quantu m communication complexity of symmetric functions [Raz03]. Consid er a communica tion problem in which Alice has a string x ∈ { 0 , 1 } n , Bob has a string y ∈ { 0 , 1 } n , and their objecti ve is to compute D ( | x ∧ y | ) 20 for some predicate D : { 0 , 1 , . . . , n } → { 0 , 1 } fixed in adv ance. This general setting encompa sses sev eral famili ar functio ns, such as disjointness (de termining if x and y interse ct) and inner pro duct modu lo 2 (dete rmining if x and y intersec t in an odd number of positio ns). As it turns out, th e hardness of this general communication problem depends on whether D changes valu e clo se to the midd le of the ran ge { 0 , 1 , . . . , n } . Specifically , define ℓ 0 ( D ) ∈ { 0 , 1 , . . . , ⌊ n / 2 ⌋} and ℓ 1 ( D ) ∈ { 0 , 1 , . . . , ⌈ n / 2 ⌉} to be the smalle st inte gers such that D is constant in the range [ ℓ 0 ( D ) , n − ℓ 1 ( D )] . Razboro v establish ed optimal lo wer bounds on the quantum communication complexi ty of eve ry function of the form D ( | x ∧ y | ): Theor em 5.10 (Razborov [R az03]) . Let D : { 0 , 1 , . . . , n } → { 0 , 1 } be an arbitrar y pr edicate. P ut f ( x , y ) = D ( | x ∧ y | ) . Then Q 1 / 3 ( f ) > Q ∗ 1 / 3 ( f ) > Ω p n ℓ 0 ( D ) + ℓ 1 ( D ) . In parti cular , disjointness has qu antum communicatio n complexi ty Ω ( √ n ) , rega rd- less of entang lement. Prior to Razboro v’ s resu lt, the best lo wer bound [BW01, ASTS + 03] for disjointness was only Ω (log n ) . In [She07b], we gi ve a ne w proof o f Razborov ’ s Theorem 5.10 using a straight - forwar d applica tion of the pattern matrix method. 5.2 The Block Composition Method Giv en functions f : { 0 , 1 } n → { 0 , 1 } and g : { 0 , 1 } k × { 0 , 1 } k → { 0 , 1 } , let f ◦ g n denote the composition of f with n in dependen t copies of g. More formally , the functi on f ◦ g n : { 0 , 1 } nk × { 0 , 1 } nk → { 0 , 1 } is gi ven by ( f ◦ g n )( x , y ) = f ( . . . , g ( x ( i ) , y ( i ) ) , . . . ) , where x = ( . . . , x ( i ) , . . . ) ∈ { 0 , 1 } nk and y = ( . . . , y ( i ) , . . . ) ∈ { 0 , 1 } nk . This section presents Shi and Zhu’ s bloc k composit ion method [S Z07], which gi ves a lower b ound on the communi cation complex ity of f ◦ g n in te rms of ce rtain proper ties of f and g. T he rele vant property of f is simply its appro ximate deg ree. The rele vant property of g is its spe ctral discr epancy , formalized next. Definition 5.11 (Spe ctral discrepa ncy [SZ07]) . Giv en g : { 0 , 1 } k × { 0 , 1 } k → { 0 , 1 } , its spectr al disc r epancy ρ ( g ) is th e least ρ > 0 for which there e xist sets A , B ⊆ 21 { 0 , 1 } k and a distrib ution µ on A × B such that h µ ( x , y )( − 1) g ( x ,y ) i x ∈ A ,y ∈ B 6 ρ √ | A | | B | , (5.8) h µ ( x , y ) i x ∈ A ,y ∈ B 6 1 + ρ √ | A | | B | , (5.9) and X ( x ,y ) ∈ A × B µ ( x , y )( − 1) g ( x ,y ) = 0 . (5.10) In view of (5.8) alone, the spectral discrepanc y ρ ( g ) is an upper bound on the discre pancy disc( g ) . The ke y addit ional requir ement (5.9) is satisfied , for example , by doub ly stochastic matrices [HJ86, § 8 .7]: if A = B and all ro w and colu mn sums in [ µ ( x , y )] x ∈ A ,y ∈ A are 1 / | A | , then k [ µ ( x , y )] x ∈ A ,y ∈ A k = 1 / | A | . As an illustrat ion, consid er the familia r fun ction inner product modulo 2, g iv en by ip k ( x , y ) = L k i = 1 ( x i ∧ y i ) . Pro position 5.12 ([SZ 07]) . T he funct ion ip k has ρ ( ip k ) 6 1 / √ 2 k − 1 . Pr oof [SZ07] . T ak e µ to be the uniform distrib ution ove r A × B , where A = { 0 , 1 } k \ { 0 k } and B = { 0 , 1 } k . W e are prepared to state the general method. Theor em 5.13 (Block composition m ethod [S Z07]) . Fix f : { 0 , 1 } n → { 0 , 1 } and g : { 0 , 1 } k × { 0 , 1 } k → { 0 , 1 } . Put d = deg 1 / 3 ( f ) and ρ = ρ ( g ) . If ρ 6 d / (2e n ) , then Q ( f ◦ g n ) > Q ∗ ( f ◦ g n ) = Ω ( d ) . Pr oof (adapte d from [SZ07]) . Fix sets A , B ⊆ { 0 , 1 } k and a distrib ution µ on A × B with respect to which ρ = ρ ( g ) is achiev ed. D efine f ∗ : { 0 , 1 } n → {− 1 , + 1 } by f ∗ ( z ) = ( − 1) f ( z ) . Then one readily verifies that deg 2 / 3 ( f ∗ ) = d . By Theorem 5.1, there exist s ψ : { 0 , 1 } n → R such that ˆ ψ ( S ) = 0 for | S | < d , (5.11) X z ∈{ 0 , 1 } n | ψ ( z ) | = 1 , (5.12) X z ∈{ 0 , 1 } n ψ ( z ) f ∗ ( z ) > 2 3 . (5.13) 22 Define matrices F = h f ∗ ( . . . , g ( x ( i ) , y ( i ) ) , . . . ) i x ,y , K = 2 n ψ ( . . . , g ( x ( i ) , y ( i ) ) , . . . ) n Y i = 1 µ ( x ( i ) , y ( i ) ) x ,y , where in b oth cases the ro w in dex x = ( . . . , x ( i ) , . . . ) ranges ove r A n and the column inde x y = ( . . . , y ( i ) , . . . ) ranges ov er B n . In view of (5 .10 ) and (5.13), h F , K i > 2 3 . (5.14) W e proceed to bound k K k . Put M S = Y i ∈ S ( − 1) g ( x ( i ) ,y ( i ) ) · n Y i = 1 µ ( x ( i ) , y ( i ) ) x ,y , S ⊆ [ n ] . Then (5.8) and (5.9) imply , in vie w of the tensor structure of M S , that k M S k 6 | A | − n / 2 | B | − n / 2 ρ | S | (1 + ρ ) n −| S | . (5.15) On the other hand, k K k 6 X S ⊆ [ n ] 2 n | ˆ ψ ( S ) | k M S k = X | S | > d 2 n | ˆ ψ ( S ) | k M S k by (5.11) 6 X | S | > d k M S k by (5.12) and Proposit ion 2.1 6 | A | − n / 2 | B | − n / 2 n X i = d n i ! ρ i (1 + ρ ) n − i by (5.15). Since ρ 6 d / (2e n ) , we furt her ha ve k K k 6 | A | − n / 2 | B | − n / 2 2 − Θ ( d ) . (5.16) In vi ew o f (5.14) an d (5.16), th e desired lo wer bound on Q ∗ ( F ) no w follo ws by the genera lized discrepan cy method (Theorem 4.1). Proposit ion 5.12 and Theorem 5.13 hav e the follo wing consequenc e: 23 Theor em 5.14 ([SZ07]) . Fix a function f : { 0 , 1 } n → { 0 , 1 } and an inte ger k > 2 log 2 n + 5 . Then Q ( f ◦ ip n k ) > Q ∗ ( f ◦ ip n k ) > Ω (deg 1 / 3 ( f )) . For t he disjointness function disj k ( x , y ) = W k i = 1 ( x i ∧ y i ) , Shi an d Zhu pro ve tha t ρ ( disj k ) = O (1 / k ) . Unlike Propos ition 5.12, this fac t requires a nontri vial proof using Knuth’ s calculati on of the eigen valu es of certain combina torial matrices. In conjun ction with Theorem 5.13, t his u pper bound on ρ ( di sj k ) leads with some work to the follo wing implication: Theor em 5.15 ([SZ07]) . Define f : { 0 , 1 } n × { 0 , 1 } n → { 0 , 1 } by f ( x , y ) = D ( | x ∧ y | ) , wher e D : { 0 , 1 , . . . , n } → { 0 , 1 } is given. Then Q ( f ) > Q ∗ ( f ) > Ω n 1 / 3 ℓ 0 ( D ) 2 / 3 + ℓ 1 ( D ) . The symbols ℓ 0 ( D ) and ℓ 1 ( D ) ha ve their meaning from S ection 5.1. Theorem 5.15 is of course a weak er v ersion of Raz borov ’ s cel ebrated lo wer b ounds for symmetric functi ons (Theorem 5.10), obtained with a di ff erent proof. 5.3 Patter n Matrix Method vs. Block Composition Method T o restate the block composition m ethod, Q ∗ ( f ◦ g n ) > Ω (deg 1 / 3 ( f )) pro vided that ρ ( g ) 6 deg 1 / 3 ( f ) 2e n . The key player in this method is the quant ity ρ ( g ) , which needs to be small. This poses two complication s. First, the functio n g will gene rally need to d epend o n many v ariables, from k = Θ (log n ) to k = n Θ (1) , which weake ns the final lo wer bound s on communicati on (recall that ρ ( g ) > 2 − k alw ays). For example, the lower bound s obtain ed in [S Z07] for symmetric functions are polynomia lly weaker than Razboro v’ s optimal lo wer bounds (see Theorems 5.15 and 5.10, respecti vely). A second complication, as Shi and Z hu note, is that “estimating the quantity ρ ( g ) is unfortunate ly di ffi cult in ge neral” [S Z07, § 4.1]. For example, re-p rovin g Razboro v’ s lo wer bounds reduce s to estimating ρ ( g ) with g being the disjointness functi on. Shi and Z hu accomplis h this using Hahn matrices, an adva nced tool that is also the centerpi ece of Razborov ’ s own p roof (Razborov’ s use of Hahn matrices is some what more demanding ). These complica tions do no t arise in the patt ern matrix method. For e xample, it implies (by setting N = 2 n in T heorem 5.7) that Q ∗ ( f ◦ g n ) > Ω (deg 1 / 3 ( f )) 24 for any functio n g : { 0 , 1 } k × { 0 , 1 } k → { 0 , 1 } such that the matrix [ g ( x , y )] x ,y contai ns the follo w ing submatrix, up to permutation s of ro w s and columns: 1 0 1 0 1 0 0 1 0 1 1 0 0 1 0 1 . (5.17) T o illustrate, one can take g to be g ( x , y ) = x 1 y 1 ∨ x 2 y 2 ∨ x 3 y 3 ∨ x 4 y 4 , or g ( x , y ) = x 1 y 1 y 2 ∨ x 1 y 1 y 2 ∨ x 2 y 1 y 2 ∨ x 2 y 1 y 2 . (In particu lar , the pattern matrix method subsumes Theorem 5.14.) T o su mmarize, there is a si mple functio n g on only k = 2 v ariabl es that work s uni versa lly for all f . This means no technical conditio ns to check, such as ρ ( g ) , and no blo w-up in the number of var iables. A s a result, in [She07b] we are able to re-pro ve Razborov ’ s optimal lo wer bounds exactly . M oreo ver , the technical machinery in vo lved is self- contai ned and disjoint from Razboro v’ s proof. W e ha ve just seen th at the pattern matri x method gi ves s trong l ower bounds fo r many functio ns to which the block compositio n method does not apply . Howe ver , this doe s not settle t he exac t relatio nship between the scope s of applica bility of the two methods. S e veral natural questio ns arise. If a function g : { 0 , 1 } k × { 0 , 1 } k → { 0 , 1 } has spectra l discrep ancy ρ ( g ) 6 1 2e , does the matri x [ g ( x , y )] x ,y contai n (5.17 ) as a submatrix, up to permutation s of ro ws and columns? An a ffi rmativ e answer would mean that the pattern matrix method has a strictly greater scope of applica- bility; a negat iv e answer would mean that the block composition method works in some situations where the patter n m atrix method does not apply . If the answer is neg ativ e, what can be said for ρ ( g ) = o (1) or ρ ( g ) = n − Θ (1) ? Another intriguing issue concern s multiparty communicati on. A s we will see in Section 6, the pattern matrix method e xtends readily to the m ultipa rty model. This exten sion mak es heavy use of the fact that the rows of a pattern matrix are applic ations of the same functi on to di ff erent subset s of the variab les. In the gen- eral conte xt of block composi tion (Section 5.2), it is unclear how to carry out this ext ension. It is in viting to exp lore a synthes is of the two metho ds in the multipar ty model or another suitabl e context . 6 Extensions to the Multiparty Model In this section , we presen t extens ions of the Degree / Discre pancy Theorem and of the pattern matrix method to the multipart y model. W e start w ith some notation. 25 Fix a function φ : { 0 , 1 } n → R and an inte ger N w ith n | N . D efine the ( k , N , n , φ ) - patter n ten sor as the k -argu ment f unction A : { 0 , 1 } n ( N / n ) k − 1 × [ N / n ] n ×· · ·× [ N / n ] n → R giv en by A ( x , V 1 , . . . , V k − 1 ) = φ ( x | V 1 ,..., V k − 1 ) , where x | V 1 ,..., V k − 1 def = x 1 , V 1 [1] ,..., V k − 1 [1] , . . . , x n , V 1 [ n ] ,..., V k − 1 [ n ] ∈ { 0 , 1 } n and V j [ i ] de notes the i th element of th e n -dimens ional vector V j . (Note that we inde x th e string x by viewing it as a k -dimensional a rray of n × ( N / n ) × · · · × ( N / n ) = n ( N / n ) k − 1 bits.) This definition generalize s the author’ s pattern matric es if one ignore s the ⊕ op erator (Section 5.1). W e are ready for th e first resul t of this secti on, namely , an exte nsion of the De- gree / Discr epancy T heorem (Theo rem 3.2) to the multi party model. T his e xtension was origin ally obt ained by Chattopa dhyay [Cha07, Lem. 2] for s lightly di ff erent tensor s and has since been revi sited in one for m or anoth er: [LS07, Thm. 19], [CA08, Lem. 4.2]. The proofs of the se sev eral vers ions are quite similar a nd are in close corresp ondence with the original two-party case. Theor em 6.1 ([Cha07, LS07, CA 08]) . Let f : { 0 , 1 } n → { 0 , 1 } be given with thr esh- old de gr ee d > 1 . Let N be a given inte ger , n | N . Let F b e the ( k , N , n , f ) -pattern tensor . If N > 4e n 2 ( k − 1)2 2 k − 1 / d , then disc( F ) 6 2 − d / 2 k − 1 . Pr oof (adapte d from [Cha07, L S07, CA08]) . As in the proof o f the De- gree / Discr epancy Theore m, let µ be a prob ability distrib ution over { 0 , 1 } n with respect to w hich E z ∼ µ [( − 1) f ( z ) p ( z )] = 0 for ev ery real-v alued function p of d − 1 or fe wer of the v ariables z 1 , . . . , z n . The existen ce of µ is assured by Theorem 3.3. W e wil l analyz e the discrepan cy of F with respect to the distrib ution λ ( x , V 1 , . . . , V k − 1 ) = 2 − n ( N / n ) k − 1 + n N n − n ( k − 1) µ ( x | V 1 ,..., V k − 1 ) . Define ψ : { 0 , 1 } n → R by ψ ( z ) = ( − 1) f ( z ) µ ( z ) . By Theorem 2.2, disc λ ( F ) 2 k − 1 6 2 n 2 k − 1 E V | Γ ( V ) | , (6.1) where we put V = ( V 0 1 , V 1 1 , . . . , V 0 k − 1 , V 1 k − 1 ) and Γ ( V ) = E x ψ x | V 0 1 , V 0 2 ,..., V 0 k − 1 | {z } ( † ) Y z ∈{ 0 , 1 } k − 1 \{ 0 k − 1 } ψ x | V z 1 1 , V z 2 2 ,..., V z k − 1 k − 1 | {z } ( ‡ ) . 26 For a fix ed choice of V , de fine sets A = n ( i , V 0 1 [ i ] , . . . , V 0 k − 1 [ i ]) : i = 1 , 2 , . . . , n o , B = n ( i , V z 1 1 [ i ] , . . . , V z k − 1 k − 1 [ i ]) : i = 1 , 2 , . . . , n ; z ∈ { 0 , 1 } k − 1 \ { 0 k − 1 } o . Clearly , A and B are the sets of variab les fe atured in the expres sions ( † ) and ( ‡ ) abo ve, respecti vel y . T o analyze Γ ( V ) , we prov e two ke y claims analogou s to those in the Degree / Disc repanc y Theorem. Claim 6.2. Assume that | A ∩ B | 6 d − 1 . Then Γ ( V ) = 0 . Pr oof. Imm ediate from the f act that the Fourie r transform of ψ is sup ported on charac ters of order d and higher . Claim 6.3. Assume that | A ∩ B | = i . Then | Γ ( V ) | 6 2 i 2 k − 1 − n 2 k − 1 . Pr oof. O bserv ation 3.4 sho w s that | Γ ( V ) | 6 2 − n 2 k − 1 2 n 2 k − 1 −| A ∪ B | . F urthermo re, it is straigh tforward to veri fy that | A ∪ B | > n 2 k − 1 − | A ∩ B | 2 k − 1 . In vie w of C laims 6.2 and 6.3, ine quality (6.1) simplifies to disc λ ( F ) 2 k − 1 6 n X i = d 2 i 2 k − 1 P [ | A ∩ B | = i ] . It remains to bound P [ | A ∩ B | = i ] . For a fixed elemen t a , w e ha ve P [ a ∈ B | a ∈ A ] 6 ( k − 1) n / N by the union bound. Moreo ver , gi ven two dis tinct elemen ts a , a ′ ∈ A , the corres ponding ev ents a ∈ B and a ′ ∈ B are independe nt. Therefore, P [ | A ∩ B | = i ] 6 n i ( k − 1) n N i , w hich yields the des ired bound on disc λ ( F ) . Remark 6.4 . Recall from Section 5.1 that the two-part y Degre e / Discrepanc y The- orem was cons iderably improv ed in [She0 7b] using matrix -analytic techniqu es. Those te chniques , howe ver , do not e xtend to the multiparty model. As a result, Theorem 6.1 that we ha ve just present ed does not subsume the improv ed De- gree / Discr epancy T heorem (Theorem 5.9). W e no w present an adaptat ion of the pattern matrix method (Theorem 5.7) to the multiparty model, obtaine d by Lee and Shraibman [LS07] and independe ntly by Chattopadhy ay and Ada [CA 08]. The proof is closely analogou s to th e two - party case. H o wev er , the spectral calculation s for pattern matrices do not extend to the multipart y model, and one is forced to fall back on the less precise calcula- tions intro duced in the D egr ee / Discrepan cy Theorem (Theor em 3.2). In particu lar , the result w e are about to present does not subsume the two-party pattern matrix method. 27 Theor em 6.5 ([LS07, CA 08]) . Let f : { 0 , 1 } n → { 0 , 1 } be given w ith deg 1 / 3 ( f ) = d > 1 . Let N be a given inte ger , n | N . L et F be the ( k , N , n , f ) -pattern tensor . If N > 4e n 2 ( k − 1)2 2 k − 1 / d , then R k ( F ) > Ω ( d / 2 k ) . Pr oof (adapte d from [LS07, CA08]) . Define f ∗ : { 0 , 1 } n → {− 1 , + 1 } by f ∗ ( z ) = ( − 1) f ( z ) . Then it is easy to verif y that deg 2 / 3 ( f ∗ ) = d . By T heorem 5.1, there is a functi on ψ : { 0 , 1 } n → R such that: ˆ ψ ( S ) = 0 for | S | < d , X z ∈{ 0 , 1 } n | ψ ( z ) | = 1 , X z ∈{ 0 , 1 } n ψ ( z ) f ∗ ( z ) > 2 3 . (6.2) Fix a function h : { 0 , 1 } n → {− 1 , + 1 } and a distrib ution µ on { 0 , 1 } n such that ψ ( z ) ≡ h ( x ) µ ( x ) . Let H be the ( k , N , n , h )-patt ern tensor . Let P be the ( k , N , n , 2 − n ( N / n ) k − 1 + n ( N / n ) − n ( k − 1) µ )-patter n tensor . Then P is a probability distri- b ution. By (6.2), h H ◦ P , F ∗ i > 2 3 , (6.3) where F ∗ is the ( k , N , n , f ∗ )-patte rn tensor . As we saw in t he proo f of Theo rem 6.1, disc P ( H ) 6 2 − d / 2 k − 1 . (6.4) The theorem no w follo ws by the generalized discre pancy metho d (Theorem 4.2) in vie w of (6.3) and (6.4). The authors of [LS 07] and [CA08] gav e important application s of their work to the k -party randomized communication complexity of disjointness , impro ving it from Ω ( 1 k log n ) to n Ω (1 / k ) 2 − O (2 k ) . As a coro llary , they separate d the multiparty communica tion classes NP cc k and BPP cc k for k = (1 − o (1)) log 2 log 2 n parties. They also obtained new results for Lov ´ asz-Schrijv er proof systems, in light of the work due to Beame, Pitassi, and Segerlin d [BP S07]. 7 Separation of NP cc k and BPP cc k W e conc lude this surv ey with a separatio n of NP cc k and BPP cc k for k = (1 − ǫ ) log 2 n parties , due to Da vid and Pitas si [DP 08]. T his is a n expon ential improveme nt over the pre vious separatio n in [LS07, CA08]. The crucial insight in this new work is to redefine the p rojection operat or x | V 1 ,..., V k − 1 from Section 6 using the proba bilistic 28 method. T his remov es the key bottlenec k in the previo us analyse s [LS07, CA08]. Unlik e the pre vious work, howe ver , this new approach no lon ger applie s to dis - jointness . W e start with some notation . Fix intege rs n , m with n > m . Let ψ : { 0 , 1 } m → R be a giv en function with P z ∈{ 0 , 1 } m | ψ ( z ) | = 1 . Let d denote the least ord er of a nonze ro Fourier coe ffi cien t of ψ. Fix a Bo olean function h : { 0 , 1 } m → {− 1 , + 1 } and a dis trib ution µ on { 0 , 1 } m such that ψ ( z ) ≡ h ( z ) µ ( z ) . For a mapping α : ( { 0 , 1 } n ) k → [ n ] m , define a ( k + 1)-party communicatio n problem H α : ( { 0 , 1 } n ) k + 1 → {− 1 , + 1 } by H ( x , y 1 , . . . , y k ) = h ( x | α ( y 1 ,...,y k ) ) . Analogou sly , define a distrib ution λ α on ( { 0 , 1 } n ) k + 1 by λ ( x , y 1 , . . . , y k ) = 2 − ( k + 1) n + m µ ( x | α ( y 1 ,...,y k ) ) . Theor em 7.1 ([DP08]) . A ssume that n > 16e m 2 2 k . Then for a uniformly ra ndom cho ice of α : ( { 0 , 1 } n ) k → [ n ] m , E α h disc λ α ( H α ) 2 k i 6 2 − n / 2 + 2 − d 2 k + 1 . Pr oof (adapte d from [DP08]) . By Theorem 2.2, disc λ α ( H α ) 2 k 6 2 m 2 k E Y | Γ ( Y ) | , (7.1) where we put Y = ( y 0 1 , y 1 1 , . . . , y 0 k , y 1 k ) and Γ ( Y ) = E x Y z ∈{ 0 , 1 } k ψ x | α y z 1 1 ,y z 2 2 ,...,y z k k . For a fixe d choice of Y , we will use the shortha nd S z = α ( y z 1 1 , . . . , y z k k ) . T o an a- lyze Γ ( Y ) , we pro ve two ke y claims analogou s to those in the Degree / Discrepanc y Theorem and in Theorem 6.1. Claim 7.2. Assume that | S S z | > m 2 k − d 2 k − 1 . T hen Γ ( Y ) = 0 . Pr oof. If | S S z | > m 2 k − d 2 k − 1 , then some S z must feature more than m − d element s that do not occur in S u , z S u . But this forces Γ ( Y ) = 0 since the Fourie r transform of ψ is supported on characters of order d and higher . Claim 7.3. F or eve ry Y , | Γ ( Y ) | 6 2 −| S S z | . Pr oof. Imm ediate from Observ ation 3.4. 29 In vie w of (7.1) and Claims 7.2 and 7.3, we hav e E α h disc λ α ( H α ) 2 k i 6 m 2 k − m X i = d 2 k − 1 2 i P Y ,α [ S z = m 2 k − i . It remains to b ound the probab ilities in th e last expres sion. W ith probabilit y at least 1 − k 2 − n ov er the choice of Y , the strings y 0 1 , y 0 1 . . . , y 0 k , y 1 k will all be distinct. Conditio ning on this ev ent, the fact that α is chosen uniformly at random means that the 2 k sets S z are distrib uted independen tly and uniformly ov er [ n ] m . A calculation no w rev eals that P Y ,α [ S z = m 2 k − i 6 k 2 − n + m 2 k i ! m 2 k n ! i 6 k 2 − n + 8 − i . W e are ready to present the separation of NP cc k and BPP cc k . Theor em 7 .4 (Separati on of NP cc k and BP P cc k [DP08]) . L et k 6 (1 − ǫ ) log 2 n , wher e ǫ > 0 is a given constant. Then ther e e xists a function F α : ( { 0 , 1 } n ) k + 1 → {− 1 , + 1 } with N k + 1 ( F α ) = O (log n ) but R k + 1 ( F α ) = n Ω (1) . Pr oof (adapte d from [DP08]) . Let m = ⌊ n ζ ⌋ for a su ffi cien tly sma ll constant ζ = ζ ( ǫ ) > 0 . A s usual, define or m : { 0 , 1 } m → {− 1 , + 1 } by o r m ( z ) = 1 ⇔ z = 0 m . It is kno wn [NS92, P at92] that de g 1 / 3 ( or m ) = Θ ( √ m ) . As a result, Theorem 5.1 guaran tees the existenc e of a function ψ : { 0 , 1 } m → R such that: ˆ ψ ( S ) = 0 for | S | < Θ ( √ m ) , X z ∈{ 0 , 1 } m | ψ ( z ) | = 1 , X z ∈{ 0 , 1 } m ψ ( z ) or m ( z ) > 1 3 . Fix a functio n h : { 0 , 1 } m → {− 1 , + 1 } and a dist ribu tion µ on { 0 , 1 } m such that ψ ( z ) ≡ h ( z ) µ ( z ) . For a mapping α : ( { 0 , 1 } n ) k → [ n ] m , let H α and λ α be as defined at the beginn ing of this section. Then Theorem 7.1 sho w s the existenc e of α such that disc λ α ( H α ) 6 2 − Ω ( √ m ) . Using the propert ies of ψ, one readi ly verifies that h H ◦ λ α , F α i > 1 / 3 , where F α : ( { 0 , 1 } n ) k + 1 → {− 1 , + 1 } is gi ven by F α ( x , y 1 , . . . , y k ) = or m ( x | α ( y 1 ,...,y k ) ) . By the genera lized discrepan cy method (Theorem 4.2), R k + 1 ( F α ) > Ω ( √ m ) = n Ω (1) . 30 On the other hand, F α has nondetermini stic complex ity O (log n ) . N amely , Player 1 (who kno ws y 1 , . . . , y k ) nondetermin istically selects an element i ∈ α ( y 1 , . . . , y k ) and announce s i . Player 2 (who kno ws x ) then announces x i as the outpu t of the protocol. A recent follo w-up result due to David , P itassi, and V iola [DPV08] derando mizes the choice of α in Theorem 7.4, yielding an explici t separatio n of NP cc k and BPP cc k for k 6 (1 − ǫ ) log 2 n . Ackno wledgments I would like to thank Anil A da, Boaz Barak, Arkade v Chattopadhya y , Adam K li- v ans, Tro y Lee, Y aoyun Shi, and Ronald de W olf for their helpful feedback on a prelimin ary versio n of this surve y . Refer ences [ABFR94] James As pnes, Richard Beigel, Merrick L. Furst, and Steven Rudich. The ex- pressiv e power of voting po lynomials. Combin atorica , 14(2):135 –148 , 1994. [All89] Eric Allen der . A n ote on the p ower of th reshold circuits. In Pr oc. of the 30 th Symposium on F ound ations of Computer Science (FOCS) , pag es 58 0–58 4, 1989. [AS04] Scott Aaronson and Y aoyun Shi. Quan tum lo wer bounds for the collision and the element distinctness problem s. J. A CM , 51(4 ):595 –605, 2004. [ASTS + 03] An dris Amba inis, Leon ard J. Schu lman, Am non T a-Sh ma, Umesh V . V azi- rani, and A vi W igd erson. T he quantum comm unication complexity of sam- pling. SIAM J. Comput. , 32(6):157 0–158 5, 200 3. [BBC + 01] Rober t Beals, Ha rry Buhrma n, Richard Cleve, Michele Mosca, an d Ronald de W olf. Quantum lower bo unds by polynom ials. J. ACM , 48(4):7 78–7 97, 2001. [BCWZ99] Harry Buhrman, Richard C leve, Ronald de W olf, and C hristof Zalka. Bounds for small-er ror and zero -error quantum algor ithms. In Pr oc. o f the 4 0th Sym- posium on F ound ations of Comp uter Science (FOCS) , pages 358–368 , 1999 . [BNS92] L ´ aszl ´ o Babai, Noam Nisan, and Ma rio Szeged y . Multiparty p rotocols, p seu- dorand om gene rators f or logspace, and time-space tr ade-o ff s. J. Comput. Syst. Sci. , 45(2 ):204– 232, 1992. [BPS07] Paul Beame, T oniann Pitassi, and N athan Segerlind. Lo wer b ounds for Lov ´ asz-Schrijver systems and beyond follow fro m multiparty com munication complexity . SI AM J . Comput. , 37(3) :845–8 69, 2007. 31 [BRS95] Richard Beigel, Nic k Reing old, and Da niel A. Spielm an. PP is closed u nder intersection. J. Comput. Syst. Sci. , 50(2):191– 202, 1995. [BVW07] Harry Buh rman, Nikolai K. V er eshchagin, and Ronald d e W olf . On com- putation and comm unication with small bias. In Pr oc. of the 2 2nd Conf. on Computation al Complexity (CCC) , pages 24–32 , 200 7. [BW01] Harry Buhrman and Ronald de W olf . Commu nication complexity lower bound s by polyno mials. In Pr oc. of the 16 th Conf. on Computationa l Com- plexity (CCC) , pages 120–130, 2001. [CA08] Arkadev Chattopadhya y and An il Ada. Multiparty comm unication complex- ity of disjointness. Preprin t at arXiv:0801 .3624 v3, Febr uary 2008. Prelimi- nary version in ECCC Report TR08-002, January 2008. [Cha07] Arkadev Chattopadhy ay . Discrep ancy and the power of b ottom fan-in in depth-th ree circuits. I n Pr oc. of the 48th Symp osium on F oun dation s of Com- puter Science (FOCS) , pages 449– 458, 2007. [CT93] Fan R. K. Chu ng and Prasad T etali. Co mmunicatio n comp lexity and qu asi random ness. SIAM J. Discr ete Math. , 6(1) :110–1 23, 1993. [DP08] Matei David and T oniann Pitassi. Separating NOF com municatio n complex- ity classes RP and NP . ECCC Repo rt TR08-014, February 2008. [DPV08] Matei David, T oniann Pitassi, and Emanu ele V iola. Im proved separations be- tween nondeterministic and randomized multiparty communication. Preprint, April 2008. [GHR92] Mikael Goldman n, Johan Håstad, and Ale xan der A. R azbo rov . Major ity gates vs. general weighted thresh old gates. Computa tional Complexity , 2:277–300 , 1992. [HJ86] Roger A. Horn and Charles R. Johnson. Matrix analysis . Cambrid ge Uni ver- sity Press, New Y ork, 1986 . [KKMS05] Adam T auman Kalai, Adam R. Kliv ans, Y ishay Manso ur , and Rocco A. Servedio. Agn ostically lea rning halfspaces. In Pr oc. of th e 4 6th S ymposium on F ounda tions of Compu ter Science (FOCS) , pages 11–20, 2005. [Kla01] Hartmut Klau ck. Lower bo unds for quantum commu nication complexity . In Pr oc. of the 42nd Symposium on F ou ndatio ns of Computer Science (FOCS) , pages 288– 297, 2001. [KLS96] Je ff Kahn, Nathan Linia l, and Alex Samoro dnitsky . In clusion-exclusio n: E x- act and approx imate. Combinato rica , 16 (4):46 5–47 7, 19 96. [KN97] Eyal Kushilevitz and Noam Nisan. Communicatio n co mplexity . Cambrid ge University Press, New Y ork, 1997 . [KOS04] Adam R. Kli vans, Ryan O’Donn ell, an d Roc co A. Servedio. Lear ning in ter- sections and thresho lds of half spaces. J. Comp ut. Syst. Sci. , 68 (4):80 8–840 , 2004. 32 [KP97] Matthias Krause and Pa vel Pudl ´ ak. On the com putation al power of depth -2 circuits with threshold and modulo gates. Theor . Comput. Sci. , 174( 1-2):1 37– 156, 1997. [KP98] Matthias Krause and Pa vel Pudl ´ ak. Com puting B oolea n fun ctions by polyno - mials and threshold circuits. Comput. Complex. , 7(4):346 –370 , 199 8. [Kre95] I. Kremer . Quantum comm unication . M aster’ s thesis, Hebrew University , Computer Science Departmen t, 19 95. [KS92] Bala Kalyanasu ndaram an d G eorg Sch nitger . Th e p robab ilistic co mmunica - tion complexity o f set intersectio n. SIAM J. Discr e te Math. , 5(4):5 45–55 7, 1992. [KS04] Adam R. Kli vans a nd Rocco A . Servedio. Learning DNF in time 2 ˜ O ( n 1 / 3 ) . J. Comput. Syst. Sci. , 68(2 ):303– 318, 2004. [KS07a] Adam R. Kli vans and Alexander A. Sherstov . A lo wer bound fo r agnostically learning disjunctions. In Pr oc. of the 20th C onf. on Learning Theory (COLT) , pages 409– 423, 2007. [KS07b] Adam R. Kliv ans and Alexander A . Sherstov . Uncondition al lower bou nds for learn ing intersection s of halfspa ces. Machine Learning , 69(2 –3):97 –114 , 2007. [LS07] T roy Lee and Adi Shraibm an. Disjointne ss is hard in the multi-party num ber- on-the- forehea d mo del. Preprint at arXiv:0712. 4279 , December 2007. T o appear i n Pr oc. of the 23r d Conf. on Computation al Complexity (CCC) , 2008. [LS07b] Nati Linial and Adi Shraibman. Lower bou nds in commu nication com plexity based on factorization norms. In Pr oc. of the 39th Symp osium on Theory of Computing (STOC) , pages 699–708, 2007. [LS ˇ S08] T r oy Le e, Adi Shraibman, and Robert ˇ Spalek. A direct produc t theo rem fo r discrepancy . In Pr oc. of the 23rd Conf. on Computational C omplexity (CCC ) , 2008. T o appear . [MP88] Marvin L. Minsky an d Seymour A. Papert. P erceptr ons: expanded edition . MIT Press, Cambridg e, Mass., 1 988. [Nis93] Noam Nisan. The communication co mplexity of threshold gates. In Combi- natorics, P a ul Er d ˝ os is Eighty , pages 301–31 5, 199 3. [NS92] Noam Nisan and Mario Szegedy . On the degree of Boolean functions as real po lynom ials. In P r oc. of th e 24 th Symposium on Theory of Compu ting (STOC) , pages 462–467 , 1 992. Final version in Computationa l Comp lexity , 4:301– 313, 1994. [OS03] Ryan O’Don nell and Rocc o A. Servedio. New degree bounds for polynomial threshold functions. In Pr oc. of the 35th Symposium on Theory of Compu ting (STOC) , pages 325–334, 2003. 33 [Pat92] Ramamoh an P aturi. On the degree of polynomials that appro ximate symmet- ric Boolean functions. In Pr oc. of the 24th Symposium on Theory of C ompu t- ing (STOC) , pages 468–474, 1992. [Raz92] Alexander A. Razbo rov . On the distrib utional complexity of disjoin tness. Theor . Comput. Sci. , 106(2):3 85–3 90, 199 2. [Raz00] Ran Raz. The BNS-Chung criterion for mu lti-party commu nication complex- ity . Computatio nal C omplexity , 9(2):11 3–122 , 2000. [Raz03] Alexander A. Razborov . Quantum co mmun ication complexity o f sy mmetric predicates. Izvestiya: Mathema tics , 67(1):14 5–15 9, 20 03. [RS08] Alexander A. Razborov and Alexander A. Sherstov . The sign-ran k of AC 0 . ECCC Report TR08-0 16, Febru ary 2008. [Sch98] Alexander Schrijver . Th eory of linea r and integ er pr ogramming . John Wile y & Sons, Inc., New Y ork , 1998. [She07a] Alexander A. Sherstov . Separating AC 0 from dep th-2 majority circuits. In Pr oc. o f the 39th Symp osium o n The ory of Computin g (S TOC) , pages 294– 301, June 2007. [She07b ] Alexande r A. Sherstov . The pattern m atrix method for lower bound s on quantum co mmun ication. T echnical Report TR-07-4 6, The Univ . of T exas at Au stin, Dept. of Computer Scienc es, Sep tember 2 007. T o a ppear in Pr oc. of the 40th Symposium on Theory of Computing (STOC), 2008. [She07c] Alexander A. Sherstov . Unb ound ed-erro r co mmunica tion complexity of sym- metric functions. T echnica l Repor t TR-0 7-53, Th e Un iv . of T exas a t Austin, Dept. of Computer Sciences, September 2007 . [She08] Alexander A. Sherstov . Appr oximate in clusion-exclusio n for arbitra ry sym- metric functio ns. In Pr o c. of the 23n d Conf. on Computatio nal Complexity (CCC) , 2008. T o appear . [SZ07] Y aoyun Shi and Y ufan Zhu. Quantu m c ommun ication co mplexity of block- composed functio ns. Preprint at arXiv:0710.00 95 v1 , 29 September 2007. [TT99] Jun T ar ui a nd T atsuie T sukiji. Learning DNF b y app roximatin g inc lusion- exclusion formulae. I n Pr oc. of the 14 th Conf. on Computationa l C omplexity (CCC) , pages 215– 221, 1999. [W ol0 1] Ron ald de W olf. Quantu m Computin g and Commun ication Comple xity . PhD thesis, University of Amsterdam , 2001. [W ol0 8] Ron ald de W olf. A note on q uantum algo rithms and th e min imal degree of ǫ -error po lynom ials fo r symmetric f unction s. Preprint at ar Xiv:0802.181 6 , February 2008 . 34
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment