Exact tensor completion with sum-of-squares

Exact tensor com pletion wit h sum-of-squares Aaron P o techin ∗ Da vid Steu rer † June 27, 20 1 7 Abstract W e obtain the ﬁrst pol ynomial-time algorithm for exact tens or com pletion that im prov es ov e r the b ound im plied b y red uction to matrix completion. The algorit h m recov ers an unkno wn 3-tensor wit h r incoherent, orthog onal com ponents in  n from r · ˜ O ( n 1 . 5 ) randoml y o bser v ed entries of t he tenso r . This bound im prov e s ov er the previous best one o f r · ˜ O ( n 2 ) by reduction to e x act matrix completion. Our bou n d also matches t he best known results for the e asier problem of approxim ate tens or com pletion (Barak & Moitra, 2015) . Our algorithm and analysis extend s se minal results for exact matrix com pletion (Candes & R echt, 2009) to the tensor sett ing v ia the s u m-of-squares method. T he main technical challeng e is to show that a small number of randomly chosen monomials are enough to construct a deg ree-3 polynomia l with precisely planted ort hogonal global optima ov er t he sphe re and t h at t his fact can be certiﬁed within the s um-of-squares proof system. Keyw ords: tensor completion, sum-of-squares method, semideﬁnite programming, exact reco v ery , ma trix pol ynomials, matrix norm bounds ∗ Institute for A dvanced Study . Supported by t he Simons Collaboration for Algorithms a nd Geometry and b y the NSF under a greement No . CCF-14 12958 . P a rt of this w ork w as done while a t Cornell Univ ersity . † Institute for A dvanced Study and C or nell Un iv ersity , ds teure r@cs. cornell.edu . Supported b y a Microsoft Research F ello wship, a Alfred P . Sloan Fello wship, NSF aw ards (CCF-140 8673, CCF-1412958,CCF- 13501 96), and the Simons C ollaboration for A lgorithms and Geometry . Contents 1 Introduction 1 1.1 Results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2 2 T echniques 3 3 Preliminar ies 6 4 T ensor completion algor it h m 7 4.1 Sim pler proofs via higher-degree sum-of-squares . . . . . . . . . . . . . . . . 8 4.2 Higher -deg ree certiﬁcates impl y exact reco v ery . . . . . . . . . . . . . . . . . 9 4.3 Constructing the certiﬁcate T . . . . . . . . . . . . . . . . . . . . . . . . . . . 13 4.4 Degree-4 certiﬁcates im ply exact recov ery . . . . . . . . . . . . . . . . . . . . 16 4.5 Degree-4 certiﬁcates exist with high probability . . . . . . . . . . . . . . . . 18 5 Matr ix norm bound techniques 19 5.1 The trace moment method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19 5.2 P artitioning b y intersectio n pattern . . . . . . . . . . . . . . . . . . . . . . . . 20 5.3 Bounding sums of products of tensor entries . . . . . . . . . . . . . . . . . . 21 5.4 Counting intersection patterns . . . . . . . . . . . . . . . . . . . . . . . . . . 24 6 T race P ow er Calculation for ¯ R Ω A ⊗ ( ¯ R Ω A ) T 25 6.1 S tructure of t r (( Y j Y T j ) q ) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.2 Bounds on | | Y 1 | | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26 6.3 Bounds on | | Y 2 | | and | | Y 3 | | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27 6.4 Bounds on | | Y 4 | | . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 7 Iterative tensor bounds 31 8 T race P ow er Calculation for P ′ ¯ R Ω A ⊗ ( P ′ ¯ R Ω A ) T 37 References 39 A Controlling t he ker n el of matrix represent ations 42 B Full T race P ow er Calculation 43 B.1 T erm St ructure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 B.2 T echniques . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45 B.3 Bounding t he number of indices . . . . . . . . . . . . . . . . . . . . . . . . . 48 B.4 Choosing an ordering . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 52 B.5 Counting intersection patterns and random partitioning . . . . . . . . . . . 57 B.6 Other Cross T er ms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 1 Introduc t ion A basic task in machine learning and signal processing is to infer missing d ata from a small number of observ ations about t he data. A n im portant example is matrix completiton which asks t o reco v er an unkno wn lo w -rank matrix from a small number of observ ed entries. This problem has man y interesting applications—one of the prominent original motiv ations w as the N etﬂix Prize that sought impro v ed algorithms f or predicting user ratings for mo vies from a small number of user -provided ratings. After an e xtensiv e research eﬀort [ CR09 , CT10 , KMO09 , SS05 ], eﬃcient algorithms wit h almost optimal, prov a ble reco v ery guarantees hav e been obtained: In order t o eﬃcientl y reco v er an unknow n incoherent n -b y- n matrix of rank r it is enough to obser v e r · ˜ O ( n ) random entries of the matrix [ Gro11 , Rec11 ]. One of the remaining challenges is to obtain algorithm for the more g eneral and much less underst ood t ensor c ompleti on p roblem where the obser v ations do not just consist of pair wise correlations but also higher -order ones. Algorithms and analyses for matrix and tensor com pletion come in three ﬂav ors: 1. algorithms analyzed by statist ical le arning tools like Rademacher com plexity [ SS05 , BM16 ]. 2. iterativ e algorithms like alternating minimization [ JNS13 , Har14 , HW 14 ]. 3. algorithms analyzed by const ructing dual certiﬁcates for conv ex progr amming re- laxations [ CR09 , Gro11 , Rec11 ]. While each of the se ﬂ a v ors ha v e diﬀerent beneﬁts, typically onl y algorithms of the third ﬂa v or achiev e exact reco v ery . (The only excep tions to this r ule w e are aw are of are a recent fast algorithm for matrix com pletion [ JN15 ] and a recent analysis [ GLM16 ] sho wing that the commonl y used non-con v ex objectiv e function fo r positiv e semideﬁnite matrix com pletion has no spurious local m inima and thus stochas tic gradient descent and other popular optimization programs can sol v e positiv e semideﬁnite matrix com pletion with arbitrary initialization.) Fo r all other algorithms, the analysis e xhibits a trade-oﬀ betw ee n reconst ruction error and the required number of obser v ations (ev en when the re is no noise in the input ). 1 In this w or k, w e obtain the ﬁrst algorithm for exact tensor com pletion that impro v es o v er the bounds implied by reduction to exact matrix com ple tion. The algorithm recov e rs an unkno wn 3 -tensor with r incoherent, orthogonal com ponents in  n from r · ˜ O ( n 1 . 5 ) randoml y obser v ed entries of the tensor . The previous be st bound for exact reco v ery is r · ˜ O ( n 2 ) , which is implied b y reduction to exact matrix com pletion. (The reduction views 3-tensor on  n as an n -b y- n 2 matrix. W e can recov e r rank - r matrices of this shape from r · ˜ O ( n 2 ) sam p l e s, which is best possible.) Our bound also matches the best kno wn results for the easier problem of appro ximate tensor com pletion [ JO14 , BS15 , BM16 ] (the results of 1 W e remark t hat this trade- oﬀ is a property of the analysis a nd not necessarily the algorithm. For example, some algorithms of the ﬁrst ﬂav or are based on the same conv ex programming rela x ations as exact recov ery algorithms. Also for iterativ e algorithm, the trade-oﬀ betw ee n reconstruction error and number of sample comes from the requirement of the analysis that each iteration uses fresh samples. F or these iterativ e algorithms, the number of samples depends only logarithmicall y on t he d e sired accuracy , which means tha t these analy ses impl y ex a ct recov er y if the bit complexity of t he entries is small. 1 the last w or k also app lie s to a wider rang e of tensors and does not require orthogonality). A problem similar to matrix and tensor com pletion is ma trix and tensor sensing. The goal is to reco v er an unknown low rank matrix or tensor from a small number of linear measurements. An interesting phenomenon is tha t for carefull y designed measurements (which actuall y happ e n to be rank 1) it is possible to e ﬃciently reco v er a 3 -tensor of rank r with just O ( r 2 · n ) measurements [ FS12 ], which is be tter than the best bounds for tensor com pletion when r ≪ n 0 . 5 . W e conjecture that for tensor completion from random entries the bound w e obtain is up to logarithmic fact ors best possible among polynomial-time algorithms. Sum-of-squares method. Our algorithm is based on sum-of-squar es [ Sho87 , P ar00 , Las01 ], a v er y g eneral and po w erful meta-algorithm stu died extensiv e ly in man y scientiﬁc communities (see f or example t he surv ey [ BS14 ]). In the oretical com p uter science, the main research focus has been on the capabilities of sum-of-squares for appro xi- mation problems [ BBH + 12 ], especially in t he context of Khot ’s Unique Games Conjec- ture [ Kho02 ]. More recently , sum-of-squares emerg ed as a gener al approach to in- ference problems that arise in machine learning and hav e deﬁed other algorithmic techniques. This approach has lead to impro v ed algorithms for tensor de com position [ BKS15 , GM15 , HSSS16 , MSS16 ], dictionary le arning [ BKS15 , HM16 ], tensor principal com ponent analysis [ HSS15 , RRS16 , BGL16 ], planted sparse v ectors [ BKS14 , HSSS16 ]. An exciting direction is also to understand limitations of sum-of-squares for inference problems on concret e input distributions [ MW15 , HSS15 , BHK + 16 ]. An ap pealing feature of the sum-of-squares met hod is that its capabilities and lim- itations can be underst ood through the lens of a sim ple but surprisingly pow erful and intuitiv e restric ted proof syst em called sum-of-squares or P ositivst ellensatz system [ GV01 , Gri01a , Gri01b ]. A conceptu al contributio n of this w or k is to sho w t hat seminal results for inference problem like com pressed sensing and matrix com pletion hav e natural interpretations as ide ntiﬁ abil ity pr oofs in this sys tem. F urthe rmore, w e show t hat this inter - pretation is helpful in order to analyze more challenging inference problems like tensor com pletion. A promising future direction is to ﬁnd more e xamples of inference problems where this lens on inference algorithms and identiﬁability proofs yields strong er prov a ble guarantees. A technical contribution of our w or k is t hat w e dev elop techniques in order to sho w that sum-of-squares achiev es exact recov er y . Mos t previous w or ks onl y sho w e d that sum-of-squares giv es approx imate solutions , which in some cases can be turned to exact solutions b y inv oking algorithms with local con v e rgence guarantees [ GM15 , BKS14 ] or sol ving successiv e sum-of-squares relaxations [ MSS16 ]. 1.1 Results W e sa y t ha t a v ector v ∈  n is µ -incoherent with respect to t he coordinate basis e 1 , . . . , e n if fo r ev e ry index i ∈ [ n ] , h e i , v i 2 6 µ n · k v k 2 . (1.1) 2 W e sa y that a 3-tensor X ∈  n ⊗  n ⊗  n is orthogonal of rank r if t here are orthogonal v ectors { u i } i ∈ [ r ] ⊆  n , { v i } i ∈ [ r ] ⊆  n , { w i } i ∈ [ r ] ⊆  n such that X  Í r i  1 u i ⊗ v i ⊗ w i . W e sa y that such a 3-tensor X is µ -incoherent if all of the v ectors u i , v i , w i are µ -incoherent. Theorem 1.1 (main) . There exis ts a polynomial-time alg ori thm that given at least r · µ O ( 1 ) · ˜ O ( n ) 1 . 5 ran dom entries of an unknown orthogonal µ -incoherent 3 -tensor X ∈  n ⊗  n ⊗  n of rank r , outputs all entrie s of X with proba bility at least 1 − n − ω ( 1 ) . W e not e that the analysis also shows that the algorithm is robust to in v erse pol ynomial amount of noise in t he input (resulting in in v erse polyno mial amount of error in the output). W e remark t hat the running time of the al gorithm depends polynomiall y on the bit com plexity on X . 2 T echniques Let { u i } i ∈ [ r ] , { v i } i ∈ [ r ] , { w i } i ∈ [ r ] be three orthonormal sets in  n . Consider a 3-tensor X ∈  n ⊗  n ⊗  n of the form X  Í r i  1 λ i · u i ⊗ v i ⊗ w i with λ 1 , . . . , λ n > 0 . Let Ω ⊆ [ n ] 3 be a subset of t h e entries of X . Our goal is to eﬃciently reconstruct the unkno wn tensor X from its restrictio n X Ω to the entries in Ω . Ignoring com putational eﬃciency , w e ﬁrst ask if this task is information- theoreticall y possible. More concretely , for a giv en set of obser v ations X Ω , how can w e rule out that the re e xists another rank - r orthogonal 3 -tensor X ′ , X that w ould giv e rise to t he same observ ations X ′ Ω  X Ω ? 2 A priori it is not clear how an answ er to this information-theoretic que stion could be related to the goal of obtaining an eﬃcient algorithm. H ow ev er , it turns out that the sum-of-squares framew or k allow s us to systematic ally translate a uniqueness proof to an algorithm that eﬃcientl y ﬁnds the solution. (In addition, t his solution also comes with a short certiﬁc ate for uniqueness. 3 ) Uniq ueness proof. Let Ω ⊆ [ n ] 3 be a set of e ntries and le t X  Í r i  1 λ i · u i ⊗ v i ⊗ w i be a 3-tensor with λ 1 , . . . , λ r > 0 . It turns out that the follo wing tw o conditions are en ough to imply that X Ω uniquely determines X : The ﬁrst condition is that the v e ctor s {( u i ⊗ v i ⊗ w i ) Ω } are linear ly inde - pendent. The se cond condition is that exists a 3 -linear form T on  n with the follo wing properties: 1. in the monomial basis T is supported on Ω so t hat T ( x , y , z )  Í ( i , j , k ) ∈ Ω T i j k · x i y j x k , 2 W e emphasize that we ask here about the uniquen ess of X for a ﬁxed set of e ntries Ω . T his questions diﬀers from asking about the uniqueness for a random set of entries, which could be answ ered b y suitably counting t he number of low -rank 3-tensors. 3 This certiﬁcate is closel y related to c e rtiﬁcates in the form of dual solutions for conv ex programming relaxations tha t a r e used in the compressed sensing and matrix com pletion literature. 3 2. ev aluated o v er unit v ectors, the 3-form T is exactl y ma ximized at the points ( u i , v i , w i ) so that T ( u 1 , v 1 , w 1 )  · · ·  T ( u r , v r , w r )  1 and T ( x , y , z ) < 1 for all unit v ector s ( x , y , z ) < { ( u i , v i , w i ) | i ∈ [ r ] } . W e show that the tw o deterministic conditions a bov e are satisﬁed with high probability if t he v ectors { u i } , { v i } , { w i } are incoherent and Ω is a r andom set of entries of size at le ast r · ˜ O ( n 1 . 5 ) . Let us sketch the proof that such a 3-linear form T indee d implies uniqueness. Con- cretel y , w e claim tha t if we let X ′ be a 3-tensor of the form Í r ′ i  1 λ ′ i · u ′ i ⊗ v ′ i ⊗ w ′ i for λ ′ 1 , . . . , λ ′ r ′ > 0 and unit v ectors { u ′ i } , { v ′ i } , { w ′ i } with X ′ Ω  X Ω that minimizes Í r ′ i  1 | λ ′ i | then X ′  X must hold. W e identify T with an element of  n ⊗  n ⊗  n (the coeﬃcient tensor of T in the monomial basis). Let X ′ be as before. W e are to show that X  X ′ . On the one hand, using that T ( x , y , z ) 6 1 for all unit v e cto rs x , y , z , h T , X ′ i  r ′ Õ i  1 λ ′ i · T ( u ′ i , v ′ i , w ′ i ) 6 r ′ Õ i  1 λ ′ i . A t the same time, using that T is supported on Ω and the fact that X Ω  X ′ Ω , h T , X ′ i  h T , X i  r Õ i  1 λ i · T ( u i , v i , w i )  r Õ i  1 λ i . Since X ′ minimizes Í r ′ i  1 λ ′ i , equality has to hold in the previous inequality . It follo ws that ev ery point ( u ′ i , v ′ i , w ′ i ) is equal to one of the points ( u j , v j , w j ) , because T is uniquely maximized at the points { ( u i , v i , w i ) | i ∈ [ r ] } . Since w e a ssumed that {( u i ⊗ v i ⊗ w i ) Ω } is linear ly independent, w e can conclude t hat X  X ′ . When w e sho w that such a 3-linear form T exists, w e will actually sho w something stro ng er , namely that t he second property is not onl y true but also ha s a short certiﬁcate in form of a “degree -4 sum-of-squares proof”, which w e de scribe ne xt. This certiﬁcate also enables us to eﬃcientl y reco v er the missing tensor entries. Uniq ueness proof in t he sum -of-squares system. A deg ree -4 sos certiﬁcate for t he second property of T is a n ( n + n 2 ) -b y- ( n + n 2 ) positiv e-semideﬁnite matrix M (acting as a linear operator on  n ⊕ (  n ⊗  n ) ) that represents the pol ynomial k x k 2 + k y k 2 · k z k 2 − 2 T ( x , y , z ) , i.e., h( x , y ⊗ z ) , M ( x , y ⊗ z )i  k x k 2 + k y k 2 · k z k 2 − 2 T ( x , y , z ) . (2.1) F urthermore, w e require t hat t he kernel of M is precisel y the span of t he v ector s { ( u i , v i ⊗ w i ) | i ∈ [ r ]} . Let’s see that t his matrix M certiﬁes that T has the property t hat o v er unit v ectors it is exactly maximized at the desired points ( u i , v i , w i ) . Let u , v , w be unit v ectors such that ( u , v , w ) is not a multiple of one of t he v ectors ( u i , v i , w i ) . Then b y orthogonality , both ( u , v ⊗ w ) and (− u , v ⊗ w ) hav e non-zero projectio n on t he orthogonal complement of the kernel of M . Therefore, the bounds 0 < h( u , v ⊗ w ) , M ( u , v ⊗ w )i  2 − 2 p ( u , v , w ) and 0 < h(− u , v ⊗ w ) , M (− u , v ⊗ w )i  2 + 2 p ( u , v , w ) tog ether giv e t he desired conclusion that | T ( u , v , w ) | < 1 . 4 Reconstruction algorit hm ba sed on t h e sum -of-squares system. The existence of a posi- tiv e semideﬁnite matrix M as abov e not only means that reconstruction of X from X Ω is pos- sible information-theoreticall y but also e ﬃcientl y . The sum-of-squares algorithm allo ws us to eﬃciently search o v er lo w-degree moments of objects called pseudo-distribution s tha t g eneralize probability distribut ions o v er real v ector spaces. Ev ery pseudo-distribut ion µ deﬁnes ps e udo-expectation values ˜  µ f for a ll low -degree poly nomial functions f ( x , y , z ) , which behav e in many wa ys like expectation v alues under an actual probability distri- bution. I n order to reconstruct X from the obser v ations X Ω , w e use the sum-of-squares algorithm to eﬃcientl y ﬁnd a pseudo-distribution µ that satisﬁes 4 ˜  µ ( x , y , z ) k x k 2 + k y k 2 · k z k 2 6 1 (2.2)  ˜  µ ( x , y , z ) x ⊗ y ⊗ z  Ω  X Ω (2.3) N ote that the distribut ion ov e r the v ectors ( u i , v i , w i ) wit h probabilities λ i satisﬁes the abo v e conditions. Our previous discussion about uniqueness sho ws that t he existence of a positiv e semid e ﬁnite matrix M as abo v e implies no other distribut ion satisﬁes the abo v e conditions. It turns out that the matrix M implies tha t this uniqueness holds ev en among pseudo-distributions in the sense that any pseudo-dis tribution that satisﬁes Eqs. (2.2) and (2.3) must satisfy ˜  µ ( x , y , z ) x ⊗ y ⊗ z  X , which means that the recons truction is successful. 5 When do such uniqueness cer tiﬁcates exist? The abo v e discussion sho ws that in order to achiev e reconstruction it is enough to show that unique ness certiﬁcates of the form abo v e exist. W e sho w that these certiﬁcates exists with high probability if w e choose Ω to be a large enough random subset of e ntries (under suitable assum ptio ns on X ). Our existence proof is based on a randomized procedure to construct such a certiﬁcate heavil y inspired b y similar const ructions fo r matrix com pletion [ Gro11 , R ec11 ]. (W e not e that this const ruction uses the unkno wn tensor X and is therefore not “constructiv e ” in the context of t he recov er y problem.) Before describing the construction, w e make the req uireme nts on t he 3-linear form T more concrete. W e identify T with the linear operat or from  n ⊗  n to  n such that T ( x , y , z )  h x , T ( y ⊗ z )i . Furthermore, let T a be linear oper ators on  n such that T ( x , y , z )  Í n a  1 x a · h y , T a z i . Then, the follo wing conditions on T impl y the existence of a uniqueness certiﬁcate M (which also means that reco v er succeeds), 1. ev er y unkno wn entry ( i , j , k ) < Ω satisﬁes h e i , T ( e j ⊗ e k )i  0 , 2. ev er y index i ∈ [ r ] satisﬁes u i  T ( v i ⊗ w i ) , 4 The viewpoint in te rms of pseudo-distributions is useful to see how t he previous uniqueness proof relates to the algorithm. W e ca n also describe t he solutions to the cons traints Eqs. (2 .2) and (2.3) in ter ms of linearl y constrained positiv e semideﬁnite matrices. S ee alter nativ e descr iption of Algorithm 4.1 5 The matrix M can also be view ed as a solution to the dua l of the c onv ex optimization problem of ﬁnding a pseudo-distribution t hat sa tisﬁes conditions Eqs. ( 2.2) and (2.3) . 5 3. t he ma trix Í n a  1 T a ⊗ T a ⊺ − Í r i  1 ( v i ⊗ w i )( v i ⊗ w i ) ⊺ has spectr al norm at most 0 . 01 . W e not e that the uniqueness certiﬁcates for matrix completion [ Gro11 , Rec11 ] ha v e similar requirements. The key diﬀerence is that w e need to cont rol the spectr al norm of an operat or that depends quadratically on the constructed object T (as opposed to a linear dependence in the ma trix completion case). C ombined with the fact tha t the const ruction of T is iterativ e (about log n steps), the spectral norm bound unfortunatel y requires signiﬁcant technical w or k. I n particular , w e cannot apply general matrix concentration in e qualities and instead apply the trace moment met hod. (See Section 5 .) W e also note that the fact that the abov e requirements a llow us to const ruct the cer - tifcate M is not immediate and requires some new ideas about matrix representations of pol ynomials, which might be useful elsewhere. (See Appendix A .) F inally , w e note that the transformation a p p l ie d to T in order to obtain the matrix for the third condition abo v e appears in man y w or ks about 3-tensors [ HSS15 , BM16 ] with the ear lie st appearance in a w or k on refutation a lgorithms for random 3-S A T instances (see [ F O07 ]). The iterativ e construction of the linear operat or T exactly follo ws the recipe from matrix completion [ Gro11 , R ec11 ]. Let R Ω be t he projection operator into the linear space of operator s T t hat satify the ﬁrst requirement. Let P T be t he (aﬃne) projection operat or into t he aﬃne linear space of operator s T that satisfy the second reqirement. W e start with T ( 0 )  X . A t t his point w e satisfy the second condition. (Also the matrix in t he third condition is 0 .) In order to enforce the ﬁrst condition w e appl y the operat or R Ω . After this projection, the second condition is most likel y no long e r satisﬁed. T o enforce the second condition, w e apply t he aﬃne linear operato r P T and obtain T ( 1 )  P T (R Ω X ) . The idea is to iterate this construction and show t hat after a logarithmic number of iterations both the ﬁrst and second condition are satisﬁed up to an in v erse pol ynomially small error (which w e can correct in a direct wa y). The main challenge is to sho w that the iterates obtained in this w ay satisfy the desired spectral nor m bound. (W e note that for technical reasons the constructio n uses fresh randomness Ω for each iteration like in the matrix completion case [ Rec11 , Gro11 ]. Since the number of iterations is logarithmic, t he to tal number of required observ ations remains the same up to a logarithmic factor .) 3 Prelimin ar ies U nless explicitly stated other wise, O (· ) -notatio n hides absolute multiplicativ e cons tants. Concretel y , ev er y occurr ence of O ( x ) is a placeholder for some function f ( x ) that satisﬁes ∀ x ∈  . | f ( x ) | 6 C | x | for some absolute constant C > 0 . Similarl y , Ω ( x ) is a placeholder for a function 1 ( x ) that satisﬁes ∀ x ∈  . | 1 ( x ) | > | x | / C for some absolute constant C > 0 . Our algorithm is based on a generalization of probability distributions ov er  n . T o deﬁne this generalization the follow ing notation for the formal e xpectation of a function 6 f on  n with respect to a ﬁnitely -supported function µ :  n →  , ˜  µ f  Õ x ∈ support ( µ ) µ ( x ) · f ( x ) . A degree- d pse udo-distribut ion over  n is a ﬁnitely -supported function µ :  n →  such that ˜  µ 1  1 and ˜  µ f 2 > 0 for ev er y poly nomial f of degree at most d / 2 . A key algorithmic property of pseudo-distr ibutions is that their low -degree moments ha v e an e ﬃcient separation oracle. Concretel y , the set of de gree- d moments ˜  µ ( 1 , x ) ⊗ d such that µ is a deg ree - d pseudo-distributio ns o v er  n has an n O ( d ) -time separation or - acle. Therefore, standard con v ex optimizat ion methods allow us t o eﬃcientl y optimize linear functions ov er lo w -degree moments of pseudo-distribut ions (ev en subject to addi- tional conv ex const raints that hav e eﬃcient separation or acles) up to arbitrary numerical accuracy . 4 T ensor com pl e tion algor it hm In this section, w e show t hat the follo wing algorithm for tensor completio n succeeds in reco v ering the unknow n tensor from partial observat ions assuming the existence of a particular linear operator T . W e will state conditions on the unknown tensor that impl y that such a linear operator exists with high probability if the obser v ed entries are chosen at random. W e use essentially the same con v ex relaxation as in [ BM16 ] but our analysis diﬀers signiﬁcantly . Algorithm 4.1 (Exact tensor com pletion based on degree-4 sum-of-squares) . Input: locations Ω ⊆ [ n ] 3 and partial obser v ations X Ω of an unkno wn 3 -tensor X ∈  n ⊗  n ⊗  n . Operation: Find a deg ree- 4 pse udo-distribution µ on  n ⊕  n ⊕  n such that the third moment matches the observations  ˜  µ ( x , y , z ) x ⊗ y ⊗ z  Ω  X Ω so as to minimize ˜  µ ( x , y , z ) k x k 2 + k y k 2 · k z k 2 . Output the 3 -tensor ˜  µ ( x , y , z ) x ⊗ y ⊗ z ∈  n ⊗  n ⊗  n . Alter native descr iption: Output a minimum trace, positiv e semideﬁnite matrix Y acting on  n ⊕ (  n ⊗  n ) with blocks Y 1 , 1 , Y 1 , 2 and Y 2 , 2 such that ( Y 1 , 2 ) Ω  X Ω matches the observatio ns, a nd Y 2 , 2 satisﬁes t he additional symmetry constraint s that each e ntry h e j ⊗ e k , Y 2 , 2 ( e j ′ ⊗ e k ′ )i only depe nds on the index sets { j , j ′ } , { k , k ′ } . Let { u i } , { v i } , { w i } be three orthonormal sets in  n , each of cardinality r . W e reason about the recov er y guarantees of t he algorithm in terms of t he follo wing notio n of certifcate. Deﬁnition 4.2. W e sa y t hat a linear operator T from  n ⊗  n to  n is a degree-4 certiﬁc ate for Ω and orthonormal sets { u i } , { v i } , { w i } ⊆  n if the follo wing conditions are satisﬁes 7 1. t he v e ctor s { ( u i ⊗ v j ⊗ w k ) Ω | ( i , j , k ) ∈ S } are linear l y independent, where S ⊆ [ n ] 3 is the set of triples wit h at least tw o identical indices from [ r ] , 2. ev er y entry ( a , b , c ) < Ω satisﬁes h e a , T ( e b ⊗ e c )i  0 , 3. If w e view T as a 3-tensor in (  n ) ⊗ 3 whose ( a , b , c ) entry is h e a , T ( e b ⊗ e c )i , ev ery index i ∈ [ r ] satisﬁes ( u ⊺ i ⊗ v ⊺ i ⊗ Id ) T  w i , ( u ⊺ i ⊗ Id ⊗ w ⊺ i ) T  v i , and ( Id ⊗ v ⊺ i ⊗ w ⊺ i ) T  u i . 4. t he follow ing matrix has spectral norm at most 0 . 01 , n Õ a  1 T a ⊗ T a ⊺ − r Õ i  1 ( v i ⊗ w i )( v i ⊗ w i ) ⊺ , where { T a } are matrices such t hat h x , T ( y ⊗ x )i  Í n a  1 x a · h y , T a z i . In Section 4.4 , w e prov e that existence of such certifcates implies t hat the abov e algo- rithm successfull y reco v ers the unkno wn tensor , as fo rmalized b y the follo wing t heorem. Theorem 4.3. Let X ∈  n ⊗  n ⊗  n be any 3-tensor of the form Í r i  1 λ i · u i ⊗ v i ⊗ w i for λ 1 , . . . , λ r ∈  + . Let Ω ⊆ [ n ] 3 be a subset of indic es. Suppose ther e exists degree-4 cer ti ﬁ cate in the sense of Deﬁnition 4.2 . The n, given the observations X Ω t he above algorithm recovers the unknown t ensor X exactly . In Section 4.5 , w e sho w that degree-4 certiﬁcates a re likely to exist when Ω is a random set of appropriate size. Theorem 4.4. Let { u i } , { v i } , { w i } be thr e e orthonormal sets of µ -incoherent vector s in  n , each of car dinality r . Let Ω ⊆ [ n ] 3 be a random set of t ensor entries of cardinality m  r · n 1 . 5 ( µ log n ) C for an absolute cons tant C > 1 . Then, with p rob ability 1 − n − ω ( 1 ) , t here exists a linear operat or T t hat satisﬁes the r equir ements of Deﬁnition 4.2 . T aken tog et her t he tw o the orems abo v e impl y our main result Theorem 1.1 . 4.1 Sim ple r proofs via higher -deg ree sum-of-squares U nfort unately the proof of Theorem 4. 4 requires extremely technical spectral norm bounds for random ma trices. It turns out t hat less technical norm bounds suﬃce if w e use degree 6 sum-of-squares relaxations. For this more pow erful algorithm, w eaker certiﬁcates are enough to ensure exact reco v ery and the proof that the se w ea ker certiﬁcates exist with high probability is considerabl y easier than the p roof that degree-4 certiﬁcates exist wit h high probability . In the follo wing w e describe t his w ea ker notion of certiﬁcates and state their properties. In the subsequent sections w e p rov e properties of these certiﬁcates are enough to impl y our main result Theorem 1.1 . 8 Algorithm 4.5 (Exact tensor com pletion based on higher -degree sum-of-squares) . Input: locations Ω ⊆ [ n ] 3 and partial obser v ations X Ω of an unkno wn 3 -tensor X ∈  n ⊗  n ⊗  n . Operation: F ind a degree- 6 pseudo-dis tribution µ on  n ⊕  n ⊕  n so as to minimize ˜  µ ( x , y , z ) k x k 2 + k z k 2 subject to the follo wing const raints  ˜  µ ( x , y , z ) x ⊗ y ⊗ z  Ω  X Ω , (4.1) ˜  µ ( x , y , z ) ( k y k 2 − 1 ) · p ( x , y , z )  0 fo r all p ( x , y , z ) ∈  [ x , y , z ] 6 4 . (4.2) Output the 3 -tensor ˜  µ ( x , y , z ) x ⊗ y ⊗ z ∈  n ⊗  n ⊗  n . Let { u i } , { v i } , { w i } be three orthonormal sets in  n , each of cardinality r . W e reason about the recov ery guarantees of th e abov e algorithm in term s of t he follo wing notion of certiﬁcate. The main diﬀerence to degree-4 certiﬁcate ( Deﬁnition 4.2 ) is that the spectral norm condition is replaced by a condition in terms of sum-of-squares representations. Deﬁnition 4.6. W e sa y that a 3-tensor T ∈ (  n ) ⊗ 3 is a hi gher -degree certiﬁ cate for Ω and orthonormal sets { u i } , { v i } , { w i } ⊆  n if the follo wing conditions are satisﬁes 1. t he v ectors { ( u i ⊗ v i ⊗ w i ) Ω } i ∈ [ r ] are linear ly ind ependent, 2. ev er y entry ( a , b , c ) < Ω satisﬁes h T , ( e a ⊗ e b ⊗ e c )i  0 , 3. ev er y index i ∈ [ r ] satisﬁes ( u ⊺ i ⊗ v ⊺ i ⊗ Id ) T  w i , ( u ⊺ i ⊗ Id ⊗ w ⊺ i ) T  v i , and ( Id ⊗ v ⊺ i ⊗ w ⊺ i ) T  u i , 4. t he follow ing degree-4 polynomials in  [ x , y , z ] are sum of squares k x k 2 + k y k 2 · k z k 2 − 1 / ε · h T ′ , x ⊗ y ⊗ z i , (4.3) k y k 2 + k x k 2 · k z k 2 − 1 / ε · h T ′ , x ⊗ y ⊗ z i , (4.4) k z k 2 + k x k 2 · k y k 2 − 1 / ε · h T ′ , x ⊗ y ⊗ z i . (4. 5 ) where T ′  T − Í r i  1 u i ⊗ v i ⊗ w i and ε > 0 is an absolute cons tant (sa y ε  10 − 6 ). In the follo wing sections w e pro v e that higher -deg ree certiﬁcates impl y t hat Algorithm 4.5 successfull y reco v ers the desired tensor and that they exis t with high prob- ability for random Ω of appropriate size. 4.2 Higher -deg ree cer tiﬁcates im ply ex act re cov er y Let { u i } , { v i } , { w i } be orthonormal bases in  n . W e sa y that a deg ree- ℓ pseudo- distribu tion µ ( x , y , z ) satisﬁes the constr aint k y k 2  1 , denoted µ |  { k y k 2  1 } , if ˜  µ ( x , y , z ) p ( x , y , z ) · ( 1 − k y k 2 )  0 for a ll pol ynomials p ∈  [ x , y , z ] 6 ℓ − 2 9 W e are to sho w that a higher-degree certiﬁcate in the sense of Deﬁnition 4.6 implies that Algorithm 4.5 reconst ructs the pa rtially observ ed tensor exactly . A key step of t hi s proof is the follo wing lemma about expectation v alues of higher degree pseudo-distributio ns. Lemma 4.7. Let T ∈ (  n ) ⊗ 3 be a higher -degree certiﬁcate as in Deﬁnition 4.6 for the set Ω ⊆ [ n ] 3 and the vector s { u i } i ∈ [ r ] , { v i } i ∈ [ r ] , { w i } i ∈ [ r ] . Then, ever y degree-6 ps eudo-distribution µ ( x , y , z ) with µ |  { k y k 2  1 } s atisﬁes ˜  µ ( x , y , z ) T ( x , y , z ) 6 ˜  µ ( x , y , z ) k x k 2 + k z k 2 2 − 1 100 · n Õ i  r + 1 ( h u i , x i 2 + h w i , z i 2 ) − 1 100 · n Õ i  1 Õ j ∈[ n ]\ { i } h v i , y i 2 ·  h u j , x i 2 + h w j , z i 2  (4.6) T o prov e this lemma it will be useful t o introduce the sum-of-squares proof sys tem. Be- fore doing that let us obser v e that the lemma indeed allo ws us to prov e that Algorithm 4.5 w or k s. Theorem 4.8 (Higher -de gree certiﬁcates impl y exact reco v ery) . Suppose there exist s a higher -degree certiﬁ cate T in the se nse of Deﬁnition 4.6 for the set Ω ⊆ [ n ] 3 and the vecto rs { u i } i ∈ [ r ] , { v i } i ∈ [ r ] , { w i } i ∈ [ r ] . Then, Algorithm 4.5 recover s the p arti ally observed tensor exactly . In ot he r wor ds, if X  Í r i  1 λ i · u i ⊗ v i ⊗ w i with λ 1 , . . . , λ r > 0 and µ ( x , y , z ) is a degree-6 pseudo-distribution with µ |  { k y k 2  1 } t hat minimizes ˜  µ ( x , y , z ) 1 2 ( k x k 2 + k z k 2 ) subject to ( ˜  µ ( x , y , z ) x ⊗ y ⊗ z ) Ω  X Ω , t hen ˜  µ ( x , y , z ) x ⊗ y ⊗ z  X . Proo f. Consider t he distribution µ ∗ o v er v ectors ( x , y , z ) such that ( √ λ i n · u i , v i , √ λ i n · w i ) has probability 1 / n . B y cons truction,  µ ∗ ( x , y , z ) x ⊗ y ⊗ z  X . W e ha v e ˜  µ ( x , y , z ) T ( x , y , z )   µ ∗ ( x , y , z ) T ( x , y , z )  r Õ i  1 λ i   µ ∗ ( x , y , z ) 1 2 ( k x k 2 + k z k 2 ) . By Lemma 4.7 and the opt imality of µ , it fo llow s that ˜  µ ( x , y , z ) 1 100 · n Õ i  r + 1 ( h u i , x i 2 + h w i , z i 2 ) + 1 100 · n Õ i  1 Õ j ∈[ n ]\ { i } h v i , y i 2 ·  h u j , x i 2 + h w j , z i 2   0 Since the summands on the lef t-hand side are squares it follo ws t hat each summand has pseudo-expectation 0 . It follo ws t ha t ˜  µ h u i , x i 2  ˜  µ h v i , y i 2  ˜  µ h w i , z i 2  0 for all i > r and ˜  µ h v i , y i 2 h u j , x i 2  ˜  µ h v i , y i 2 h w j , z i 2  0 for a ll i , j . By the Ca uch y–Schw arz inequality for pseudo-expectations, it follo ws that ˜  µ ( x , y , z ) h x ⊗ y ⊗ z , u i ⊗ v j ⊗ w k i  0 unless i  j  k ∈ [ r ] . Consequentl y , ˜  µ ( x , y , z ) x ⊗ y ⊗ z is a linear combination of the v ector s { u i ⊗ v i ⊗ w i | i ∈ [ r ] } . Finall y , the linear independence of the v e cto rs {( u i ⊗ v i ⊗ w i ) Ω | i ∈ [ r ] } im plies that ˜  µ x ⊗ y ⊗ z  X as desired.  10 It remains to pro v e Lemma 4.7 . Here it is con v enie nt to use formal notation for sum-of- squares proofs. W e will w or k with polynomials  [ x , y , z ] and the pol ynomial equation A  { k y k 2  1 } . For p ∈  [ x , y , z ] , w e sa y t hat there exists a degree- ℓ SOS proof that A implies p > 0 , denoted A ⊢ ℓ p > 0 , if there exists a polynomial q ∈  [ x , y , z ] of degree at most ℓ − 2 such that p + q · ( 1 − k y k 2 ) is a sum of squa res of polynomials. This notion proof allows us to reason about pseudo-distr ibutions. In particular , if A ⊢ ℓ p > 0 t hen ev ery degree - ℓ pseudo-distribut ion µ with µ |  A satisﬁes ˜  µ p > 0 . W e will chang e coor dinates such that u i  v i  w i  e i is the i -th coordinate v ector for ev ery i ∈ [ n ] . Then, t he conditions on T in Deﬁnition 4.6 impl y that h T , ( x ⊗ y ⊗ z )i  r Õ i  1 x i y i z i + T ′ ( x , y , z ) , (4.7) where T ′ is a 3-linear form with the property t hat T ′ ( x , x , x ) does not contain squares (i.e. is multilinear). Fu rthermore, the conditions impl y the follo wing SOS proofs for T ′ : 1. ∅ ⊢ 4 T ′ ( x , y , z ) 6 ε ·  k x k + k y k 2 · k z k 2  , 2. ∅ ⊢ 4 T ′ ( x , y , z ) 6 ε ·  k y k + k x k 2 · k z k 2  , 3. ∅ ⊢ 4 T ′ ( x , y , z ) 6 ε ·  k z k + k x k 2 · k y k 2  . The follo wing lemma giv es an upper bound on one of the parts in Eq. (4. 7 ) . Lemma 4.9. F or A  { k y k 2  1 } , t he following inequality has a degree-6 sum-of-squar es p roo f, A ⊢ 6 r Õ i  1 x i y i z i 6 1 2 k x k 2 + 1 2 k z k 2 − 1 4 n Õ i  r + 1 ( x 2 i + z 2 i ) − 1 8 Õ i , j y 2 i ·  x 2 j + z 2 j + y 2 j · ( k x k 2 + k z k 2 )  . (4.8) Proo f. W e bound the left-hand side in the lemma as follo ws, A ⊢ 6 r Õ i  1 x i y i z i 6 r Õ i  1 ( 1 2 x 2 i + 1 2 y 2 i z 2 i ) (4.9) 6 1 2 | | x | | 2 − 1 2 Õ i > r x 2 i + 1 2 n Õ i  1 y 2 i z 2 i . (4.10) W e can further bound Í i y 2 i z 2 i as follo ws, A ⊢ 6 n Õ i  1 y 2 i z 2 i  n Õ i  1 y 2 i ! · n Õ i  1 z 2 i ! − Õ i , j y 2 i · z 2 j (4.11)  n Õ i  1 z 2 i ! − Õ i , j y 2 i · z 2 j . (4.12) W e can pro v e a diﬀerent bound on Í i y 2 i z 2 i as follo ws, A ⊢ 6 n Õ i  1 y 2 i z 2 i 6 1 2 k z k 2 + 1 2 n Õ i  1 y 4 i z 2 i (4.13) 11 6 1 2 k z k 2 + 1 2 n Õ i  1 y 4 i k z k 2 (4.14)  1 2 k z k 2 + 1 2 n Õ i  1 y 2 i ! · n Õ i  1 y 2 i k z k 2 ! − 1 2 Õ i , j y 2 i · y 2 j k z k 2 (4.15)  k z k 2 − 1 2 Õ i , j y 2 i · y 2 j k z k 2 . (4.16) By combining t hese three ineq ua lities, w e obtain t he inequality A ⊢ 6 r Õ i  1 x i y i z i 6 1 2 k x k 2 + 1 2 k z k 2 − 1 2 Õ i > r x 2 i − 1 2 Õ i , j y 2 i · z 2 j − 1 4 Õ i , j y 2 i · y 2 j k z k 2 . By symmetry betw een z and x , the same inequality holds with x and z exchang ed. Com- bining these symmetric inequalities, w e obtain the desired inequality A ⊢ 6 r Õ i  1 x i y i z i 6 1 2 k x k 2 + 1 2 k z k 2 − 1 4 Õ i > r ( x 2 i + z 2 i ) − 1 4 Õ i , j y 2 i · ( x 2 j + z 2 j ) − 1 8 Õ i , j y 2 i · y 2 j ( k x k 2 + k z k 2 ) . (4.17)  It remains t o bound the second part in Eq. (4. 7 ) , which t he follo wing le mma a chiev e s. Lemma 4.10. A ⊢ 6 T ′ ( x , y , z ) 6 3 ε 2 Í i Í j , i y 2 i  x 2 j + z 2 j + 1 2 y 2 j ( | | x | | 2 + | | z | | 2 )  Proo f. It is enough to show t he follo wing i ne quality fo r all i ∈ [ n ] , A ⊢ 6 y 2 i T ′ ( x , y , z ) 6 ε Õ j , i  3 2 y 2 i ( x 2 j + z 2 j ) + 1 2 y 2 i y 2 j ( | | x | | 2 + | | z | | 2 )  By symmetry it suﬃces to consider t he case i  1 . Let x ′  x − x 1 · e 1 , y ′  y − y 1 · e 1 , and z ′  z − z 1 · e 1 . W e obser v e that A ⊢ 4 T ′ ( x , y , z )  T ′ ( x 1 e 1 + x ′ , y 1 e 1 + y ′ , z 1 e 1 + z ′ )  T ′ ( x 1 e 1 , y ′ , z ′ ) + T ′ ( x ′ , y 1 e 1 , z ′ ) + T ′ ( x ′ , y ′ , z 1 e 1 ) + T ′ ( x ′ , y ′ , z ′ ) W e now apply the follo wing inequalities 1. A ⊢ 4 T ′ ( x 1 e 1 , y ′ , z ′ ) 6 ε 2  x 2 1 | | y ′ | | 2 + | | z ′ | | 2  6 ε 2 Í j , 1 ( y 2 j | | x | | 2 + z 2 j ) 2. A ⊢ 4 T ′ ( x ′ , y 1 e 1 , z ′ ) 6 ε 2  | | x ′ | | 2 y 2 1 + | | z ′ | | 2  6 ε 2 Í j , 1 ( x 2 j + z 2 j ) 3. A ⊢ 4 T ′ ( x ′ , y ′ , z 1 e 1 ) 6 ε 2  z 2 1 | | y ′ | | 2 + | | x ′ | | 2  6 ε 2 Í j , 1 ( y 2 j | | z | | 2 + x 2 j ) 12 4. A ⊢ 4 T ′ ( x ′ , y ′ , z ′ ) 6 ε 2  | | x ′ | | 2 | | y ′ | | 2 + | | z ′ | | 2  6 ε 2 Í j , 1 ( x 2 j + z 2 j )  W e can no w pro v e Lemma 4.7 . Proo f of Lemma 4.7 . T aken tog et her , Lemmas 4.9 and 4.10 impl y A ⊢ 6 h T , x ⊗ y ⊗ z i 6 1 2 k x k 2 + 1 2 k z k 2 − ( 1 4 − O ( ε )) n Õ i  r + 1 ( x 2 i + z 2 i ) − ( 1 8 − O ( ε )) Õ i , j y 2 i ·  x 2 j + z 2 j + y 2 j · ( k x k 2 + k z k 2 )  , (4.18) where the absolute cons tant hidden by O (·) notation is at most 10 . Therefo re for ε < 1 / 100 , as w e assumed in Deﬁnition 4.6 , w e g et a SOS proof of the inequality , A ⊢ 6 h T , x ⊗ y ⊗ z i 6 1 2 k x k 2 + 1 2 k z k 2 − 1 8 n Õ i  r + 1 ( x 2 i + z 2 i ) − 1 16 Õ i , j y 2 i ·  x 2 j + z 2 j + y 2 j · ( k x k 2 + k z k 2 )  , (4.19) This SOS proof implies that that ev er y degree-6 pseudo-distribution µ ( x , y , z ) with µ |  A satisﬁes the desired inequality , ˜  µ ( x , y , z ) h T , x ⊗ y ⊗ z i 6 ˜  µ ( x , y , z ) 1 2 k x k 2 + 1 2 k z k 2 − 1 8 n Õ i  r + 1 ( x 2 i + z 2 i ) − 1 16 Õ i , j y 2 i ·  x 2 j + z 2 j + y 2 j · ( k x k 2 + k z k 2 )  , (4.20)  4.3 Constr ucting t he cer tiﬁca t e T In this section w e giv e a procedure for construct ing the certiﬁcate T . This construction is directl y inspired b y the constructio n of t he dual certiﬁcate in [ Gro11 , Rec11 ] (sometimes called quantum golﬁng). W e will then prov e that T satisﬁes all of t he conditions for a higher -deg ree certiﬁcate of Ω . In Section 4.5 w e will sho w that T also satisﬁes the conditions for a degree-4 certiﬁcate for Ω . Let { u i } , { v i } , { w i } ⊆  n be three orthonormal bases, with all v ectors µ -incoherent. Let X  Í r i  1 u i ⊗ v i ⊗ w i . Let Ω ⊆ [ n ] 3 chosen at random such that each element is included independently with probability m / n 1 . 5 (so that | Ω | is tightl y concentrated around m ). Let P be the project or on the span of the v ectors u i ⊗ v j ⊗ w k such that a n inde x in [ r ] appears at least twice in ( i , j , k ) (i.e., at least one of the conditions i  j ∈ [ r ] , i  k ∈ [ r ] , j  k ∈ [ r ] is satisﬁed). Let R Ω be t he linear operato r on  n ⊗  n ⊗  n that sets all entries 13 outside of Ω to 0 (so that ( R Ω [ T ]) Ω  R Ω [ T ] ) and is scaled such t hat  Ω R Ω  Id . Le t ¯ R Ω be Id − R Ω . Our goal is to const ruct T ∈  n ⊗  n ⊗  n such that P [ T ]  X , ( T ) Ω  T , and the spectral norm condition in Deﬁnition 4.2 is satisﬁed. The idea for constructing T is to start with T  X . Then, mo v e to closest point T ′ that satisﬁes R Ω [ T ′ ]  T ′ Then, m ov e t o closes t point T ′ ′ that satisﬁes P [ T ′ ′ ]  X and repeat. T o implement this str ategy , w e deﬁne T ( k )  k − 1 Õ j  0 (− 1 ) j R Ω j + 1 ( P ¯ R Ω j ) · · · ( P ¯ R Ω 1 )[ X ] , (4.21) where Ω 1 , . . . , Ω k are iid samples from t he same distribut ion as Ω . By induction, w e can sho w the follo wing lemma about linear constraints t hat the const ructed stensors T ( k ) satisfy . Lemma 4.11. F or every k > 1 , the tensor T ( k ) satisﬁes ( T ) Ω  T and P [ T ( k ) ] + (− 1 ) k P ( P ¯ R Ω k ) · · · ( P ¯ R Ω 1 )[ X ]  X . Here, P ( P ¯ R Ω k ) · · · ( P ¯ R Ω 1 )[ X ] is an error term that de creases geometricall y . In the parameter regime of Theorem 4.4 , the norm of this term is n − ω ( 1 ) for some k  ( log n ) O ( 1 ) . The follow ing lemma shows that it is possible to correct such small errors. This lemma also implies that the linear independence condition in Deﬁnition 4.2 is satisﬁed with high probability . (Therefore, w e can ignore this condition in the follo wing.) Lemma 4.12. Suppose m > r n µ · ( log n ) O ( 1 ) . The n, wit h p roba bility 1 − n − ω ( 1 ) over the choice of Ω , t he following holds: F or ever y E ∈  n ⊗  n ⊗  n with P [ E ]  E , ther e exists Y with ( Y ) Ω  Y such tha t P [ Y ]  E and k Y k F 6 O ( 1 ) · k E k F . Proo f. Let S ⊆ [ n ] 3 be such that P is the projecto r to the v ectors u i ⊗ v j ⊗ w k with ( i , j , k ) ∈ S . By cons truction of P we hav e | S | 6 3 r n . In or der to sho w the conclusion of the lemma it is enough to sho w t hat the v ectors ( u i ⊗ v j ⊗ w k ) Ω with ( i , j , k ) ∈ S are w ell-conditioned in the sense that the rat io of the larg est and smallest singular v alue is O ( 1 ) . This fact follo ws from standar d matrix concentration inequalities. See Lemma 4.14 .  The main technical challenge is to show that the construction satisﬁes the condition that the follo wing degree 4 poly nomials are sums of squares (where T ′  T − X ). k x k 2 + k y k 2 · k z k 2 − 1 / ε · h T ′ , x ⊗ y ⊗ z i , (4.22) k y k 2 + k x k 2 · k z k 2 − 1 / ε · h T ′ , x ⊗ y ⊗ z i , (4.23) k z k 2 + k x k 2 · k y k 2 − 1 / ε · h T ′ , x ⊗ y ⊗ z i . (4. 2 4) W e sho w ho w to prov e t he ﬁrst statement, the other s tatements can be prov e d with symmetrical arguments. T o pro v e the ﬁrst statement, w e decompose T ′ into pieces of the form ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) , P ′ ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) (where P ′ is a part of P ), or E . F or each piece A , w e prov e a norm bound k Í a A a ⊗ A T a k 6 B . Since Í a A a ⊗ A T a represents 14 the same polynomial as A ⊺ A , this prov es that B k y k 2 k z k 2 − ( y ⊗ z ) ⊺ A ⊺ A ( y ⊗ z ) is a degree 4 sum of squares. N ow not e that ( y ⊗ z ) ⊺ A ⊺ A ( y ⊗ z ) − √ B x ⊺ A ( y ⊗ z ) − √ B ( y ⊗ z ) ⊺ A ⊺ x + B k x k 2 is also a sum of squares. Combining these equations and scaling we ha v e t hat k x k 2 + k y k 2 k z k 2 − 2 √ B x ⊺ A ( y ⊗ z ) is a degree 4 sum of squares. Thus, it is suﬃcient to pro v e norm bounds on k Í a A a ⊗ A T a k . W e ha v e an appropriate bound in the case whe n A  E because E has v e ry small Fro benius norm. Fo r t he cases when A  ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) or A  P ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) , we use the follo wing theorem Theorem 4.13. Let A  ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) or P ′ ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) where P ′ is a part of P . There is an absolut e const ant C such that for any α > 1 and β > 0 ,  "      Õ a A a ⊗ A T a      > α −( l + 1 ) # < n − β as long as m > C α β µ 3 2 r n 1 . 5 · log ( n ) and m > C α β µ 2 r n log ( n ) . Proo f. This theorem follo ws directl y from combining Proposit ion 5.10 , Theorem 6.1 , The- orem 7.1 , and Theorem 8.1 .  4.3.1 Final correction of er ror ter ms In this section, w e prov e a spectral norm bound that a llo ws us to correct error ter ms t ha t are left at the end of the const ruction. The proof uses the b y now standar d Matrix Ber nstein concentr ation inequality . Similar p roofs appear in the matrix completion literature [ Gro11 , Rec11 ]. Let { u i } , { v i } , { w i } ⊆  n be three orthonormal bases, with all v ectors µ -incoherent. Let Ω ⊆ [ n ] 3 be m entires sampled uniformly at random with replacement. (This sampling model is diﬀerent from what i s used in the rest of the proof. How ev er , it is w ell kno wn that the models are e quiv ale nt in terms of the ﬁnal reco v e ry problem.) Lemma 4.14. Let S ⊆ [ n ] 3 . Suppose m  µ | S | ( log n ) C for an absolute const ant C > 1 . Then with probability 1 − n ω ( 1 ) over the choice of Ω , the vectors ( u i ⊗ v j ⊗ w k ) Ω for ( i , j , k ) ∈ S ar e well-conditioned in the sense that t he ratio between the larg est and smallest singular value is at most 1 . 1 . Proo f. For s  ( i , j , k ) ∈ S , let y s  u i ⊗ v j ⊗ w k . Let Ω  { ω 1 , . . . , ω m } , where ω 1 , . . . , ω ∈ [ n ] 3 are sampled uniforml y at random with replacement. Let A b e the S -b y- S Gr am matrix of t he v e cto rs ( y s ) Ω . Then, A is t he sum of m identically distributed rank -1 matrices A i , A  m Õ i  1 A i with ( A i ) s , s ′  ( y s ) ω i · ( y s ′ ) ω i . Each A i has expectation  A i  n − 1 . 5 Id and spectral nor m a t most | S | · µ / n 1 . 5 . Standard matrix conc entration inequa lities [ T ro12 ] sho w that m > O ( | S | µ 2 log n ) is enough to ensure that the sum is spectral close to its expectation ( m / n 1 . 5 ) Id in the sense t hat 0 . 99 A  ( m / n 1 . 5 ) Id  1 . 1 A .  15 4.4 Deg r ee-4 cer tiﬁcates im ply ex act re cov er y In this section w e pro v e Theorem 4.3 . W e need t he follow ing technical le mma, which w e pro v e in Appendix A . Lemma 4.15. Let R be self-adjoi nt linear operat or R on  n ⊗  n . Suppose h( v j ⊗ w k ) , R ( v i ⊗ w i )i  0 for all indi c es i , j , k ∈ [ r ] such that i ∈ { j , k } . Then, ther e exist s a se l f-adjoint linear operat or R ′ on  n ⊗  n such that R ′ ( v i ⊗ w i )  0 for all i ∈ [ r ] , the spec tral norm of R ′ satisﬁes k R ′ k 6 10 k R k , and R ′ repr es ents t he same pol ynomial in  [ y , z ] , h( y ⊗ z ) , R ′ ( y ⊗ z )i  h( y ⊗ z ) , R ( y ⊗ z )i . W e can no w pro v e t hat certiﬁcates in the sense of Deﬁnition 4.2 impl y that our algo- rithm successfull y reco v es t he unkno wn tensor . Proo f of Theorem 4.3 . Let T be a certiﬁcate in the sense of Deﬁnition 4 .2 . Our goal is to cons tr uct a positiv e semideﬁnite matrix M on  n ⊕  n ⊗  n that represents the follo wing pol ynomial h( x , y ⊗ z ) , M ( x , y ⊗ z )i  k x k 2 + k y k 2 · k z k 2 − 2 h x , T ( y ⊗ z )i . Let T a be matrices such t hat h x , T ( x ⊗ y )i  Í a x a · T a ( y , z ) . Since k x k 2 + Í n a  1 T a ( y , z ) 2 − 2 h x , T ( y ⊗ z )i  k x − T ( y ⊗ z ) k is a sum of squares of pol ynomials, it will be enough to ﬁnd a positiv e semideﬁnite matrix tha t represents the polynomial k y k 2 · k z k 2 − Í n a  1 T a ( y , z ) 2 . (This step is a pol ynomial v ersion of the Schur com p le ment condition for positiv e semidef- initeness.) Let R be t he follo wing linear operato r R  n Õ a  1 T a ⊗ T a ⊺ − r Õ i  1 ( v i ⊗ w i )( v i ⊗ w i ) ⊺ , Lemma 4.16. R satisﬁe s t he r equir ement of Lemma 4.15 . Proo f. Consider h( v j ⊗ w k ) , R ( v j ⊗ w j )i . Since v j is repeated, the value of t h is e xpression will be t he same if w e replace R b y an R 2 which represents the same polynomial. Thus, w e can replace R by R 2  Í n a  1 T a ⊺ T a − Í r i  1 ( v i ⊗ w i )( v i ⊗ w i ) ⊺  T ⊺ T − Í r i  1 ( v i ⊗ w i )( v i ⊗ w i ) ⊺ W e now observ e that h( v j ⊗ w k ) , R 2 ( v j ⊗ w j )i  h( v j ⊗ w k ) , T ⊺ ( u j ) − ( v j ⊗ w j )i  0 . By a symmetrical proof, h( v j ⊗ w k ) , R ( v k ⊗ w k )i  0 as w ell.  By Lemma Lemma 4.15 , there exist s a self-adjoint linear oper ator R ′ that represents the same p olynomial as R , has spectral norm k R ′ k 6 10 k R k 6 0 . 1 , and sends all v ectors v i ⊗ w i to 0 . Since R ′ sends all v ector s v i ⊗ w i to 0 and k R ′ k 6 0 . 1 , the follo wing matrix R ′ ′  r Õ i  1 ( v i ⊗ w i )( v i ⊗ w i ) ⊺ + R ′ 16 has r eigen v alues of v alue 1 (corresponding to the space spanned b y v i ⊗ w i ) and all other eig en v alues are at most 0 . 1 (because the non-zero eigen v alue s of R ′ ha v e eigen v ectors orthogonal to all v i ⊗ w i ). A t the same time, R ′ ′ represents the follo wing poly nomial, h( y ⊗ z ) , R ′ ′ ( y ⊗ z )i  n Õ a  1 T a ( y , z ) 2 . Let P be a positiv e semideﬁnite matrix that represents the polyno mial k x k 2 + Í n a  1 T a ( y , z ) 2 − 2 h x , T ( y ⊗ z )i (such a matrix exists because the pol ynomial is a sum of squa res). W e choos e M as follo ws M   Id − T ( T ) ⊺ ( T ) ⊺ T  +  0 0 0 Id − R ′ ′  Since R ′ ′  Id , this matrix is positiv e semideﬁnite. Also, M represents k x k 2 + k y k 2 · k z k 2 − 2 h x , T ( y ⊗ z )i . Since u i  T ( v i ⊗ w i ) for all i ∈ [ r ] and the kernel of Id − R ′ ′ onl y contains span of v i ⊗ w i , t he kernel of M is exactl y the span of the v ectors ( u i , v i ⊗ w i ) . Next , w e sho w t ha t the abov e matrix M implies t hat Algorithm 4.1 reco v ers the un- kno wn tensor X . Recall t hat t he algorithm on input X Ω ﬁnds a pseudo-distribution µ ( x , y , z ) so as to minimize ˜  µ k x k 2 + k y k 2 · k z k 2 such that ( ˜  µ x ⊗ y ⊗ z ) Ω  X Ω . Since ev - erything is scale in v ariant, w e may assume that X  Í r i  1 λ i · u i ⊗ v i ⊗ w i for λ 1 , . . . , λ r > 0 and Í i λ i  1 . Then, a v alid pseudo-distribution w ould be t he probability distribu- tion ov er ( u 1 , v 1 , w 1 ) , . . . , ( u r , v r , w r ) with probabilities λ 1 , . . . , λ r . Le t µ be the pseudo- distribu tion computed by the algorithm. By optimalit y of µ , we know that the objectiv e v alue satisﬁes ˜  µ k x k 2 + k y k 2 · k z k 2 6  i ∼ λ k u i k 2 + k v i k 2 · k w i k 2  2 . Then, if w e let Y   µ ( x , y ⊗ z )( x , y ⊗ z ) ⊺ , 0 6 h M , Y i  ˜  µ ( x , y , z ) k x k 2 + k y k 2 · k z k 2 − 2 h x , T ( y ⊗ z )i 6 2 − 2 ˜  µ ( x , y , z ) h x , T ( y ⊗ z )i  2 − 2  i ∼ λ h u i , T ( v i ⊗ w i )i  0 The ﬁrs t step uses that M and Y are psd. The second step uses that M represents the pol ynomial k x k 2 + k y k 2 · k z k 2 − 2 h x , T ( y ⊗ z )i . The third step uses that µ minimizes t he objectiv e function. The fourth step uses tha t the entries of T are 0 outside of Ω and that µ matches the observ ations ( ˜  µ x ⊗ y ⊗ z ) Ω  X Ω . The last st ep uses t hat u i  T ( v i ⊗ w i ) for all i ∈ [ r ] . W e conclude that h M , Y i  0 , which means that t he range of Y is contained in the kernel of M . Therefore, Y  Í r i , j  1 γ i , j · ( u i , v i ⊗ w i )( u j , v j ⊗ w j ) ⊺ for scalars { γ i , j } . W e claim that the multipliers must satisfy γ i , i  λ i and γ i , j  0 for all i , j ∈ [ r ] . Inde ed since µ matches the observ ations in Ω , 0  n Õ i , j  1 ( λ i − γ i , j δ i j ) · ( u i ⊗ v j ⊗ w j ) Ω . 17 Since the v ectors ( u i ⊗ v j ⊗ w j ) Ω are linear ly ind ependent, w e conclude that γ i , j  λ i · δ i j as desired. (This linear independence w as one of the requirements of t he certiﬁcate in Deﬁnition 4.2 .)  4.5 Deg r ee-4 cer tiﬁcates exist w it h high pr obability In this section w e show that our certiﬁcate T in fact satisﬁes the conditions for a deg ree-4 certiﬁcate, pro ving Theorem 4.4 . W e use the same construction as in Section 4.3 . The main, remaining technical chal- leng e for Theorem 4.4 is to show t hat the cons truction satisﬁes the spectral norm condition of Deﬁnition 4.2 . This spectral norm bound follo ws from t he fo llow ing theorem which w e giv e a proof sk etch for in Appendix B . Theorem 4.17. Let A  ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) or P ( ¯ R Ω l P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) and let B  ( ¯ R Ω l ′ P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) or P ( ¯ R Ω l ′ P ) · · · ( ¯ R Ω 1 P )( ¯ R Ω 0 X ) . There is an absolut e consta nt C such t hat for any α > 1 and β > 0 ,  "      Õ a A a ⊗ B T a      > α −( l + l ′ + 2 ) # < n − β as long as m > C α β µ 3 2 r n 1 . 5 · log ( n ) and m > C α β µ 2 r n log ( n ) . Remar k 4.18 . If it w ere true in general that | | Í a A a ⊗ B T a | | 6 p | | Í a A a ⊗ A T a | | p | | Í a B a ⊗ B T a | | then it w ould be suﬃcient to use The orem 4.13 and w e w ould not nee d to pro v e Theorem 4.17 . U nfortunatel y , this is not true in gener al. That said, it ma y be possible t o sho w that ev en if w e do not kno w directly that | | Í a A a ⊗ B T a | | is small, since | | Í a A a ⊗ A T a | | and | | Í a B a ⊗ B T a | | are both small there must be some alternativ e ma trix representation of Í a A a ⊗ B T a which has small nor m, and this is suﬃcient. W e lea v e it a s an open problem whethe r this can be done. W e ha v e no w all ingredients to pro v e Theorem 4.4 . Proo f of Theorem 4.4 . Let k  ( log n ) C for some absolute constant C > 1 . Let E  (− 1 ) k P ( P ¯ R Ω k ) · · · ( P ¯ R Ω 1 )[ X ] . B y Lemma 4.12 there exists Y with ( Y ) Ω  Y and P [ Y ]  E such that k Y k F 6 O ( 1 ) k E k . W e let T  T ( k ) + Y . This tensor satisﬁes the desired line ar const raints ( T ) Ω  T and P [ T ]  X . Since E has the form of the matrices in Theorem 4.17 , the bound in Theorem 4.17 implies k E k F 6 2 − k · n 10 6 n − C + 10 . (Here, w e use t hat the nor m in the conclusion of Theorem 4.17 is within a fact or of n 10 of t he Frobenius norm. ) W e are to p rov e that the follo wing matrix has spectral norm bounded b y 0 . 01 , n Õ a  1 ( T ) a ⊗ ( T ) T a − n Õ a  1 X a ⊗ X T a . W e expand the sum accor ding to t he deﬁnition of T ( ℓ ) in Eq. (4.21) . Then, most terms that appear in t h e expansion hav e t he form as in Theorem 4.17 . Since those terms decrease 18 g eometricall y , w e can bound their contribution by 0 . 001 with probability 1 − n − ω ( 1 ) . The terms that inv olv e the error correction Y is smaller t han 0 . 0 01 because Y has polynomiall y small norm k Y k F 6 n − C + 10 . The only remaining ter ms are cross terms betw een X and a tensor of the form as in Theorem 4.17 . W e can bound t he tot al contribution of these terms also bounded by a t most 0 . 001 using Theorem B.27 .  5 Matr ix nor m bound tech niques In this section, w e de scribe the techniques t hat w e will use to prov e probabilistic norm bounds on matrices of the form Y  Í a ( ¯ R Ω A ) a ⊗ ( ¯ R Ω A ) T a . W e will prov e these norm bounds using the tr ace moment method, which obtains probabilistic bounds on the norm of a matrix Y from bounds on the expected v alue of t r (( Y Y T ) q ) for suﬃcientl y larg e q . This will require a na ly zing t r (( Y Y T ) q ) , which will take the form of a sum of products, where the ter ms in the product are eithe r entries of A or ter ms of the form ¯ R Ω ( a , b , c ) where ¯ R Ω ( a , b , c )  n 3 m − 1 if ( a , b , c ) ∈ Ω and − 1 other wise. T o analyze t r (( Y Y T ) q ) , w e will group products tog et he r which hav e the same expected beha vior on the ¯ R Ω ( a , b , c ) terms, forming smaller sums of products. For e ach of these sums, w e can then use the same bound on the expected beha vior of the ¯ R Ω ( a , b , c ) terms fo r each product in the sum. This allo ws us to mov e this bound outside of the sum, leaving us with a sum of products of entries of A . W e will then bound the v alue of t hese sums b y carefully choosing the order in which we sum o v er the ind ices. In t he reainder of this section and in the next tw o sections, w e allow for our tensors to ha v e asymmetric dime nsions. W e account for t his with t he follo wing deﬁnitions. Deﬁnition 5.1. W e de ﬁne n 1 to the dimension of the u v ectors, n 2 to be the dimension of t he v v e ctors, and n 3 to be the dimension of the w v ector s. W e deﬁne n m a x  ma x { n 1 , n 2 , n 3 } 5.1 The tr a ce moment met hod W e use the trace moment met hod through t he following proposition and corollary . Proposition 5.2. F or any random matri x Y , for any integ er q > 1 and any ε > 0 , P r " | | Y | | > 2 q r E [ t r (( Y Y T ) q )] ε # < ε Proo f. By Mark ov’ s inequality , for all integers q > 1 and all ε > 0 P r " t r (( Y Y T ) q ) > E  t r (( Y Y T ) q )  ε # < ε The result now f ollo ws fro m the observation that if | | Y | | > 2 q q E [ t r (( Y Y T ) q ) ] ε then t r (( Y Y T ) q ) > E [ t r (( Y Y T ) q ) ] ε .  19 Corollar y 5.3. For a gi ven p > 1 , r > 0 , n > 0 , and B > 0 , for a random m atri x Y , if E  t r  ( Y Y T ) q   6 ( q p B ) 2 q n r for all integ e rs q > 1 then for al l β > 0 , P r  | | Y | | > B e p  ( r + β ) 2 p ln n + 1  p  < n − β Proo f. W e take ε  n − β and w e choose q to minimize 2 q q ( q p B ) 2 q n r ε  B q p n r + β 2 q . Setting the deriv ativ e of this expression to 0 w e obtain t hat ( p q − r + β 2 q 2 ln n ) B q p n r + β 2 q  0 , so w e w ant q  r + β 2 p ln n . How ev er , q must be an integer , so w e instead take q  ⌈ r + β 2 p ln n ⌉ . With this q , w e hav e that B q p n r + β 2 q 6 B  r + β 2 p ln n + 1  p n p ln n  B e p  ( r + β ) 2 p ln n + 1  p Applying Proposition 5.2 with q , w e obtain that P r " | | Y | | > 2 q r E [ t r (( Y Y T ) q )] ε # 6 P r  | | Y | | > B e p  ( r + β ) 2 p ln n + 1  p  < n − β  5.2 P ar titioning by inter section patter n As discussed at the beginning of the section, E  t r (( Y Y T ) q )  will be a sum of products, where part of these products will be of the form Î 2 q ′ i  1 ¯ R Ω ( a i , b i , c i ) . H ere, q ′ ma y or ma y not be equal to q , in fact we will often ha v e q ′  2 q because each Y will contribute tw o terms of the form ¯ R Ω ( a , b , c ) to the product. T o handle this part of the product, w e partition the terms of our sum based on the intersection pattern of which triples ( a i , b i , c i ) are equal to each other . Fixing an intersection pattern determines the expected v alue of Î 2 q ′ i  1 ¯ R Ω ( a i , b i , c i ) . Deﬁnition 5.4. W e deﬁne an intersection pattern to be a set of equalities and inequalities satisfying the follo wing conditions 1. Al l of the equalities and inequalities are of the form ( a i 1 , b i 1 , c i 1 )  ( a i 2 , b i 2 , c i 2 ) or ( a i 1 , b i 1 , c i 1 ) , ( a i 2 , b i 2 , c i 2 ) , respectiv ely . 2. For ev ery i 1 , i 2 , either ( a i 1 , b i 1 , c i 1 )  ( a i 2 , b i 2 , c i 2 ) is in the intersection pattern or ( a i 1 , b i 1 , c i 1 ) , ( a i 2 , b i 2 , c i 2 ) is in the intersection pattern 3. Al l of the e q ualities and inequalities are consist ent with each other , i.e. t here exist v alues of ( a 1 , b 1 , c 1 ) , · · · , ( a 2 q , b 2 q , c 2 q ) satisfying all of the equalities and inequalities in the intersection pattern. Proposition 5.5. F or a given ( a , b , c ) , 20 1. E  ¯ R Ω ( a , b , c )   0 2. For all k > 1 , E h  ¯ R Ω ( a , b , c )  k i 6  n 1 n 2 n 3 m  k − 1 Corollar y 5.6. For a given intersection patt ern, if ther e is any triple ( a , b , c ) whi ch appears exactl y once, E h Î 2 q ′ i  1 ¯ R Ω ( a i , b i , c i ) i  0 . Otherwise, l etting z be the number of distinct triples, E h Î 2 q ′ i  1 ¯ R Ω ( a i , b i , c i ) i 6  n 1 n 2 n 3 m  2 q ′ − z Proo f. for a giv en intersection pattern, l e t ( a i 1 , b i 1 , c i 1 ) , · · · , ( a i z , b i z , c i z ) be the distinct triples and le t c j be the number of times t he triple ( a i j , b i j , c i j ) appears. W e ha v e that E       2 q ′ Ö i  1 ¯ R Ω ( a i , b i , c i )        z Ö j  1 E h  ¯ R Ω ( a i j , b i j , c i j )  c j i If c j  1 fo r an y j t hen this expression is 0 . Other wise, z Ö j  1 E h  ¯ R Ω ( a i j , b i j , c i j )  c j i 6 z Ö j  1  n 1 n 2 n 3 m  c j − 1   n 1 n 2 n 3 m   Í z j  1 c j  − z   n 1 n 2 n 3 m  2 q ′ − z  5.3 Bounding sums of pr oducts of tensor entr ies In this subsection, w e describe ho w t o bound t he sum of products of tensor entries w e obtain for a giv en intersection pattern af ter moving our bound on the expected v a lue of the ¯ R Ω ( a , b , c ) terms outside t he sum. W e represent such a p roduct with a h ypergraph a s follo ws. Deﬁnition 5.7. Giv en a set of distinct indices and a set of tensor e ntries on those indices, let H be the hy pergraph wit h one v ertex fo r each distinct inde x and one h yperedg e for each tensor entry , where the h yperedge consis ts of all ind ice s contained in the tensor entry . If the tenor entry appears to the pth po w er , w e take this h yperedge with multiplicity p . With th is deﬁnition in mind, w e will ﬁrst preprocess our products . 1. W e will preprocess t he tensor entries so t hat ev ery entry appe ars to a n ev en pow er using t he inequality | a b | 6 1 2 ( a 2 + b 2 ) . This has the e ﬀect of taking tw o h yperedges of our choice in H a nd replacing them wit h one doubled h yperedge or t he other (w e hav e to consider both possibilities). No te that this step makes all of our terms positiv e and can onl y increase their magnitude, so the result will be an upper bound on our a ctual sum. 2. W e will add the missing ter ms to our sum so that for w e sum ov er ev e r y possibility for t he distinct indices (ev en the possibilities which make sev eral of t he se indices equal and w ould put us in a diﬀerent intersection pattern). No te that thi s can only increase our sum. 21 Remar k 5.8 . I t is import ant tha t w e ﬁrst bound the expected v alue of the ¯ R Ω ( a , b , c ) terms and mo v e this bound outside of our sum be fore adding the missing terms to t he sum. After preprocessing our products, our str ategy will be as follo ws. W e will sum ov er the indices, remo ving the corresponding v ertices from H . As w e do this, w e will apply appropriate bounds on squared tensor entries, removing t he corresponding doubled h yperedge from H . T o obtain t hese bounds, w e observ e t hat w e can bound t h e av e rag e square of our tensor entries in terms of the number of indices w e are a v eraging ov er . Deﬁnition 5.9. W e sa y that an order 3 tensor A of dimensions n 1 × n 2 × n 3 is ( B , r , µ ) - bounded if t he follo wing bounds are true 1. max a , b , c { A 2 a b c } 6 B r 2. max { max b , c { 1 n 1 Í a A 2 a b c } , max a , c { 1 n 2 Í b A 2 a b c } , max a , b { 1 n 3 Í c A 2 a b c } } 6 B µ 3. max { max c { 1 n 1 n 2 Í a , b A 2 a b c } , max b { 1 n 1 n 3 Í a , c A 2 a b c } , max a { 1 n 2 n 3 Í b , c A 2 a b c } } 6 B µ 2 4. 1 n 1 n 2 n 3 Í a , b , c A 2 a b c 6 B µ 3 More g enerall y , w e sa y that a tensor A is ( B , r , µ ) -bounded if the follo wing is true 1. The maximum v alue of an entry of A squa red is at most B r 2. Ev ery index which w e av erag e o v er decreases our upper bound by a fact or of µ 3. If w e are av eraging ov er at least one index then w e can delete the fact or of r in our bound. Since r and µ will alw a ys be the same, w e write B -bounded rather than ( B , r , µ ) -bounded T o giv e a sense of why these a re th e correct type of bounds to use, we no w show that X is  r µ 3 n 1 n 2 n 3  -bounded. I n Section 7 , w e will use an iterativ e argument to sho w that with high probability , similar bounds hold for all of the tensors A w e will be considering. Proposition 5.10. X is  r µ 3 n 1 n 2 n 3  -bounded Proo f. Recall t hat X  Í r i  1 u i ⊗ v i ⊗ w i where t he v ectors { u i } are orthono rmal, the v ectors { v i } are orthonormal, and the v ectors { w i } are orthonormal. Also recall that for all i , a , b , c , u 2 i a 6 µ n 1 , v 2 i b 6 µ n 2 , and w 2 i c 6 µ n 3 . W e no w ha v e the follo wing bounds: 1. max a , b , c { X 2 a b c }  max a , b , c ( r Õ i  1 r Õ i ′  1 u i a v i b w i c u i ′ a v i ′ b w i ′ c ) 6 r 2 µ 3 n 1 n 2 n 3 22 2. max b , c ( 1 n 1 Õ a X 2 a b c )  1 n 1 max b , c ( Õ a r Õ i  1 r Õ i ′  1 u i a v i b w i c u i ′ a v i ′ b w i ′ c )  1 n 1 max b , c ( r Õ i  1 r Õ i ′  1 Õ a u i a u i ′ a ! v i b w i c v i ′ b w i ′ c )  1 n 1 max b , c ( r Õ i  1 v 2 i b w 2 i c ) 6 r µ 2 n 1 n 2 n 3 The other bounds where w e sum ov er one index follow b y symmetrical arguments. 3. max c ( 1 n 1 n 2 Õ a , b X 2 a b c )  1 n 1 n 2 max c ( Õ a , b r Õ i  1 r Õ i ′  1 u i a v i b w i c u i ′ a v i ′ b w i ′ c )  1 n 1 n 2 max c ( r Õ i  1 r Õ i ′  1 Õ a u i a u i ′ a ! Õ b v i b v i ′ b ! w i c w i ′ c )  1 n 1 n 2 max c ( r Õ i  1 w 2 i c ) 6 r µ n 1 n 2 n 3 The other bounds where w e sum ov er tw o indices follo w b y symmetric al arguments. 4. 1 n 1 n 2 n 3 Õ a , b , c X 2 a b c  1 n 1 n 2 n 3 Õ a , b , c r Õ i  1 r Õ i ′  1 u i a v i b w i c u i ′ a v i ′ b w i ′ c  1 n 1 n 2 n 3 r Õ i  1 r Õ i ′  1 Õ a u i a u i ′ a ! Õ b v i b v i ′ b ! Õ c w i c w i ′ c !  1 n 1 n 2 n 3 r Õ i  1 1  r n 1 n 2 n 3  With these kinds of bounds in mind, w e bound sums of products of tensor entries as follo ws. W e note that w e can alw a ys apply the entrywise bound for a squa red tensor 23 entry . How ev e r , to app ly any of the other bounds, w e must be a ble to sum ov er an index or indice s where the only term in our product which depends on this index or indices is the squared tensor entry . This can be described in terms of the h ypergraph H as follo ws. Deﬁnition 5.11. Giv en a h yperedg e e in H , deﬁne b ( e ) to the t he minimal B such t hat the tensor entry corresponding to e is B -bounded. Deﬁnition 5.12. W e sa y that a v ertex is free in H if it contained in onl y one h yperedge and this hy peredge appears with multiplicity tw o. W e can apply our bounds in the follo wing w a ys. 1. W e can alw a ys choose a hyperedg e e of H , use the entrywise bound of r b ( e ) on the corresponding squared tensor e ntry (note the extra fact or of r ), and reduce t he multiplicity of e b y tw o. 2. If there is a free v ertex incident with a doubled h yperedge e in H , w e can sum o v er all free v ertices which are incident wit h e using the corresponding bound t hen delete these v ertices and the doubled h yperedg e e from H . When w e do this, w e obtain a fact or of b ( e )  n 1 µ  # of deleted a v ertices  n 2 µ  # of deleted b v ertices  n 3 µ  # of deleted c vertic es The factors of n 1 , n 2 , n 3 appear because w e are summing o v er these indices and the facto rs of 1 µ appear because each inde x w e sum ov e r reduces the bound on the a v erag e value by a fact or of µ . If w e apply these bounds rep e atedly until there are no tensor entries/h yperedges left to bound, our ﬁnal bound on a single sum of products of tensor entries will be Ö e ∈ H p b ( e ) !  n 1 µ  # of a indices  n 2 µ  # of b indices  n 3 µ  # of c indices r # of entr ywise bounds used T o pro v e our ﬁnal upper bound, w e will argue that w e can a lwa ys apply these bounds in such a wa y tha t the number of times w e need to use an entrywise bound is suﬃcientl y small. 5.4 Counting i nter section patt er ns There will be one more fact or in our ﬁnal bound. This facto r will come from the number of p ossible intersection p a tterns with a giv e n number z of dist inct triples ( a , b , c ) . Lemma 5.13. The tot al number of intersection patt erns on 2 q ′ triples with z di stinct triples ( a , b , c ) such tha t every triple ( a , b , c ) has multiplicity at least two is at most  2 q ′ z  z 2 q ′ − z 6 2 2 q ′ q ′ 2 q ′ − z Proo f. T o determine which triples ( a , b , c ) are equal to each other , it is suﬃcient to decide which triples are distinct from all previous triples (there are  2 q ′ z  choices for this) and for the remaining 2 q ′ − z triples, which of the z distinct triples they are equal to (there are z 2 q ′ − z choices for this).  24 6 T rac e P o w er Calcula t i on for ¯ R Ω A ⊗ ( ¯ R Ω A ) T In t his section, w e implement the techniques described in Sectio n 5 to probabilist ically bound | | ¯ R Ω A ⊗ ( ¯ R Ω A ) T | | . In particular , w e prov e the follo wing theorem. Theorem 6 . 1 . If A is B -bounded, C > 1 , and 1. m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x 2. m > 10000 C ( 2 + β ) 2 r √ n 1 max { n 2 , n 3 } µ 3 2 ln n m a x > 1 0 000 C ( 2 + β ) 2 r √ n 1 n 2 n 3 µ 3 2 ln n m a x 3. µ r 6 min { n 1 , n 2 , n 3 } t hen deﬁning Y  ¯ R Ω A ⊗ ( ¯ R Ω A ) T , P r  | | Y | | > B n 1 n 2 n 3 C r µ 3  < 4 n −( β + 1 ) Corollar y 6.2. If C > 1 and 1. m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x 2. m > 10000 C ( 2 + β ) 2 r √ n 1 max { n 2 , n 3 } µ 3 2 ln n m a x > 1 0 000 C ( 2 + β ) 2 r √ n 1 n 2 n 3 µ 3 2 ln n m a x 3. µ r 6 min { n 1 , n 2 , n 3 } t hen P r  | | ¯ R Ω X ⊗ ( ¯ R Ω X ) T | | > 1 C  < 4 n −( β + 1 ) Proo f. This follo ws imme diately from Theorem 6.1 and the fact that X is  r µ 3 n 1 n 2 n 3  -bounded.  T o prov e Theorem 6.1 , w e break up Y into four parts and then prov e probabilistic norm bounds for each part. Deﬁnition 6.3. 1. Deﬁne ( Y 1 ) b c b ′ c ′  Y b c b ′ c ′ if b  b ′ , c  c ′ and 0 otherwise. 2. Deﬁne ( Y 2 ) b c b ′ c ′  Y b c b ′ c ′ if b  b ′ , c , c ′ and 0 otherwise. 3. Deﬁne ( Y 3 ) b c b ′ c ′  Y b c b ′ c ′ if b , b ′ , c  c ′ and 0 otherwise. 4. Deﬁne ( Y 4 ) b c b ′ c ′  Y b c b ′ c ′ if b , b ′ , c , c ′ and 0 other wise. 25 6.1 Structur e of t r (( Y j Y T j ) q ) W e ha v e t hat Y b c b ′ c ′  Í a ¯ R Ω ( a , b , c ′ ) ¯ R Ω ( a , b ′ , c ) A a b c ′ A a b ′ c . T o see the structure of ( Y j Y T j ) q , w e now com pute Y j Y T j . ( Y j Y T j ) b 1 c 1 b 2 c 2  Õ a 1 , a 2 , b ′ , c ′ ¯ R Ω ( a 1 , b 1 , c ′ ) ¯ R Ω ( a 1 , b ′ , c 1 ) ¯ R Ω ( a 2 , b 2 , c ′ ) ¯ R Ω ( a 2 , b ′ , c 2 ) A a 1 b 1 c ′ A a 1 b ′ c 1 A a 2 b 2 c ′ A a 2 b ′ c 2 where t he sum is taken o v er b ′ , c ′ which satisfy the appropriate constr aints. The ¯ R Ω terms will not be part of our hy pergraph H (as their expected beha vior is d e termined b y the intersection pattern). W e can view the ﬁrst tw o ter ms A a 1 b 1 c ′ and A a 1 b ′ c 1 as an hourg lass with upper triangle ( b 1 , a 1 , c ′ ) and lo w er triangle ( c 1 , a 1 , b ′ ) (where the v ertices in each triangle are listed from left to right). Similar ly , w e can view the last tw o terms A a 2 b 2 c ′ and A a 2 b ′ c 2 as an hourglass with upper triangle ( c ′ , a 2 , b 2 ) and low e r triangle ( b ′ , a 2 , c 2 ) . Thus, the hyper graph H cor responding to t r (( Y j Y T j ) q ) will be 2 q hourglasses glued tog ether where the top v e rtices of the hourglass alternate betw een b and c ′ indices, the botto m v ertices of the hourglass alternate betw een c and b ′ indices, and the middle v e rtices of the hourg lass are the a ind ice s. Remar k 6.4 . While there is no real diﬀerence betw e en the b and b ′ indices and betw een the c and c ′ indices, w e will keep track of t his to make it easier to see the structu re of H . As d e scribed in Section 5 , we split up E h t r (( Y j Y T j ) q ) i based on the intersection pa ttern of which of t he 4 q triples of the form ( a , b , c ′ ) or ( a , b ′ , c ) are equal to each other . W e onl y nee d to consider patterns where each triple and thus each hyperedg e appears at least twice, as other wise the terms in t he sum will ha v e e xpected v alue 0 . I n all cases, letting z be the number of distinct triples in a giv en intersection pa ttern, by Corollary 5.6 our bound on the expected v alue of the ¯ R Ω terms will be  n 1 n 2 n 3 m  4 q − z 6.2 Bounds on | | Y 1 | | Consider E  t r (( Y 1 Y T 1 ) q )  . The cons traints that b ′  b and c ′  c in ev ery Y force all of the b and b ′ indices to be equal and all of t he c and c ′ indices to be equal, so our h ypergraph H consist s of a single v ertex b , a single v ertex c , an d tw o copies of the h yperedge ( a i , b , c ) for ea ch i ∈ [ 1 , 2 q ] . F or all intersection patterns, the number of distinct triples z is equal to t he number of dist inct a indices, which can be any where from 1 to 2 q . W e apply our bounds on H as follo ws. 1. In our preprocessing step, when there a re tw o hyperedg es e 1 and e 2 which ap p e ar with odd multiplicity , w e double one of these hyperedg es or the other . Thus, w e can assume that all h yperedg es a p p e ar with ev en multiplicity . 2. W e will apply an entrywise bound 2 q − z times on hyperedg es of multiplicity > 4 , reducing t he multiplicity b y 2 e ach time. 26 3. Af ter apply ing these entrywise bounds, a ll of the distinc t a v ertices will be free and w e can sum up ov e r these indices one b y one. Recall tha t the bound from the R Ω terms is  n 1 n 2 n 3 m  4 q − z and our bound for the o ther terms is Ö e ∈ H p b ( e ) !  n 1 µ  # of a entries  n 2 µ  # of b entries  n 3 µ  # of c entries r # of entr ywise bounds used where b ( e )  B for all our h yperedges. Summing ov er all z ∈ [ 1 , 2 q ] a nd all intersection patterns using Lemma 5.13 , our ﬁnal bound is 2 q · 2 4 q max z ∈ [ 1 , 2 q ]  ( 2 q ) 4 q − z  n 1 n 2 n 3 m  4 q − z B 2 q  n 1 µ  z  n 2 µ   n 3 µ  r 2 q − z  The inner expression will either be maximized at z  2 q or z  1 and w e will alwa ys take q to be betw een ln n m a x 2 and n m a x 2 , so our ﬁnal bound on E  t r (( Y 1 Y T 1 ) q )  is at most ( 4 q ) 4 q max        n m a x n 2 1 n 2 n 3 B m µ ln n m a x ! 2 q  n 2 µ   n 3 µ  , n 2 1 n 2 2 n 2 3 r B m 2 ! 2 q m r µ 3        Since m > 100 00 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x and m > 10000 C ( 2 + β ) 2 r √ n 1 n 2 n 3 µ 3 2 ln n m a x , w e ha v e that E  t r (( Y 1 Y T 1 ) q )  < ( 16 q 2 ) 2 q  n 1 n 2 n 3 B 10000 C ( 2 + β ) 2 r µ 3 ( ln n m a x ) 2  2 q n 3 m a x (note t hat m < n 3 m a x as otherwise the tensor completion problem is trivial). W e no w recall Corollary 5.3 , which sa ys that for a giv en p > 1 , r > 0 , n > 0 , and B > 0 , for a random matrix Y , if E  t r  ( Y Y T ) q   6 ( q p B ) 2 q n r for all integer s q > 1 t hen for all β > 0 , P r  | | Y | | > B e p  ( r + β ) 2 p ln n + 1  p  < n − β Using Corollary 5.3 with the appropriate parameters, w e can show that for all β > 0 , P  | | Y 1 | | > 16 e 2 B n 1 n 2 n 3 10000 r µ 3  < n −( β + 1 ) m a x 6.3 Bounds on | | Y 2 | | and | | Y 3 | | Consider E  t r (( Y 2 Y T 2 ) q )  . The constr aint that b ′  b in ev er y Y forces all of the b and b ′ indices to be e qual, so our hy pergraph H consis ts of a single v ertex b and 4 q t otal h yperedges of the form ( a , b , c ) or ( a , b , c ′ ) . I gnoring the b v ertex (which is part of all the h yperedges), the ( a , c ) and ( a , c ′ ) edges form a single connected com ponent. W e onl y need to consider intersection patterns where each triple ( a , b , c ) or ( a , b , c ′ ) (and t hus each edge ( a , c ) or ( a , c ′ ) ) a p pears with multiplicity at least tw o. For a giv en intersection pattern, let z be t he numbe r of distinct edges. W e apply our bounds on H as follo ws. 27 1. In our preprocessing step, when the re a re tw o edges e 1 and e 2 which appear wit h odd multiplicity , w e double one of these edges or the other . Thus, w e can assume that all edges appe a r with ev e n multiplicity . 2. W e will apply an entrywise bound 2 q − z times on edges of multiplicity > 4 , reducing the multiplicity b y 2 each time. 3. Af ter apply ing these entrywise bounds, all of our ed ges will hav e multiplicity 2 . W e no w sum ov er a free a , c , or c ′ v ertex in H whenev er such a v ertex exists . Other wise, there must be a cy cle, in which case w e use t he e ntrywise bound on one edg e of the cy cle and de lete it. Deﬁnition 6.5. Let x be the number of times w e dele te an edge in a cy cle using the entrywise bound. Lemma 6.6. The to tal number of vertices in H (excluding b ) is z + 1 − x Proo f. Obser v e that ne ither deleting a free v ertex nor de le ting an edge in a cy cle can disconnect H . Also, exc ept for the ﬁnal edge where both of its v ertices will be free, ev er y edge which has a free v ertex has exactl y one free v e rtex. Thus, we d e lete an edg e in a cy cle x times, remo ving 0 v ertices each time, w e delete an ed ge with one free v ertex z − x − 1 times, remo ving 1 v ertex each time, and w e delete the ﬁnal edg e once, remo ving the ﬁnal tw o v ertices. This adds up to z + 1 − x v ertices in H .  Recall tha t the bound from the R Ω terms is  n 1 n 2 n 3 m  4 q − z and our bound for the o ther terms is Ö e ∈ H p b ( e ) !  n 1 µ  # of a en tries  n 2 µ  # of b entries  n 3 µ  # of c or c ′ entries r # of entr ywise bounds used where b ( e )  B for all our h yperedges. Summing ov er all z ∈ [ 1 , 2 q ] a nd all intersection patterns using Lemma 5.13 , our ﬁnal bound is 2 q · 2 4 q max z ∈ [ 1 , 2 q ] , x ∈[ 0 , z − 1 ] ( ( 2 q ) 4 q − z  n 1 n 2 n 3 m  4 q − z B 2 q  n m a x µ  z − 1 − x  n 1 n 2 n 3 µ 3  r 2 q − z + x ) Since µ r 6 n m a x , t he inner expression will e it her be maximized when z  2 q and x  0 or when z  1 and x  0 . Again, w e will alw a ys take q to be betw een ln n m a x 2 and n m a x 2 , so our ﬁnal bound on E  t r (( Y 2 Y T 2 ) q )  is at most ( 4 q ) ( 4 q ) max         n 1 n 2 n 3 n m a x B m µ ln n m a x  2 q  n 1 n 2 n 3 µ 2  , n 2 1 n 2 2 n 2 3 r B m 2 ! 2 q m r µ 3        Since m > 100 00 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x and m > 10000 C ( 2 + β ) 2 r √ n 1 n 2 n 3 µ 3 2 ln n m a x , w e ha v e that E  t r (( Y 2 Y T 2 ) q )  < ( 16 q 2 ) 2 q  n 1 n 2 n 3 B 10000 C ( 2 + β ) 2 r µ 3 ( ln n m a x ) 2  2 q n 3 m a x 28 Using Corollary 5.3 wit h t he appropriate parameters (in fact t he same ones as before), w e can sho w tha t for all β > 0 , P  | | Y 2 | | > 16 e 2 B n 1 n 2 n 3 10000 r µ 3  < n −( β + 1 ) m a x By a symmetric al argument, w e can obtain the same probabilis tic bound on | | Y 3 | | . 6.4 Bounds on | | Y 4 | | Consider E  t r (( Y 4 Y T 4 ) q )  . Our h ypergraph H consist s of 2 q hyperedg es of the form ( b , a , c ′ ) or ( c ′ , a , b ) from the top triangles of the hour glasses and 2 q h yperedg es of the form ( c , a , b ′ ) or ( b ′ , a , c ) from t h e bott om triangles of t he hourglasses. W e onl y need to consider intersection patterns where each triple (and thus e ach h yperedg e) appears with multiplicity at least tw o. For a giv en intersection pattern, let z be t he number of distinct h yperedges. Ignoring the a v ertices for now , w e can think of H as a graph on t he b , b ′ , c , and c ′ v ertices. No te that the ( b , c ′ ) and ( c ′ , b ) edges are part of a single connected component and the ( c , b ′ ) and ( b ′ , c ) e dg es are part of a single connected com ponent (t he se connected com ponents ma y or ma y not be the same). W e apply our bounds on H as follo ws. 1. In our preprocessing step, when there a re tw o hyperedg es e 1 and e 2 which ap p e ar with odd multiplicity , w e double one of these hyperedg es or the other . Thus, w e can assume that all h yperedg es a p p e ar with ev en multiplicity . 2. W e will apply an entrywise bound 2 q − z times on hyperedg es of multiplicity > 4 , reducing t he multiplicity b y 2 e ach time. 3. Af ter ap p lying these entrywise bounds, all of our h yperedges will ha v e multiplicity 2 . W e no w sum ov er a free b , b ′ , c , or c ′ v ertex in H whenev er such a v ertex exists. Other wise, there must be a cy cle on the ( b , c ′ ) and ( b ′ , c ) parts of the h yperedges, in which case w e use t he entrywise bound on one hyperedg e of the cy cle and delete it. Deﬁnition 6.7 . Let x be t he number of times w e delete a h yperedg e in a cy cle using the entrywise bound. Lemma 6.8. Let k be the number of connected components of H . The to tal number of b , b ′ , c , and c ′ vertices in H is z + k − x 6 z + 2 − x Proo f. The proof is similar to the proof of Le mma 6.6 . Observ e that neither d e leting a free v ertex nor de leting an edge in a cy cle can disconnect a connected component of H . Also, ex cept for the ﬁnal edge of a connected component where both of its v e rtices will be free, ev ery edge which has a free v e rtex ha s exactly one free v ertex. Thus, w e delete an edge in a cy cle x times, removing 0 v ertices each time, w e delete an edge with one free v ertex z − x − k times, removing 1 v ertex e a ch time, and w e delete the ﬁnal edge of a connected 29 com ponent k times, removing the ﬁnal 2 k v ertices. This adds up to z + k − x v ertices in H . For the inequality , recall that H has at mos t 2 connected com ponents, one fo r the ( b , c ′ ) edges and one fo r the ( c , b ′ ) edg es.  F inally , w e bound the number of distinct a indices Proposition 6.9. The number of distin ct a indic es is at m ost z 2 . Proo f. No te tha t b y t he deﬁnition of Y 4 , ev e ry a index must be part of at least tw o distinct h yperedges.  Recall t hat t he bound from the R Ω terms is  n 1 n 2 n 3 m  4 q − z and our bound for the other terms is Ö e ∈ H p b ( e ) !  n 1 µ  # of a entries  n 2 µ  # of b or b ′ entries  n 3 µ  # of c or c ′ entries r # of entr ywise bounds used where b ( e )  B for all our h yperedges. Summing ov er all z ∈ [ 2 , 2 q ] a nd all intersection patterns using Lemma 5.13 , our ﬁnal bound is 2 q · 2 4 q max z ∈ [ 1 , 2 q ] , x ∈[ 0 , z − 2 ] ( ( 2 q ) 4 q − z  n 1 n 2 n 3 m  4 q − z B 2 q  n 1 µ  z 2  max { n 2 , n 3 } µ  z − 2 − x n 2 2 n 2 3 µ 4 ! r 2 q − z + x ) Since µ r 6 min { n 1 , n 2 , n 3 } , t he inner expression will either be maximized when z  2 q and x  0 or when z  2 and x  0 . Ag ain, w e will alw a ys take q to be betw ee n ln n m a x 2 and n m a x 2 , so our ﬁnal bound on E  t r (( Y 2 Y T 2 ) q )  is at most ( 4 q ) ( 4 q ) max        n 1 n 2 n 3 √ n 1 max { n 2 , n 3 } B m µ 3 2 ln n m a x ! 2 q  n 3 m a x µ 2  , n 2 1 n 2 2 n 2 3 r B m 2 ! 2 q m 2 r 2 µ 5 n 1        Since m > 1 0000 C ( 2 + β ) 2 r √ n 1 max { n 2 , n 3 } µ 3 2 ln n m a x , w e ha v e that E  t r (( Y 4 Y T 4 ) q )  < ( 16 q 2 ) 2 q  n 1 n 2 n 3 B 10000 C ( 2 + β ) 2 r µ 3 ( ln n m a x ) 2  2 q n 6 m a x Using Corollary 5.3 with the appropriate parameters, w e can show that for all β > 0 , P  | | Y 4 | | > 16 e 2 B n 1 n 2 n 3 10000 r µ 3  < n −( β + 1 ) m a x Putting our four bounds to g ether with a union bound, for all β > 0 , P  | | Y | | > B n 1 n 2 n 3 r µ 3  6 P  | | Y | | > 64 e 2 B n 1 n 2 n 3 10000 r µ 3  < 4 n −( β + 1 ) m a x as needed. 30 7 Iterativ e tensor bounds In t his section, w e show that with high probability , applying the operat or P ¯ R Ω to an order 3 tensor A impro v es our bounds on it, where w e are assuming that Ω is chosen independently of A . Theorem 7 . 1 . If A is a B -bounded tensor , C > 1 , β > 0 , and 1. m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x 2. m > 10000 C ( 2 + β ) 2 r √ n 1 max { n 2 , n 3 } µ 3 2 ln n m a x > 1 0 000 C ( 2 + β ) 2 r √ n 1 n 2 n 3 µ 3 2 ln n m a x 3. µ r 6 min { n 1 , n 2 , n 3 } t hen P r  P ¯ R Ω A is not  B C  -bounded  < 100 n −( β + 1 ) m a x Proo f. W e ﬁrs t consider ho w P acts on a tensor Deﬁnition 7.2. 1. Deﬁne P U V to be t he projection onto s p a n { u i ⊗ v i ⊗ w : i ∈ [ 1 , r ]} . 2. Deﬁne P U W to be the p rojection onto s p a n { u i ⊗ v ⊗ w i : i ∈ [ 1 , r ] } . 3. Deﬁne P V W to be t he projection ont o s p a n { u ⊗ v i ⊗ w i : i ∈ [ 1 , r ] } . 4. Deﬁne P U V W to be t he projection ont o s p a n { u i ⊗ v i ⊗ w i : i ∈ [ 1 , r ] } . Proposition 7.3. P  P U V + P U W + P V W − 2 P U V W With this in mind, w e break up the tensor W  P ¯ R Ω A into four parts and then obtain probabilist ic bounds for each part. Theorem 7.1 will then follo w from the union bound and the inequality ( a + b + c − 2 d ) 2 6 5 ( a 2 + b 2 + c 2 + 2 d 2 ) . Deﬁnition 7.4. 1. Deﬁne W U V  P U V ¯ R Ω A . 2. Deﬁne W U W  P U W ¯ R Ω A . 3. Deﬁne W V W  P V W ¯ R Ω A . 4. Deﬁne W U V W  P U V W ¯ R Ω A . T o a naly ze these parts, w e reexpress P U V , P U W , P V W , P U V W in terms of matrices U V , U W , V W , U V W . Deﬁnition 7.5. 31 1. Deﬁne U V a b a ′ b ′  Í r i  1 u i a v i b u i a ′ v i b ′ 2. Deﬁne U W a c a ′ c ′  Í r i  1 u i a w i c u i a ′ w i c ′ 3. Deﬁne V W b c b ′ c ′  Í r i  1 v i b w i c v i b ′ w i c ′ 4. Deﬁne U V W a b c a ′ b ′ c ′  Í r i  1 u i a v i b w i c u i a ′ v i b ′ w i c ′ Proposition 7.6. 1. U V is  r µ 4 n 2 1 n 2 2  -bounded. 2. U W is  r µ 4 n 2 1 n 3 2  -bounded. 3. V W is  r µ 4 n 2 2 n 2 3  -bounded. 4. U V W is  r µ 6 n 2 1 n 2 2 n 2 3  -bounded. Proo f. These bounds can be pro v ed in the same w ay as Proposition 5.10 .  Proposition 7.7. 1. W U V a b c  Í a ′ , b ′ U V a b a ′ b ′ ¯ R Ω ( a ′ , b ′ , c ) A a ′ b ′ c 2. W U W a b c  Í a ′ , c ′ U V a c a ′ c ′ ¯ R Ω ( a ′ , b , c ′ ) A a ′ b c ′ 3. W V W a b c  Í b ′ , c ′ U V b c b ′ c ′ ¯ R Ω ( a , b ′ , c ′ ) A a b ′ c ′ 4. W U V W a b c  Í a ′ , b ′ , c ′ U V a b c a ′ b ′ c ′ ¯ R Ω ( a ′ , b ′ , c ′ ) A a ′ b ′ c ′ Proposition 7.8. 1.  W U V a b c  2  Í a ′ 1 , b ′ 1 , a ′ 2 , b ′ 2 U V a b a ′ 1 b ′ 1 U V a b a ′ 2 b ′ 2 ¯ R Ω ( a ′ 1 , b ′ 1 , c ) ¯ R Ω ( a ′ 2 , b ′ 2 , c ) A a ′ 1 b ′ 1 c A a ′ 2 b ′ 2 c 2.  W U W a b c  2  Í a ′ 1 , c ′ 1 , a ′ 2 , c ′ 2 U W a c a ′ 1 c ′ 1 U W a c a ′ 2 c ′ 2 ¯ R Ω ( a ′ 1 , b , c ′ 1 ) ¯ R Ω ( a ′ 2 , b , c ′ 2 ) A a ′ 1 b c ′ 1 A a ′ 2 b c ′ 2 3.  W V W a b c  2  Í b ′ 1 , c ′ 1 , b ′ 2 , c ′ 2 V W b c b ′ 1 c ′ 1 V W b c b ′ 2 c ′ 2 ¯ R Ω ( a , b ′ 1 , c ′ 1 ) ¯ R Ω ( a , b ′ 2 , c ′ 2 ) A a b ′ 1 c ′ 1 A a b ′ 2 c ′ 2 4.  W U V W a b c  2  Õ a ′ 1 , b ′ 1 , c ′ 1 , a ′ 2 , b ′ 2 , c ′ 2 U V W a b c a ′ 1 b ′ 1 c ′ 1 U V W a b c a ′ 2 b ′ 2 c ′ 2 ¯ R Ω ( a ′ 1 , b ′ 1 , c ′ 1 ) ¯ R Ω ( a ′ 2 , b ′ 2 , c ′ 2 ) A a ′ 1 b ′ 1 c ′ 1 A a ′ 2 b ′ 2 c ′ 2 32 W e need to pr obabilisticall y bound the expressions Í subset of { a , b , c } ( W U V , U W , V W , or U V W a , b , c ) 2 . F or each expression which w e ne ed to proba- bilisticall y bound, w e can obtain t his bound by analyzing the expected v alue of its q th po w er using the techniques in Section 5 and the n using a result similar to Corollary 5.3 . W e begin b y probabilistic ally bounding  W U V W a b c  2 . A s the remaining bounds will all be v ery similar , rather than giving a full proof of the remaining bounds w e will onl y describe the few diﬀerences and what eﬀect they hav e. Lemma 7.9. F or all a , b , c and all β > 0 , if m > 1 0000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x t hen P   W U V W a b c  2 > 32 e 2 B r µ 10000 C n m a x  < n −( β + 4 ) m a x Proo f. Similar to before, w e partition our sum based on the intersection pattern of which ( a ′ i , b ′ i , c ′ i ) are equal. Letting z be t he number of distinct triples ( a ′ i , b ′ i , c ′ i ) , the contributio n from the ¯ R Ω ( a ′ i , b ′ i , c ′ i ) terms will be at most a f actor of  n 1 n 2 n 3 m  2 q − z . Recall that for a giv en intersection p attern, our bound on the remaining terms is Ö e ∈ H p b ( e ) !  n 1 µ  # of a or a ′ indices  n 2 µ  # of b or b ′ indices  n 3 µ  # of c or c ′ indices r # of entr ywise bounds used Remar k 7.10 . Here w e will only be summing o v er a ′ , b ′ , and c ′ , indices, but for other expressions w e will be summing ov er a , b , and c indices as w ell. In our hyper g raph H , w e will hav e hy peredges ( a ′ i , b ′ i , c ′ i ) corresponding to the tensor entries A a ′ i , b ′ i , c ′ i and w e will hav e hyperedg es ( a , b , c , a ′ i , b ′ i , c ′ i ) corr esponding to the matrix entries U V W a , b , c , a ′ i , b ′ i , c ′ i . W e ha v e that Ö e ∈ H p b ( e )  B q r µ 6 n 2 1 n 2 2 n 2 3 ! q W e apply our techniques t o H as follo ws. 1. Recall that in our preprocessing st ep, w e can take a pair of h yperedges e 1 , e 2 and replace the m with either a doubled cop y of e 1 or a doubled cop y of e 2 . Using t his, w e ensure that ev ery hy peredge appe a rs with ev en multiplicity . Here, w e start with h yperedges ( a ′ , b ′ , c ′ ) where ev ery distinct ( a ′ , b ′ , c ′ ) has multiplic- ity at least tw o and hy peredges ( a , b , c , a ′ , b ′ , c ′ ) where ev ery distinct ( a , b , c , a ′ , b ′ , c ′ ) has multiplicity at le ast tw o ( a , b , c are t he same for all of these hyperedg es). Thus, in our preprocessing step, w e can ensure that all of the h yperedges ( a ′ , b ′ , c ′ ) and ( a , b , c , a ′ , b ′ , c ′ ) occur with ev en multiplicity and ev ery distinct h yperedge has mul- tiplicity at least tw o. 2. W e apply an entrywise bound q times to the ( a , b , c , a ′ , b ′ , c ′ ) h yperedg es. 33 3. W e will apply an entrywise bound q − z times on hyperedg es ( a ′ , b ′ , c ′ ) of multiplicity > 4 , reducing the multiplicity b y 2 ea ch time. Af ter doing this, all our h yperedg es will hav e m ultiplicity 2 . W e now ignore the c ′ v ertices and consider the g raph on the a ′ , b ′ v ertices. W e then sum ov er a free a ′ or b ′ v ertex in H whenev er such a v ertex exists. Other wise, there must be a cy cle (which could be a duplicated edge if w e hav e h yper -edges ( a ′ , b ′ , c ′ 1 ) and ( a ′ , b ′ , c ′ 2 ) ), in which case w e use the entr ywise bound on one edge of t he cy cle and delete it. Deﬁnition 7.11. Let x be the number of times w e de lete an edge in a cy cle using t he entrywise bound. Lemma 7 . 1 2. Let k be the number of connected components of H . The tot al number of a ′ and b ′ vertices in H is z + k − x 6 2 z − 2 x Proo f. The ﬁrst part can be prov ed in exactly the same wa y as Lemma 6.8 . For the ineq uality , w e need to show that k 6 z − x . T o see this, note that there are at most z dist inct edges and ev ery time w e delete an edge in a cy cle, this remov es one edge without reducing the number of connected com ponents. After removing a ll cy cles (and no other edges) , w e must ha v e at least as many edg es left as w e ha v e connected com p onents, so z − x > k , as needed.  Summing ov e r all z ∈ [ 1 , 2 q ] and all intersection patterns using Lemma 5.13 and noting that there are at most z a ′ , b ′ , c ′ indices but w e must ha v e tw o fe w er a ′ or b ′ indices fo r each time w e delete an edge in a cy cle using an entr ywise bound, our ﬁnal bound on E    W U V W a b c  2  q  is 2 q · 2 2 q max z ∈ [ 1 , 2 q ] , x ∈[ 0 , z − 1 ] ( q 2 q − z  n 1 n 2 n 3 m  2 q − z B r 2 µ 6 n 2 1 n 2 2 n 2 3 ! q  n 1 n 2 n 3 µ 3  z  µ min { n 1 , n 2 }  2 x r q − z + x ) Since m > > r q and µ r 6 min { n 1 , n 2 , n 3 } , t he inner expression will be maximize d when z  q and x  0 . Ag ain, w e will take q to be betw e en ln n m a x 2 and n m a x 2 so our ﬁnal bound on E    W U V W a b c  2  q  is at most ( 2 q ) ( 2 q )  2 B r 2 µ 3 m ln n m a x  q n m a x Since m > 1 0000 C ( 2 + β ) 2 r n m a x µ 2 ln n m a x , w e ha v e that for all a , b , c . E h   W U V W a b c  2  q i < q 2 q  8 B r µ 10000 C ( 2 + β ) 2 n m a x ( ln n m a x ) 2  q n m a x T o obtain our ﬁnal probabilistic bound, w e adapt Corollary 5.3 for non-negativ e scalar expressions. 34 Corollar y 7.13. For a given p > 1 , r > 0 , n > 0 , and B > 0 , for a non-neg ative scalar expression Z , if E [ Z q ] 6 ( q p B ) 2 q n r for all integ e rs q > 1 then for all β > 0 , P r " | Z | > B 2 e 2 p  ( r + β ) 2 p ln n + 1  2 p # < n − β Proo f. This can be prov ed in the same wa y as Corollary 5.3 exc ept that | Z | takes the place of | | Y Y T | | which is wh y the bound of Corollary 5.3 is squared.  Using Corollary 7.13 with the ap p ropriate parameters, for all a , b , c P   W U V W a b c  2 > 32 e 2 B r µ 10000 C n m a x  < n −( β + 4 ) m a x  The remaining bounds can be prov ed in a similar w a y , though the re are a few diﬀer - ences. W e now consider the remaining bounds in v olv ing W U V W . When w e av erag e o v er at least one coor dinate, our analysis is as fo llow s: 1. The ( a , b , c , a ′ , b ′ , c ′ ) hyperedg es no long er all hav e the same ( a , b , c ) . In fact, since the intersection patterns onl y specify which ( a ′ i , b ′ i , c ′ i ) a re equal to each other , w e treat all of t he diﬀerent a , b , c as distinct indices. 2. For each ( a , b , c ) , w e begin with tw o ( a , b , c , a ′ , b ′ , c ′ ) hyperedg es which ha v e this ( a , b , c ) (though their ( a ′ , b ′ , c ′ ) ma y be diﬀerent) T o handle this, in our preprocessing step w e tak e each such pair of ( a , b , c , a ′ , b ′ , c ′ ) h yperedges and double one or the other . 3. A v e raging ov er the a , b , or c indices, w e av oid using entrywise bounds for an y of the doubled ( a , b , c , a ′ , b ′ , c ′ ) h yperedg es. 4. The analysis of the ( a ′ , b ′ , c ′ ) h yperedg es is exactly t he same T aking Z to be the appropriate expression (for example, Z  1 n 1 Í a  W U V W a b c  2 if w e are onl y a v eraging ov er the a index), our bound on E [ Z q ] is aﬀected as fo llow s: 1. A v oiding using the entrywise bounds on the ( a , b , c , a ′ , b ′ , c ′ ) hy peredges reduces our bound on E [ Z q ] by a factor of r q . 2. If w e a v erag e ov er a , this giv es us q a d ditional a indices to sum ov er , increasing our bound on E [ Z q ] b y a factor of  n 1 µ  q , but this also giv e s us a fact or of 1 n q 1 so the net eﬀect is to reduce our bound on E [ Z q ] b y a fact or of µ q . Similar logic applies to b and c , so each index w e a v erag e o v er (including the ﬁrst) reduces our bound on E [ Z q ] b y a fact or of µ q . 35 This implies that each ind e x w e a v erag e ov er (including the ﬁrst) reduces our ﬁnal bound b y a fact or of µ and a v eraging ov er at least one index reduces our ﬁnal bound b y a further fact or of r , as needed. A t this point, w e just need to consider t he bounds inv olving W U V , as t he remain- ing cases are symmetric. When we analyze Z   W U V a b c  2 rather than  W U V W a b c  2 , our analys is d iﬀers as follow s. Instead of having  B r 2 µ 6 n 2 1 n 2 2 n 2 3  q in our bound on E [ Z q ] from t he ( a , b , c , a ′ , b ′ , c ′ ) hyperedg es, w e will hav e  B r 2 µ 4 n 2 1 n 2 2  q from ( a , b , a ′ , b ′ ) hyperedg es, increas- ing our bound on E [ Z q ] by a factor of  n 2 3 µ 2  q . How ev er , this is partiall y counter acted b y the fact t hat w e are eithe r no long er summing ov e r the c ′ indices separatel y from the c ind ice s because w e alwa ys hav e that c ′ i  c i . This remov es a factor of ( n 3 µ ) z from our bound on E [ Z q ] . Thus, our bound on E [ Z q ] is no w 2 q · 2 2 q max z ∈ [ 1 , 2 q ] , x ∈[ 0 , z − 1 ] ( q 2 q − z  n 1 n 2 n 3 m  2 q − z B r 2 µ 4 n 2 1 n 2 2 ! q  n 1 n 2 µ 2  z  µ min { n 1 , n 2 }  2 x r q − z + x ) W e check that it is still optimal to take z  q and x  0 . Since r µ 6 min { n 1 , n 2 , n 3 } , it is alwa ys optimal to take x  0 . N o w if w e reduce z by 1 , this giv es us a factor of at most q n 1 n 2 n 3 m · r µ 2 n 1 n 2  q r µ 2 n 3 m . W e will tak e q 6 10 ( 1 + β ) ln n m a x and w e hav e that m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x , so it is in d e ed still optimal to take z  q and x  0 . Thus, the net eﬀect of the diﬀerences is a factor of  n 3 µ  q in our bound on E [ Z q ] which giv es us a factor of n 3 µ in our ﬁnal bound. This giv es us t he following bound. Lemma 7.14. F or all a , b , c and all β > 0 , if m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x t hen P   W U V a b c  2 > 32 e 2 B r 10000 C  < n −( β + 4 ) m a x F inally , w e consider what happen s if w e a v erag e ov er one or more of the a , b , and c indices. If w e a v erag e o v e r the a indices or av erag e o v er the b indices, the n inst ead of using entrywise bounds on the ( a , b , a ′ , b ′ ) h yperedg es, the index or indices w e a v erag e o v er will create free v e rtices, allowing us t o bound th e ( a , b , a ′ , b ′ ) hyperedg es without using an y entrywise bounds. W e can now use the same reasoning a s before. The ﬁnal case is if w e onl y a v erag e ov er the c indices. Lemma 7.15. F or all a , b and all β > 0 , if m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x and t h e n P " 1 n 3 Õ c  W U V a b c  2 > 32 e 2 B 10000 C µ # < n −( β + 4 ) m a x Proo f. In this case, our preprocessing e n sures t ha t a ll distinct hyperedg es appea r with multiplicity which is ev en and at least tw o. No w instead of ﬁrs t bounding the ( a , b , a ′ , b ′ ) h yperedges and the n bounding the ( a ′ , b ′ , c ′ ) h yperedges, w e will ﬁrst bound the ( a ′ , b ′ , c ′ ) 36 h yperedges using all of the distinct c indices and bound the ( a , b , a ′ , b ′ ) h yperedg es using the distinct a ′ , b ′ indices. Letting y  z − ( # of dist inct c ′ ) , w e ha v e t h e following bounds on the number of a ′ , b ′ , c indices and the number of times w e will use an entrywise bound 1. There are at most z a ′ indices and th e re are at most z b ′ indices. 2. There are a t most 2 z − 2 x a ′ and b ′ indices, where x is the number of times we use an entrywise bound on ( a , b , a ′ , b ′ ) h yperedg es because of an ( a ′ , b ′ ) edg e in a cy cle. 3. There are ( z − y ) c indices. 4. The total number of times that w e will use an entrywise bound is 2 q − 2 z + x + y T aking Z  1 n 3 Í c  W U V a b c  2 , this giv es us a bound of 2 q · 2 2 q max z ∈ [ 1 , 2 q ] , x , y ∈[ 0 , z − 1 ] ( q 2 q − z  n 1 n 2 n 3 m  2 q − z B r µ 4 n 2 1 n 2 2 n 3 ! q  n 1 n 2 n 3 µ 3  z  µ min { n 1 , n 2 }  2 x  µ n 3  y r 2 q − 2 z + x + y ) on E [ Z q ] . W e check that it is opt imal to take z  q , x  0 , and y  0 . Since r µ 6 min { n 1 , n 2 , n 3 } , it is a lw ay s optimal to take x  y  0 . N o w if w e reduce z by 1 , this giv es us a fact or of at most q n 1 n 2 n 3 m · r 2 µ 3 n 1 n 2 n 3  q r 2 µ 3 m . W e will take q 6 10 ( 1 + β ) ln n m a x and we ha v e that r µ 6 min { n 1 , n 2 , n 3 } and m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x , so it is indeed optimal to take z  q , x  0 , and y  0 . Comparing the resulting bound to our bound on E   W U V a b c  2 q  , it is smaller by a fact or of ( r µ ) q , so our ﬁnal bound is smaller b y a fact or of r µ , as needed.  W e now hav e a ll of our nee ded probabilist ic bounds. Theorem 7.1 follo ws from the inequality W 2 a b c 6 5  ( W U V a b c ) 2 + ( W U W a b c ) 2 + ( W V W a b c ) 2 + 2 ( W U V W a b c ) 2  and union bounds.  8 T rac e P o w er Calcula t i on for P ′ ¯ R Ω A ⊗ ( P ′ ¯ R Ω A ) T In t his section, w e prov e the follo wing theorem. Theorem 8 . 1 . If A is B -bounded, C > 1 , and 1. m > 10000 C ( 2 + β ) 2 n m a x r µ 2 ln n m a x 2. m > 10000 C ( 2 + β ) 2 r √ n 1 max { n 2 , n 3 } µ 3 2 ln n m a x > 1 0 000 C ( 2 + β ) 2 r √ n 1 n 2 n 3 µ 3 2 ln n m a x 3. µ r 6 min { n 1 , n 2 , n 3 } 37 t hen P r  | | Y | | > B n 1 n 2 n 3 C r µ 3  < 4 n −( β + 1 ) whenever Y i s any of t he following: 1. Y  P U V ¯ R Ω A ⊗ ( P U V ¯ R Ω A ) T 2. Y  P U W ¯ R Ω A ⊗ ( P U W ¯ R Ω A ) T 3. Y  P V W ¯ R Ω A ⊗ ( P V W ¯ R Ω A ) T 4. Y  P U V W ¯ R Ω A ⊗ ( P U V W ¯ R Ω A ) T Proo f. This can be prov ed using t h e techniques of Sections 5 and 6 with one additional trick. W e ﬁrst consider the P U V W case and then describe the diﬀerences for t he other cases. In all of these cases, w e will show that t he bound w e obtain on E  t r (( Y Y T ) q )  is much less than the bound w e obtained for E  t r (( Y 4 Y T 4 ) q )  in section 6 , which w as 2 q · 2 4 q max z ∈ [ 1 , 2 q ] , x ∈[ 0 , z − 2 ] ( ( 2 q ) 4 q − z  n 1 n 2 n 3 m  4 q − z B 2 q  n 1 µ  z 2  max { n 2 , n 3 } µ  z − 2 − x n 2 2 n 2 3 µ 4 ! r 2 q − z + x ) Thus, for sim plicity , for the remainder of the section, w e will a bsorb constants, functions of onl y q , and logar ithms into an ˜ O . Doing this and taking z  2 q , x  0 , the abo v e bound becomes  ˜ O  n 1 n 2 n 3 √ n 1 max { n 2 , n 3 } B m µ 3 2   2 q n 2 m a x When Y  P U V W ¯ R Ω A ⊗ ( P U V W ¯ R Ω A ) T , the structure of t r  ( Y Y T ) q  is as follo ws. W e ha v e ( a ′ , b ′ , c ′ ) hyperedg es and w e ha v e hyperedg es ( a , b , c , a ′ , b ′ , c ′ ) which w e can view as an outer triangle ( a , b , c ) and an inner triangle ( a ′ , b ′ , c ′ ) . The outer triangles form hourg lasses as before while the inne r triangles sit inside the outer triangles. The ¯ R Ω terms only in v olv e the ( a ′ , b ′ , c ′ ) triples so our intersection patterns onl y de- scribe these indices. Thus, w e sum ov er all of the a , b , c indice s freely . W e n ow use the follo wing additional trick. W e decompo se each U V W a b c a ′ b ′ c ′ as Í r i  1 u i a v i b w i c u i a ′ v i b ′ w i c ′ . N o w obser v e that ev er y v ertex in the outer triangles appe ars in tw o h ypered ges. Whe n w e sum o v er that v ertex, w e get a ter m such as Í a u i 1 a u i 2 a . This is 0 unless i 1  i 2 and is 1 if i 1  i 2 . This in fact forces a global choice for i among t he U V W term s, giving a single fact or of r f or the choices for this global i . This also mea ns that the v ertices in the outer triangles giv e a factor of exactl y 1 , so t hey can be ignored! For the remaining terms of U V W , w e use t he bounds u 2 i a ′ 6 µ n 1 , v 2 i b ′ 6 µ n 2 , and w 2 i c ′ 6 µ n 3 , obtaining a factor of  µ 3 n 1 n 2 n 3  2 q W e now consider the contribution from summing ov er t he a ′ , b ′ , c ′ v ertices, the con- tribution from the ¯ R Ω terms, and the contribution from the entries of A . Letting z be the number of distinct triples ( a ′ , b ′ , c ′ ) in the giv en intersection pattern, t he contribution from the ¯ R Ω terms will be  n 1 n 2 n 3 m  4 q − z . The contribution from the entries of A from the b ( e ) is B 2 q . Letting x be the number of times that w e ha v e to use an e ntr ywise bound on 38 a doubled edge because it is in a cy cle, w e hav e the follo wing bounds on the number of indices and the number of times w e use an entrywise bound. 1. The number of distinct a indices, the number of distinct b indices, and the number of d istinct c indices are all at most z 2. The total number of distinct indices is at most 3 z − x . 3. The number of times w e use an entrywise bound is 2 q − z + x Putting ev e rything tog ethe r , w e obtain a bound of ˜ O max z ∈ [ 1 , 2 q ] , x ∈[ 0 , z − 1 ] ( r  µ 3 B n 1 n 2 n 3  2 q  n 1 n 2 n 3 m  4 q − z  n 1 n 2 n 3 µ 3  z  µ min { n 1 , n 2 , n 3 }  x r 2 q − z + x ) ! for E  t r (( Y Y T ) q )  . This is maximized when z  2 q and x  0 , lea ving us with a bound of  ˜ O ( n 1 n 2 n 3 B m )  2 q r which is much less t han the bound of  ˜ O  n 1 n 2 n 3 √ n 1 max { n 2 , n 3 } B m µ 3 2   2 q n 2 m a x which w e had for E  t r (( Y 4 Y T 4 ) q )  , so w e get a correspondingly smaller norm bound, as needed. The ana lysis is t he same for the P U V , P U W , and P V W cases excep t for t he follo wing diﬀerences which increase the bound on E  t r (( Y Y T ) q )  , but still makes it much less than w e had for E  t r (( Y 4 Y T 4 ) q )  . 1. In the P V W case t here are no w tw o global indices, one for the top of t he outer hourg lasses and one for the bott om of the outer hourglasses. This giv es us a global fact or of r 2 rather than r . 2. Since one of t he outer indices is now me rg ed with the correspo nding inner index, instead of the U V W terms giving us factor s of  µ n 1  2 q ,  µ n 2  2 q , and  µ n 3  2 q for the inner ind i ces, w e will onl y hav e tw o of these fact ors. This increases our bound on E  t r (( Y Y T ) q )  b y a facto r of at most  n m a x µ  2 q Putting these diﬀerences tog ether , our bound will now be  ˜ O ( n 1 n 2 n 3 n m a x B m µ )  2 q r 2 which is still much less than the bound w e had for E  t r (( Y 4 Y T 4 ) q )  .  Ref erence s [BBH + 12] Boaz Ba rak , F ernando G. S. L. Brandão, Aram W ettro th Harrow , Jonathan A. K elner , Da vid S teurer , and Y uan Zhou, Hypercont ractivity , sum-of-squar es proofs, and t h e ir applications , ST OC, A CM, 2012, pp. 307 –326. 2 39 [BGL16] Vĳa y V . S. P . Bhattiprolu, V enkatesan Guruswami, and Euiw oong Lee, Certi- fying random polynomials over the unit sp h ere via s um of squares hie rar chy , CoRR abs/1605.00903 (20 16). 2 [BHK + 16] Boaz Barak, Samuel B. Hopkins, Jonathan A. Kelner , Pra v esh K othari, Ankur Moitr a, and A aron P otechin, A near ly tight sum-of-squar e s lower bound for the plant ed clique problem , CoRR abs/1604.030 8 4 (201 6). 2 [BKS14] Boaz Barak , Jonathan A. K elner , and Da vid St eurer , Rounding sum-of-squar es relaxations , S TOC, A C M, 2014, pp. 31–40. 2 [BKS15] , D ictionary learning and t ensor decomposition via the sum-of-squares m e t hod , S TOC, A C M, 2015, pp. 143–151. 2 [BM16] Boaz Barak and Ankur Moitr a, Noisy tensor completion via the sum-of-squar es hi e r - ar chy , C OL T, JMLR W orkshop and Conference Proceedings, v ol. 49, JMLR.org, 2016, pp. 417– 445. 1 , 6 , 7 [BS14] Boaz Barak and Da vid St eurer , Sum-of-squares pr oofs and the q uest tow ar d optimal algorithms , Ele ctronic Colloquium on Computational Complexity (ECCC) 21 (2014), 59. 2 [BS15] Srinadh Bhojanapalli and Suja y Sangha vi, A new sampling techniq ue for tensors , CoRR abs/1502.05023 (2015). 1 [CR09] Emmanuel J. Ca nd è s and Benjamin Recht, Exact matrix completion via con vex optimization , Foundatio ns of Computational Mathematics 9 (2009), no. 6, 717– 772. 1 [CT10] Emmanuel J. Candès and T erence T ao, The power of conv ex r elaxation: near -optimal matrix completion , I EEE T rans. Information Theory 56 (2010), no. 5, 20 5 3–2080. 1 [F O07] U riel Feig e and E ran Ofek , Easily refut able subformulas of larg e r andom 3c nf formu- las , The ory of Computing 3 (2007), no. 1, 25–43. 6 [FS12] Michael A. For bes and Amir Shpilka, On i dentity tes ting of tensors, low -ran k recov ery and c ompressed s e nsing , ST OC, A CM, 2012, pp. 16 3 –172. 2 [GLM16] Rong Ge, Jason D. Le e, and T engyu Ma, Matrix completi on h as no spurious local minimum , C oRR abs/1605.07272 (2016). 1 [GM15] Rong Ge and T engyu Ma, Decomposing over complete 3rd order t e nsors using sum- of-squar es algorithms , A PPR O X -RA NDOM, LIPI cs, v ol. 40, Schloss Dagstuhl - Leibniz-Zentrum fuer Informatik , 20 1 5, pp . 829–849. 2 [Gri01a] Dima Grigorie v , Complexity of positivst el lensatz proofs for the knapsack , Computa- tional C om plexity 10 (2001), no. 2, 139 – 154. 2 40 [Gri01b] , Linear lower bound on degrees of positivst e llensatz c alc ulus pr oofs for the parity , Theor . Comput. Sci. 259 (2001), no. 1-2, 613 –622. 2 [Gro11] Da vid Gross, Recovering low -rank matrice s from few coeﬃcients in any basis , IEEE T rans. Information Theory 57 (2011), no. 3, 1548–1566. 1 , 5 , 6 , 13 , 15 [GV01] Dima Grigoriev and Nicolai V orobjo v , Complexity of null-and posi ti vst ellensatz proofs , Ann. Pure Appl. Logic 1 13 (20 01), no. 1-3, 153 – 160. 2 [Har14] Mo ritz Hardt , Unders tanding alternating minim ization for matrix completion , F OCS, IEEE Com p uter Society , 20 14, pp. 651–66 0. 1 [HM16] Elad Hazan and T engyu Ma, A non-gen erative f ramew or k and con vex relaxations for unsupervised learning , CoRR abs/1610.01132 (20 1 6). 2 [HSS15] S amuel B. Hopkins, Jonathan Shi, and David Steurer , T ensor p r incipal compo- nent anal ysis via sum-of-squar e proo fs , C OL T, JMLR W or k shop and Conference Proceedings, v ol. 40, JMLR.org , 2015, pp. 956–1006. 2 , 6 [HSSS16] Samuel B. Hopkins, T selil Schramm, Jonathan Shi, and David Steurer , F ast spectral algorithms from sum-of-squar es p roofs: tensor dec omposition and plant ed sparse vectors , S TOC, A CM, 2016, pp. 178 – 191. 2 [HW14] Moritz Hardt and Mary W oott ers, F ast matrix c ompletion without t he condi- tion number , C OL T, JMLR W or k shop and Conference Proceedings, v ol. 35, JMLR.org, 20 14, pp. 6 38–678. 1 [JN15] Prateek Jain and Praneeth Netr ap a lli, F ast exact matrix completi on wit h ﬁnite sam- ples , C OL T, JMLR W or kshop a nd Conference Proceedings, v ol. 40, JMLR.org, 2015, pp. 1007 –1034. 1 [JNS13] Prateek Jain, Praneeth N etrapalli, and Suja y Sanghav i, Low -ran k m atrix comple- tion using alternating m inimization , S TOC, A C M, 2013 , pp. 665 –674. 1 [JO14] Prateek Jain and Sew oong Oh, Prov able tensor fact orization wi th missing data , NIPS, 2014, pp. 1431– 1439. 1 [Kho02] Subhash Khot , On the p ower of uniqu e 2-prov er 1-r ound games , ST OC, A C M, 2002, pp. 7 6 7–775. 2 [KMO09] Raghunandan H. Kesha van, Andrea Montanari, and Sew oong Oh, Matrix c om- pletion from noisy e ntries , NIPS, Curr an Associates, I nc., 2009, pp. 952–960. 1 [Las01] Jean B. Lasserre, Global optimization wi th polyn omials and the p rob lem of m oments , SIAM J. Optim. 11 (2000 /01), no. 3 , 79 6 – 817. MR 18140 4 5 2 [MSS16] T engyu Ma, Jonathan Shi, and Da vid St eurer , P olynomial-time tensor decomposi- tions wit h s um-of-squar es , CoRR abs/1610.01980 (2016). 2 41 [MW15] T engyu Ma and A vi Wig derson, Sum-of-squar es lower bounds for sparse PCA , NIPS, 2015, pp. 1612– 1620. 2 [P ar00] P ablo A P arrilo, Structur ed sem i deﬁnite programs and se mialgebr aic geometry meth- ods in r obust ness and optimization , Ph.D. thesis, California Institut e of T echnology , 2000. 2 [Rec11] Benjamin Recht, A simpler approac h to m atrix completion , Journal of Machine Learning Research 12 (2011), 3413–343 0 . 1 , 5 , 6 , 13 , 15 [RRS16] Prasad Ra ghav end ra, Satish Rao, and T selil Schramm, S trong ly r efuting ran dom csps below the spe ctral thr esh old , CoRR abs/1605.00058 (2016). 2 [Sho87] N . Z. Shor , An approac h to obtaining global extr ema in polynomial problems of mathematical progr am m ing , Kibernetika (Kiev) (1987), no. 5, 102–106, 136. MR 931698 2 [SS05] N athan Srebro a nd Adi Shraibman, R ank, trace-norm and max-norm , C OL T, Lec- ture No tes in Com puter Science, v ol. 3559, Springer , 2005, pp. 545–560. 1 [T ro12] Joel A. T ropp, U ser -friendly tail bounds for sums of random m atrices , Foundatio ns of C om putational Mathematics 12 (2012), no. 4, 389–434. 15 A Controll ing t he ker nel of matr ix repres e ntat i ons W e pro v e the follo wing lemma in this section which w as a n ingredient of the proof of Theorem 4.3 . Let { u i } , { v i } , { w i } be three orthonormal bases of  n . Lemma (Restat ement Lemma 4.15 ) . Let R be self-adjoint linear oper at or R on  n ⊗  n . Suppose h( v j ⊗ w k ) , R ( v i ⊗ w i )i  0 for all indices i , j , k ∈ [ r ] such that i ∈ { j , k } . Th en, ther e exists a self-adjoint linear operat or R ′ on  n ⊗  n such that R ′ ( v i ⊗ w i )  0 for all i ∈ [ r ] , the sp ectral norm of R ′ satisﬁes k R ′ k 6 10 k R k , and R ′ repr es ents the same polyno mial in  [ y , z ] , h( y ⊗ z ) , R ′ ( y ⊗ z )i  h( y ⊗ z ) , R ( y ⊗ z )i . Proo f. W e write R ( v i ⊗ w i )  Õ j k c i i j k v j ⊗ w k Then t he conditio n on the bilinear form of R im p l ie s t ha t fo r all i , j , k , c i i i k  0 and c i i j i  0 . W e now take Z to be the follo wing matrix Z  Õ i , j , k c i i j k  ( v j ⊗ w k )( v i ⊗ w i ) T − ( v j ⊗ w i )( v i ⊗ w k ) T − ( v i ⊗ w k )( v j ⊗ w i ) T + ( v i ⊗ w i )( v j ⊗ w k ) T  + Õ i j c i i j j 2  ( v j ⊗ w j )( v i ⊗ w i ) T − ( v j ⊗ w i )( v i ⊗ w j ) T − ( v i ⊗ w j )( v j ⊗ w i ) T + ( v i ⊗ w i )( v j ⊗ w j ) T  42 It can be v eriﬁed directly that Z represents the 0 poly nomial and has the same beha vior on each of the ( v i ⊗ w i ) as R . The fact or of 1 2 in t h e second sum comes from the fact that c j j i i  c i i j j and t he fourth term for c j j i i matches the ﬁrst ter m for c i i j j W e choose R ′  R − Z . In order to show the bound k R ′ k 6 10 k R k it is enough t o sho w that k Z k 6 9 k R k W e analyze the norm of Z a s follo ws. W e break Z int o pa rts according to each type of term and analyze each pa rt separatel y . Deﬁne X to be the subspace spanne d by the ( v i ⊗ w i ) , deﬁne P X to be the projection onto X a n d de ﬁne P ⊥ X to be t he projection onto the subspace orthogonal to X . F or the part Í i j k c i i j k ( v j ⊗ w k )( v i ⊗ w i ) T , no te that Í i j k c i i j k ( v j ⊗ w k )( v i ⊗ w i ) T  P ⊥ X R ′ P X so it has norm at mos t | | R | | . F or the part Í i j k c i i j k ( v j ⊗ w i )( v i ⊗ w k ) T , note that under a change of basis this is equiv a lent to a block -diagonal matrix with blocks Í j k c i i j k v j w T k . The norm of each such block is a t most its F robenius norm, which is the norm of Í j k c i i j k ( v j ⊗ w k )  R ′ ( v i ⊗ w i ) . Thus, this part also has norm at most | | R | | . U sing similar arguments, w e can bound the norm of the other parts b y | | R | | a s w ell, obtaining that | | Z | | 6 8 | | R | | .  B Full T r a ce P o w er Calcul a tion In this section, w e analyze | | Í a A a ⊗ B T a | | where A  ( ¯ R Ω l P l ) · · · ( ¯ R Ω 1 P 1 )( ¯ R Ω 0 X ) or A  P l + 1 ( ¯ R Ω l P l ) · · · ( ¯ R Ω 1 P 1 )( ¯ R Ω 0 X ) for some projection operators P 1 , · · · , P l , P l + 1 and B  ( ¯ R Ω l ′ P l ′ ) · · · ( ¯ R Ω 1 P ′ 1 )( ¯ R Ω 0 X ) or B  P l ′ + 1 ( ¯ R Ω l ′ P l ′ ) · · · ( ¯ R Ω 1 P ′ 1 )( ¯ R Ω 0 X ) for some projec- tion operat ors P ′ 1 , · · · , P ′ l ′ , P ′ l ′ + 1 . In particular , w e pro v e the follo wing theorem using the trace pow e r met h od. Theorem B.1. The re is an absolute constan t C such that for an y α > 1 and β > 0 , P r " | | Õ a A a ⊗ B T a | | > α −( l + l ′ + 2 ) # < n − β as long as 1. r µ 6 min { n 1 , n 2 , n 3 } 2. m > C α β µ 3 2 r √ n 1 max { n 2 , n 3 } l o 1 ( max { n 1 , n 2 , n 3 } ) 3. m > C α β µ 2 r max { n 1 , n 2 , n 3 } l o 1 ( max { n 1 , n 2 , n 3 } ) Remar k B.2 . In this draft, w e onl y sk etch t he case where w e do not ha v e projection operator s in front. T o handle the cases where there are projection operators in front, w e can use the same ideas t hat are sketched out in Section 8 , 43 B.1 T e r m Str ucture When w e expand out the sums in t r   ( Í a A a ⊗ B T a )( Í a A a ⊗ B T a ) T  q  , our terms will hav e the follo wing structure. W e label the indices so that each ¯ R Ω j operat or has its ow n indices ( a i j , b i j , c i j ) or ( a ′ i j , b ′ i j , c ′ i j ) . Man y of these indices will be equal. 1. For all i ∈ [ 0 , l ] and all j ∈ [ 1 , 2 q ] w e hav e indices ( a i j , b i j , c i j ) and a corresponding term ¯ R Ω i ( a i j , b i j , c i j ) in the product. 2. For all i ∈ [ 0 , l ′ ] and all j ∈ [ 1 , 2 q ] w e hav e indices ( a ′ i j , b ′ i j , c ′ i j ) and a corresponding term ¯ R Ω i ( a ′ i j , b ′ i j , c ′ i j ) in the product. 3. For all j ∈ [ 1 , 2 q ] w e hav e a term X a 0 b 0 c 0 and a ter m X a ′ 0 b ′ 0 c ′ 0 in t h e product. 4. For all i ∈ [ 0 , l ] and all j ∈ [ 1 , 2 q ] w e hav e a term P i ( a i j , b i j , c i j , a ( i − 1 ) j , b ( i − 1 ) j , c ( i − 1 ) j ) in the product. 5. For all i ∈ [ 0 , l ′ ] and all j ∈ [ 1 , 2 q ] w e hav e a ter m P ′ i ( a ′ i j , b ′ i j , c ′ i j , a ′ ( i − 1 ) j , b ′ ( i − 1 ) j , c ′ ( i − 1 ) j ) in the product. W e represent the terms in the p roduct graphicall y as follo ws. Deﬁnition B.3. 1. For all i , w e represent the terms ¯ R Ω i ( a i j , b i j , c i j ) and ¯ R Ω i ( a ′ i j , b ′ i j , c ′ i j ) by triangles. W e call these triangles R i -triangles and R ′ i -triangles respectiv ely . 2. For all i and j , (a) If P i  P U V then w e represent P i ( a i j , b i j , c i j , a ( i − 1 ) j , b ( i − 1 ) j , b ( i − 1 ) j ) b y a hyperedg e ( a i j , b i j , a ( i − 1 ) j , b ( i − 1 ) j ) . W e call this hy peredge a U V -h yperedge. (b) If P i  P U W then w e represent P i ( a i j , b i j , c i j , a ( i − 1 ) j , b ( i − 1 ) j , b ( i − 1 ) j ) b y a h yperedg e ( a i j , c i j , a ( i − 1 ) j , c ( i − 1 ) j ) . W e call this hy peredge a U W -hyperedg e. (c) If P i  P V W then w e represent P i ( a i j , b i j , c i j , a ( i − 1 ) j , b ( i − 1 ) j , b ( i − 1 ) j ) b y a h yperedge ( b i j , c i j , b ( i − 1 ) j , c ( i − 1 ) j ) . W e call this hy peredge a V W -hy peredge. (d) If P i  P U V W then w e represent P i ( a i j , b i j , c i j , a ( i − 1 ) j , b ( i − 1 ) j , b ( i − 1 ) j ) b y a h y- peredge ( a i j , b i j , c i j , a ( i − 1 ) j , b ( i − 1 ) j , c ( i − 1 ) j ) . W e call this h yperedge a U V W - h yperedge. W e represent the P ′ i terms b y hyperedg es in a similar ma nn e r . 3. For all j ∈ [ 1 , 2 q ] , w e represent t he term X a 0 j b 0 j c 0 j with a h yperedge ( a 0 j , b 0 j , c 0 j ) and w e represent the term X a ′ 0 j b ′ 0 j c ′ 0 j with a h yperedg e ( a ′ 0 j , b ′ 0 j , c ′ 0 j ) . W e call the se h yperedges X -hyperedg es. W e ha v e the follo wing equalities a m ong the indices: 44 1. For all j ∈ [ 1 , 2 q ] , a l j  a ′ l ′ j 2. For all j ∈ [ 1 , 2 q ] , if j is ev en then b l j  b l ( j + 1 ) and c ′ l ′ j  c ′ l ′ ( j + 1 ) 3. For all j ∈ [ 1 , 2 q ] , if j is odd then c l j  c l ( j + 1 ) and b ′ l ′ j  b ′ l ′ ( j + 1 ) 4. For all i ∈ [ 1 , l ] and all j ∈ [ 1 , 2 q ] , if P i  P U V then c i j  c ( i − 1 ) j , if P i  P U W then b i j  b ( i − 1 ) j , and if P i  P V W then a i j  a ( i − 1 ) j 5. For all i ∈ [ 1 , l ] and all j ∈ [ 1 , 2 q ] , if P ′ i  P U V then c ′ i j  c ′ ( i − 1 ) j , if P ′ i  P U W then b ′ i j  b ′ ( i − 1 ) j , and if P ′ i  P V W then a ′ i j  a ′ ( i − 1 ) j B.2 T e chniques In t his section, w e describe how to bound t he expected v alue of t r ( Õ a A a ⊗ B a )( Õ a A a ⊗ B a ) T ! q ! W e ﬁrst consider the ¯ R Ω i terms, which for a giv en choice of the indices are as follo ws: ©  « l Ö i  0 2 q Ö j  1 ¯ R Ω i ( a i j , b i j , c i j ) ª ® ¬ ©  « l ′ Ö i  0 2 q Ö j  1 ¯ R Ω i ( a ′ i j , b ′ i j , c ′ i j ) ª ® ¬ F or a giv en choice of the ind ice s, the expected value of t his part can be bounded as fo llow s Deﬁnition B.4. For all i , le t z i be th e number of distinct R i -triangles and le t z ′ i be the number of d istinct R ′ i -triangles. If a triangle appears as both an R i -triangle and as an R ′ i -triangle t hen it contributes 1 2 to both z i and z ′ i (so t he to tal number of distinc t triangles at lev el i is z i + z ′ i ) Lemma B.5. F or a given choice of the indices { a i j , b i j , c i j } and { a ′ i j , b ′ i j , c ′ i j } 1. If any triang le appears exactly once at some le vel i then E       ©  « l Ö i  0 2 q Ö j  1 ¯ R Ω i ( a i j , b i j , c i j ) ª ® ¬ ©  « l ′ Ö i  0 2 q Ö j  1 ¯ R Ω i ( a ′ i j , b ′ i j , c ′ i j ) ª ® ¬        0 2. If for all i , all of the triang les which appear at level i app e ar at l east twice then 0 < E       ©  « l Ö i  0 2 q Ö j  1 ¯ R Ω i ( a i j , b i j , c i j ) ª ® ¬ ©  « l ′ Ö i  0 2 q Ö j  1 ¯ R Ω i ( a ′ i j , b ′ i j , c ′ i j ) ª ® ¬       6  n 1 n 2 n 3 m  Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i ) 45 Proo f. If there is any triangle ( a , b , c ) which appears exactl y once in lev el i then ¯ R Ω i ( a i , b i , c i ) has expectation 0 and is inde p e ndent of ev ery other term in the product so t he e ntire prod- uct has v alue 0 . Otherwise, note t hat for k > 1 , 0 < E  ( R Ω i ( a , b , c )) k  6  n 1 n 2 n 3 m  k − 1 . F urther note that R Ω i ( a , b , c ) ter ms wit h either diﬀerent i or diﬀerent a , b , c are ind epen- dent of each other . Thus, using t his bound, each cop y of a triangle bey ond the ﬁrst giv e s us a fact or of  n 1 n 2 n 3 m  . The total number of facto rs which w e obtain is t he tot al number of triangles minus the number of distinct triangles (where triangles at diﬀerent lev els are automatic ally distinct) and t he result follo ws.  W e no w note that this bound holds for all sets of indices t hat follo w the same intersec- tion pa ttern of which R i -triangles and R ′ i -triangles are eq ua l to e a ch other . Thus, w e can group all ter ms which ha v e t he same intersection pattern tog ether , using this bound on all of t hem. Each such intersection pattern forces additional equalities betw een t h e indices. A fter taking these equalities into account, w e must sum ov er the remaining distinct indices. W e no w an a ly ze what happens wit h the remaining terms of t he product as w e sum ov e r these indices. W e begin b y considering ho w w ell w e can bound t he sum of entries of X squared if w e sum o v er 0 , 1 , 2 , or all 3 indices. Lemma B.6. 1. max a b c { X 2 a b c } 6 r 2 µ 3 n 1 n 2 n 3 2. max b c { Í a X 2 a b c } 6 r µ 2 n 2 n 3 3. max c { Í a , b X 2 a b c } 6 r µ n 3 4. Í a , b , c X 2 a b c  r Proo f. For t he ﬁrst statement, X 2 a b c  Õ i , j u i a v i b w i c u j a v j b w j c 6 r 2 max i { u 2 i a v 2 i b w 2 i c } 6 r 2 µ 3 n 1 n 2 n 3 F or the second stat ement, Õ a X 2 a b c  Õ i , j Õ a u i a u j a ! v i b w i c v j b w j c  Õ i , a u 2 i a v 2 i b w 2 i c 6 µ 2 n 2 n 3 Õ a , i u 2 i a  r µ 2 n 2 n 3 F or the third statement, Õ a , b X 2 a b c  Õ i , j , b Õ a u i a u j a ! v i b w i c v j b w j c  Õ i , a , b u 2 i a v 2 i b w 2 i c 6 µ n 3 Õ i Õ a u 2 i a Õ b v 2 i b !  r µ n 3 The ﬁnal st atement can be pro v ed in a similar w a y .  46 N ote that ev ery index w e sum ov er reduces the av erage value b y µ . Further note that if w e do not sum ov er an y indices, there is an extra factor of r in our bound. Follo wing similar logic, similar statements hold fo r the P i and P ′ i terms. W e utilize t his as fo llow s. W e st art with a hy pergraph H which represents the current terms in our product. W e ﬁrst preprocess our product using the inequality | a b | 6 x 2 a 2 + b 2 2 x (carefull y choosing each a , b , and x ) to make all of our h yperedges hav e ev en multiplicity . N ote that when doing this, w e cannot fully control which doubled hyperedg es w e will ha v e; if w e apply this on h yperedg es e 1 and e 2 w e could end up with tw o copies of e 1 or tw o copies of e 2 . N o w if w e hav e a h yperedge with multiplicity 4 or more, w e use t he entrywise bound to reduce its multiplicity b y 2 . F or example, if our sum w as Í a X 4 a b c then w e w ould use the inequali ty Õ a X 4 a b c 6 max a b c { X 2 a b c } Õ a X 2 a b c to bound this sum. Once ev er y h yperedge appe ars with po w er 2, w e choose an ordering for how w e will bound the hyperedg es. F or each h yperedg e, w e sum ov e r all indices which are currentl y only incide nt with that h yperedg e, take the appropriate bound, and then delete the h yperedg e and these in d ices from our current hyper graph H . W e account for all of this with t he follo wing deﬁnitions: Deﬁnition B.7. 1. Deﬁne the base value of an X -h yperedg e e to be v ( e )  q r µ 3 n 1 n 2 n 3 2. Deﬁne the base value of a U V -h yperedge e to be v ( e )  r r µ 4 n 2 1 n 2 2 3. Deﬁne the base value of a U W -h yperedge e to be v ( e )  r r µ 4 n 2 1 n 2 3 4. Deﬁne the base value of a V W -hyperedg e e to be v ( e )  r r µ 4 n 2 2 n 2 3 5. Deﬁne the base value of an U V W -hyperedg e e to be v ( e )  r r µ 6 n 2 1 n 2 2 n 2 3 Deﬁnition B.8. W e sa y that a index in our hy pergraph H is free if it is incident with at most one hy peredge. F or a giv en intersection pa ttern, assuming that ev ery v ertex is incident with at le ast one h yperedge after the preprocessing, our ﬁnal bound will be  n 1 n 2 n 3 m  Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i ) Ö e v ( e ) !  n 1 µ  # of a indices 47  n 2 µ  # of b indices  n 3 µ  # of c indices r # of doubled h y peredges we bound wit h no free index T o see this, note t hat from the discussion abov e, when an index a , b , or c is free and w e sum ov er it, w e obtain n 1 , n 2 , or n 3 terms respectiv e ly but t his also reduces the current bound w e are using by a facto r of µ . This will happen precisely one time for ev ery index which is incide nt to at least one edg e. Thus, for this part the ordering doesn’ t really matter . Ho w ev er , there is an e xtr a facto r of r whenev e r w e bound a doubled h yperedg e with no free index (including when this hy peredge has multiplicity 4 or higher and w e reduce its multiplicity by 2). W e want to av oid this extr a factor of r as much as possible. W e describe ho w to do this in subsection B.4 . Remar k B.9 . Whe n summing ov er an index, w e may not acutall y sum ov er all p ossiblities because this could create equalities betw een triangles which should not be equal according to the intersection pattern. H ow ev er , adding in these missing ter ms can onl y increase the sum, so it is still a n uppe r bound. B.3 Bounding t he number of indices In this subsection, w e describe bounds on the number of each type of inde x for a giv en intersection pattern. W e t hen deﬁne a coeﬃcient ∆ which is the discrepency betw een the our bounds and t he actual number of indices and reexpress our boun in ter ms of ∆ . W e make the follo wing simplifying assump tion about our sums. 1. For all i ∈ [ 0 , l ′ ] , w e either hav e that a ′ i j  a i j for all j ∈ [ 1 , 2 q ] or a ′ i j , a i j for a ll j ∈ [ 1 , 2 q ] . 2. For all i ∈ [ 0 , l ′ ] , w e eithe r ha v e t ha t b ′ i j  b i j for a ll j ∈ [ 1 , 2 q ] or b ′ i j , b i j for all j ∈ [ 1 , 2 q ] . 3. For all i ∈ [ 0 , l ′ ] , w e either ha v e that c ′ i j  c i j for all j ∈ [ 1 , 2 q ] or c ′ i j , c i j for a ll j ∈ [ 1 , 2 q ] . Moreo v er , all of these choices a re ﬁx ed beforehand. W e just ify this assump tion with a random partitioning argument in subsection B.5 . With th is setup, w e ﬁrst bound the number of each type of index which appe a rs. Deﬁnition B.10. For all i , 1. W e deﬁne x i a to be t he number of distinct indices a i j which do not ap p e ar at a higher lev el, w e deﬁne x i b to be the number of distinct indices b i j which do not appe ar at a higher lev el, and w e de ﬁne x i c to be the number of distinct indices c i j which do not appear at a higher lev el. 2. W e deﬁne x ′ i a to be t he number of distinct indices a ′ i j which do not ap p e ar at a higher lev el, w e deﬁne x ′ i b to be the number of distinct indices b ′ i j which do not appe ar at a higher lev el, and w e de ﬁne x ′ i c to be the number of distinct indices c ′ i j which do not appear at a higher lev el. 48 In t he case where w e ha v e an equality a ′ i j  a i j and t his index does not appear at a higher lev el, w e instead count it as 1 2 for x i a and 1 2 for x ′ i a (and similarl y for b and c ). Recall that w e deﬁned z i to be t he number of distinct R i -triangles and w e d e ﬁned z ′ i to be the nmber of dist inct R ′ i -triangles. z i and z ′ i giv e the follo wing bounds on t he coeﬃcients Proposition B.11. 1. For all i < l , x i a 6 z i , x i b 6 z i , and x i c 6 z i . 2. For all i < l ′ , x ′ i a 6 z ′ i , x ′ i b 6 z ′ i , and x ′ i c 6 z ′ i . 3. If l ′ , l , b ′ l j , b l j , or c ′ l j , c l j , (a) x l a + x ′ l ′ a 6 min { z l , z ′ l ′ } (b) x l b + x l c 6 z l + 1 (c) x ′ l ′ b + x ′ l ′ c 6 z ′ l ′ + 1 4. In the spe cial case that l ′  l , b ′ l j  b l j , and c ′ l j  c l j , (a) x l a + x ′ l ′ a  z l + z ′ l ′ (b) x l b + x l c  x ′ l ′ b + x ′ l ′ c  1 Proo f. The ﬁrs t tw o statements and 3(a) follo w from the obser v a tion t hat distinct v ertices must be in distinct triangles. F or 3(b), no te t hat if w e take the b , c edges from each R i - triangle, t he resulting graph is connected. Thus, each distinct such edge (which must come from a distinct triangle) af ter the ﬁrst edge can only add one new v ertex and the result follo ws. 3 (c) can be prov ed ana logousl y . F or the fourth statement, note that in this case all of the b l j and b ′ l j indices are equal to a single index b and all of the c l j and c ′ l j indices a re equal to a single index c . Thus, t he number of dist inct a l j is equal to the number of distinct R l and R ′ l triangles.  With these bounds in mind, w e deﬁne x m a x coeﬃcients which represent the maximum number of distinct indices w e can expect (giv e n the structure of A and B and the values z i , z ′ i ) and ∆ coeﬃcients which describe the discrepency betw een this maximum and the number of dist inct indices which w e actuall y ha v e. Deﬁnition B.12. 1. For all i < l , (a) If P i + 1  P U V then w e de ﬁne x m a x i a  x m a x i b  z i . W e de ﬁne ∆ i a  x m a x i a − x i a , ∆ i b  x m a x i b − x i b , and ∆ i c  0 . (b) If P i + 1  P U W then w e deﬁne x m a x i a  x m a x i c  z i . W e deﬁne ∆ i a  x m a x i a − x i a , ∆ i c  x m a x i c − x i c , and ∆ i b  0 . 49 (c) If P i + 1  P V W then w e deﬁne x m a x i a  x m a x i b  z i . W e deﬁne ∆ i b  x m a x i b − x i b , ∆ i c  x m a x i c − x i c , and ∆ i a  0 . (d) If P i + 1  P U V W then w e deﬁne x m a x i a  x m a x i b  x m a x i c  z i . W e deﬁne ∆ i a  x m a x i a − x i a , ∆ i b  x m a x i b − x i b , and ∆ i c  x m a x i c − x i c . 2. For all i < l ′ , (a) If P ′ i + 1  P U V then w e de ﬁne x ′ m a x i a  x ′ m a x i b  z ′ i . W e deﬁne ∆ ′ i a  x ′ m a x i a − x ′ i a , ∆ ′ i b  x ′ m a x i b − x ′ i b , and ∆ ′ i c  0 . (b) If P ′ i + 1  P U W then w e deﬁne x ′ m a x i a  x ′ m a x i c  z ′ i . W e d eﬁne ∆ ′ i a  x ′ m a x i a − x ′ i a , ∆ ′ i c  x ′ m a x i c − x ′ i c , and ∆ ′ i b  0 . (c) If P ′ i + 1  P V W then w e deﬁne x ′ m a x i a  x ′ m a x i b  z ′ i . W e d eﬁne ∆ ′ i b  x ′ m a x i b − x ′ i b , ∆ ′ i c  x ′ m a x i c − x ′ i c , and ∆ ′ i a  0 . (d) If P ′ i + 1  P U V W then w e deﬁne x ′ m a x i a  x ′ m a x i b  x ′ m a x i c  z ′ i . W e deﬁne ∆ i a  x ′ m a x i a − x ′ i a , ∆ ′ i b  x ′ m a x i b − x ′ i b , and ∆ ′ i c  x ′ m a x i c − x ′ i c . 3. If l ′ , l , b ′ l ′ j , b l j , or c ′ l ′ j , c l j then w e deﬁne x m a x l l ′ a  min { z l , z ′ l ′ } , w e deﬁne x m a x l b c  z l + 1 , and we deﬁne x ′ m a x l b c  z ′ l ′ + 1 . I n the special case where l ′  l , b ′ l ′ j  b l j , and c ′ l ′ j  c l j , w e deﬁne x m a x l l ′ a  z l + z ′ l and x m a x l b c  x ′ m a x l ′ b c  1 . In both of these cases, w e de ﬁne ∆ l l ′ a  x m a x l l ′ a − x l a − x ′ l ′ a , ∆ l b c  x m a x l b c − x l b − x l c , and ∆ ′ l ′ b c  x ′ m a x l ′ b c − x ′ l ′ b − x ′ l ′ c . Deﬁnition B.13. W e deﬁne ∆  ∆ l l ′ a + ∆ l b c + ∆ ′ l ′ b c + l − 1 Õ i  0 ( ∆ i a + ∆ i b + ∆ i c ) + l ′ − 1 Õ i  0 ( ∆ ′ i a + ∆ ′ i b + ∆ ′ i c ) W e now reexpress our bound in terms of ∆ . Lemma B.14. For a given inter section patt ern and choices for the equalities or inequalities between t he a i j , b i j , c i j and a ′ i j , b ′ i j , c ′ i j indices, we can obtain a bound which is a pro duct of n 2 n 3 µ 2  µ min { n 1 , n 2 , n 3 }  ∆ r # of doubled hyperedg es we bound with no fr ee ind ex −  Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i )  and t e r ms of t h e form r µ 3 2 √ n 1 max { n 2 , n 3 } m , r µ 2 max { n 1 , n 2 , n 3 } m , or r µ 3 m Proo f. Recall that our bound w as  n 1 n 2 n 3 m  Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i ) Ö e v ( e ) !  n 1 µ  # of a indices  n 2 µ  # of b indices  n 3 µ  # of c indices r # of doubled h y peredges we bound wit h no free index 50 F or all i < l , w e consider the part of this bound which comes from P i + 1 and the indice s a i , b i , c i which do not appe ar at a higher lev e l. Similary , for all i < l ′ , w e consider the part of this bound which comes from P ′ i + 1 and the indices a ′ i , b ′ i , c ′ i which do not appear at a higher lev el. F inally , w e consider t he p art of this bound that comes from the X h yperedges, the R l -triangles, the R ′ l ′ -triangles, and their indices. 1. If P i + 1  P U V then w e can de com pose t he corresponding terms into the f ollo wing parts: (a) ( r µ 4 n 2 1 n 2 2 ) q from the h yperedg es. (b)  n 1 n 2 µ 2  q from the q potential new a , b , and c indices. (c)  n 1 n 2 n 3 m  q from the q potential dist inct triangles. (d)  r · µ 2 n 1 n 2 · n 1 n 2 n 3 m  q − z i   r µ 2 n 3 m  q − z i from the actual number of distinct triangles, the corresponding reduced maximum number of potential new indices, a nd the factors of r which w e take from r # of doubled h y peredges w e bound with no free index (e)  µ n 1  ∆ i a  µ n 2  ∆ i b  µ n 3  ∆ i c 6  µ min { n 1 , n 2 , n 3 }  ∆ i a + ∆ i b + ∆ i c from the actual number of new indices which w e ha v e Putting ev e rything tog ethe r w e obtain  r µ 2 n 3 m  2 q − z i  µ min { n 1 , n 2 , n 3 }  ∆ i a + ∆ i b + ∆ i c Similar arg uments apply if P i + 1  P V W or P V W 2. If P i + 1  P U V W then w e can decom pose the corresponding terms in to t he following parts: (a) ( r µ 6 n 2 1 n 2 2 n 2 3 ) q from the hy peredges. (b)  n 1 n 2 n 3 µ 2  q from the q potential new a , b , a nd c indices. (c)  n 1 n 2 n 3 m  q from the q potential dist inct triangles. (d)  r · µ 3 n 1 n 2 n 3 · n 1 n 2 n 3 m  q − z i   r µ 3 m  q − z i from the actual number of distinct triangles, the corresponding reduced maximum number of potential new indices, a nd the factors of r which w e take from r # of doubled h y peredges w e bound with no free index (e)  µ n 1  ∆ i a  µ n 2  ∆ i b  µ n 3  ∆ i c 6  µ min { n 1 , n 2 , n 3 }  ∆ i a + ∆ i b + ∆ i c from the actual number of new indices which w e ha v e Putting ev e rything tog ethe r w e obtain  r µ 3 m  2 q − z i  µ min { n 1 , n 2 , n 3 }  ∆ i a + ∆ i b + ∆ i c Similar arg uements holds for t he P ′ terms. 51 3. If l ′ , l , b ′ l j , b l j , or c ′ l j , c l j then our remaining terms are as follo ws (a) ( r µ 3 n 1 n 2 n 3 ) 2 q from the hy peredges. (b) n 2 n 3 µ 2  n 1 ( max { n 2 , n 3 }) 2 µ 3  q from the q potent ial a indices and 2 q + 2 potential b or c indices (which must ha v e at least one b index and at least one c index). (c)  n 1 n 2 n 3 m  2 q from the 2 q potential distinct triangles. (d)  r · µ 3 2 √ n 1 max { n 2 , n 3 } · n 1 n 2 n 3 m  2 q − z l − z ′ l 6  r µ 3 / 2 √ n 1 max { n 2 , n 3 } m  2 q − z l − z ′ l from t he actual number of dist inct triangles, the corresponding red uced maximum num- ber of potential new indices, and the factor s of r which w e take from r # of doubled hyperedges we bound with no free index (e)  µ n 1  ∆ l l ′ a  µ max { n 2 , n 3 }  ∆ l b c + ∆ ′ l ′ b c 6  µ min { n 1 , n 2 , n 3 }  ∆ l l ′ a + ∆ l b c + ∆ ′ l ′ b c from the actual num- ber of new indices which w e hav e Putting ev e rything tog ethe r w e obtain n 2 n 3 µ 2  r µ 3 / 2 √ n 1 max { n 2 , n 3 } m  4 q − z l − z ′ l  µ min { n 1 , n 2 , n 3 }  ∆ l l ′ a + ∆ l b c + ∆ ′ l ′ b c 4. In t he special case tha t l ′  l , b ′ l j  b l j , and c ′ l j  c l j , w e hav e the same terms ex cept t hat now there is onl y one b a nd c index and there a re 2 q potential a indices. F ollo wing similar logic w e obtain a bound of n 2 n 3 µ 2  r µ 2 n 1 m  4 q − z l − z ′ l  µ min { n 1 , n 2 , n 3 }  ∆ l l ′ a + ∆ l b c + ∆ ′ l ′ b c  With this le mma in hand, t o show our bound it is suﬃcient to show t hat w e can choose an ordering on the hy peredges such that the number of times w e bound a doubled h yperedge without a free index is at most ∆ + Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i ) B.4 Choosing an order i ng In t his section, w e describe how to choose a good ordering fo r bounding the h yperedges. Lemma B.15. For any structur e for A and B (i ncl uding equalities or i nequalities between a i j , b i j , c i j and a ′ i j , b ′ i j , c ′ i j ) and any intersection pattern, ther e is a way to double the hyperedges using the inequality | a b | 6 x 2 a 2 + 1 2 x b 2 and t h e n bound the doubled hyperedg es one by one so that 1. Af ter doubling the hyperedg es, every index is part of at least one hyperedg e . 2. The number of time s that we bound a doubled hype redg e wit hout a free index is at most ∆ + Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i ) 52 Proo f. T o double t he X -h yperedg es, w e choose pairs of X -h yperedges corresponding to the same triangle. This guarantees us at least one d oubled hyperedg e for ev er y triangle at lev el 0 . W e double a ny remaining X -h yperedges arbitraril y . W e sho w b y induction on i that w e co v er all indices with t hese h yperedges. The base case i  0 is already done. If w e hav e already co v ered all indices at lev el i − 1 then consider the h yperedg es corresponding to the p rojection operat ors P i and P ′ i . All of t hese h yperedges go betw e en a triangle at lev el i − 1 a nd a triangle at lev e l i . W e double pairs of these h yperedg es which correspond to the same triangle at lev el i . This guarantees that for ev ery triangle at lev el i , there is at least one doubled h yperedg e corresponding to it. This h yperedge ma y not co v er all three of the v ertices of the triangle, but if it misses one, this one must be equa l to a v ertex at the lev el below which w as a lrea d y co v ered by assum ption. W e double the remaining h yperedges corresponding to the projection operat ors P i and P ′ i arbitrar ily . When perfo rming this doubling, whenev er the tw o hyperedg es e 1 and e 2 ha v e the same base v alue, w e use t he inequality | e 1 e 2 | 6 e 2 1 + e 2 2 2 . In the rare case when they hav e diﬀerent base values, w e use t he ine quality | e 1 e 2 | 6 v ( e 2 ) 2 v ( e 1 ) e 2 1 + v ( e 1 ) 2 v ( e 2 ) e 2 2 to preser v e the product of the base v alues. N ote that b y this constructio n, f or ev ery triangle at lev el i > 1 , th e re is a doubled h yperedge corresponding to some P i or P ′ i which goes betw een this triangle and a lo w er triangle, but w e don ’ t know which one. W e now describe our ordering on the hy peredges. T o ﬁnd t his ordering, w e consider the follo wing multi-gr aph. Deﬁnition B.16. W e deﬁne the multi-graph G to hav e v ertex set V ( G )  ∪ i j { a i j , b i j , c i j , a ′ i j , b ′ i j , c ′ i j } (with all equalities implied by the intersection pattern, t he struc- ture of the matrices A and B , and the choices f or equalities or ineq ualities betw een the primed indices and unprimed indices.). W e take the edges of G as follo ws. Fo r all i < l and for each distinct triangle ( a i j , b i j , c i j ) or ( a ′ i j , b ′ i j , c ′ i j ) , w e take the ele ments which d o not appear in a higher lev el. If this is true for tw o of t he three eleme nts (which will be the case most of the time) w e take the corresponding edge. If this is true for all three el e ments, w e choose tw o of the m to take a s an ed ge, making t his choice so t hat w e take t he same type of edge for all triangles a t that lev el. If this is only true for one element, w e take a loop on that element. There are tw o cases fo r what happens with i  l 1. If l ′ , l , b ′ l j , b l j , or c ′ l j , c l j then for ev ery triangle ( a l j , b l j , c l j ) w e take the edge ( b l j , c l j ) . If l ′  l then for ev er y triangle ( a ′ l j , b ′ l j , c ′ l j ) w e take the edg e ( b ′ l j , c ′ l j ) 2. If l ′  l , b ′ l j  b l j , and c ′ l j  c l j then w e take loops on ev er y distinct ele ment a l j . W e analyze ∆ in terms of this G . If w e ha v e a ﬁxed budget of edges and w ant to maximize the number of v ertices which w e ha v e, w e w ant to hav e as many connected com ponents as possible and w e want each connected com ponent to hav e the minimal 53 number of edges. W e deﬁne w eights on the connected com p onents of G measuring ho w far they are from satisfying these ideals. Deﬁnition B.17. Giv en a connected component C of G , w e d e ﬁne w e d 1 e ( C ) to be the number of non-loop e dg es it contains plus 1 minus the number of v ertices it contains. Deﬁnition B.18. Giv e n a connected component C of G , w e deﬁne w t r i a n 1 l e ( C ) as follo ws 1. If C does not contain an y b l j , c l j , b ′ l ′ j , or c ′ l ′ j then w e deﬁne w t r i a n 1 l e ( C ) to be the number of distinct triangles whose corresponding edge in G is in C minus 1 . 2. If C is the connected component containing b l j and c l j for all j t hen w e set w t r i a n 1 l e ( C )  0 3. If C is the connected compo nent containing b ′ l ′ j and c ′ l ′ j for all j the n w e set w t r i a n 1 l e ( C ) to be the number of d istinct R l ′ triangles ( a l ′ j , b l ′ j , c l ′ j ) whose corre- sponding edg e in G is in C . 4. If C is a connected component containing some c ′ l ′ j but no b ′ l ′ j (because all of t he b ′ l ′ j appeared at a higher lev el) or vice v ersa, then w e deﬁne w t r i a n 1 l e ( C ) to be the number of distinct triangles whose corresponding edge in G is in C minus 1 . Deﬁnition B.19. W e sa y that a v ertex a i j is bad if 1. The projecto r P i + 1 in v ol v es the v ertex a i j (i.e. w e do no t ha v e the const raint a ( i + 1 ) j  a i j directl y) 2. a i j appears at a higher lev el. W e deﬁne badne ss similarl y for the a ′ , b , b ′ , c , c ′ indices. N ote t hat w e could ha v e a i be bad while a ′ i is not bad ev en if a ′ i  a i (in fact this equality must be true in this case). Lemma B.20. ∆ > Õ C ( w e d 1 e ( C ) + w t r i a n 1 l e ( C )) + Õ i < l : a i j , b i j , or c i j is b ad z i + Õ i < l : a ′ i j , b ′ i j , or c ′ i j is b ad z ′ i Proo f. As discussed abov e, ev er y time a connected com ponent contains an extra edge abo v e what it needs to be connected, this reduces the number of indices w e can hav e by 1. Similar ly , in the optimal case w e hav e one connected component per triangle (with the ex cept ion of t he R l -triangles and perhaps the R ′ l ′ -triangles), so ev er y time a connected com ponent contains an extra triangle (or rather the edge corresponding to t hat triangle), this reduces the number of connected components b y 1. For the remaining ter ms, note t hat if there are bad v ertices, our previous bounds assumed t h a t w e w ould hav e new indices of t hat type but w e do not. The resulting diﬀerence in the bounds is the corresponding z i or z ′ i . N ote that this also w or k s out in t he special case tha t a ′ i  a i , b ′ i  b i , c ′ i  c i . Here w e can view each a index as being half a i and half a ′ i and similar ly for the b and c indices.  54 With this lemma in hand , our str ategy is as f ollo ws. W e choose an ordering on the h yperedges so that each time w e fail to hav e a free index, w e can attr ibute it to one of the terms described a bo v e. W e ﬁrst preprocess our doubled hy peredges so that each h yperedge appears with multiplicity exactl y 2. This requires bounding Í l i  0 ( 2 q − z i ) + Í l ′ i  0 ( 2 q − z ′ i ) doubled hy peredges with no free index. A t this point, there is a one to one correspondence betw e en our doubled h yperedg es and edges of G . No te that this correspondence is somewhat strang e, w e only kno w that each edg e in G is part of the upper lev e l triangle for its corr esponding hyperedg e. W e now describe our procedures for ordering the h yperedges Deﬁnition B.21. W e sa y that a v ertex v is an anchor for an edge e of G if eit her 1. v , e ⊆ { a i j , b i j , c i j } for some i and j and v appears at a higher lev el. 2. v , e ⊆ { a ′ i j , b ′ i j , c ′ i j } for some i and j and v appears at a higher lev el. Deﬁnition B.22. Fo r an anchor v ertex v a n c h o r , deﬁne E i ( v a n c h o r ) to be t he set of all edges at lev el i which hav e v a n c h o r as an anchor v e rtex. Deﬁnition B.23. W e sa y t ha t a v ertex v or edge e is unco v ered if it is not incident wit h an y hy peredges betw e en its lev el and the lev el a bov e a nd cov ered other wise. For a v ertex v which is not part of G at lev el i , w e sa y t ha t v is uncov e red at lev e l i if the re is no j > 0 such that v incident with a h yperedge betw een lev el i + j and i + j + 1 . Deﬁnition B.24. W e sa y that a v ertex v is released at lev el i if there are no h yperedges remaining betw een lev el i and i − 1 whose upper and lo w er triangles both contain v . Our main recursiv e procedure is as fo llow s. W e are considering a collection of con- nected component of the graph at lev el i where ev erything is unco v ered exc ept p ossibl y for one edg e e r . If an edge e r is cov ered and has anchor v e rtex v a n c h o r then w e assume that this collection contains all of E i ( v a n c h o r ) and that v a n c h o r is unco v ered at lev el i . W e ﬁrst consider the case when there are no bad v ertices (w e will consider t he cases where w e hav e bad v ertices af terwar ds). I f G contains a cy cle, w e can delete an ed ge and its corresponding h yperedge to break the cy cle, accounting for this b y decreasing w e d 1 e ( C ) . Other wise, unless C is just t he single edge e r , there must be a v ertex v and edge e in C such that e , e r and e is t he onl y edg e incident with v . W e now consider the h yperedge corresponding to e . If v is p a rt of this h yperedge then w e can de le te e and this h yperedge and continue. Other wise, v must be an anchor v ertex for man y edg es at the lev el below . Moreo v er , v is uncov ered at lev el i − 1 . W e no w consider E i − 1 ( v ) . If E i − 1 ( v ) and ev er ything connected to it is unco v ered excep t f or the edge e ′ r which is the bott om edge of t he h yperedg e corresponding to e , then w e can apply our procedure recursiv ely on E i − 1 ( v ) and ev erything connected to it. Other wise, E i − 1 ( v ) must be connected to E i − 1 ( v ′ a n c h o r ) fo r some other anchor v ertex v ′ a n c h o r which has not y et been released at lev el i . N ote that since the re are no bad v ertices, E i − 1 ( v ) ∩ E i − 1 ( v ′ a n c h o r )  ∅ . Thus, there is a contribution of at lea st 1 to w t r i a n 1 l e of one of these connected com ponents from t he connection betw een E i − 1 ( v ) and E i − 1 ( v ′ a n c h o r ) . Using t his contribution, w e can delete e and continue. Af ter doing this, v is released at lev el i . 55 Remar k B.25 . Whenev er w e ha v e a connectio n betw een E i − 1 for tw o anchor v ertices, w e relase one of them at lev el i immediatel y after taking this connection into account. This ensures that w e do not d ouble count contributions to w t r i a n 1 l e . If w e are left with the single edg e e r then there are sev eral cases. Letting v be t he a nchor v ertex for e r , if v goes down to the lev el belo w then consider the h yperedge corresponding to e r and let e ′ r be its botto m edge. Since w e hav e dele ted all edges in E i ( v ) excep t for e r , either all of E i − 1 ( v ) excep t for e ′ r is uncov e red or E i − 1 ( v ) is connected to E i − 1 ( v ′ a n c h o r ) for a diﬀerent a nchor v ertex v ′ a n c h o r which has not y et been relea sed at lev e l i . In t he ﬁrs t case, w e can apply our recursiv e procedure on E i − 1 ( v ) and all edges connected to it. In the second case, w e instead de le te e r as before and go back to t he lev el abo v e. Ag ain, after doing this, v is no w released at lev el i . If v does not go do wn to the lev el belo w (or w e are already at t he bottom) t hen the h yperedge coresponding to e r contains v . Moreo v er , by our assum ptio n v is uncov ered a t lev el i . Thus, v is a free index for e r so w e can delete e r and go back to the lev el abov e. This procedure will succeed in the case that there are no bad v e rtices. W e now handle bad v ertices b y reducing t o the case where there are no bad v ertices. W e consider the case where are belo w lev el l ′ and w e do not haav e that a ′ i j  a i j , b ′ i j  b i j , and c ′ i j  c i j . W e will handle these cases separatel y . If the a ′ i j are bad v ertices, this must be because of equalities a ′ i j  a i j . W e handle this b y replacing each a ′ i j with a new v ertex and r unning our procudure on t his altered graph. This will cause failures when w e try t o use a ′ i j or a i j as a free inde x. That said, once w e’v e tried to use all but one of a set of equal v ertices, the ﬁnal one will succeed, so the number of additional failures is at most z ′ i . W e can account for this using t he term Í i < l : a ′ i j , b ′ i j , or c ′ i j is bad z ′ i . W e handle bad a i j , b i j , b ′ i j , c i j , c ′ i j v ertices in a similar manner . In the case that a ′ i j  a i j , b ′ i j  b i j , and c ′ i j  c i j , if the a ′ i j and b i j are bad v ertices, this must be because of the equalities a ′ i j  a i j and b ′ i j  b i j . W e handle this by creating a new v ertex for each a ′ i j , having the hy peredges betw ee n lev els i and i + 1 use t he old v ertices, and ha ving the h yperedges at low er lev els use the new v e rtices. W e modify G so that instead of loops at lev el i , t he edges in v ol v e these new v ertices. This makes it so that t he onl y anchor v ertices for edges at lev el i are the v ertices b ′ ( i + 1 ) j . Since all edges of G now ha v e a unique anchor , the recursiv e procedure succeeds. W e can accom plish t his wit h the terms Í i < l : a i j , b i j , or c i j is bad z i + Í i < l : a ′ i j , b ′ i j , or c ′ i j is bad z ′ i . W e consider lev el l ′ separatel y . If the bot tom of lev el l ′ contains bad v ertices, w e cannot make t hese v e rtices distinct. Ho w ev er , if this happens then w e hav e loops in G for the bott om triangles at lev el l ′ . These triangles are distinct from the triangles on top at lev el l ′ . W e handle t his b y using w t r i a n 1 l e to dele te edges from G at this lev el so that ea ch com ponent contains at most one loop. When w e run the procedure, w e can use w t r i a n 1 l e when E i − 1 ( v ) is connected to a loop as w ell a s when it is connected to E i − 1 ( v ′ a n c h o r ) for some other anchor v ertex v ′ a n c h o r which has not bee n relea sed at lev el i . This allows us to process each compo nent of G at lev el i until w e are left with either a co v ered edge or a single loop, bo th of which can be handled by our procedure.  56 B.5 Counting i nter section patt er ns and ra ndom par titioning There are tw o pieces lef t to add. Firs t, all of our analys is so far was for a giv en intersection pattern. W e must sum o v er all intersection patterns. Lemma B.26. F or all i , t here are at most  2 q z i  ( z i ) 2 q − z i 6 2 2 q ( 2 q ) 2 q − z i choices for which R i -triang le s are equal to each other . Proo f. T o specify a partition of the 2q R i -triangles into z i parts, w e specify which triangles are distinct from all previous triangles. There are  2 q z i  choices for which triangles these are. F or the remaining triangles, w e specify which previous triangle they are equal to. There are at m ost ( z i ) 2 q − z i choices for this.  In our bound, w e can group this with the other fact ors corresponding to the R i -triangles. Since w e take q to be O ( l o 1 n ) , this is ﬁne as m has a l o 1 ( n ) fact or . Second, w e just ify our assump tion that 1. For all i ∈ [ 0 , l ′ ] , w e either hav e that a ′ i j  a i j for all j ∈ [ 1 , 2 q ] or a ′ i j , a i j for a ll j ∈ [ 1 , 2 q ] . 2. For all i ∈ [ 0 , l ′ ] , w e eithe r ha v e t ha t b ′ i j  b i j for a ll j ∈ [ 1 , 2 q ] or b ′ i j , b i j for all j ∈ [ 1 , 2 q ] . 3. For all i ∈ [ 0 , l ′ ] , w e either ha v e that c ′ i j  c i j for all j ∈ [ 1 , 2 q ] or c ′ i j , c i j for a ll j ∈ [ 1 , 2 q ] . T o achiev e this, ins tead of looking at the entire ma trix Í a A a ⊗ B T a , w e split it into parts based on t he equalities/inequalities w e’r e looking at. T o obtain the case where indices a and a ′ are alwa ys e qual,w e just restrict oursel v es in Í a A a ⊗ B T a to the terms where this is the case. T o obtain the case where indices a and a ′ are nev er equal, w e choose a random partition V , V c of the indices and rest rict ourselv es in Í a A a ⊗ B T a to the terms where a ∈ V and a ′ ∈ V c . If t he re are multiple indices t hat w e wish to for k o v er , w e app ly this arg ument to ea ch one (choosing the v ertex partitions independently ). This cons truction has the property that if w e take the expectation ov er all the possible v ertex partitions, w e obtain a constant times the part of Í a A a ⊗ B T a w e are interested in. Using t his, it can be shown that probabilistic nrom bounds on these rest ricted matrices impl y probabilis tic nor m bounds on the original matrix. F or de tails, see Lemma 27 of “B ounds on the Norms of Unif orm Lo w Degree Graph Matrices”. From t he abov e subsections, we hav e probabilistic norm bounds on t he restrict ed matrices and the result follo ws. B.6 Ot her Cross T er ms In this subsection, w e sketch ho w the argument diﬀers when B  X rather than B  ¯ R Ω 0 X or B  P ′ 0 ¯ R Ω 0 X . 57 Theorem B.27. Th e re is an absolute c onstan t C such that for an y α > 1 and β > 0 , P r " | | Õ a A a ⊗ X T | | > α −( l + 1 ) # < n − β as long as 1. r µ 6 min { n 1 , n 2 , n 3 } 2. m > C α β µ 3 2 r √ n 1 max { n 2 , n 3 } l o 1 ( max { n 1 , n 2 , n 3 } ) 3. m > C α β µ 2 r max { n 1 , n 2 , n 3 } l o 1 ( max { n 1 , n 2 , n 3 } ) Proo f sketch: The terms from X directly are 2 q Ö j  1 X a ′ 0 j b ′ 0 j c ′ 0 j  2 q Ö j  1 ©  « Õ i j u i j a ′ 0 j v i j b ′ 0 j w i j c ′ 0 j ª ® ¬ N ote that the R Ω fact ors are com pletely inde p e ndent of the b ′ 0 j and c ′ 0 j indices. Thus, w e can sum ov er the b ′ 0 j and c ′ 0 j indices ﬁrs t. When w e do, this zeros out all terms excep t the ones where all of the i j are equal. Moreov er , a ll of the v and w terms sum to 1 . The u i j terms can be bounded by  µ n 1  q . W e now com pare the bound w e had before with the bound w e ha v e here. F or ¯ R Ω 0 X w e had facto rs 1. ( r µ 3 n 1 n 2 n 3 ) 2 q from t he X -hy peredges. 2. n 2 n 3 µ 2  n 1 ( max { n 2 , n 3 }) 2 µ 3  q from the q potential a ind ice s and 2 q + 2 potential b or c indices. 3.  n 1 n 2 n 3 m  2 q from the 2 q potential distinct triangles. 4.  r · µ 3 2 √ n 1 max { n 2 , n 3 } · n 1 n 2 n 3 m  2 q − z l − z ′ l 6  r µ 3 / 2 √ n 1 max { n 2 , n 3 } m  2 q − z l − z ′ l from the actual number of distinct triangles, the corresponding reduced maximum num- ber of po tential new indices, and the fact ors of r which w e take from r # of doubled h y peredges w e bound with no free index 5.  µ n 1  ∆ l l ′ a  µ max { n 2 , n 3 }  ∆ l b c + ∆ ′ l ′ b c 6  µ min { n 1 , n 2 , n 3 }  ∆ l l ′ a + ∆ l b c + ∆ ′ l ′ b c from t he a ctual number of new indices which w e ha v e W e now ha v e the follo wing fact ors instead: 1. ( r µ 3 n 1 n 2 n 3 ) q r ( µ n 1 ) q from the X -h ype redges. 2. max { n 2 , n 3 } µ  n 1 max { n 2 , n 3 } µ 2  q from the q p otential a in d ices and q + 1 potent ial b or c indices. 58 3.  n 1 n 2 n 3 m  q from t he q potential d istinct triangles. 4.  r · µ 2 n 1 max { n 2 , n 3 } · n 1 n 2 n 3 m  q − z l 6  r µ 2 max { n 2 , n 3 } m  q − z l from the actual number of distinct triangles, the corresponding reduced ma ximum number of po tential new indices, and t he factor s of r which w e take from r # of doubled h y peredges w e bound with no free index 5.  µ n 1  ∆ l l ′ a  µ max { n 2 , n 3 }  ∆ l b c + ∆ ′ l ′ b c 6  µ min { n 1 , n 2 , n 3 }  ∆ l l ′ a + ∆ l b c + ∆ ′ l ′ b c from t he a ctual number of new indices which w e ha v e The diﬀerence is in the ﬁrst t hree terms, grouping the se terms tog ethe r giv es r max { n 2 , n 3 } µ  r µ 2 max { n 2 , n 3 } m  q By our assum ption, m > C r µ 2 max { n 2 n 3 } l o 1 ( n ) 2 so w e are ﬁne.  59

Exact tensor completion with sum-of-squares

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment