Fast Exact Matrix Completion with Finite Samples

F ast Exact Matrix Completion with Finite Samples Prateek Jain ∗ Praneeth Netrapalli † Abstract Matrix completion is the problem of reco v ering a lo w rank matrix by observing a small fraction of its en tries. A series of recen t w orks [Kes12, JNS13, Har14] ha v e prop osed fast non- con v ex optimization based iterative algorithms to solve this problem. Ho w ever, the sample complexit y in all these results is sub-optimal in its dep endence on the rank, condition num ber and the desired accuracy . In this pap er, w e presen t a fast iterativ e algorithm that solv es the matrix completion problem b y observing O  nr 5 log 3 n  en tries, which is independent of the condition n um b er and the desired accuracy . The run time of our algorithm is O  nr 7 log 3 n log 1 /  whic h is near linear in the dimension of the matrix. T o the b est of our knowledge, this is the ﬁrst near linear time algorithm for exact matrix completion with ﬁnite sample complexity (i.e. indep enden t of  ). Our algorithm is based on a well kno wn pro jected gradient descen t metho d, where the pro jection is onto the (non-conv ex) set of lo w rank matrices. There are t w o key ideas in our result: 1) our argumen t is based on a ` ∞ norm p otential function (as opp osed to the sp ectral norm) and pro vides a nov el wa y to obtain p erturbation b ounds for it. 2) we pro v e and use a natural extension of the Davis-Kahan theorem to obtain p erturbation b ounds on the b est lo w rank appro ximation of matrices with goo d eigen gap. Both of these ideas ma y b e of indep endent in terest. ∗ Microsoft Researc h, India. Email: pra jain@microsoft.com † Microsoft Researc h, Cambridge MA. Email: praneeth@microsoft.com (Part of the work done while a student at UT Austin and interning at MSR India.) 1 1 In tro duction In this pap er, w e study the problem of low-rank matrix completion (LRMC) where the goal is to reco v er a low-rank matrix by observing a tiny fraction of its entries. That is, giv en M = { M ij , ( i, j ) ∈ Ω } , where M ∈ R n 1 × n 2 is an unkno wn rank- r matrix and Ω ⊆ [ n 1 ] × [ n 2 ] is the set of observ ed indices, the goal is to reco ver M . An optimization version of the problem can be posed as follows: ( LRM C ) : min X k P Ω ( X − M ) k 2 F , s.t. r ank ( X ) ≤ r, (1) where P Ω ( A ) is deﬁned as: P Ω ( A ) ij = ( A ij , if ( i, j ) ∈ Ω , 0 , otherwise . (2) LRMC is b y no w a well studied problem with applications in several machine learning tasks such as collab orativ e ﬁltering [BK07], link analysis [GL11], distance embedding [CR09] etc. Motiv ated b y widespread applications, sev eral practical algorithms ha ve been prop osed to solve the problem (heuristically) [RR13, HCD12]. On the theoretical front, the non-conv ex rank constraint implies NP-hardness in general [HMR W14]. Ho w ev er, under certain (by no w) standard assumptions, a few algorithms hav e b een shown to solv e the problem eﬃciently . These approaches can b e categorized into the follo wing tw o broad groups: a) The ﬁrst approach relaxes the rank constrain t in (1) to a trace norm constraint (sum of singular v alues of X ) and then solves the resulting con v ex optimization problem [CR09]. [CT09, Rec09] sho w ed that this approach has a near optimal sample complexit y (i.e. the num b er of observ ed en tries of M ) of | Ω | = O  r n log 2 n  , where w e abbreviate n = n 1 + n 2 . How ever, current iterativ e algorithms used to solve the trace-norm constrained optimization problem require O  n 2  memory and O  n 3  time p er iteration, whic h is prohibitiv e for large-scale applications. b) The second approac h is based on an empirically p opular iterative technique called Alternating Minimization (AltMin) that factorizes X = U V > where U , V ha v e r columns, and the algorithm alternately optimizes o ver U and V holding the other ﬁxed. Recen tly , [Kes12, JNS13, Har14, HW14] sho w ed con vergence of v arian ts of this algorithm. The b est kno wn sample complexit y results for AltMin are the incomparable bounds | Ω | = O  r κ 8 n log n   and | Ω | = O  p oly ( r ) (log κ ) n log n   due to [Kes12] and [HW14] resp ectively . Here, κ = σ 1 ( M ) /σ r ( M ) is the condition num b er of M and  is the desired accuracy . The computational cost of these metho ds is O  | Ω | r + nr 3  p er iteration, making these metho ds very fast as long as the condition n um ber κ is not too large. Of the ab o v e tw o approac hes AltMin is known to b e the most practical and runs in near linear time. Ho w ev er, its sample complexit y as w ell as computational complexit y dep end on the condition n um b er of M which can b e arbitrarily large. Moreov er, for “exact” reco v ery of M , i.e., with error  = 0, the metho d requires inﬁnitely many samples (or rather to observ e the entire matrix). The dep endence of sample complexit y on the desired accuracy arises due to the use of indep endent sam- ples in eac h iteration, which in turn is necessitated b y the fact that using the same samples in each iteration leads to complex dependencies among iterates whic h are hard to analyze. Nev ertheless, practitioners ha v e b een using AltMin with same samples in eac h iteration successfully in a wide range of applications. 2 Our results : In this paper, w e address this issue b y prop osing a new algorithm called Stagewise- SVP ( St-SVP ) and sho wing that it solv es the matrix completion problem exactly with a sample complexit y | Ω | = O  nr 5 log 3 n  , which is indep endent of b oth the condition n umber, and desired accuracy and time complexit y p er iteration O  | Ω | r 2  , which is near linear in n . The basic blo ck of our algorithm is a simple pro jected gradient descent step, ﬁrst prop osed by [JMD10] in the context of this problem. More precisely , giv en the t th iterate X t , [JMD10] prop osed the following update rule, which they call singular v alue pro jection (SVP). ( S V P ) : X t +1 = P r  X t + n 1 n 2 | Ω | P Ω ( M − X t )  , (3) where P r is the pro jection on to the set of rank- r matrices and can b e eﬃcien tly computed using singular v alue decomp osition (SVD). Note that the SVP step is just a pro jected gradient descent step where the pro jection is onto the (non-con v ex) set of lo w rank matrices. [JMD10] show ed that despite inv olving pro jections on to a non-conv ex set, SVP solv es the related problem of lo w- rank matrix sensing, where instead of observing elements of the unknown matrix, we observe dense linear measuremen ts of this matrix. How ever, their result does not extend to the matrix completion problem and the correctness of SVP for matrix completion w as left as an op en question. Our preliminary result resolv es this question by sho wing the correctness of SVP for the matrix completion problem, albeit with a sample complexit y that dep ends on the condition num b er and desired accuracy . W e then dev elop a stage-wise v ariant of this algorithm, where in the k th stage, we try to reco ver P k ( M ), there b y getting rid of the dep endence on the condition n um b er. Finally , in eac h stage, we use independent samples for log n iterations, but use same samples for the remaining iterations, there by eliminating the dependence of sample complexity on  . Our analysis relies on t wo key nov el techniques that enable us to understand SVP style pro jected gradien t metho ds even though the pro jection is onto a non-c onvex set. First, w e consider ` ∞ norm of the error X t − M as our p otential function, instead of its sp ectral norm that most existing analysis of matrix completion use. In general, bounds on the ` ∞ norm are muc h harder to obtain as pro jection via SVD is optimal only in the sp ectral and F rob enius norms. W e obtain ` ∞ norm b ounds by writing do wn explicit eigenv ector equations for the low rank pro jection and using this to con trol the ` ∞ norm of the error. Second, in order to analyze the SVP up dates with same samples in eac h iteration, we prov e and use a natural extension of the Davis-Kahan theorem. This extension b ounds the p erturbation in the best rank- k approximation of a matrix (with large enough eigen-gap) due to any additive p erturbation; despite this b eing a very natural extension of the Davis-Kahan theorem, to the b est of our knowledge, it has not b een considered b efore. W e believe b oth of the ab o v e techniques should b e of indep endent in terest. P ap er Organization : W e ﬁrst presen t the problem setup, our main result and an ov erview of our tec hniques in the next section. W e then presen t a “warm-up” result for the basic SVP metho d in Section 3. W e then presen t our main algorithm (St-SVP) and its analysis in Section 4. W e conclude the discussion in Section 5. The pro ofs of all the tec hnical lemmas will follow thereafter in the app endix. Notation : W e denote matrices with b oldface capital letters ( M ) and vectors with b oldface letters ( x ). m i denotes the i th column and M ij denotes the ( i, j ) th en try resp ectively of M . SVD and EVD stand for the singular v alue decomp osition and eigen v alue decomp osition respectively . P k ( A ) denotes the pro jection of A onto the set of rank- k matrices. That is, if A = U Σ V > is the SVD 3 of A , then P k ( A ) = U k Σ k V > k where U k ∈ R n 1 × k and V k ∈ R n 2 × k are the k left and righ t singular v ectors respectively of A corresp onding to the k largest singular v alues σ 1 ≥ σ 2 ≥ · · · ≥ σ k . k u k q denotes the ` q norm of u . W e denote the op erator norm of M by k M k 2 = max u , k u k 2 =1 k M u k 2 . In general, k α k 2 denotes the ` 2 norm of α if it is a v ector and the operator norm of α if it is a matrix. k M k F denotes the F rob enius norm of M . 2 Our Results and T ec hniques In this section, we will ﬁrst describ e the problem set up and then present our results as w ell as the main techniques w e use. 2.1 Problem Setup Let M b e an n 1 × n 2 matrix of rank- r . Let Ω ⊆ [ n 1 ] × [ n 2 ] b e a subset of the indices. Recall that P Ω ( M ) (as deﬁned in (2)) is the pro jection of M on to the indices in Ω. Giv en Ω , P Ω ( M ) and r , the goal is to reco v er M . The problem is in general ill p osed, so w e make the follo wing standard assumptions on M and Ω [CR09]. Assumption 1 ( Incoherence ) . M ∈ R n 1 × n 2 is a r ank- r , µ -inc oher ent matrix i.e., max i k e T i U ∗ k 2 ≤ µ √ r √ n 1 and max j k e T j V ∗ k 2 ≤ µ √ r √ n 2 , wher e M = U ∗ Σ V ∗ > is the singular value de c omp osition of M . Assumption 2 ( Uniform sampling ) . Ω is gener ate d by sampling e ach element of [ n 1 ] × [ n 2 ] indep endently with pr ob ability p . The incoherence assumption ensures that the mass of the matrix is well spread out and a small fraction of uniformly random observ ations giv e enough information ab out the matrix. Both of the ab ov e assumptions are standard and are used b y most of the existing results, for instance [CR09, CT09, KMO10, Rec09, Kes12]. A few exceptions include the works of [MJD09, CBSW14, BJ14]. 2.2 Main Result The following theorem is the main result of this pap er. Theorem 1. Supp ose M and Ω satisfy Assumptions 1 and 2 r esp e ctively. Also, let E [ | Ω | ] ≥ C αµ 4 r 5 n log 3 n, wher e α > 1 , n := n 1 + n 2 and C > 0 is a glob al c onstant. Then, the output c M of Algorithm 2 satisﬁes:    c M − M    F ≤ , with pr ob ability gr e ater than 1 − n − 10 − log α . Mor e over, the run time of A lgorithm 2 is O  | Ω | r 2 log(1 / )  . Algorithm 2 is based on the pro jected gradient descen t up date (3) and pro ceeds in r stages where in the k -th stage, pro jections are p erformed on to the set of rank- k matrices. See Section 4 for a detailed description and the underlying intuition b ehing our algorithm. T able 1 compares our result to that for nuclear norm minimization, whic h is the only other p oly- nomial time metho d with ﬁnite sample complexit y guarantees (i.e. no dep endence on the desired 4 accuracy  ). Note that St-SVP runs in time near linear in the ambien t dimension of the matrix ( n ), where as nuclear norm minimization runs in time cubic in the am bien t dimension. How ever, the sample complexit y of St-SVP is sub optimal in its dep endence on the incoherence parameter µ and rank r . W e b eliev e closing this gap b etw een the sample complexit y of St-SVP and that of nuclear norm minimization should b e p ossible and leav e it for future work. Sample complexity Comp. complexity Nuclear norm minimization [Rec09] O  µ 2 r n log 2 n  O  n 3 log 1   St-SVP (This pap er) O  µ 4 r 5 n log 3 n  O  µ 4 r 7 n log 3 n log (1 / )  T able 1: Comparison of our result to that for nuclear norm minimization. 2.3 Ov erview of T ec hniques In this section, w e brieﬂy presen t the k ey ideas and lemmas we use to pro ve Theorem 1. Our pro of rev olv es around analyzing the basic SVP step (3): X t +1 = P k  X t + 1 p P Ω ( M − X t )  = P k ( M + b H ) where p is the sampling probability , b H := X t − M − 1 p P Ω ( X t − M ) = E − 1 p P Ω ( E ) and E := X t − M is the error matrix. Hence, X t +1 is given by a rank- k pro jection of M + b H , which is a perturbation of the desired matrix M . Bounding the ` ∞ norm of errors : As the SVP update is based on pro jection on to the set of rank- k matrices, a natural p oten tial function to analyze w ould be k E k 2 or k E k F . How ever, suc h a p oten tial function requires b ounding norms of E − 1 p P Ω ( E ) which in turn would require us to show that E is incoheren t. This is the approac h tak en b y papers on AltMin [Kes12, JNS13, Har14]. In con trast, in this paper, we consider k E k ∞ as the p oten tial function. So the goal is to sho w that    P k  M + b H  − M    ∞ is m uc h smaller than k E k ∞ . Unfortunately , standard p erturbation results suc h as the Da vis-Kahan theorem provide b ounds on sp ectral, F rob enius or other unitarily inv arian t norms and do not apply to the ` ∞ norm. In order to carry out this argument, w e write the singular vectors of M + b H as solutions to eigen v ector equations and then use these to write X t +1 explicitly via T aylor series expansion. W e use this technique to prov e the following more general lemma. Lemma 1. Supp ose M ∈ R n × n is a symmetric matrix satisfying Assumption 1. L et σ 1 ≥ · · · ≥ σ r denote its singular values. L et H ∈ R n × n b e a r andom symmetric matrix such that e ach H ij is indep endent with E [ H ij ] = 0 and E [ | H ij | a ] ≤ 1 /n for 2 ≤ a ≤ log n . Then, for any α > 1 and | β | ≤ σ k 200 √ α we have: k M − P k ( M + β H ) k ∞ ≤ µ 2 r 2 n  σ k +1 + 15 | β | √ α log n  , with pr ob ability gr e ater than 1 − n − 10 − log α . Pro ceeding in stages : If w e applied Lemma 1 with k = r , w e would require | β | to be m uch smaller than σ r . Now, β can be though t of as β ≈ q n p k E k ∞ . If w e start with X 0 = 0, we ha ve E = − M , 5 and so k E k ∞ = k M k ∞ ≤ σ 1 µ 2 r n . T o make β ≤ σ r , w e w ould need the sampling probability p to b e quadratic in the condition n um b er κ = σ 1 /σ r . In order to ov ercome this issue, we perform SVP in r stages with the k th stage p erforming pro jections on to the set of rank- k matrices while main taining the in v ariant that at the end of ( k − 1) th stage, k E k ∞ = O ( σ k /n ). This lets us c ho ose a p independent of κ while still ensuring β ≈ q n p k E k ∞ ≤ σ k . Lemma 1 tells us that at the end of the k th stage, the error k E k ∞ is O  σ k +1 n  , there b y establishing the inv ariant for the ( k + 1) th stage. Using same samples : In order to reduce the error from O  σ k n  to O  σ k +1 n  , the k th stage w ould require O  log σ k σ k +1  iterations. Since Lemma 1 requires the elements of H to b e independent, in order to apply it, w e need to use fresh samples in each iteration. This means that the sample complexit y increases with σ k σ k +1 , or the desired accuracy  if  < σ k +1 . This problem is faced by all the existing analysis for iterativ e algorithms for matrix completion [Kes12, JNS13, Har14, HW14]. W e tac kle this issue by observing that when M is ill conditioned and k E k F is very small, w e can sho w a decay in k E k F using the same samples for SVP iterations: Lemma 2. L et M and Ω b e as in The or em 1 with M b eing a symmetric matrix. F urther, let M b e il l c onditione d in the sense that k M − P k ( M ) k F < σ k n 3 , wher e σ 1 ≥ · · · ≥ σ r ar e the singular values of M . Then, the fol lowing holds for al l r ank- k X s.t. k X − P k ( M ) k F < σ k n 3 (w.p. ≥ 1 − n − 10 − α ): k X + − P k ( M ) k F ≤ 1 10 k X − P k ( M ) k F + 1 p k M − P k ( M ) k F , wher e X + := P k  X − 1 p P Ω ( X − M )  denotes the r ank- k SVP up date of X and p = E [ | Ω | ] /n 2 = C αµ 4 r 5 log 3 n n is the sampling pr ob ability. The following lemma plays a crucial role in pro ving Lemma 2. It is a natural extension of the Da vis-Kahan theorem for singular vector subspace p erturbation. Lemma 3. Supp ose A is a matrix such that σ k +1 ( A ) ≤ 1 4 σ k ( A ) . Then, for any matrix E such that k E k F < 1 4 σ k ( A ) , we have: k P k ( A + E ) − P k ( A ) k F ≤ c  √ k k E k 2 + k E k F  , for some absolute c onstant c . In contrast to the Da vis-Kahan theorem, which establishes a b ound on the p erturbation of the space of singular vectors, Lemma 3 establishes a b ound on the p erturbation of the b est rank- k appro ximation of a matrix A with go o d eigen gap, under small perturbations. This is a v ery natural quantit y while considering p erturbations of lo w rank approximations, and we b eliev e it ma y ﬁnd applications in other scenarios as well. A ﬁnal remark regarding Lemma 3: we susp ect it migh t b e p ossible to tighten the right hand side of the result to c min  √ k k E k 2 , k E k F  , but hav e not b een able to prov e it. 6 Algorithm 1 SVP for matrix completion 1: Input : Ω , P Ω ( M ) , r,  2: T ← log ( n 1 + n 2 ) k M k ∞  3: P artition Ω randomly into T subsets { Ω t : t ∈ [ T ] } 4: X t ← 0 5: for t ← 1 , · · · , T do 6: X t ← P r  X t − 1 − n 1 n 2 | Ω t | P Ω t ( X t − 1 − M )  7: end for 8: Output : X T 3 Singular V alue Pro jection Before we go on to prov e Theorem 1, in this section we will analyze the basic SVP algorithm (Algorithm 1), b ounding its sample complexity and thereby resolving a question p osed b y Jain et al. [JMD10]. This analysis also serv es as a warm-up exercise for our main result and brings out the k ey ideas in analyzing the ` ∞ norm p oten tial function while also highligh ting some issues with Algorithm 1 that w e will ﬁx later on. As is clear from the pseudoco de in Algorithm 1, SVP is a simple pro jected gradien t descen t method for solving the matrix completion problem. Note that Algorithm 1 ﬁrst splits the set Ω in to T random subsets and up dates iterate X t using Ω t . This step is critical for analysis as it ensures that Ω t is indep endent of X t − 1 , allowing for the use of standard tail b ounds. The follo wing theorem is our main result for Algorithm 1: Theorem 2. Supp ose M and Ω satisfy Assumptions 1 and 2 r esp e ctively with E [ | Ω | ] ≥ C αµ 4 κ 2 r 5 n  log 2 n  T , wher e n = n 1 + n 2 , α > 1 , κ =  σ 1 σ r  with σ 1 ≥ · · · ≥ σ r denoting the singular values of M , T = log 100 µ 2 r k M k 2  and C > 0 is a lar ge enough glob al c onstant. Then, the output of Algorithm 1 satisﬁes (w.p. ≥ 1 − T min( n 1 , n 2 ) − 10 − log α ): k X T − M k F ≤  Pr o of. Using a standard dilation argument (Lemma 4), it suﬃces to pro v e the result for symmetric matrices. Let p = E [ | Ω t | ] n 2 = E [ | Ω | ] n 2 T b e the probabilit y of sampling in each iteration. Now, let E = X t − 1 − M and b H = E − 1 p P Ω t ( E ). Then, the SVP up date (line 6 of Algorithm 1) is given by: X t = P r ( M + b H ). Since Ω t is sampled uniformly at random, it is easy to chec k that E [ b H ij ] = 0 and E h    b H ij    s i ≤ β s /n where β = 2 √ n k E k ∞ √ p ≤ 2 µ 2 rσ 1 √ np (Lemma 5). By our c hoice of p , we ha v e β < σ r 200 √ α . Applying Lemma 1 with k = r , we hav e k X t − M k ∞ ≤ 15 µ 2 r 2 n β √ α log n ≤ (1 / 30 C ) k E k ∞ = 1 2 k X t − 1 − M k ∞ , where the last inequality is obtained b y selecting C large enough. The theorem is immediate from this error decay in each step. 7 Algorithm 2 Stagewise SVP (St-SVP) for matrix completion 1: Input : Ω , P Ω ( M ) , , r 2: T ← log 100 µ 2 r k M k 2  3: P artition Ω into r log n subsets { Ω k,t : k ∈ [ r ] , t ∈ [log n ] } uniformly at random 4: k ← 1, X k, 0 ← 0 5: for k ← 1 , · · · , r do 6: /* Stage- k */ 7: for t = 1 , · · · , log n do 8: X k,t ← P GD ( X k,t − 1 , P Ω k,t ( M ) , Ω k,t , k ) /* SVP Step with r e-sampling*/    Step I 9: end for 10: if σ k +1  GD  X k, log n , P Ω k, log n ( M ) , Ω k, log n  > σ k ( X k, log n ) n 2 then 11: X k +1 , 0 ← X k,T /* Initialize for next stage and c ontinue*/    Step I I 12: con tin ue 13: end if 14: for t = log n + 1 , · · · , log n + T do 15: X k,t ← P GD ( X k,t − 1 , P Ω ( M ) , Ω , k ) /* SVP Step without r e-sampling */    Step I I I 16: end for 17: for t = log n + T + 1 , · · · , log n + T + log n do 18: X k,t ← P GD ( X k,t − 1 , P Ω k,t ( M ) , Ω k,t , k ) /* SVP Step with r e-sampling */    Step IV 19: end for 20: X k +1 , 0 ← X k,t /* Initialization for next stage */ 21: Output : X k,t if σ k +1  GD ( X k,t − 1 , P Ω k,t ( M ) , Ω k,t )  <  10 µ 2 r 22: end for Sub-routine 3 Pro jected Gradient Descen t (PGD) 1: Input : X ∈ R n 1 × n 2 , P Ω ( M ) , Ω , k 2: Output : X next ← P k ( X − n 1 n 2 | Ω | P Ω ( X − M )) Sub-routine 4 Gradien t Descent (GD) 1: Input : X ∈ R n 1 × n 2 , P Ω ( M ) , Ω 2: Output : X next ← X − n 1 n 2 | Ω | P Ω ( X − M ) 4 Stagewise-SVP Theorem 2 is sub optimal in its sample complexit y dep endence on the rank, condition num b er and desired accuracy . In this section, w e will ﬁx t wo of these issues – the dep endence on condi- tion num b er and desired accuracy – by designing a stagewise v ersion of Algorithm 1 and pro ving Theorem 1. Our algorithm, St-SVP (pseudo co de presen ted in Algorithm 2) runs in r stages, where in the k th stage, the pro jection is onto the set of r ank- k matric es . In each stage, the goal is to obtain an appro ximation of M up to an error of σ k +1 . In order to do this, we use the basic SVP updates, but in a very speciﬁc wa y , so as to av oid the dep endence on condition num b er and desired accuracy . 8 • (Step I) Apply SVP up date with fresh samples for log n iterations : Run log n steps of SVP up date (3), with fresh samples in each iteration. Using fresh samples allows us to use Lemma 1 ensuring that the ` ∞ norm of the error b et w een our estimate and M deca ys to k X k, log n − M k ∞ = O  1 n  σ k +1 + σ k n 3  . • (Step I I) Determine if σ k +1 > σ k n 3 : Note that w e can determine this, by using the ( k + 1) th singular v alue of the matrix obtained after the gradient step, i.e., σ k +1 ( X k, log n − 1 p P Ω k, log n ( X k, log n − M )). If true, the error k X k, log n − M k ∞ = O  σ k +1 n  , and so the algorithm pro ceeds to the ( k + 1) th stage. • (Step II I) If not (i.e., σ k +1 ≤ σ k n 3 ), apply SVP up date for T = log 1  iterations with same samples : If σ k +1 ≤ σ k n 3 , w e can use Lemma 2 to conclude that after log 1  iterations, the F robenius norm of error is k X k, log n + T − M k F = O ( nσ k +1 +  ). • (Step IV) Apply SVP up date with fresh samples for log n iterations : T o set up the in v arian t k X k +1 , 0 − M k ∞ = O ( σ k +1 /n ) for the next stage, we wish to con v ert our F robe- nius norm bound k X k, log n + T − M k F = O ( nσ k +1 ) to an ` ∞ b ound k X k, 2 log n + T − M k ∞ = O  σ k +1 n  . Since σ k +1 < σ k n 3 , we can b ound the initial F robenius error b y O  1 n   1 2  b T σ k + σ k +1  for some b T = O  log σ k n 2 σ k +1  . As in Step I, after log n SVP up dates with fresh samples, Lemma 1 lets us conclude that k X k, 2 log n + T − M k ∞ = O  σ k +1 n  , setting up the in v arian t for the next stage. 4.1 Analysis of St-SVP (Pro of of Theorem 1) W e will no w present a pro of of Theorem 1. Pr o of of The or em 1. Just as in Theorem 2, it suﬃces to pro ve the result for when M is symmet- ric.F or ev ery stage, we will establish the follo wing inv ariant: k X k, 0 − M k ∞ < 4 µ 2 r 2 n σ k +1 . (4) W e will use induction. (4) clearly holds for the base case k = 1. Now, supp ose (4) holds for the k th stage, w e will prov e that it holds for the ( k + 1) th stage. The analysis follows the four step outline in the previous section: Step I : Here, w e will sho w that for every iteration t , w e ha v e: k X k,t − M k ∞ < 4 µ 2 r 2 n γ k,t , where γ k,t := σ k +1 +  1 2  t − 1 σ k . (5) (5) holds for t = 0 b y our induction h yp othesis (4) for the k -th stage. Supp osing it true for iteration t , we will sho w it for iteration t + 1. The ( t + 1) th iterate is given b y: X k,t +1 = P k ( M + β H ) , where H = 1 β  E − 1 p P Ω k,t ( E )  , E = X k,t − M , (6) 9 p = E [ | Ω k,t | ] n 2 = C αµ 4 r 4 log 2 n n , and β = 2 √ n k E k ∞ √ p ≤ 8 µ 2 r 2 γ k,t √ n · p . Our hypothesis on the sample size tells us that β ≤ | σ k | / (200 √ α ) and Lemma 5 tells us that H satisﬁes the h yp othesis of Lemma 1. So w e hav e: k X k,t +1 − M k ∞ < µ 2 r 2 n  σ k +1 + 15 β √ α log n  < µ 2 r 2 n  σ k +1 + 1 9 γ k,t  ≤ 10 µ 2 r 2 9 n γ k,t +1 . This prov es (5). Hence, after log n steps, we ha v e: k X k, log n − M k ∞ < 10 µ 2 r 2 9 n  σ k n 3 + σ k +1  . (7) Step I I : Let G := X k, log n − 1 p P Ω k, log n ( X k, log n − M ) = M + β H b e the gradien t up date with notation as ab o v e. A standard p erturbation argumen t (Lemmas 7 and 8) tells us that: k G − M k 2 < 3 β √ α ≤ 24 µ 2 r 2 γ k, log n √ np < 1 100  σ k n 3 + σ k +1  . So if σ k +1 ( G ) > σ k ( G ) n 3 , then we hav e σ k +1 > 9 σ k 10 n 3 . Since w e mo v e on to the next stage with X k +1 , 0 = X k, log n , (7) tells us that: k X k +1 , 0 − M k ∞ = k X k, log n − M k ∞ ≤ 10 µ 2 r 2 9 n  σ k n 3 + σ k +1  ≤ 2 µ 2 r 2 n (2 σ k +1 ) , sho wing the inv arian t for the ( k + 1) th stage. Step II I : On the other hand, if σ k +1 ( G ) ≤ σ k ( G ) n 3 , then Lemmas 8 and 8 tell us that σ k +1 ≤ 11 σ k 10 n 3 . So, using Lemma 2 with T = log 1  iterations, we obtain: k X k,T +log n − P k ( M ) k F ≤ max  , 2 p k M − P k ( M ) k F  . (8) If  > 2 p k M − P k ( M ) k F , then we ha ve: k X k,T +log n − M k F ≤ k X k,T +log n − P k ( M ) k F + k M − P k ( M ) k F ≤ 2 . On the other hand, if  ≤ 2 p k M − P k ( M ) k F , then we ha ve: k X k,T +log n − M k ∞ ≤ k X k,T +log n − P k ( M ) k F + k M − P k ( M ) k ∞ ≤ 2 p k M − P k ( M ) k F + µ 2 r 2 σ k +1 n ≤  2 √ r n + µ 2 r 2 n  σ k +1 ≤ 2 µ 2 r 2 n  1 2  log σ k n 2 σ k +1 σ k + σ k +1 ! . (9) Step IV : Using (9) and “fresh samples” analysis as in Step I (in particular (5)), w e ha v e: k X k,T +2 log n − M k ∞ ≤ 10 µ 2 r 2 9 n  1 2  log σ k σ k +1 σ k + σ k +1 ! ≤ 2 µ 2 r 2 n (2 σ k +1 ) , 10 whic h establishes the in v arian t for the ( k + 1) th stage. Com bining the inv arian t (4) with the exit condition as in Step I I I, w e ha ve: k c M − M k F ≤  where c M is the output of the algorithm. As there are r stages, and in eac h stage, we need 2 log n sets of samples of size O ( pn 2 ). Hence, the total samplexit y is | Ω | = O  αµ 4 r 5 n log 3 n  . Similarly , total computation complexity is O  αµ 4 r 7 n log 3 n log ( k M k F / )  . 5 Discussion and Conclusions In this pap er, w e prop osed a fast pro jected gradient descen t based algorithm for solving the matrix completion problem. The algorithm runs in time O  nr 7 log 3 n log 1 /  , with a sample complexity of O  nr 5 log 3 n  . T o the b est our kno wledge, this is the ﬁrst near linear time algorithm for exact matrix completion with sample complexity independent of  and condition num b er of M . The ﬁrst k ey idea b ehind our result is to use the ` ∞ norm as a p otential function whic h entails b ounding all the terms of an explicit T a ylor series expansion. The second k ey idea is an extension of the Da vis-Kahan theorem, that provides p erturbation b ound for the b est rank- k approximation of a matrix with go o d eigen-gap. W e b eliev e b oth these tec hniques may ﬁnd applications in other con texts. Design an eﬃcient algorithm with information-theoretic optimal sample complexity | Ω | = O ( nr log n ) is still op en; our result is suboptimal b y a factor of r 4 log 2 n and nuclear norm approac h is sub opti- mal by a factor of log n . Another in teresting direction in this area is to design optimal algorithms that can handle sampling distributions that are widely observ ed in practice, suc h as the p o w er la w distribution[MJD09]. 11 References [Bha97] Ra jendra Bhatia. Matrix Analysis . Springer, 1997. [BJ14] Srinadh Bho janapalli and Prateek Jain. Universal matrix completion. In ICML , 2014. [BK07] Rob ert Bell and Y eh uda Koren. Scalable collab orativ e ﬁltering with jointly derived neigh b orho o d in terp olation weigh ts. In ICDM , pages 43–52, 2007. [CBSW14] Y udong Chen, Srinadh Bho janapalli, Sujay Sangha vi, and Rachel W ard. Coherent matrix completion. In Pr o c e e dings of The 31st International Confer enc e on Machine L e arning , pages 674–682, 2014. [CR09] Emman uel J. Cand` es and Benjamin Rec h t. Exact matrix completion via con v ex opti- mization. F oundations of Computational Mathematics , 9(6):717–772, Decem b er 2009. [CT09] Emmanuel J. Cand ` es and T erence T ao. The pow er of conv ex relaxation: Near-optimal matrix completion. IEEE T r ans. Inform. The ory , 56(5):2053–2080, 2009. [EKYY13] L´ aszl´ o Erdos, Antti Knowles, Horng-Tzer Y au, and Jun Yin. Spectral statistics of Erdos–R ´ enyi graphs I: Local semicircle la w. The Annals of Pr ob ability , 41(3B):2279– 2375, 2013. [GL11] David F. Gleich and Lek-Heng Lim. Rank aggregation via nuclear norm minimization. In KDD , pages 60–68, 2011. [Har14] Moritz Hardt. Understanding alternating minimization for matrix completion. In F OCS , 2014. [HCD12] Cho-Jui Hsieh, Kai-Y ang Chiang, and Inderjit S. Dhillon. Low rank modeling of signed net w orks. In KDD , pages 507–515, 2012. [HMR W14] Moritz Hardt, Ragh u Mek a, Prasad Raghav endra, and Benjamin W eitz. Computational limits for matrix completion. In COL T , pages 703–725, 2014. [HW14] Moritz Hardt and May W o otters. F ast matrix completion without the condition n um- b er. In COL T , 2014. [JMD10] Prateek Jain, Ragh u Mek a, and Inderjit S. Dhillon. Guaran teed rank minimization via singular v alue pro jection. In NIPS , pages 937–945, 2010. [JNS13] Prateek Jain, Praneeth Netrapalli, and Suja y Sanghavi. Low-rank matrix completion using alternating minimization. In Pr o c e e dings of the 45th annual ACM Symp osium on the ory of c omputing , pages 665–674. ACM, 2013. [Kes12] Ragh unandan H. Keshav an. Eﬃcien t algorithms for collab orativ e ﬁltering. Phd Thesis, Stanford Universit y , 2012. [KMO10] Raghunandan H. Keshav an, Andrea Mon tanari, and Sew o ong Oh. Matrix completion from a few entries. IEEE T r ansactions on Information The ory , 56(6):2980–2998, 2010. [MJD09] Ragh u Mek a, Prateek Jain, and Inderjit S. Dhillon. Matrix completion from p o w er-la w distributed samples. In NIPS , 2009. 12 [Rec09] Benjamin Rec h t. A simple approac h to matrix completion. JMLR , 2009. [RR13] Benjamin Rec h t and Christopher R´ e. P arallel sto c hastic gradient algorithms for large- scale matrix completion. Mathematic al Pr o gr amming Computation , 5(2):201–226, 2013. [T ro12] Jo el A. T ropp. User-friendly tail b ounds for sums of random matrices. F oundations of Computational Mathematics , 12(4):389–434, 2012. 13 A Preliminaries and Notations for Pro ofs The follo wing lemma sho ws that wlog w e can assume M to b e a symmetric matrix. A similar result is given in Section D of [Har14]. Lemma 4. L et M ∈ R n 1 × n 2 and Ω ⊆ [ n 1 ] × [ n 2 ] satisfy Assumption 1 and 2, r esp e ctively. Then, ther e exists a symmetric f M ∈ R n × n , n = n 1 + n 2 , s.t. f M is of r ank- 2 r , inc oher enc e of f M is twic e the inc oher enc e of M . Mor e over, ther e exists | e Ω | ⊆ [ n ] × [ n ] that satisfy Assumption 2, P e Ω ( f M ) is eﬃciently c omputable, and the output of a SVP up date (3) with P Ω ( M ) c an also b e obtaine d by the SVP up date of P e Ω ( f M ) . Pr o of of L emma 4. Deﬁne the follo wing symmetric matrix from M using a dilation technique: f M =  0 M M > 0  . Note that the rank of f M is 2 · r and the incoherence of f M is b ounded by ( n 1 + n 2 ) /n 2 µ (assume n 1 ≤ n 2 ). Note that if n 2 > n 1 , then we can split the columns of M in blo cks of size n 1 and apply the argument separately to each blo c k. No w, w e can split Ω to generate samples from M and M T , and then augment redundant samples from the 0 part ab ov e to obtain e Ω = [ n ] × [ n ]. Moreo v er, if w e run the SVP up date (3) with input f M , f X and e Ω, an easy calculation shows that the iterates satisfy: f X + =  0 X + X + > 0  , where X + is the output of (3) with input M , X , and Ω. That is, a con v ergence result for f X + w ould imply a con v ergence result for X + as well. F or the remaining sections , we assume (wlog) that M ∈ R n × n is symmetric and M = U ∗ Σ U ∗ > is the eigen v alue decomposition (EVD) of M . Also, unless sp eciﬁed, σ i denotes the i -th eigenv alue of M . B Pro of of Lemma 1 Recall that we assume (wlog) that M ∈ R n × n is symmetric and M = U ∗ Σ U ∗ > is the eigenv alue decomp osition (EVD) of M . Also, the goal is to b ound k X + − M k ∞ , where X + = P k ( M + β H ) and H is suc h that it satisﬁes the following deﬁnition: Deﬁnition 1. H is a symmetric matrix with e ach of its elements dr awn indep endently, satisfying the fol lowing moment c onditions: E [ h ij ] = 0 , | h ij | < 1 , E h | h ij | k i ≤ 1 n , for i, j ∈ [ n ] and 2 ≤ k ≤ 2 log n . 14 That is, we wish to understand k X + − M k ∞ under p erturbation H . T o this end, we ﬁrst present a few lemmas that analyze how H is obtained in the con text of our St-SVP algorithm and also b ounds certain key quantities related to H . W e then presen t a few technical lemmas that are helpful for our pro of of Lemma 1. The detailed pro of of the lemma is giv en in Section B.3. See Section B.4 for pro ofs of the tec hnical lemmas. B.1 Results for H Recall that the SVP up date (3) is given by: X + = P k ( X − 1 p P Ω ( X − M )) = P k ( M + H ) where H = E − 1 p P Ω ( E ) and E = X − M . Our ﬁrst lemma shows that matrices of the form E − 1 p P Ω ( E ), scaled appropriately , satisfy Deﬁnition 1, i.e., satisﬁes the assumption of Lemma 1. Lemma 5. L et A b e a symmetric n × n matrix. Supp ose Ω ⊆ [ n ] × [ n ] is obtaine d by sampling e ach element with pr ob ability p ∈  1 4 n , 0 . 5  . Then the matrix B := √ p 2 √ n k A k ∞  A − 1 p P Ω ( A )  satisﬁes Deﬁnition 1. W e no w present a critical lemma for our pro of whic h b ounds k H a u k ∞ for 2 ≤ a ≤ log n . Note that the en tries of H a can b e dep enden t on eac h other, hence w e cannot directly apply standard tail b ounds. Our proof follo ws along v ery similar lines to Lemma 6.5 of [EKYY13]; see App endix D for a detailed pro of. Lemma 6. Supp ose b H satisﬁes Deﬁnition 1. Fix 1 ≤ a ≤ log n . L et e r denote the r th standar d b asis ve ctor. Then, for any ﬁxe d ve ctor u , we have:    D e r , b H a u E    ≤ ( c log n ) a k u k ∞ ∀ r ∈ [ n ] , with pr ob ability gr e ater than 1 − n 1 − 2 log c 4 . Next, we b ound k H k 2 using matrix Bernstein inequalit y by [T ro12]; see Appendix B.4 for a pro of. Lemma 7. Supp ose H satisﬁes Deﬁnition 1. Then, w.p. ≥ 1 − 1 /n 10+log α , we have: k H k 2 ≤ 3 √ α. B.2 T echnical Lemmas useful for Pro of of Lemma 1 In this section, w e present the tec hnical lemmas used b y our pro of of Lemma 1. First, we presen t the well kno wn W eyl’s p erturbation inequality [Bha97]: Lemma 8. Supp ose B = A + N . L et λ 1 , · · · , λ n and σ 1 , · · · , σ n b e the eigenvalues of B and A r esp e ctively. Then we have: | λ i − σ i | ≤ k N k 2 ∀ i ∈ [ n ] . The b elow given lemma b ounds the ` ∞ norm of an appropriate incoherent matrix using its ` 2 norm. 15 Lemma 9. Supp ose M is a symmetric matrix with size n and satisfying Assumption 1. F or any symmetric matrix B ∈ R n × n , we have: k M B M − M k ∞ ≤ µ 2 r n k M B M − M k 2 . Next, we presen t a natural p erturbation lemma that b ounds the sp ectral norm distance of A to AB − 1 A where B = P k ( A + E ) and E is a p erturbation to A . Lemma 10. L et A ∈ R n × n b e a symmetric matrix with eigenvalues β 1 , · · · , β n , wher e | β 1 | ≥ · · · ≥ | β n | . L et W = A + E b e a p erturb ation of A , wher e E is a symmetric matrix with k E k 2 < | β k | 2 . A lso, let P k ( W ) = U Λ U > b e the eigenvalue de c omp osition of the b est r ank- k appr oximation of W . Then, Λ − 1 exists. F urthermor e, we have:    A − AU Λ − 1 U > A    2 ≤ | β k +1 | + 5 k E k 2 , and    AU Λ − a U > A    2 ≤ 4  | β k | 2  − a +2 ∀ a ≥ 2 . B.3 Detailed Pro of of Lemma 1 W e are no w ready to presen t a pro of of Lemma 1. Recall that X + = P k ( M + β H ), hence, ( M + β H ) u i = λ i u i , ∀ 1 ≤ i ≤ k , (10) where ( u i , λ i ) is the i th ( i ≤ k ) top eigen v ector-eigen v alue pair (in terms of magnitude). No w, as H satisﬁes conditions of Deﬁnition 1, we can apply Lemma 7 to obtain: | β | k H k 2 ≤ | β | · 3 √ α ≤ | σ k | 5 . (11) Using Lemma 8 and (11), we ha v e: | λ i | ≥ | σ i | − | β | k H k 2 ≥ 4 | σ k | 5 ∀ i ∈ [ k ] . (12) Using (10), w e hav e:  I − β λ i H  u i = 1 λ i M u i . Moreov er, using (12), I − β λ i H is inv ertible. Hence, using T a ylor series expansion, we ha v e: u i = 1 λ i  I + β λ i H + β 2 λ 2 i ( H ) 2 + · · ·  M u i . Letting U Λ U > denote the eigenv alue decomp osition (EVD) of X + , we obtain: X + = U Λ U > = X a,b ≥ 0 β a + b ( H ) a M U Λ − ( a + b +1) U > M ( H ) b . 16 Using triangle inequality , we ha v e: k X + − M k ∞ ≤    M U Λ − 1 U > M − M    ∞ + X a,b ≥ 0 a + b ≥ 1 | β | a + b    ( H ) a M U Λ − ( a + b +1) U > M > ( H ) b    ∞ . (13) Using Lemma 9, w e hav e the follo wing b ound for the ﬁrst term ab o v e:    M U Λ − 1 U > M − M    ∞ ≤ µ 2 r n    M U Λ − 1 U > M − M    2 . (14) F urthermore, using Lemma 10 w e ha v e:    M U Λ − 1 U > M − M    2 ≤ | σ k +1 | + 5 | β | k H k 2 , and (15)    M U Λ − a U > M    2 ≤ 4  | σ k | 2  − a +2 ∀ a ≥ 2 . (16) Plugging (15) in to (14) gives us:    M U Λ − 1 U > M − M    ∞ ≤ µ 2 r n ( | σ k +1 | + 5 | β | k H k 2 ) . (17) Let M = U ∗ Σ ( U ∗ ) > denote the EVD of M . W e no w bound the terms in the summation in (13) for 1 ≤ a + b < log n . | β | a + b    ( H ) a M U Λ − ( a + b +1) U > M ( H ) b    ∞ = | β | a + b max i,j e i > ( H ) a M U Λ − ( a + b +1) U > M ( H ) b e j ≤ | β | a + b  max i e i > ( H ) a U ∗     Σ ( U ∗ ) > U Λ − ( a + b +1) U > U ∗ Σ    2  max j ( U ∗ ) > ( H ) b e j  ≤ | β | a + b  √ r max i k ( H ) a u ∗ i k ∞     M U Λ − ( a + b +1) U > M    2  √ r max j    ( H ) b u ∗ j    ∞  ( ζ 1 ) ≤ µ 2 r 2 n | β | a + b  10 √ α log n  a + b    M U Λ − ( a + b +1) U > M    2 ( ζ 2 ) ≤ µ 2 r 2 n | β | a + b  10 √ α log n  a + b · 4  2 | σ k |  a + b − 1 ≤ µ 2 r 2 n  80 | β | √ α log n | σ k |  a + b − 1  10 | β | √ α log n  ≤ µ 2 r 2 n  1 20  a + b − 1 · 10 | β | √ α log n, (18) where ( ζ 1 ) follows from Lemma 6 and ( ζ 2 ) follows from (16). 17 F or a + b ≥ log n , we ha v e | β | a + b    ( H ) a M U Λ − ( a + b +1) U > M ( H ) b    ∞ ≤ | β | a + b    ( H ) a M U Λ − ( a + b +1) U > M ( H ) b    2 ≤ | β | a + b k H k a 2    M U Λ − ( a + b +1) U > M    2 k H k b 2 ≤ | β | a + b k H k a + b 2  5 4 | σ k |  a + b − 1 ≤  15 | β | √ α 4 | σ k |  a + b − 1 · 3 | β | √ α ≤ µ 2 r 2 n  1 20  a + b − 1  10 | β | √ α log n  , (19) where we used Lemma 10 to bound   M U Λ − ( a + b +1) U > M   2 and Lemma 7 to bound k H k 2 . The last inequality follo ws from using (1 / 2) a + b ≤ 1 /n ≤ µ 2 r 2 n as a + b > log n . Plugging (17), (18) and (19) in (13) gives us: k X + − M k ∞ ≤ µ 2 r n ( | σ k +1 | + 5 | β | k H k 2 ) + µ 2 r 2 n X a,b ≥ 0 a + b ≥ 1  1 20  a + b (10 | β | √ α log n ) ≤ µ 2 r 2 n  | σ k +1 | + 15 | β | √ α log n  . This prov es the lemma. B.4 Pro ofs of T ec hnical Lemmas from Section B.1, Section B.2 Pr o of of L emma 5. Since ( P Ω ( A )) ij is an un biased estimate of A ij , w e see that E [ B ij ] = 0. F or k ≥ 2, w e ha v e: E h | B ij | k i =  √ p A ij 2 √ n k A k ∞  k p  1 p − 1  k + (1 − p ) ! ≤  p 2 n  k 2 · 2 p k − 1 ≤ 1 n ( np ) k 2 − 1 ≤ 1 n . Pr o of of L emma 7. Note that, H = P ij h ij e i e j > = P i ≤ j G ij where G ij = h ij 1 { i 6 = j } +1 2  e i e j > + e j e i >  . No w, E [ G ij ] = 0, max ij k G ij k 2 = 2, and,    E h G ij G > ij i    2 =       E   X ij h 2 ij e i e i >         2 = max i X j E  h 2 ij  ≤ 1 . The lemma now follo ws using matrix Bernstein inequality (Lemma 16). 18 Pr o of of L emma 9. Let M = U ∗ Σ U ∗ > b e the eigen v alue decomp osition M . W e ha v e: k M B M − M k ∞ = max i,j e i > ( M B M − M ) e j = max i,j e i >  U ∗ Σ U > B U ∗ Σ U ∗ > − U ∗ Σ U ∗ >  e j ≤  max i    e i > U ∗    2     Σ U ∗ > B U ∗ Σ − Σ    2  max j U ∗ > e j  ( ζ 1 ) ≤ µ 2 r n    U ∗  Σ U ∗ > B U ∗ Σ − Σ  U ∗ >    2 = µ 2 r n k M B M − M k 2 , where ( ζ 1 ) follows from the incoherence of M . Pr o of of L emma 10. Let W = U Λ U > + e U e Λ e U > b e the eigen v alue decomp osition of W . Since P k ( W ) = U Λ U > , we see that | λ k | ≥    e λ i    . F rom Lemma 8, we ha v e: | λ i − β i | ≤ k E k 2 , ∀ i ∈ [ k ] , and ,    e λ i − β k + i    ≤ k E k 2 , ∀ i ∈ [ n − k ] . (20) Since k E k 2 ≤ β k 2 , we see that | λ k | ≥ | β k | / 2 > 0 . (21) Hence, we conclude that Λ ∈ R k × k is inv ertible pro ving the ﬁrst claim of the lemma. Using the eigenv alue decomp osition of W , we ha v e the following expansion: AU Λ − 1 U > A − A =  U Λ U > + e U e Λ e U > − E  U Λ − 1 U >  U Λ U > + e U e Λ e U > − E  − A = U Λ U > − U U > E − E U U > + E U Λ − 1 U > E − U Λ U > − e U e Λ e U > + E = − U U > E − E U U > + E U Λ − 1 U > E − e U e Λ e U > + E . (22) Applying triangle inequality and using k B C k 2 ≤ k B k 2 k C k 2 , we get:    A − AU Λ − 1 U > A >    2 ≤ 3 k E k 2 + k E k 2 2 | λ k | +    e λ 1    . Using the ab o v e inequality with (21), we obtain:    A − AU Λ − 1 U > A >    2 ≤ | β k +1 | + 5 k E k 2 . This prov es the second claim of the lemma. No w, similar to (22), we ha ve: AU Λ − a U > A =  U Λ U > + e U e Λ e U > − E  U Λ − a U >  U Λ U > + e U e Λ e U > − E  = U Λ − a +2 U > − U Λ − a +1 U > E − E U Λ − a +1 U > + E U Λ − a U > E . The last claim of the lemma follo ws by using triangle inequality and (21) in the ab ov e equation. 19 C Pro of of Lemma 2 W e no w present a proof of Lemma 2 that sho w decrease in the F rob enius norm of the error matrix, despite using same samples in eac h iteration. In order to state our proof, w e will ﬁrst introduce certain notations and provide a few p erturbation results that might b e of indep endent interest. Then, in next subsection, w e will present a detailed pro of of Lemma 2. Finally , in Section C.3, we presen t pro ofs of the technical lemmas given below. C.1 Notations and T echnical Lemmas Recall that we assume (wlog) that M ∈ R n × n is symmetric and M = U ∗ Σ U ∗ > is the eigenv alue decomp osition (EVD) of M . In order to state our ﬁrst supp orting lemma, we will in tro duce the concept of tangen t spaces of matrices [Bha97]. Deﬁnition 2. L et A b e a matrix with EVD (eigenvalue de c omp osition) U ∗ Σ U ∗ > . The fol lowing sp ac e of matric es is c al le d the tangent sp ac e of A : T ( A ) := n U ∗ Λ 0 U ∗ > + U ∗ Λ 1 U > + U Λ 2 U ∗ > o , wher e U ∈ R n × n , U T U = I , and Λ 0 , Λ 1 , Λ 2 ar e al l diagonal matric es. That is, if A = U ∗ Σ U ∗ > is the EVD of A , then any matrix B can b e decomp osed in to four m utually orthogonal terms as B = U ∗ U ∗ > B U ∗ U ∗ > + U ∗ U ∗ > B U ∗ ⊥ U ∗ ⊥ > + U ∗ ⊥ U ∗ ⊥ > B U ∗ U ∗ > + U ∗ ⊥ U ∗ ⊥ > B U ∗ ⊥ U ∗ ⊥ > , (23) where U ∗ ⊥ is a basis of the orthogonal space of U ∗ . The ﬁrst three terms ab ov e are in T ( A ) and the last term is in T ( A ) ⊥ . W e let P T ( A ) and P T ( A ) ⊥ denote the pro jection op erators onto T ( A ) and T ( A ) ⊥ resp ectiv ely . Lemma 11. L et A and B b e two symmetric matric es. Supp ose further that B is r ank- k . Then, we have:    P T ( A ) ⊥ ( B )    F ≤ k A − B k 2 F σ k ( B ) . Next, we presen t a few tec hnical lemmas related to norm of M − P Ω ( M ): Lemma 12. L et M , Ω b e as given in L emma 2 and let p = | Ω | /n 2 b e the sampling pr ob ability. Then, F or every r × r matrix b Σ , we have ( w .p. ≥ 1 − n − 10 − α ) :      U ∗ b Σ U ∗ > − 1 p P Ω  U ∗ b Σ U ∗ >   U ∗     F ≤ 1 40    b Σ    F . Lemma 13. L et M , Ω , p b e as given in L emma 2. Then, for every i, j ∈ [ r ] , we have ( w .p. ≥ 1 − n − 10 − α ) :     u ∗ j u ∗ i > − 1 p P Ω  u ∗ j u ∗ i >      2 < 1 40 r √ r . 20 Lemma 14. L et M , Ω , p b e as given in L emma 2. Then, for every i, j ∈ [ r ] and s ∈ [ n ] , we have ( w .p. ≥ 1 − n − 10 − α ) :        u ∗ i , u ∗ j  − 1 p X ( s,l ) ∈ Ω ( u ∗ i ) l  u ∗ j  l       < 1 40 r √ r . C.2 Detailed Pro of of Lemma 2 Let E := X − P k ( M ), H := E − 1 p P Ω ( E ) and G := X − 1 p P Ω ( X − M ) = P k ( M ) + H − 1 p P Ω ( M − P k ( M )). That is, X + = P k ( G ). F or simplicity , in this se ction , we let M = U ∗ Σ U ∗ > + U ∗ ⊥ Σ U ∗ ⊥ > denote the eigen v alue decom- p osition (EVD) of M with P k ( M ) = U ∗ Σ U ∗ > , and also let M = U ∗ ⊥ Σ U ∗ ⊥ > . W e also use the shorthand notation T := T ( P k ( M )). Represen ting X in terms of its pro jection onto T and its complement, w e hav e: X = U ∗ Λ 0 U ∗ > + U ∗ Λ 1 U ∗ ⊥ > + U ∗ ⊥ Λ 1 > U ∗ > + U ∗ ⊥ Λ 3 U ∗ ⊥ > , (24) and also conclude that: k Σ − Λ 0 k F ≤ k X − P k ( M ) k F , k Λ 1 k F ≤ k X − P k ( M ) k F , and k Λ 3 k F ≤ k X − P k ( M ) k F n 2 , where the last conclusion follo ws from Lemma 11 and the hypothesis that k X − P k ( M ) k F < | σ k | n 2 . Using k E k F ≤ σ k /n 2 , we ha v e: k H k F ≤ 2 p k E k F ≤ 2 p σ k n 2 ≤ σ k 8 , and,     1 p P Ω ( M )     F ≤ 1 p k M k F ≤ 1 p σ k n 2 ≤ σ k 8 , where we used the hypothesis that k M − P k ( M ) k F < σ k n 2 in the second inequalit y . The ab ov e bounds implies:     P T  H − 1 p P Ω ( M )      F ≤     H − 1 p P Ω ( M )     F ≤ k H k 2 +     1 p P Ω ( M )     F ≤ σ k 4 . (25) Similarly ,     P T ⊥  H − 1 p P Ω ( M )      2 ≤ σ k 4 . (26) Since X + = P k  P k ( M ) + H − 1 p P Ω ( M )  , using Lemma 3 with (25), (26), we ha ve: k P k ( M ) − X + k F =     P k  P k ( M ) + P T ⊥  H − 1 p P Ω ( M )  − P k  P k ( M ) + H − 1 p P Ω ( M )      F ≤ c     P T  H − 1 p P Ω ( M )      F . 21 No w, using Claim 1, w e hav e    P T  H − 1 p P Ω ( M )     F < 1 10 k P k ( M ) − X k F + 2 √ p k M k F , which along with the abov e equation establishes the lemma . W e no w state and prov e the claim b ounding    P T  H − 1 p P Ω ( M )     F that we used ab ov e to ﬁnish the pro of. Claim 1. Assume notation deﬁne d in the se ction ab ove. Then, we have:     P T  H − 1 p P Ω ( M )      F < 1 10 k P k ( M ) − X k F + 2 √ p k M k F . Pr o of. W e ﬁrst bound kP T ( H ) k F . Recalling that P k ( M ) = U ∗ Σ U ∗ > is the EVD of P k ( M ), we ha v e: kP T ( H ) k F < 2 k H U ∗ k F . Using (24), w e hav e: H U ∗ =  U ∗ ( Σ − Λ 0 ) U ∗ > − 1 p P Ω  U ∗ ( Σ − Λ 0 ) U ∗ >   U ∗ +  U ∗ Λ 1 U ∗ ⊥ > − 1 p P Ω  U ∗ Λ 1 U ∗ ⊥ >   U ∗ +  U ∗ ⊥ Λ 2 U ∗ > − 1 p P Ω  U ∗ ⊥ Λ 2 U ∗ >   U ∗ +  U ∗ ⊥ Λ 3 U ∗ ⊥ > − 1 p P Ω  U ∗ ⊥ Λ 3 U ∗ ⊥ >   U ∗ . (27) Step I : T o b ound the ﬁrst term in (27), w e use Lemma 12 to obtain:      U ∗ ( Σ − Λ 0 ) U ∗ > − 1 p P Ω  U ∗ ( Σ − Λ 0 ) U ∗ >   U ∗     F ≤ 1 40 k ( Σ − Λ 0 ) k F ≤ 1 40 k X − P k ( M ) k F . (28) Step II : T o b ound the second term, w e let U := U ∗ ⊥ Λ 1 > , and pro ceed as follo ws:      U ∗ Λ 1 U ∗ ⊥ > − 1 p P Ω  U ∗ Λ 1 U ∗ ⊥ >   u ∗ i     2 =       r X j =1  u ∗ j u j > − 1 p P Ω  u ∗ j u j >   u ∗ i       2 =       r X j =1  u ∗ j u ∗ i > − 1 p P Ω  u ∗ j u ∗ i >   u j       2 ≤ r X j =1     u ∗ j u ∗ i > − 1 p P Ω  u ∗ j u ∗ i >      2 k u j k 2 ( ζ 1 ) ≤ 1 40 r √ r r X j =1 k u j k 2 ≤ 1 40 √ r    Λ 1 U ∗ ⊥ >    F ≤ 1 40 √ r k P k ( M ) − X k F , where ( ζ 1 ) follows from Lemma 13. This means that we can b ound the second term as:      U ∗ Λ 1 U ∗ ⊥ > − 1 p P Ω  U ∗ Λ 1 U ∗ ⊥ >   U ∗     F ≤ 1 40 k P k ( M ) − X k F . (29) 22 Step II I : W e no w let U := U ∗ ⊥ Λ 2 and turn to b ound the third term in (27). W e ha v e:      U ∗ ⊥ Λ 2 U ∗ > − 1 p P Ω  U ∗ ⊥ Λ 2 U ∗ >   u ∗ i     2 =       r X j =1  u ∗ j , u ∗ i  u j − u j  e h u ∗ j , u ∗ i i Ω       2 =       r X j =1 u j    u ∗ j , u ∗ i  1 − e h u ∗ j , u ∗ i i Ω        2 ( ζ 1 ) ≤ 1 40 r √ r r X j =1 k u j k 2 ≤ 1 40 √ r k U ∗ ⊥ Λ 2 k F ≤ 1 40 √ r k P k ( M ) − X k F , where 1 denotes the all ones v ector, and e h u ∗ j , u ∗ i i Ω denotes a v ector whose s th co ordinate is given b y 1 p P l :( s,l ) ∈ Ω  u ∗ j  l ( u ∗ i ) l . Note that ( ζ 1 ) follows from Lemma 14. So, we again hav e:      U ∗ ⊥ Λ 2 U ∗ > − 1 p P Ω  U ∗ ⊥ Λ 2 U ∗ >   U ∗     F ≤ 1 40 k P k ( M ) − X k F . (30) Step IV : T o bound the last term in (27), we use Lemma 11 to conclude      U ∗ ⊥ Λ 3 U ∗ ⊥ > − 1 p P Ω  U ∗ ⊥ Λ 3 U ∗ ⊥ >   U ∗     F ≤     U ∗ ⊥ Λ 3 U ∗ ⊥ > − 1 p P Ω  U ∗ ⊥ Λ 3 U ∗ ⊥ >      F ≤ 2 p    U ∗ ⊥ Λ 3 U ∗ ⊥ >    F = 2 p kP T ⊥ ( X ) k F ≤ 2 p k P k ( M ) − X k 2 F σ k ( X ) ≤ 1 40 n k P k ( M ) − X k F . (31) Com bining (28), (29), (30) and (31), we ha v e: kP T ( H ) k F ≤ 1 10 k P k ( M ) − X k F . (32) On the other hand, we trivially ha v e:     1 p P Ω ( M )     F ≤ 2 p k M k F . (33) Claim now follo ws by com bining (32) and (33). C.3 Pro ofs of T ec hnical Lemmas from Section C.1 Pr o of of L emma 11. Let B = U Λ U > b e EVD of B . Then, we ha v e:    P T ( A ) ⊥ ( B )    F =    U ∗ ⊥ U ∗ ⊥ > B U ∗ ⊥ U ∗ ⊥ >    F =    U ∗ ⊥ > U Λ U > U ∗ ⊥    F =    U ∗ ⊥ > U ΛΛ − 1 Λ U > U ∗ ⊥    F ≤    U ∗ ⊥ > U Λ    F   Λ − 1   2    Λ U > U ∗ ⊥    F = k A − B k F   Λ − 1   2 k A − B k F ≤ k A − B k 2 F σ k ( B ) . Hence Prov ed. 23 W e will no w prov e Lemma 3, whic h is a natural extension of the Da vis-Kahan theorem. In order to do so, w e will ﬁrst recall the Davis-Kahan theorem: Theorem 3 (Theorem VI I.3.1 of [Bha97]) . L et A and B b e symmetric matric es. L et S 1 , S 2 ⊆ R b e subsets sep ar ate d by ν . L et E = P A ( S 1 ) and F = P B ( S 2 ) b e an orthonormal b asis of the eigenve ctors of A with eigenvalues in S 1 and that of the eigenve ctors of B with eigenvalues in S 2 r esp e ctively. Then, we have: k EF k 2 ≤ 1 ν k A − B k 2 , k EF k F ≤ 1 ν k A − B k F . Pr o of of L emma 3. Let A = U ∗ Σ U ∗ > + U ∗ ⊥ b Σ U ∗ ⊥ > b e the EVD of A with P k ( A ) = U ∗ Σ U ∗ > . Similarly , let A + E = U Λ U > + U ⊥ b Λ U ⊥ > denote the EVD of A + E with P k ( A + E ) = U Λ U > . Expanding P k ( A + E ) into comp onents along U ∗ and orthogonal to it, we ha ve: U Λ U > = U ∗ U ∗ > U Λ U > U ∗ U ∗ > + U ∗ ⊥ U ∗ ⊥ > U Λ U > U ∗ U ∗ > + U Λ U > U ∗ ⊥ U ∗ ⊥ > . No w, k P k ( A + E ) − P k ( A ) k F =    U ∗ U ∗ > U Λ U > U ∗ U ∗ > + U ∗ ⊥ U ∗ ⊥ > U Λ U > U ∗ U ∗ > + U Λ U > U ∗ ⊥ U ∗ ⊥ > − U ∗ Σ U ∗ >    F ≤    U ∗ U ∗ > U Λ U > U ∗ U ∗ > − U ∗ Σ U ∗ >    F +    U ∗ ⊥ U ∗ ⊥ > U Λ U > U ∗ U ∗ >    F +    U Λ U > U ∗ ⊥ U ∗ ⊥ >    F ≤    U ∗ U ∗ > U Λ U > U ∗ U ∗ > − U ∗ Σ U ∗ >    F +    U ∗ ⊥ U ∗ ⊥ > U Λ U >    F +    U Λ U > U ∗ ⊥ U ∗ ⊥ >    F =    U ∗ U ∗ > U Λ U > U ∗ U ∗ > − U ∗ Σ U ∗ >    F + 2    U Λ U > U ∗ ⊥ U ∗ ⊥ >    F ≤    U ∗ U ∗ > U Λ U > U ∗ U ∗ > + U ∗ U ∗ > U ⊥ b Λ U ⊥ > U ∗ U ∗ > − U ∗ Σ U ∗ >    F +    U ∗ U ∗ > U ⊥ b Λ U ⊥ > U ∗ U ∗ >    F + 2    Λ U > U ∗ ⊥    F =    U ∗ U ∗ > E U ∗ U ∗ >    F +    U ∗ > U ⊥ b Λ U ⊥ > U ∗    F + 2    Λ U > U ∗ ⊥    F ≤ k E k F +    U ∗ > U ⊥ b Λ U ⊥ > U ∗    F + 2    Λ U > U ∗ ⊥    F (34) Before going on to b ound the terms in (34), let us make some observ ations. W e ﬁrst use Lemma 8 to conclude that 3 4 | σ i | ≤ | λ i | ≤ 5 4 | σ i | , and    b λ k + i    ≤ | σ k | 2 . Applying Theorem 3 with S 1 = h −| σ k | 2 , | σ k | 2 i and S 2 =  −∞ , − 3 | σ i | 4 i ∪ h 3 | σ i | 4 , ∞  , with separation parameter ν = | σ i | 4 , we see that    u i > U ∗ ⊥    2 ≤ 4 | σ i | k E k 2 , and (35)    U ⊥ > U ∗    F ≤ 4 | σ k | k E k F . (36) 24 W e are no w ready to b ound the last t w o terms in the right hand side of (34). Firstly , we ha ve:    U ∗ > U ⊥ b Λ U ⊥ > U ∗    F ≤    b Λ    2    U ∗ > U ⊥    2    U ⊥ > U ∗    F ≤    b λ k +1       U ⊥ > U ∗    2 F ≤ 2 k E k F , where the last step follows from (36) and the assumption on k E k F . F or the other term, w e hav e:    Λ U > U ∗ ⊥    2 F = X i λ 2 i    u i > U ∗ ⊥    2 2 ≤ 25 16 X i σ 2 i 16 k E k 2 2 σ 2 i = 25 k k E k 2 2 , where we used (35). Com bining the ab o v e tw o inequalities with (34) pro ves the lemma. Finally , w e presen t pro ofs for Lemma 12, Lemma 13, Lemma 14. Pr o of of L emma 12. Using Theorem 1 b y [BJ14], the follo wings ∀ b Σ (w.p. ≥ 1 − n − 10 − α ):     U ∗ b Σ U ∗ > − 1 p P Ω  U ∗ b Σ U ∗ >      2 ≤ µ 2 r √ np k b Σ k 2 ≤ 1 √ r · C · α log n k b Σ k 2 . Lemma no w follo ws b y using the assumed v alue of p in the ab ov e b ound along with the fact that  U ∗ b Σ U ∗ > − 1 p P Ω  U ∗ b Σ U ∗ >  U ∗ is a rank- r matrix. Pr o of of L emma 13. Let H = 1 β  u ∗ j u ∗ i > − 1 p P Ω  u ∗ j u ∗ i >  , where β = 2 µ 2 r √ n · p . Then, using Lemma 5, H satisﬁes the conditions of Deﬁnition 1. Lemma now follows b y using Lemma 7 and using p as giv en in the lemma statement. Pr o of of L emma 14. Let δ ij = I [( i, j ) ∈ Ω]. Then,  u ∗ i , u ∗ j  − 1 p X ( s,l ) ∈ Ω ( u ∗ i ) l  u ∗ j  l = X l (1 − δ sl p ) ( u ∗ i ) l  u ∗ j  l = X l B l , (37) where E [ B l ] = 0, | B l | ≤ 2 µ 2 r n · p , and P E [ B 2 l ] = µ 2 r n · p . Lemma follows b y using Bernstein inequality (giv en b elow) along with the sampling probability p sp eciﬁed in the lemma. Lemma 15 (Bernstein Inequalit y) . L et b i b e a set of indep endent b ounde d r andom variables, then the fol lowing holds ∀ t > 0 : P r      n X i =1 b i − E [ n X i =1 b i ]      ≥ t ! ≤ exp  − t 2 E [ P i b 2 i ] + t max i | b i | / 3  . Lemma 16 (Matrix Bernstein Inequality (Theorem 1.4 of [T ro12])) . L et B i ∈ R n × n b e a set of indep endent b ounde d r andom matric es, then the fol lowing holds ∀ t > 0 : P r      n X i =1 B i − E [ n X i =1 B i ]      2 ≥ t ! ≤ n exp  − t 2 σ 2 + tR / 3  , wher e σ 2 = E  P i B 2 i  and R = max i k B i k 2 . 25 D Pro of of Lemma 6 W e will pro ve the statement for r = 1. The lemma can b e pro v ed by taking a union b ound ov er all r . In order to prov e the lemma, we will calculate a high order momen t of the random v ariable b X a := D e 1 , b H a u E , and then use Marko v inequality . W e use the following notation which is mostly consistent with Lemma 6 . 5 of [EKYY13]. W e abbreviate ( i, j ) as α and denote b h ij b y b h α . W e further let B ( i,j )( k ,l ) := δ j k . With this notation, w e hav e: b X a = X α 1 , ··· ,α a α 1 (1)=1 B α 1 α 2 · · · B α a − 1 α a b h α 1 · · · b h α a u α a (2) . W e no w split the matrix b H in to t w o parts H and H 0 whic h corresp ond to the upp er triangular and lo w er triangular parts of b H . This means b X a = X α 1 , ··· ,α a α 1 (1)=1 B α 1 α 2 · · · B α a − 1 α a  h α 1 + h 0 α 1  · · ·  h α a + h 0 α a  u α a (2) . (38) The ab ov e summation has 2 a terms, of which w e consider only X a := X α 1 , ··· ,α a α 1 (1)=1 B α 1 α 2 · · · B α a − 1 α a h α 1 · · · h α a u α a (2) . The resulting factor of 2 a do es not c hange the result. Abbreviating α α α := ( α 1 , · · · , α a ), and ζ α α α := B α 1 α 2 · · · B α a − 1 α a h α 1 · · · h α a u α a (2) , w e can write X a = X α α α ζ α α α , where the summation runs only ov er those α α α such that α 1 (1) = 1. Calculating the k th momen t expansion of X a for some even n umber k , we obtain: E h X k a i = X α α α 1 , ··· ,α α α k E [ ζ α α α 1 · · · ζ α α α k ] . (39) F or eac h v alid α α α = ( α α α s ) = ( α s l ), w e deﬁne the partition Γ( α α α ) of the index set { ( s, l ) : s ∈ [ k ]; l ∈ [ a ] } , where ( s, l ) and ( s 0 , l 0 ) are in the same equiv alence class if α s l = α s 0 l 0 . W e ﬁrst b ound the con tribution of all α α α corresp onding to a partition Γ in the summation (39) and then b ound the total n um b er of partitions Γ p ossible. Since each h α is centered, w e can conclude that an y partition Γ that has a non-zero contribution to the summation in (39) satisﬁes: 26 (*) eac h equiv alence class of Γ contains at least tw o elemen ts. W e further b ound the summation in (39) b y taking absolute v alues of the summands E h X k a i ≤ X α α α 1 , ··· ,α α α k E [ | ζ α α α 1 | · · · | ζ α α α k | ] , (40) where the summation runs ov er ( α α α 1 , · · · , α α α k ) that corresp ond to v alid partitions Γ. Fixing one such partition Γ, we b ound the con tribution to (40) of all the terms α α α such that Γ( α α α ) = Γ. W e denote G ≡ G (Γ) to b e the graph constructed from Γ as follows. The vertex set V ( G ) is given b y the equiv alence classes of Γ. F or ev ery ( s, l ), w e ha v e an edge b etw een the equiv alence class of ( s, l ) and the equiv alence class of ( s, l + 1). Eac h term in (40) can b e b ounded as follo ws: E [ | ζ α α α 1 | · · · | ζ α α α k | ] ≤ k u k k ∞ k Y s =1 a − 1 Y l =1 B α s l α s l +1 ! E " k Y s =1 a Y l =1   h α s l   !# ≤ k u k k ∞ k Y s =1 a − 1 Y l =1 B α s l α s l +1 ! Y γ ∈ V ( G ) 1 n , where the last step follows from prop erty ( ∗ ) ab ov e and Deﬁnition 1. Using the ab o v e, we can b ound (40) as follows: E h X k a i ≤ k u k k ∞ n v X α 1 , ··· ,α v   Y { γ ,γ 0 }∈ E ( G ) B α γ α γ 0   . where v := | V ( G ) | denotes the num b er of v ertices in G . F actorizing the ab ov e summation o v er diﬀerent components of G , w e obtain E h X k a i ≤ k u k k ∞ n v l Y j =1 X α 1 , ··· ,α v j   Y { γ ,γ 0 }∈ E ( G j ) B α γ α γ 0   , (41) where l denotes the n umber of connected components of G , G j denotes the j th comp onen t of G , and v j denotes the n um b er of v ertices in G j . W e will now b ound terms corresp onding to one connected comp onen t at a time. Pic k a connected comp onent G j . Since α s 1 (1) = 1 for ev ery s ∈ [ a ], we know that there exists a v ertex α γ ∈ G j suc h that α γ (1) = 1. Pick one such vertex as a root v ertex and create a spanning tree T j of G j . W e use the bound B α γ α γ 0 ≤ 1 for every { γ , γ 0 } ∈ E j \ T j . The remaining summation P α 1 , ··· ,α v j  Q { γ ,γ 0 }∈ T j B α γ α γ 0  can b e calculated b ottom up from leav es to the ro ot. Since X α γ 0 B α γ α γ 0 = n, ∀ γ , 27 (a) (b) (c) (d) Figure 1: SVP vs St-SVP: simulations on syn thetic datasets. a), b): reco v ery error and run time of the t wo methods for v arying rank. c): run time required b y St-SVP and SVP with v arying condition num b er. d): run time of b oth the metho ds with v arying matrix size. w e obtain X α 1 , ··· ,α v j   Y { γ ,γ 0 }∈ E ( G j ) B α γ α γ 0   ≤ n v j . Plugging the ab o v e in (41) giv es us E h X k a i ≤ k u k k ∞ n v n P j v j = k u k k ∞ . Noting that the n um b er of partitions Γ is at most ( k a ) ka , we obtain the b ound E h X k a i ≤ ( k u k ∞ ( k a ) a ) k . Cho osing k = 2 d log n a e and applying k th momen t Marko v inequalit y , we obtain Pr [ | X a | > ( c log n ) a k u k ∞ ] ≤ E h | X a | k i  1 ( c log n ) a k u k ∞  k ≤  k a c log n  ka ≤ n − 2 log c 2 . Going back to (38), we ha v e: Pr h    b X a    > ( c log n ) a k u k ∞ i ≤ 2 a Pr h | X a | >  c 2 log n  a k u k ∞ i ≤ 2 a E h | X a | k i 1  c 2 log n  a k u k ∞ ! k ≤ 2 a  k a c log n  ka ≤ n − 2 log c 4 . Applying a union b ound now giv es us the result. E Empirical Results In this section, w e compare the p erformance of St-SVP with SVP on syn thetic examples. W e do not ho w ev er include comparison to other matrix completion methods lik e n uclear norm minimization or alternating minimization; see [JMD10] for a comparison of SVP with those methods. 28 W e implemented b oth the methods in Matlab and all the results are a veraged o v er 5 random trials. In eac h trial w e generate a random lo w rank matrix and observe | Ω | = 5( n 1 + n 2 ) r log ( n 1 + n 2 ) en tries from it uniformly at random. In the ﬁrst exp erimen t, w e ﬁx the matrix size ( n 1 = n 2 = 5000) and generate random matrices with v arying rank r . W e choose the ﬁrst singular v alue to be 1 and the remaining ones to b e 1 /r , giving us a condition num b er of κ = r . Figure 1 (a) & (b) sho w the error in recov ery and the run time of the tw o methods, where w e deﬁne the reco very error as    c M − M    2 / k M k 2 . W e see that St-SVP reco v ers the underlying matrix m uch more accurately as compared to SVP . Moreov er, St-SVP is an order of magnitude faster than SVP . In the next exp eriment, we v ary the condition n um ber of the generated matrices. In terestingly , for small κ , both SVP and St-SVP recov er the underlying matrix in similar time. How ev er, for larger κ , the running time of SVP increases signiﬁcan tly and is almost tw o orders of magnitude larger than that of St-SVP . Finally , we study the tw o metho ds with v arying matrix sizes while keeping all the other parameters ﬁxed ( r = 10, κ = 1 /r ). Here again, St-SVP is muc h faster than SVP . 29

Fast Exact Matrix Completion with Finite Samples

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment