Sharpened Error Bounds for Random Sampling Based $ell_2$ Regression

Given a data matrix $X \in R^{n\times d}$ and a response vector $y \in R^{n}$, suppose $n>d$, it costs $O(n d^2)$ time and $O(n d)$ space to solve the least squares regression (LSR) problem. When $n$ and $d$ are both large, exactly solving the LSR pr…

Authors: Shusen Wang

Sharp ened Error Bound s for Random Sampling Based ℓ 2 Regression Sh usen W ang College of Computer Science & T ec hnology , Zhejiang Unver sit y wss@zju.edu.cn Abstract. Giv en a data matrix X ∈ R n × d and a response vector y ∈ R n , supp ose n > d , it costs O ( nd 2 ) time and O ( nd ) space to solve the least sq u ares regression (LSR) problem. When n and d are b oth large, exactly solving the LSR problem is ve ry exp ensive. When n ≫ d , one feasible approac h to sp eed in g up LSR is to randomly embed y and all columns of X into a smaller su bspace R c ; the induced LSR problem has the same num ber of col umns but muc h f ew er num ber of ro ws, and it can b e solv ed in O ( cd 2 ) time and O ( cd ) space. W e discuss in this pap er tw o random sampling based metho ds for solv- ing LSR more efficiently . Previous w ork show ed that the levera ge scores based sampling based LSR ac hieves 1+ ǫ accuracy when c ≥ O ( dǫ − 2 log d ). In this p ap er we sharp en this error b ound, showing that c = O ( d log d + dǫ − 1 ) is enough for ac hieving 1 + ǫ accuracy . W e also sh o w that when c ≥ O ( µdǫ − 2 log d ) , the uniform sampling based LSR attains a 2 + ǫ b ound with p ositiv e probabilit y . 1 In t ro duction Given n data instances x (1) , · · · , x ( n ) , each of dimension d , and n resp onse s y 1 , · · · , y n , it is interesting to find a mo del β ∈ R d such that y = X β . If n > d , there will not in gener al exist a solution to the linear s ystem, s o we instea d seek to find a mo del β lsr such that y ≈ X β lsr . This can b e form ulated as the least squares r egress io n (LSR) pr oblem: β lsr = argmin β ∈ R d   y − X β   2 2 . (1) Suppo se n ≥ d , in general it takes O ( nd 2 ) t ime and O ( nd ) space to compute β lsr using the Cholesky deco mpo sition, the QR decomp osition, or the singular v alue decomp osition (SVD) [9]. LSR is per haps one of the most widely use d metho d in data pro cess ing, how- ever, solving LSR fo r big da ta is very time and space exp ensive. In the big-data problems where n and d are bo th larg e, the O ( nd 2 ) time complexity a nd O ( nd ) space co mplexity make LSR prohibitive. So it is o f grea t interest to find efficient solution to the LSR problem. F ortunately , when n ≫ d , o ne ca n use a small p o r - tion of the n instance s ins tead of using the full data to a pproximately compute 2 Shusen W ang β lsr , and the computation cost can thereb y b e significantly reduced. Ra ndom sampling ba s ed metho ds [5, 11, 12] and random pro jectio n ba s ed metho ds [2, 6] hav e b een applied to make LSR more efficie ntly solved. F ormally sp eaking, let S ∈ R c × n be a r a ndom sampling/pro jection matrix, we so lve the following problem instead of (1): ˜ β S = ar gmin β ∈ R d   Sy − SX β   2 2 . (2) This pr oblem can be so lved in o nly O ( cd 2 ) time and O ( cd ) spac e . If the random sampling/pro jection ma trix S is constructed using some sp ecial techniques, then it is ens ur ed theor etically that ˜ β S ≈ β lsr and that   y − X ˜ β S   2 2 ≤ (1 + ǫ )   y − X β lsr   2 2 (3) hold with high pr obability . There are tw o criteria to ev aluate random sam- pling/pro jection based LSR. – Running Tim e. That is, the total time complexity in constructing S ∈ R c × n and computing SX ∈ R c × d . – Dimension after Pro jection. Given an error parameter ǫ , w e hop e that there ex ists a poly no mial function C ( d, ǫ ) such that if c > C ( d, ǫ ), the in- equality (3) holds with high probability for all X ∈ R n × d and y ∈ R n . Obviously C ( d, ǫ ) is the s ma ller the b etter b eca use the induced problem (2) can b e solved in less time a nd space if c is small. 1.1 Con tributions The leverage scores base d sampling is an impor tant r andom sa mpling technique widely studied and empiric a lly ev aluated in the litera ture [2, 4, 5 , 8 , 11, 13, 12 , 15, 16]. When applied to acc e le rate LSR, error analysis of the leverage s cores based sampling is av a ilable in the literature. It was shown in [4] s how ed that by using the leverage s cores based sampling with replacement, when c ≥ O  d 2 ǫ − 2  , the inequa lit y (3 ) ho lds with high proba bilit y . Later o n, [5 ] show ed that by using the leverage s cores based sampling without replacement, c ≥ O  dǫ − 2 log d  is sufficient to make (3) hold with high probability . In Theo rem 1 we show that (3) ho lds with high proba bilit y when c ≥ O  d log d + dǫ − 1  , using the same leverage scores bas ed sampling without repla cement. Our results are descr ib ed in Theorem 1 . Our pr o of techniques are ba sed on the pr e v ious work [5 , 6 ], and our pro of is s elf-contained. Error Bound for Random Sampling Based ℓ 2 Regression 3 Algorithm 1 The Le verage Sco r es Based Sampling (without Replacement). 1: Input: an n × d real matrix X , target dimension c < n . 2: (Exactly or approximately) compu t e the leverage scores of X : l 1 , · · · , l n ; 3: Compute the sampling probabilities by p i = min { 1 , cl i /d } for i = 1 to n ; 4: Den ote the set containing th e ind ices of selected rows by C , initialized by ∅ ; 5: F or each index i ∈ [ n ], add i to C with probab ility p i ; 6: Compute the diagonal matrix D = diag ( p − 1 1 , · · · , p − 1 n ); 7: return S ← − the ro ws of D indexed by C . Though uniform sampling can ha v e very bad worst-case p erfor mance when applied to the LSR problem [11], it is still the simplest and most efficien t str ategy for column sa mpling. Tho ugh the unifor m sa mpling is uniformly worse than the leverage score base d sampling from an algo rithmic p ersp ective, it is not true from a statistica l p ersp ective [1 1]. F urthermore, when the leverage scores of X are very ho mogeneous, the uniform sampling has vir tually the same p erfor mance as the leverage scor es ba sed sampling [11 ]. So uniform sampling is still worth of study . W e provide in Theo rem 2 a n er ror b ound for the uniform sampling based LSR. W e show that when c > O  µdǫ − 2 log d  , the uniform sa mpling ba sed LSR attains a 2 + ǫ b ound with p ositive pro ba bility . Here µ denotes the matrix coherence of X . 2 Preliminaries and P revious W ork F or a matrix X = [ x ij ] ∈ R n × d , we let x ( i ) be its i -th r ow, x j be its j -th co lumn, k X k F =  P i,j x 2 ij  1 / 2 be its F rob enius norm, and k X k 2 = max k z k 2 =1 k Xz k 2 be its sp ectral norm. W e let I n be a n n × n identit y matrix and let 0 b e a n all-zero matrix with pro pe r size . W e let the thin singular v alue decomp osition of X ∈ R n × d be X = U X Σ X V T X = d X i =1 σ i ( X ) u X ,i v T X ,i . Here U X , Σ X , and V X are of sizes n × d , d × d , and d × d , and the singular v alues σ 1 ( X ) , · · · , σ d ( X ) are in no n- increasing order . W e let U ⊥ X be an n × ( n − d ) column o rthogona l ma trix such that U T X U ⊥ X = 0 . The condition num b er of X is defined by κ ( X ) = σ max ( X ) /σ min ( X ). Based on SVD, the (ro w) statistic al lever age sc or es of X ∈ R n × d is defined by l i =   u ( i ) X   2 2 , i = 1 , · · · , n , where u ( i ) X is the i -th row of U X . It is obvious that P n i =1 l i = d . Exactly com- puting the n leverages sc ores cos ts O ( nd 2 ) time, which is as ex pe ns ive as ex a ctly 4 Shusen W ang solving the LSR problem (1). F ortunately , if X is a skinny matrix , the leverages scores can be highly efficiently computed within arbitrary accuracy using the techn iques o f [2, 3]. There are man y w ays to constr uct the random sampling/pro jection matrix S , a nd b elow we describ e some of them. – Uniform Sampling . The sampling matrix S is co nstructed by sampling c rows of the identit y matrix I n uniformly a t random. This method is the simplest and fastest, but in the worst ca se its p erfor mance is very bad [11]. – Lev erage Scores Based Sampling. The sa mpling ma trix S is computed by Alg o rithm 1 ; S ha s c rows in exp ectation. This metho d is pr op osed in [4 , 5]. – Subsampled Randomized H adamard T r ansform (SRHT). The ran- dom pro jection matrix S = p n/c RHD is calle d SRHT [1, 6, 14] if • R ∈ R c × n is a subset o f c rows fr o m the n × n iden tity matrix, where the r ows a r e chosen uniformly at rando m a nd witho ut r eplacement; • H ∈ R n × n is a nor malized W alsh–Ha damard matrix ; • D is an n × n random diagonal matrix with eac h diago nal ent ry inde- pendently chosen to b e +1 or − 1 with equa l proba bilit y . SRHT is a fa s t version o f the Johnso n- Lindenstrauss tra ns form. The p erfor - mance of SRHT based LSR is analyzed in [6]. – Sparse Embe dding Matrices. The spar s e em bedding matrix S = Φ D enables random pro jectio n perfor med in time only linear in the num ber of nonzero entries of X [2]. The random linear map S = Φ D is defined by • h : [ n ] 7→ [ c ] is a ra ndom map so that for each i ∈ [ n ], h ( i ) = t for t ∈ [ c ] with pr obability 1 /c ; • Φ ∈ { 0 , 1 } c × n is a c × n binary matrix with Φ h ( i ) ,i = 1, and all remaining ent ries 0 ; • D is the same to the matrix D of SRHT. Sparse embedding matric e s base d LSR is guara nt eed theoretically in [2]. 3 Main Results W e provide in Theor em 1 an improved erro r bo und for the leverage sc ores sam- pling based LSR. Theorem 1 (The Lev erage Score Based Sampling ). Use the lever age sc or e b ase d sampling without r eplac ement (Algorithm 1) to c onstru ct the c × d sam- pling matr ix S wher e c ≥ O ( d ln d + dǫ − 1 ) , and solve the appr oximate LSR pr oblem (2) to obtain ˜ β S . Then with pr ob ability at le ast 0 . 8 the fol lowing ine qualities hold:   y − X ˜ β S   2 2 ≤ (1 + ǫ )   y − X β lsr   2 2 ,   β lsr − ˜ β S   2 2 ≤ ǫ σ 2 min ( X )   y − X β lsr   2 2 ≤ ǫ κ 2 ( X )  γ − 2 − 1    β lsr   2 2 , wher e γ is define d by γ ≤ k U X U T X y k 2 / k y k 2 ≤ 1 . Error Bound for Random Sampling Based ℓ 2 Regression 5 W e show in Theor e m 2 an error b ound for the uniform sampling based LSR. Theorem 2 (Uniform Sampling). Use the uniform sampling without re plac e- ment to sample c ≥ 1 000 µd (ln d + 7) r ows of X and c ompute the app r oximate LSR pr oblem (2) to obtain ˜ β S . Then with pr ob abili ty 0 . 05 the fol lowing ine qualities hold: k y − X ˜ β S k 2 2 ≤ 2 . 2 k y − X β lsr k 2 2 ,   β lsr − ˜ β S   2 2 ≤ 1 . 2 σ 2 min ( X )   y − X β lsr   2 2 ≤ 1 . 2 κ 2 ( X )  γ − 2 − 1    β lsr   2 2 , wher e γ is define d by γ ≤ k U X U T X y k 2 / k y k 2 ≤ 1 . Since computing k y − X ˜ β S k 2 costs o nly O ( nd ) time, so one can rep eat the pro cedure t times and cho ose the solution that attains the minimal err or k y − X ˜ β S k 2 . In this wa y , the error b ounds hold with pro bability 1 − 0 . 95 t which can be a rbitrarily high. 4 Pro of In Sectio n 4.1 we list some of the previous work that will b e used in our pr o of. In Section 4.2 we prove Theorem 1. W e prove the theor em by using the techniques in the pro of of Lemma 1 a nd 2 of [5] a nd Lemma 1 and 2 o f [6]. F or the sa ke of self-contain, we repe at some of the pro of of [5] in o ur pro of. In Section 4.3 we prov e T he o rem 2 using the techn iques in [6, 7, 1 4]. 4.1 Key Lemmas Lemma 1 is a deterministic error b ound for the sa mpling /pro jection based LSR, which will be used to prov e both of Theorem 1 and Theorem 2. The ra ndom matrix multiplication b ounds in Lemma 2 will b e used to pr ov e Theor em 1. The matrix v ariable ta il b ounds in Lemma 3 will b e used to pr ov e Theo rem 2 . Lemma 1 (Determinis tic Error Bound, Lemma 1 and 2 of [6 ]). Supp ose we ar e given an over c onstr aine d le ast squar es appr oximation pr oblem with X ∈ R n × d and y ∈ R n . We let β lsr b e define d in (1) and ˜ β S b e define d in (2), and define z S ∈ R d such that U X z S = X ( β lsr − ˜ β S ) . Then the fol lowing e quality and ine qu alities hold deterministic al ly:   y − X ˜ β S   2 2 =   y − X β lsr   2 2 +   U X z S   2 2 ,   β lsr − ˜ β S   2 2 ≤ k U X z S k 2 2 σ 2 min ( X ) ,   z S   2 ≤   U T X S T SU ⊥ X U ⊥ X T y   2 σ 2 min ( SU X ) . 6 Shusen W ang By further assuming that k U X U T X y k 2 ≥ γ k y k 2 , it fol lows that   U ⊥ X U ⊥ X T y   2 2 ≤ σ 2 max ( X )  γ − 2 − 1    β lsr   2 2 . Pr o of. The equality and the firs t tw o ineq ua lities follow from Lemma 1 of [6]. The last inequa lity follows from Lemma 2 of [6]. Lemma 2 (Theorem 7 of [5] ). Supp ose X ∈ R d × n , Y ∈ R n × p , and c ≤ n , and we let S ∈ R c × n b e the sampling matrix c ompute d by A lgorithm 1 taking X and c as input, then E   X T Y − X T S T SY   F ≤ 1 √ c   X   F   Y   F , E   X T X − X T S T SX   F ≤ O  r log c c    X   2   X   F . Lemma 3 (Theorem 2. 2 of [14 ]). L et W b e a finite set of p ositive semidefinite matric es with dimension d , and su pp ose t hat max W ∈W λ max ( W ) ≤ R . Sample W 1 , · · · , W c uniformly at r andom fr om W witho ut r eplac ement. We de- fine ξ min = cλ min  E W 1  and ξ max = cλ max  E W 1  . Then for any θ 1 ∈ (0 , 1] and θ 2 > 1 , t he fol lowing ine qualities hold: P  λ min  c X i =1 W i  ≤ θ 1 ξ min  ≤ d  e θ 1 − 1 θ θ 1 1  ξ min /R , P  λ max  c X i =1 W i  ≥ θ 2 ξ max  ≤ d  e θ 2 − 1 θ θ 2 2  ξ max /R . 4.2 Pro of of Theorem 1 Pr o of. W e firs t b ound the term σ 2 min as follows. Applying a singular v alue in- equality in [1 0], we hav e that for all i ≤ rank( X )   1 − σ 2 i ( SU X )   =   σ i  U T X U X  − σ i ( U T X S T SU X )   ≤ σ max  U T X U X − U T X S T S T U X  =   U T X U X − U T X S T S T U X   2 . Since the leverage scores o f X are also the leverage sco res of U X , it follows from Lemma 2 that E   U T X U X − U T X S T S T U X   2 ≤ O  r ln c c  k U X k F k U X k 2 = O  r d ln c c  . Error Bound for Random Sampling Based ℓ 2 Regression 7 It then follows from Markov’s inequality that the inequa lity   1 − σ 2 i ( SU X )   ≤ δ − 1 1 O  r d ln c c  holds with proba bility a t least 1 − δ 1 . When c ≥ O ( dδ − 2 1 ǫ − 2 1 ln( dδ − 2 1 ǫ − 2 1 )) , (4) the ineq ua lity σ 2 min ( SU X ) ≥ 1 − ǫ 1 (5) holds with proba bility a t least 1 − δ 1 . Now we b ound the term k U T X S T SU ⊥ X U ⊥ X T y k 2 . Since U T X U ⊥ X T = 0 , we have that    U T X S T SU ⊥ X U ⊥ X T y    2 =     U T X  U ⊥ X U ⊥ X T y  −  U T X  S T S  U ⊥ X U ⊥ X T y     2 . Since the leverage scores o f X are also the leverage sco res of U X , it follows from Lemma 2 that E     U T X  U ⊥ X U ⊥ X T y  −  U T X  S T S  U ⊥ X U ⊥ X T y     2 ≤ 1 √ c    U X    F    U ⊥ X U ⊥ X T y    2 = q d c    U ⊥ X U ⊥ X T y    2 . It follows fro m the Markov’s inequality that the following inequality holds with probability at lea st 1 − δ 2 :    U T X S T SU ⊥ X U ⊥ X T y    2 ≤ δ − 1 2 √ d √ c    U ⊥ X U ⊥ X T y    2 . (6) Thu s when c ≥ dδ − 2 2 ǫ − 2 2 (1 − ǫ 1 ) − 2 , (7) it follows fro m (5), (6), and the union b ound that the ineq ua lity   U T X S T SU ⊥ X U ⊥ X T y   2 σ 2 min ( SU X ) ≤ ǫ 2    U ⊥ X U ⊥ X T y    2 (8) holds with probability at least 1 − δ 1 − δ 2 . W e let ǫ 1 = 0 . 5, ǫ 2 = √ ǫ , δ 1 = δ 2 = 0 . 1, and let z S be defined in Lemma 1. When c ≥ max  O ( d ln d ) , 4 0 0 dǫ − 1  , it follows from (4), (7), (8), a nd Lemma 1 that with probability a t least 0 . 8 the following inequality holds: k z S k 2 ≤ √ ǫ    U ⊥ X U ⊥ X T y    2 . Since U ⊥ X U ⊥ X T y = y − X β lsr , the theorem follows directly fr o m Lemma 1. 8 Shusen W ang 4.3 Pro of of Theorem 2 Pr o of. W e first follow some o f the techniques of [7] to bo und the tw o terms σ 2 max ( SU X ) = σ max ( U T X S T SU X ) , σ 2 min ( SU X ) = σ min ( U T X S T SU X ) . W e let u i ∈ R d be the i -th column of U T X , and le t W 1 , · · · , W c be d × d ma- trices sa mpled i.i.d. from  u i u T i  n i =1 uniformly at random without replacement. Obviously , σ k  U T X S T SU X  = σ k  P c j =1 W j  . W e a ccordingly define R = ma x j λ max ( W j ) = max i   u i   2 2 = d n µ , where µ is the row matrix coherence of X , and define ξ min = cλ min  E W 1  = c n λ min  U T X U X  = c n , ξ max = cλ max  E W 1  = c n λ max  U T X U X  = c n . Then we apply Lemma 3 and obtained the following inequality: P  λ min  c X i =1 W i  ≤ θ 1 c n  ≤ d  e θ 1 − 1 θ θ 1 1  c dµ , δ 1 , P  λ max  c X i =1 W i  ≥ θ 2 c n  ≤ d  e θ 2 − 1 θ θ 2 2  c dµ , δ 2 , where θ 1 ∈ (0 , 1), θ 2 > 1, and δ 1 , δ 2 ∈ (0 , 1) a re a rbitrary real num b ers . W e set c = ma x  µd ln( d/δ 1 ) θ 1 ln θ 1 − θ 1 + 1 , µd ln( d/δ 2 ) θ 2 ln θ 2 − θ 2 + 1  , (9) it then follo ws tha t with pr o bability at least 1 − δ 1 − δ 2 , both of the following t wo ine q ualities ho ld: σ max  SU X  ≤ r θ 2 c n and σ − 2 min  SU X  ≤ n θ 1 c . (10) Now w e seek to bound the term   SU ⊥ X U ⊥ X T y   2 . Let C b e an index set of cardinality c with each element chosen fro m [ n ] uniformly at ra ndom witho ut replacement, and let y ⊥ = U ⊥ X U ⊥ X T y , then we have tha t E   SU ⊥ X U ⊥ X T y   2 2 = E   Sy ⊥   2 2 = E X j ∈C ( y ⊥ j ) 2 = c n   y ⊥   2 2 . Thu s with proba bilit y at least 1 − δ 3 the following ineq uality holds:   SU ⊥ X U ⊥ X T y   2 2 ≤ c nδ 3   U ⊥ X U ⊥ X T y   2 2 . (11) Error Bound for Random Sampling Based ℓ 2 Regression 9 Finally , it follows from inequalities (10, 1 1) and Lemma 1 that k z S k 2 ≤   U T X S T SU ⊥ X U ⊥ X T y   2 σ 2 min ( SU X ) ≤   U T X S T   2   SU ⊥ X U ⊥ X T y   2 σ 2 min ( SU X ) ≤ r θ 2 c n n θ 1 c r c nδ 3   U ⊥ X U ⊥ X T y   2 = 1 θ 1 r θ 2 δ 3   U ⊥ X U ⊥ X T y   2 . Here the first t w o inequalities hold deterministically , and the third inequality holds with proba bility a t least 1 − δ 1 − δ 2 − δ 3 . W e se t θ 1 = 1 − ǫ , θ 2 = 1 + ǫ , δ 1 = δ 2 = δ , and δ 3 = 1 − 3 δ . Since ln θ ≈ θ − 1 when θ is close to 1, it follows from (9) that when c > µdǫ − 2 (ln d − ln δ ), the inequality   z S   2 2 ≤ 1 + ǫ (1 − ǫ ) 2 (1 − 3 δ )    U ⊥ X U ⊥ X T y    2 2 holds with proba bility a t least δ . Setting θ 1 = 0 . 9 556, θ 2 = 1 . 0 45, δ 1 = δ 2 = 0 . 0 015, and δ 3 = 0 . 9 47, we co n- clude that when c = 1000 µd (ln d + 7), the inequality k z S k 2 2 ≤ 1 . 2   U ⊥ X U ⊥ X T y   2 2 holds with pro ba bility at least 0 . 05. Then the theorem follows direc tly from Lemma 1. References 1. C. Boutsidis and A . Gittens. I mprov ed matrix algorithms via the subsampled ran- domized hadamard transform. SIAM Journal on Matrix A nal ysis and Appli c ations , 34(3):1301 –1340, 2013. 2. K . L. Clarkson a nd D. P . W o od ruff. Lo w rank approximatio n and regression in input sparsity time. In Annual ACM Symp osi um on the ory of c omputing (STOC) . ACM , 2013. 3. P . Drineas, M. Magdon-Ismail, M. W . Mahoney , and D. P . W o o druff. F ast approx- imation of matrix coherence and statistical leverage. Journal of Machine L e arning R ese ar ch , 13:3441–3 472, 2012. 4. P . Drineas, M. W. Mahoney , and S. Mu t hukrishnan. S ampling algorithms for ℓ 2 regression and applications. I n Pr o c e e di ngs of the sevente enth annual A CM-SIAM symp osium on Discr ete algorithm . ACM , 2006. 5. P . Drineas, M. W. Mahoney , and S. Muthukrishnan. Relativ e- error CUR matrix decomp ositions. SIAM Journal on Matrix A nalysis and Applic ations , 30(2):84 4– 881, Sept. 2008. 6. P . Drineas, M. W. Mahoney , S. Mut hukrishnan, and T. Sarl´ os. F aster least squares approximatio n. Numerische Mathematik , 117(2):219– 249, 2011. 7. A . Gittens. The sp ectral n orm error of t h e naive Nystr¨ om extension. arXiv pr eprint arXiv:1110.5305 , 2011. 8. A . Gi ttens and M. W. Mahoney . Revisiting the nystr¨ om method for improved large-scale machine learning. In I nternational Confer enc e on Machine L e arning (ICML) , 2013. 9. G. H. Golub and C. F. van Loan. Matrix computations. The Johns Hopkins , 1996. 10. R . A. Horn an d C. R . Johnson. T opics in matrix analysis. 1991. Cambridge University Pr esss, C am bridge . 10 Shusen W ang 11. P . Ma, M. W . Mahoney , and B. Y u. A statistical p ersp ective on algorithmic lever- aging. In International Confer enc e on Machine L e arning (ICML) , 2014. 12. M. W. Mahoney . R andomized algorithms for matrices an d data. F oundations and T r ends in Machine L e arning , 3(2):123–224, 2011. 13. M. W. Mahoney and P . Drineas. CUR matrix decomp ositions for improv ed data analysis. Pr o c e e dings of the National A c ademy of Sci enc es , 106(3):69 7–702, 2009. 14. J. A. T ropp. Improv ed analysis of the subsampled randomized hadamard trans- form. A dvanc es i n A daptive Data Analysis , 3(01–02):115–126 , 2011 . 15. S. W ang and Z. Zhang. Impro ving CUR matrix decomposition and th e N ystr¨ om approximatio n via adaptive sampling. Journal of Machine L e arning Re se ar ch , 14:2729 –2769, 2013. 16. S. W ang and Z. Zhang. Effici ent alg orithms and error analysis for the mod ified nystr¨ om metho d. I n International C onf er enc e on Artificial Intel ligenc e and Statis- tics (AIST A T S) , 2014.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment