A Channel Coding Perspective of Recommendation Systems

Motivated by recommendation systems, we consider the problem of estimating block constant binary matrices (of size $m \times n$) from sparse and noisy observations. The observations are obtained from the underlying block constant matrix after unknown…

Authors: S.T. Aditya, Onkar Dabeer, Bikash Kumar Dey

A Channel Coding Perspect i v e of Recommendation Systems S.T . Aditya Departmen t of E lectrical En gineerin g Indian Institute o f T echn ology Bomb ay Mumbai, I ndia Email: staditya@ee.iitb .ac.in Onkar Da beer School of T echno logy and Computer Science T ata Institute of Fun damental Researc h Mumbai, I ndia Email: o nkar@tcs.tifr . res.in Bikash Kumar Dey Departmen t of Electrical En gineerin g Indian I nstitute o f T echn ology Bomb ay Mumbai, I ndia Email: bikash@ee.iitb.ac.in Abstract — Motiva ted b y recommendation systems, we consider the problem of estimating block constant binary matrices (of size m × n ) from spar se and noisy ob serv ations. The observ ations are o btained from the un derlying bl ock constant matrix after unknown row and colu mn p ermutations, erasures, and errors. W e derive upp er and lower bounds on the achiev able probability of error . For fi xed erasure and error probability , we show that there exists a constant C 1 such that if the cluster sizes are less than C 1 ln( mn ) , t hen f or any algorithm the probability of err or approaches one as m, n → ∞ . On the other h and, we show that a simpl e polynomial ti me algorithm giv es probability of error diminishin g to zero provided the cluster sizes are grea ter than C 2 ln( mn ) for a suitable constant C 2 . I . I N T R O D U C T I O N Recommend er systems ar e commo nly used to suggest con- tent (movies, book s, etc.) that is relev an t to a given buyer . T he most common ap proach is to pr ed ict the rating th at a p otential buyer might assign to an item and use th e pr edicted ratin gs to recommen d items. Th e prob lem th us r educes to comp letion o f the ratin g matrix based o n a sparse set of observations. This problem has been pop ularized by the Netflix Prize ([ 1]). A number of metho ds hav e been suggested to solve this prob lem; see for example [2], [3], [ 4] and referenc es th erein. Recently , se veral authors ([5], [6] [ 7]) have used th e assumption of a low-rank rating matr ix to propo se provably goo d algorith ms. For example, in [5], [6], a “compr essed sensing” appr oach based o n n uclear-norm minimization is prop osed. It is shown in [6] that if the numbe r of samples is larger tha n a lower bound (de pending on th e matrix size and rank), then wit h high probab ility , the prop osed optimization prob lem exactly recovers the un derlyin g lo w-rank matrix from the samp les. In [7], the relationship between the “fit-err or” and the pred iction error is studie d for large random matrices w ith bounded ran k. An efficient algorithm for matrix com pletion is also prop osed. In this pape r , we co nsider a different setup. W e assume that there is an under lying “tru e” rating matrix, which has blo ck constant structure. In other words, buyers (respectively items) are clustered into groups of similar buyers (respectively items), and similar buyers r ate similar items by the same value. The observations are obtained from this u nderly ing matrix (say M ) as d escribed b elow . 1) The rows an d column s o f M are permuted with un- known p ermutation s, th at is, the clu sters are not known. 2) Many entries of M are erased by a memory less era- sure ch annel. This models th e sp arsity of th e available ratings. 3) The n on-erased entr ies are observed through a discrete memory less channel (DM C). This chann el models • the residual e rror in the b lock con stant model, and, • the “noisy” behavior of buyers w ho m ay rate th e same item differently at different times. One may also treat these two channels as a single effecti ve DMC, but we prefer the above br eak-up for co nceptual rea- sons. Our go al is to identify co nditions on th e cluster sizes under which the underlying matrix can be recovered with small probab ility of error . Our recommendatio n s ystem model dif fers from [5], [6], and in par ticular, we do not seek completio n of the observed matrix, but rather the recovery o f the un derlying M . As d escribed above, ou r g oal re duces to an alyzing the error perfo rmance of the co de of b lock-con stant m atrices over the ch annel described above. From a pra ctical stand- point, it is desirable to co nsider the case when the par ameters of the er asure cha nnel and DMC are not kn own. Howev er , in this pap er , we con sider the simpler case when these chann el parameters are k nown. In particular, for simplicity , we consider the case when M is an m × n ma trix with entries in { 0 , 1 } a nd th e DMC is a binary sy mmetric channel (BS C) with error probability p . The erasure probability is ǫ . Our main r esults are of the f ollowing nature. • If the “largest cluster size” (defined precisely in Section III) is less than C 1 ln( mn )) , th en th e pr obability o f err or approa ches un ity for any estimato r of M as mn → ∞ (Corollary 2, Part 2)) . • W e analyze a simp le algo rithm, which c lusters rows and columns first, a nd th en estimates th e cluster values. W e show th at if the “smallest c luster size” is gr eater than a constant m ultiple o f ln( mn ) , th en the pro bability of erro r for this algorithm (averaged over the rating matrices) , approa ches zero as mn → ∞ (Theor em 3, Part 2)). Com - bined with the previous result, this implies that ln( mn ) is a sharp threshold f or exact recovery asymptotically . • If we consider the p robab ility of erro r fo r a fixed rating matrix, th en th e algorithm need s the smallest cluster size to be larger than a constant multiple of p mn ln( m ) ln( n ) . While we ob tain the asymp totic resu lts f or fixed p and ǫ , the bound s we obtain in th e process also app ly to the case when p , ǫ depen d on m , n . The paper is organized as follows. In Section II, we descr ibe our mo del. The m ain r esults are stated and proved in Section III. W e conclude in Section IV. I I . O U R M O D E L A N D N OTA T I O N Suppose X is the unkn own m × n rating m atrix with entr ies in { 0 , 1 } , where n is the number of b uyers and m is the number of items. Let A = { A i } r i =1 and B = { B j } t j =1 be partitio ns of [1 : m ] a nd [1 : n ] respe ctiv ely . The sets A i × B j are the clusters in the m atrix X . W e call A i ’ s ( B j ’ s) th e row ( column) clusters. W e denote the correspo nding row a nd column cluster sizes by m i and n j , and the num ber o f row cluster s and the n umber o f colum n clusters b y r and t r espectively . (W e note that th e A i ’ s (resp ectiv ely B i ’ s) ne ed not con sist of adjacent rows (respectively columns) and h ence this notation is different from that in the intro duction ). T he entr ies o f X ar e passed thr ough the cascade of a me moryless erasur e channel with erasure p robab ility ǫ and a memor yless BSC with e rror pro bability p . While the era sure cha nnel mode ls the missing ratings, the BSC models noisy behavior of the buyers. Th e output of the cha nnel, i.e . the observed rating matrix, is denoted by Y and its e ntries are in { 0 , 1 , e } , wher e e denote s an erasure. W e analyze the prob ability o f error for a fixed r ating matrix as well as the p robability o f error averaged over th e rating matr ices. W e use th e following probab ility law on the ratin g matrices. W e assum e that all r ow and colu mn clusters have the same size m 0 and n 0 respectively , and the rt con stant b locks ( of size m 0 n 0 ) contain i.i. d. Bern oulli 1/2 random v ariables. I I I . M A I N R E S U LT S In Section III-A, we stud y the p robab ility of err or of the maximum likelihood deco der when the clusters A , B ar e known. This result provides a lower bou nd on the cluster size that en sures dim inishing prob ability of e rror . In Section III-B, we analyze the probab ility of error in identifyin g the clusters for a specific algo rithm. These results are in tegrated in Section III-C to obtain con ditions on the cluster sizes for the overall probab ility of error to diminish to zero. A. Pr obability of Err o r When Clustering is Known In this section, we stud y the pro bability of er ror of the maximum likelihood decoder for a giv en rating matrix X when A and B are kn own. W e d enote this p robab ility by P e |A , B ( X ) . W e note that th e ML decoder ignores the erasures, coun ts th e number o f 0’ s and 1’ s in e ach cluster A i × B j , an d takes a majority d ecision. Ties are resolved by tossing a fair co in. T he following theorem provides simple upp er and lower bounds on P e |A , B . Theor em 1: Let 0 ≤ p ≤ 1 / 2 , and let p 1 = ǫ + 2(1 − ǫ ) p p (1 − p ) G ( u ) = 1 − r,t Y i =1 ,j =1 (1 − u m i n j ) . Then the probability of erro r of the ML decoder satisfies the following boun ds: G ( ǫ ) ≤ P e |A , B ( X ) ≤ G ( p 1 ) . (1) Pr oo f: W e note that when p = 0 , we m ake an error in a cluster iff all the entries in the cluster are erased. Since the erasures in different clu sters ar e indepe ndent, it fo llows that P e |A , B ( X ) = G ( ǫ ) for p = 0 . This gives the lower bo und o n P e |A , B ( X ) for p ≥ 0 . Next we prove the upp er bou nd. Sup pose in cluster A i × B j we have s non erased samp les. Th en the prob ability of cor rect decision in this cluster is given by Pr ( E c i,j,s ) = ⌊ s 2 ⌋ X q =0  s q  p q (1 − p ) s − q if s is odd = s 2 − 1 X q =0  s q  p q (1 − p ) s − q + 1 2  s s 2  p s 2 (1 − p ) s 2 if s is ev en . (2) A veraging over the number of non erased samples, the prob - ability o f co rrect decision in cluster A i × B j is g iv en by Pr ( E c i,j ) = m i n j X s =0  m i n j s  ǫ m i n j − s (1 − ǫ ) s Pr ( E c i,j,s ) . (3) Since the erasur e and BSC are memo ryless P e |A , B ( X ) = Pr  ∪ r,t i =1 ,j =1 E i,j  = 1 − r,t Y i =1 ,j =1 Pr  E c i,j  . (4) Equation s (4), (3), and ( 2) specif y th e pr obability of error . The desired upp er bound is obtained by d eriving an upper bo und on Pr ( E c i,j,s ) . First we note th at from (2), 1 − Pr ( E c i,j,s ) ≤ s X ⌈ s 2 ⌉  s q  p q (1 − p ) s − q . But for 0 ≤ p ≤ 1 2 and q ≥ s 2 , p q (1 − p ) s − q ≤ p s 2 (1 − p ) s 2 . Substituting this in the pr evious equation, we have Pr ( E c i,j,s ) ≥ 1 − (2 p p (1 − p )) s . (5) From Equation s (3) and (5), we have Pr ( E c i,j ) ≥ 1 − p m i n j 1 and so from (4), P e |A , B ( X ) ≤ G ( p 1 ) . This completes the pr oof for the upp er boun d on P e |A , B ( X ) . Let us define the sma llest clu ster size as s ∗ ( X ) := min i,j m i n j , (6) and the lar gest cluster size as s ∗ ( X ) := max i,j m i n j . The following corollary gives simpler bounds on P e |A , B ( X ) . Cor ollary 1: Let N X ( s ) be the n umber o f clusters in X with exactly s elem ents. L et s ∗ ( X ) ≥ ln(2) ln(1 /p 1 ) . Then P e |A , B ( X ) ≥ 1 − exp − ∞ X s =1 N X ( s ) ǫ s ! , P e |A , B ( X ) ≤ 1 − exp − 2 ln(2) ∞ X s =1 N X ( s ) p s 1 ! . (7) In p articular, P e |A , B ( X ) ≥ 1 − exp  − mnǫ s ∗ ( X ) s ∗ ( X )  , P e |A , B ( X ) ≤ 1 − exp − 2 ln(2) mnp s ∗ ( X ) 1 s ∗ ( X ) ! . (8) Pr oo f: The proo f is based on upper and lower bounds for G ( u ) . W e note that (1 − x ) ≤ ex p( − x ) and for x ∈ [0 , 1 / 2] , 1 − x ≥ exp( − 2 ln(2) x ) . Hence exp   − 2 ln(2) r,t X i =1 ,j =1 u m i n j   ≤ r,t Y i =1 ,j =1 (1 − u m i n j ) ≤ exp   − r,t X i =1 ,j =1 u m i n j   . Where the first inequa lity holds for u m i n j ≤ 1 2 . The sum in the exponen t can be written in terms of th e size o f the clusters: r,t X i =1 ,j =1 u m i n j = ∞ X s =1 N X ( s ) u s . The bound s (7) n ow follow fro m Theo rem 1 by no ting that p m i n j 1 ≤ 1 / 2 for s ∗ ( X ) ≥ ln(2) / ln(1 /p 1 ) . T o prove (8), we n ote tha t ∞ X s =1 N X ( s ) u s ≤ rtu s ∗ ( X ) ≤ mn s ∗ ( X ) u s ∗ ( X ) . This gives the upper bound in (8 ). T he lower bo und in (8) follows similarly . W e a re interested in stud ying the cluster sizes that guaran tee correct decisions asy mptotically . Th ough (7) is tighter than (8), the c ondition s arising out of (8) are cleaner and are stated below . Cor ollary 2: Suppose we ar e g iv en a sequence of rating matrices of increasing size, that is, mn → ∞ . Th en the following are tr ue. 1) If s ∗ ( X ) ≥ ln( mn ) ln(1 /p 1 ) then P e |A , B ( X ) → 0 . 2) If s ∗ ( X ) ≤ (1 − δ ) ln ( mn ) ln(1 /ǫ ) , for some δ > 0 , then P e |A , B ( X ) → 1 . Pr oo f: First c onsider Part 1 . Fr om (8), using e − x ≥ 1 − x we get P e |A , B ( X ) ≤ 2 ln(2) mnp s ∗ ( X ) 1 s ∗ ( X ) . The RHS is a dec reasing fu nction of s ∗ ( X ) and h ence su bsti- tuting the lower bou nd on s ∗ ( X ) we ge t P e |A , B ( X ) ≤ 2 ln(2) ln(1 /p 1 ) ln( mn ) → 0 . For Part 2 , we note that 1 − ex p  − mnǫ s ∗ ( X ) /s ∗ ( X )  is a decreasing function of s ∗ ( X ) , and hence substituting the upper bound , we h av e fro m (8) P e |A , B ( X ) ≥ 1 − exp  − ln(1 /ǫ )( mn ) δ (1 − δ ) ln( mn )  . But since ( mn ) δ / ln mn → ∞ , we h ave P e |A , B → 1 . B. Pr obability of Err o r in Clustering Data mining researc hers hav e dev eloped se veral tech niques for clu stering data; see f or examp le [8, Chap ter 4]. In this section, we analyze a simple p olynom ial time clustering algo - rithm. The algo rithm clusters rows a nd co lumns separately . T o cluster rows, we compu te the no rmalized Hamm ing distance between two r ows over co mmon ly sampled entries. For ro ws i, j , this distance is: d ij = 1 n n X k =1 1 ( Y ik 6 = e, Y j k 6 = e ) 1( Y ik 6 = Y j k ) . If this is less than a thr eshold d 0 , then the two rows ar e declared to be in the same clu ster and oth erwise they are declared to be in d ifferent clusters. W e app ly this process to all pairs o f rows and all pairs of column s. Let I ij be equal to 1 if rows i , j belo ng to the same cluster a nd let it b e 0 otherwise. Th e algorithm gives an estima te: ˆ I ij = ( 1 , d ij < d 0 , 0 , d ij ≥ d 0 . W e are interested in th e probability th at we make an error in row clustering averaged over the p robab ility law on the rating matrices d escribed in Section II: ¯ P e,rc = Pr  ˆ I ij 6 = I ij for some i, j  . Once th e rows are clustered, we c an ap ply the same pro cedure to cluster column s. Below we analyze th e erro r pr obability ¯ P e,rc ; th e pro bability of error in fin ding co lumn clusters has similar beh avior . Theor em 2: Suppose we are given a sequ ence of r ating matrices with n → ∞ and t n column clusters, such that lim sup n →∞ m/n < ∞ . Let µ := 2 p (1 − p )(1 − ǫ ) 2 , δ := (1 − ǫ ) 2 (1 − 2 p ) 2 and ch oose d 0 = µ + δ / 3 . Then ther e exists a positive constant C 0 such th at if t n > C 0 ln( n ) , th en ¯ P e,rc → 0 . Pr oo f: W e start by considering t he choice of the threshold. When i , j are in the same c luster , E [ d ij | I ij = 1 , X ] = 2 p (1 − p )(1 − ǫ ) 2 = µ. When i, j are in different c lusters, let s ij be the numbe r of columns in which i, j disagree. T hen E [ d ij | I ij = 0 , X ] = (1 − ǫ ) 2 n  ( p 2 + (1 − p ) 2 ) s ij + 2 p (1 − p )( n − s ij )  = µ + s ij n δ. W e cho ose d 0 = µ + α n n δ, where α n is cho sen below to ob tain diminishing p robability of er ror . First we bo und the prob ability of error wh en I ij = 1 . W e note that in this case d ij is the a verage o f n i.i.d . Bernoulli random v ariables with mean µ = 2 p (1 − p )(1 − ǫ ) 2 . Hence Pr  ˆ I ij 6 = 1   I ij = 1 , X  = Pr  d ij − µ ≥ α n n δ    I ij = 1 , X  ≤ exp  − δ 2 α 2 n µn  (9) where in the la st step we h ave used the Cher noff b ound [9, Theorem 4.4, pp. 64]. Next con sider the case I ij = 0 . In this case, d ij is the av erage of n − s ij identically distributed Berno ulli rand om variables with m ean µ and s ij identically d istributed Bernoulli random variables with m ean ν = (1 − ǫ ) 2 [ p 2 + (1 − p ) 2 ] , all the random variables bein g indep endent. So we ha ve Pr  ˆ I ij 6 = 0   I ij = 0 , X  ≤ (1 − µ + µe θ ) n − s ij (1 − ν + ν e θ ) s ij e nd 0 θ , θ < 0 (10) ≤ exp  n ( e θ − 1) β ij − nd 0 θ  , β ij = µ + δ s ij n (11) where in ( 10) we have used the Cher noff bound a nd in (1 1) we h ave used the inequ ality 1 + x ≤ exp( x ) . Choo sing θ = max(0 , ln( d 0 /β ij )) (which is the o ptimal choice), we have Pr  ˆ I ij 6 = 0   I ij = 0 , X  ≤ ( exp  n ( d 0 − β ij ) + nd 0 ln  β ij d 0  if s ij ≥ α n 1 if s ij < α n . (12) Note that f or s ij ≥ α n , we ha ve 0 ≤ ( β ij − d 0 ) /d 0 ≤ 1 , and so ln  β ij d 0  ≤ β ij − d 0 d 0 − 1 6  β ij − d 0 d 0  2 . Substituting in (12), if s ij ≥ α n , then Pr  ˆ I ij 6 = 0   I ij = 0 , X  ≤ exp  − δ 2 ( s ij − α n ) 2 6( nµ + δ α n )  . (13) T ak ing expectation in ( 12) and using (13), we get, E h Pr  ˆ I ij 6 = 0   I ij = 0 , X i ≤ Pr ( s ij ≤ α n ) + E  exp  − δ 2 ( s ij − α n ) 2 6( nµ + δ α n )  =: T 1 + T 2 . W e note that s ij = n 0 X , where X is Binomial( t n ,1/2). Thus E [ s ij ] = n 0 t n / 2 = n/ 2 an d var { s ij } = nn 0 / 4 . Thus if n 0 = o ( n ) , then s ij concentr ates arou nd its mea n. Hence to get a diminishing T 1 , we cho ose α n = n/ 3 . Then T 1 = P  s ij ≤ n 3  = P  X ≤ t n 3  ≤ P  | X − t n / 2 | ≥ t n 6  ≤ 2 exp  − t n 54  (14) where we have used th e Chernoff b ound [ 9, Corollar y 4 .6, pp. 67]. Substituting for α n in T 2 , we see that for a suitable p ositiv e constant c , T 2 = E  exp  − cn 0 ( X − t n / 3) 2 t n  = t n X s =0  t n s  2 − t n exp  − cn 0 ( s − t n / 3) 2 t n  = X | s − t n / 3 | >t n / 9  t n s  2 − t n exp  − c n 0 ( s − t n / 3) 2 t n  + X | s − t n / 3 |≤ t n / 9  t n s  2 − t n exp  − c n 0 ( s − t n / 3) 2 t n  ≤ exp ( − cn/ 8 1 ) + X | s − t n / 3 |≤ t n / 9 2 − t n 2 t n h ( s/t n ) ≤ exp ( − cn/ 8 1 ) + t n 2 − t n (1 − h (4 / 9)) . (15) From (14) and (15), it follows that E h Pr  ˆ I ij 6 = 0   I ij = 0 , X i ≤ T 1 + T 2 ≤ nc 1 exp( − c 2 t n ) . (16) where c 1 , c 2 are positi ve constants. Since there are only m ( m − 1) / 2 pairs of rows, the d esired result follows. Remark: If we consider the probability of err or in clustering for a fixed rating matrix, then to get diminishing pro bability of er ror asym ptotically , we need m 0 n 0 > C p mn ln( m ) ln( n ) . C. Estimatio n Un der Unk nown Clustering In this section, we consider our full pro blem - estima tion of the u nderly ing rating matrix f rom n oisy , sparse observations when clustering is not known. Our result is the following. Theor em 3: Consider the collection of block constant ma- trices with the probability law descr ibed in Sectio n II. L et m = β n , β > 0 fixed. Then there exist constants C i , 1 ≤ i ≤ 4 such that the f ollowing h olds for t > C 3 ln( n ) , r > C 4 ln( m ) . 1) If m 0 n 0 ≤ C 1 ln( mn ) , then fo r any estimator of X , ¯ P e → 1 as n → ∞ . 2) Consider an estimato r which first clusters the rows and columns using the algor ithm described in Section III-B and then uses ML decod ing as in Section III-A assuming that the clustering is correct. If m 0 n 0 ≥ C 2 ln( mn ) , then for this algo rithm ¯ P e → 0 as n → ∞ . Pr oo f: When A , B ar e kn own, then unde r our model all f easible ratin g matr ices are equa lly likely . Hen ce the ML decoder gives the minimum pr obability of erro r and so we have ¯ P e ≥ E [ P e |A , B ( X )] . T o p rove Part 1), we n ext lower bound E [ P e |A , B ( X )] . Let T b e th e event that s ∗ ( X ) > m 0 n 0 . W e n ote that X ∈ T iff for some p air of row c lusters all the t colu mn c lusters have be en generated equ al or for some pair of columns all the r row clusters h av e been generated equal. Using th e union bou nd, we get that, Pr ( T ) ≤  r 2  2 t +  t 2  2 r ≤ m 2 2 − t + n 2 2 − r . (17) W e c hoose C 1 , C 2 to ensure that th e above b ound decays to zero and hen ce Pr ( T ) → 0 . No w , E [ P e |A , B ( X )] ≥ E [ P e |A , B ( X ); T c ] . But on the event T c , s ∗ ( X ) = m 0 n 0 and from the lower bound in (8) we get ¯ P e ≥ E [ P e |A , B ( X )] ≥ (1 − Pr ( T ))  1 − exp  − ln(1 /ǫ )( mn ) δ (1 − δ ) ln ( mn )  (18) which → 1 as mn → ∞ . This proves Part 1 ). Next we prove Part 2 ). Let D denote th e event th at th e clustering is identified corr ectly . W e no te that the probab ility of error in estimating X av eraged over the probab ility law on the block con stant matric es satisfies ¯ P e ≤ E  P e |A , B ( X ) Pr ( D ) + Pr ( D c )  ≤ E  P e |A , B ( X )  +  ¯ P e,rc + ¯ P e,cc  where ¯ P e,cc is the prob ability of er ror in colu mn clustering . The desired result follows fr om Part 1) of Corollary 2, and Theorem 2. Remark: Th e a bove result states that for a fixed p, ǫ , th e smallest clu ster size that leads to zero error a symptotically is O (ln ( mn )) = O (ln( n )) . When p = 0 , then we can also apply the meth od in [6] to our m odel, an d this yields a smallest cluster size of O ( n 1 / 2 (ln( n )) 2 ) , which is strictly worse than our result. Remark: In [7], the focus is on r ating matrices of r ank O (1 ) and ǫ = c/ n , wh ich leads to O ( n ) ob servations. For o ur model, O (1) rank co rrespon ds to a cluster size of Θ( mn ) , and for ǫ = c/n , our algorithm can be seen to give zero err or asymptotically for any fixed rating ma trix. I V . C O N C L U S I O N W e consider ed th e pro blem of estimating a blo ck constant rating matrix . The observed matrix is ob tained thro ugh u n- known relab eling of the rows an d column s o f the underly ing matrix, followed by an err or a nd er asure ch annel. Our p rob- ability of error analysis showed that if the num ber o f row clusters and th e num ber colum n cluster s are Ω(ln ( m )) and Ω(ln( n )) respectively , then the matrix can be clu stered an d estimated with vanishing p robab ility of error i f the cluster sizes are Ω(ln ( mn )) . V . A C K N O W L E D G M E N T S The work o f Onk ar Dab eer was sup ported b y the XI Plan Project fr om TIFR and the Homi Bha bha Fellowship. The work of Bikash Kumar Dey was su pported by Bhar ti Centr e for Communication in IIT Bomb ay . R E F E R E N C E S [1] http://www .netflixpri ze.com/ [2] G. Adomaviciu s, A. Tuz hilin, “T ow ard the Next Generat ion of Rec- ommender Systems: A Survey of the State-of-the -Art and Possible Extensions, ” IEEE Tr an. Knowle dge and Data Engineeri ng , vol. 17, no. 6, pp. 734-749, June 2005. [3] A. Felfern ig, G. Friedric h, L. Schmidt-Thieme, “Guest Editor’ s Intro- duction : Recommender Systems, ” IEEE Intellig ent Systems , vol. 22 no. 3, pp. 18-21, May 2007. [4] Y ehuda Kor en, “Fac toriza tion Meets the Neighborho od: a Multifac eted Collab orati ve Filtering Model, ” ACM Int. Confer ence on Knowledge Discov ery and Data Mining (KDD’08) , 2008. [5] B. Recht, M. Fazel, P . A. Parrilo, “Guarante ed minimum rank solutions to linear matrix equatio ns via nuclear norm minimizati on, ” prepri nt (2007), submitte d to SIAM Revi ew . [6] E. J. Ca ndes, B. Re cht, “Exact Matri x Completi on via Con vex Optimiz ation, ” preprint (2008), a v ailab le at http:/ /www .acm.caltech.ed u/emmanuel/papers/MatrixCompletion.pdf [7] R. Kesha van, A. Montanari, S. Oh, “Learning lo w rank matrices from O(n) entri es, ” Allerton 2008. [8] S. Chakrab arti, “Mining the W eb, ” Morgan Kaufmann Publishers, San Fransisco, 2003. [9] M. Mitze nmacher , E. Upfal, Probabi lity and Computing: Randomize d Algorithms and Probabi listic Analysis, Cambridge Univ ersity Press, 2005.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment