Top-N recommendations in the presence of sparsity: An NCD-based approach

T op-N recomme ndations i n the presence of sparsity: An NCD-based approach Athanasi os N. Nikol ako poulo s a , b , ∗ and John D. Garofal akis a , b , ∗∗ a Department of Computer Engin eering and Informatics, University of P atras, P anepistimioupo li, GR26500 , Rio, Gr eece b Computer T echnology Institute and Pr ess “Diophantus”, P anepistimioupoli, GR26504 , Rio, Gr eece E-mail: {nikolako,gar ofala }@ceid.upatras.gr Abstract. Making recommend ations in the presence of sparsity is kno wn to present one of the most challenging problem s faced by co l laborati ve ﬁ ltering methods. In this w ork we tackle this problem by exploiting the innately hierarchical st ructure of the item space follo wing an approach inspired by the theo ry of Decomposab ility . W e vie w the itemspace as a Nearly Decomposa ble system and we deﬁne blocks of closely related elements and corresponding indirect proximity components. W e study the theo- retical properties of the decompo sition and we deriv e sufﬁcient conditions that guarantee full item space covera ge ev en in cold- start recommen dation scenarios. A comprehensi ve set of ex periments on the Mov i eLens and the Y ahoo ! R2Music datasets, using se veral widely applied performance metrics, support our model’ s theoretically predicted properties and verify that NCDREC outperforms se veral state-of-the-art algorithms, in terms of recomme ndation accuracy , di versity and sparseness insensitivity . Ke ywords: Recommender Systems, Collaborativ e Fi ltering, Sparsity , Decomposability , Marko v Chain Mo dels, Long-T ail Recommendation 1. Introduction Recommend er Sy stems (RS) are information ﬁlter - ing tools that h av e been widely a dopted over th e past decade, by the majority of e-commerce sites, in or - der to make intelligent perso nalized product sugges- tions to th eir custom ers [1,17,27]. RS techn ology en- hances user experience and it is known to incr ease user ﬁdelity to the system [39]. Correspo ndingly , from an econom ic perspective, the utilization of reco mmend er systems is known to as sist in building big ger, and more loyal customer bases, and to d rive a signiﬁcant in- crease in the volume of product s ales [21,3 7,41]. The development of recom mender systems is – in a very fundamental s e nse – based on a rather simple ob- servation: people, very often rely their every day deci- sion making on adv ise an d suggestion s provid ed by the * Correspondi ng author . E-mai l: niko lak o@ceid .upatras.gr . T el.no: +3026109 97543 ** E-mail: garofa la@ceid.upatra s. gr . T el.no: +302610997562 commun ity . For example, it is very com mon when one wants to pick a n ew movie to watch, to take in to con- sideration pu blished revie ws ab out the movie o r ask friends fo r their opinion. M imicking this behavior , rec- ommend er systems exploit the plethora o f information produ ced by the interactions of a large community of users, and try to deliv er persona lized suggestions th at aim to help an acti ve user co pe with the dev astating number of option s in front of him. Among the several different appro aches to b u ilding recommen der systems, Co llaborative Filtering (CF) is widely regarded as o ne of the most successful ones [1, 20,27, 38,40]. CF meth ods basically model both users and items as s e ts of rating s, and focus on the sp arse rat- ing matrix th at lies at the co mmon core, try ing to either estimate the missing values, o r ﬁnd promising cells to propo se (see Figure 1). In th e major ity of CF r elated work for reaso ns of mathema tical con venience (as well as ﬁtness with formal o ptimization methods), the r ec- ommend ation task reduces to predicting t he ratings for all the u nseen user-item pairs ( prediction-based meth- RECOMMENDER SYSTEM USERS ITEMS RA TINGS . . . Recomme ndation List Rating Predic tions Fig. 1. Example Recommende r System ods [1 2,26,46]). Recently , howe ver, m any leading re - searchers h av e turn ed sign iﬁcant attention to ranking- based methods which are belie ved to confo rm more naturally with how the recommender system will actu- ally be used in practice [6,11, 15,16,18, 29,35,4 8]. Despite their success in m any ap plication settings, RS techniques suffer a numb er of problems tha t r e- main to b e resolved. One of the most imp ortant such problem s arises from th e fact that o ften available data are insuf ﬁcien t fo r id entifying si milar elements an d is common ly referr ed to as th e Spa rsity Pr oblem . Sp ar- sity imposes seriou s limitatio ns to the quality of rec- ommend ations, and it is k nown to decrease signiﬁ- cantly the di versity and the e ffecti veness of CF me th- ods – especially in re commend ing un popular items ( “long tail” problem). Unf ortunately , sparsity is a n in- trinsic characteristic of r ecommen der system s because in the majority of rea listic applica tions, users ty pi- cally interact with o nly a small portion of the available items, and the pro blem is aggra vated ev e n mo re, by the fact that n ew users with no rating s at all, are r egularly added to the system ( Cold-Sta rt problem [7,36]). Among the most pro mising ap proach es in d ealing with limited coverage and sp arsity ar e graph-based methods [1 2,14,1 8,48]. Th e m ethods of this family ex- ploit t ransitive relations in the data, wh ich makes them able to estimate the re lationship between users and items th at are no t d irectly co nnected. G ori and Pucci [18] proposed Item Rank; a P ag eRank-inspir ed scoring algorithm th at pr oduces a personalized r anking vector using a r andom walk with restarts on an items’ correla- tion graph indu ced b y the ratin gs. Fouss et al. [14,15] create a graph model of the RS d atabase and they present a nu mber of methods to comp ute no de similar- ity me asures, includ ing the random walk-related aver - age Comm ute Time and a verage First Passage T ime, as well a s the pseudo-inverse of the graph’ s Laplacian. They co mpare their metho ds ag ainst other state-of -the- art gr aph-based appro aches such as, the sophisticated node similarity measure that integrates indir ect paths in the graph, based on the matrix-forest theo rem [ 9], and a similar ity measure based on the well k nown Katz algorithm [23]. Here, we attack the sparsity problem from a dif fer- ent pe rspective. Th e fact, that sparsity has b een com- monly observed in mod els of seemingly unrelated nat- urally emerging systems, suggests an ev e n mo re fu n- damental cau se behind this ph enomen on. Accord ing to Herbert A. Simon, this inheren t s parsity is intertwin ed with the structur al organization an d the ev olu tionary viability of these systems. In h is seminal work on th e architecture of com plexity [44], he argu ed that the ma- jority of sparse h ierarchically structured systems share the prop erty of having a Nearly Completely Decom- posable (NCD) architecture: they can be seen a s com- prised of a hierarchy of interconnected b locks, sub- blocks and so o n, in such a way that elements within any p articular such block relate much more vig orously with each othe r than do elemen ts belonging to differ- ent blocks, and th is property h olds between a ny two lev els of the hierar chy . The an alysis of d ecompo sable systems has bee n pi- oneered by Simon and Ando [ 45] wh o repor ted on state aggregation in linear models of economic sys- tems, but the un iv er sality and the v ersatility of Simon’ s idea have permitted the theo ry to be u sed in many com - plex p roblems from di verse disciplines ranging fr om econom ics, cognitive theory and social sciences, to computer systems per forman ce ev a luation, data min- ing and infor mation retrie val [8,10,22,30, 31,33,49]. The criteria behind the decomposition v ary with the goals of the stud y and th e n ature of the pro blem u n- der considera tion. For example, in the stochastic mod- eling literatur e, decompo sability is usually found in the time domain and the blocks are deﬁned to separa te the sho rt-term from th e long-ter m temporal dynam- ics [10,49]. In other cases the decomp osition is cho- sen to h ighlight known structural pr o perties of th e un- derlying space; for exam ple in the ﬁeld of link analy- sis, many leading researchers have explo ited th e near ly decomp osable structur e of the W eb, from a compu- tational (faster e xtraction of th e PageRank vecto r) as well as a qualitative (gen eralization of the random surfer teleportation model) perspective [8, 22,33]. In th is work 1 , building on the intuition behind NCD, we d ecompo se the item space into b locks , and we use these blocks to characterize the inter-item pr oximity in a macroscop ic level. Central to our appro ach is the idea that blen ding tog ether the direct with the indirec t inter- item relations ca n help reduce the sensiti v ity to sparse- ness and imp rove the quality o f r ecommen dations. T o this en d, we propose NCDREC , a n ovel ran king based recommen dation method wh ich: – Provides a theoretical frame work that enables the exploitation o f item spac e’ s innately d ecompos- able structure in an efﬁcient, and scalable way . – Produces re commend ations that outperf orm se v- eral state-of-the-art methods, in widely used met- rics (Section 3 .2), achieving high qua lity results ev e n in th e generally harder task o f recommend - ing long-ta il items ( Section 3.3). – Displays low sensitivity to the problem s caused by the sparsity of th e un derlying space and treats New Users more fairly ; this is supported b oth b y NCDREC’ s theoretical p roper ties (Section 2.2 .4) and our experimental ﬁndings (Section 3.4). The rest of th e pap er is organized as follows. In Section 2, af ter discussing brieﬂy the intuition beh ind the explo itation of Decomposability for recommenda - tions, we in troduce form ally our model and we study se veral of its interesting th eoretical pr operties (Sec- tion 2.2). In Section 2 .3 we p resent the NCDREC al- gorithm and we talk about its storage and c omputa- tional aspects. Our testing metho dology and experi- mental results are presented in Section 3. Finally , Sec- tion 4 conclu des this paper an d o utlines directions fo r future work. 1 This work e xtents signiﬁca ntly our initial contrib ution [32], adding detai led prese ntation of the NCDREC m odel enriched by thorough expla nations and example s, as well as rigorous theoreti cal analysi s of its constitue nts parts. F urthermore , in this paper we pro- vide a more in-depth cov erage of related litera ture includi ng thor- ough discussions of the competing state-of-the -art rec om mendati on techni ques as well as details regardi ng thei r implementation in our expe riments. 2. NCDREC Framework 2.1. Exploiting Decompo sability for Recommend ations In the method we p ropose in this work , we see the set of items as a d ecompo sable space and , following the m odeling app roach of a recently proposed W eb ranking framework [33,34], we use the deco mposition to char acterize macro- relations between the e lements of the dataset that can hopef ully reﬁne and augm ent the underly ing collaborative ﬁltering ap proach an d “ﬁll in” some of the void left by the intrinsic sparsity of the data. Th e criteria behind the d ecompo sition can vary with the particular aspects of the item space, the in- formation av ailable etc. For example, if one wants to recommen d hotels, th e b locks m ay be deﬁned to de - pict geograph ic information ; in the m ovie reco mmen- dation p roblem, the b locks may co rrespond to th e cat- egorization of m ovies into gen res, or o ther m ovie at- tributes etc. T o give our framework maximu m ﬂexibil- ity , we extend the notion to allow overlapping block s; intuitively this seems to be particula rly useful in many modeling appro aches and r ecommen dation problem s. Before we proceed to the rigor ous deﬁnition of the NCDREC fram ew or k, we outline brieﬂy our ap- proach : First, we deﬁne a decomp osition, D , of the item space into blocks an d we introdu ce the notion o f D - proxim ity , to c haracterize the implicit in ter-le vel re- lations between the items. The n, we translate this pro x- imity n otion to suitably deﬁn ed m atrices tha t qua ntify these macroscopic inter -item relations under the prism of the c hosen d ecompo sition. These matr ices need to be ea sily han dleable in o rder fo r our m ethod to be ap- plicable in realistic scenarios. Fu rthermo re, their con- tribution to the ﬁnal mo del needs to be we ighted care- fully so as not to “ov ershadow” the pure collabora- ti ve ﬁltering p arts of the model. In a chieving these, we follow a n app roach based on pe rturbing the stan- dard CF p arts, using suitably deﬁne d lo w- rank matri- ces. Finally , to ﬁght the ine v itably extreme and local- ized sparsity r elated to cold start scenar ios we create a Markov c hain-based subcomp onent, d esigned to in- crease the percentage of the item space covered b y the produ ced re commend ations, and we study the condi- tions (in ter ms of th eoretical pro perties o f the prop osed decomp osition) u nder which full item space coverage is guarantee d. 2.2. NCDREC Model and Theor etica l Pr operties 2.2.1. Notation All vectors are represented by bold lower case letters and they are column vectors ( e.g., ω ). All matrices are represented by bo ld upper case le tters (e.g. , W ). The i th row and j th column of matrix W are d enoted w ⊺ i and w j , respecti vely . The ij th element of m atrix W is denoted [ W ] ij . W e use dia g( ω ) to den ote the m atrix having vector ω o n its d iagonal, and ze ros else wh ere. W e use calligrap hic letters to deno te sets (e.g., U , V ). Finally , symbol , is used in deﬁnitio n statements. 2.2.2. Deﬁnitions Let U = { u 1 , . . . , u n } be a set of users , V = { v 1 , . . . , v m } a set of items an d R a set o f tuples R , { t ij } = { ( u i , v j , r ij ) } , (1) where r ij is a nonnegative n umber referr ed to as the rating gi ven by user u i to the item v j . For each user in U we assum e he has rated at le ast one item; similarly each item in V is assumed to have been rated by at least one user . W e d eﬁne an associated user-item rating mat rix R ∈ R n × m , who se i j th element equ als r ij , if t ij ∈ R , and zero otherwise. For each user u i , w e deno te R i the set of items rated by u i in R , and we deﬁne a pref- erence vector ω , [ ω 1 , . . . , ω m ] , whose n onzero el- ements contain th e u ser’ s ratings that are included in R i , normalized to sum to one. W e consider an indexed family of no n-empty sets D , {D 1 , . . . , D K } , (2) that deﬁnes a D - decompositio n o f the underly ing space V , such that V = S K k =1 D k . Each set D I is re- ferred to as a D - block , and its elements are considered related according to some criterion. W e deﬁne D v , [ v ∈ D k D k (3) to be the p roximal set of items o f v ∈ V , i.e. t he union of the D -block s that contain v . W e u se N v to denote the number of different blocks in D v , and n ℓ u i , |{ r ik : ( r ik > 0) ∧ ( v k ∈ D ℓ ) }| (4) for the number of i tems rated by user u i that belong to the D - block, D ℓ . Every D -decom position is also asso- ciated with an undir ected graph G D , ( V D , E D ) (5) Its vertices corresp ond to the D -blo cks, and an edge between two vertices exists whenever the intersection of these block s is a n on-em pty set. This graph is re- ferred to as th e block co upling graph fo r the D - decomp osition. Finally , with every D - decompo sition we associate an Aggregat ion matrix A D ∈ R m × K , whose j k th element is 1, if v j ∈ D k and zero otherwise. 2.2.3. Main Compon ent The pursuit of rankin g-based recommenda tions, grants u s the ﬂexibility of not caring ab out the exact recommen dation sco res; only the cor rect item ordering is n eeded. Th is allows us to manipulate the missing values of the rating matrix in an “informed ” way so as to introduce some preliminary o rdering based on the user’ s expressed op inions about some items, an d the way these items relate with the rest of the item space. The existence of such co nnection s is r ooted in the idea tha t a u ser’ s ratin g, except for exp ressing his d i- rect op inion about a p articular item, also gives a clue about his opin ion r egarding the proximal set of this item. So, “pr opagating ” th ese o pinions throug h the de- composition to the many related elements of the item space, can hopefully reﬁne the estimation of his pref- erences r egarding the vast fraction of the item set fo r which he has not expressed o pinions, and in troduce an orderin g between the zero s in th e rating m atrix, that will hopefu lly relieve sparsity r elated problems. Having this in min d, we per turb the u ser-item rating matrix R , with an NCD preferences matrix W tha t propag ates the expressed user opinions about particu- lar items to the pro ximal sets. The resulting matrix is giv en by: G , R + ǫ W , (6) where ǫ is a positi ve parameter, chosen small so as not to “eclipse” the actual ratin gs. The NCD preferences matrix is forma lly deﬁned belo w: NCD Preferences Ma trix W . The NCD prefer ences matrix, is deﬁned to pr opagate each user’ s ratings to the m any related elem ents (in the D -d ecompo sition sense) of the item sp ace. F o rmally , ma trix W is de- ﬁned as follows: W , ZX ⊺ (7) where matrix X den otes th e r ow nor malized version of A D , and the ik th element of matr ix Z eq uals ( n k u i ) − 1 [ RA D ] ik , when n k u i > 0 , and zer o otherwise. The ﬁnal recomme ndation vectors are p roduced by projecting the perturb ed data on to an f -dimen sional space. In particular , the ﬁnal r ecommen dation vectors are deﬁned to be the rows of matrix Π , U f Σ f V ⊺ f , (8) where m atrix Σ f ∈ R f × f is a diag onal matrix con- taining the ﬁrst f singula r values o f G , and m atrices U f ∈ R n × f and V f ∈ R m × f are orthonorma l matri- ces containing the corresp onding left and right singular vectors. Remark 1. In fact, the recomme ndation vectors p ro- duced by Eq . (8) can b e seen as ar ising from a lo w d i- mensional eigenspac e of an NCDaware inter-item sim- ilarity matrix. W e discuss this further in Append ix A. 2.2.4. ColdStart Componen t In some cases the sparsity phenom enon becomes so intense an d localized th at the perturbation of the rat- ings thro ugh matrix W is n ot enou gh. T ake for exam- ple newly emerging users in an existing recomm ender system. Naturally , bec ause these users are new , th e number of r atings they introd uce in the RS is usually not suf ﬁcie nt to b e able to make reliable recommenda- tions. If one tak es into acc ount only their direct inter- actions with th e items, the reco mmenda tions to these newly ad ded users are very likely to be restricted in small subsets of V , lea v ing the majority of the item space uncovered. T o address this pro blem which rep resents one o f the continuin g dif ﬁcu lties faced by recommend er systems in operation [ 7], we create a C O L D S TA RT subc ompo- nent based o n a discre te Markov chain mod el over the item space with transition p robability matrix S , de- ﬁned to bring together the dir ect as well as the de- composab le stru cture of the un derlying space. Matrix S is d eﬁned to con sist of three componen ts, namely a rank-one preference matr ix e ω ⊺ that rises from the explicit rating s of the user as pr esented in the train ing set; a direct proximity matrix H , that depicts the di- rect inter-item relations; and an NCD proximity ma - trix D that relates every item with its prox imal sets. Concretely , matrix S is given by: S , (1 − α ) e ω ⊺ + α ( β H + (1 − β ) D ) (9) with α and β being p ositiv e real numbers fo r wh ich α, β < 1 holds. P arameter α contro ls ho w frequ ently the Markov chain “restarts” to the p reference vector, ω , whereas parameter β weights the in volvement o f the Direct and the NCD Proximity matrices in the ﬁnal Markov chain model. The personalize d ranking vector for each ne wly add ed user is de ﬁned to b e the station- ary p r obab ility d istrib u tion of the Ma rkov ch ain that correspo nds to the stochastic matrix S , u sing the nor- malized ratings of the user as the initial distribution. Direct Proximity Mat rix H . The direct p roximity ma- trix H is design ed to captu re the d irect relations between the elements of V . Generally , e very such element will be associated with a d iscrete distri- bution h v = [ h 1 , h 2 , · · · , h m ] over V , that re- ﬂects the co rrelation b etween these elements. In our case, we u se the stochastic m atrix deﬁned a s follows: H , diag( Ce ) − 1 C (10) where C is an m × m matrix whose ij th element is deﬁned to be [ C ] ij , r ⊺ i r j for i 6 = j , zero otherwise, and e is a properly sized unit vector . NCD Proximity Matrix D . The NCD pro ximity ma- trix D is cr eated to d epict the interlevel co nnec- tions betwe en the elemen ts of the item sp ace. In particular, each row of matrix D deno tes a prob- ability vector d v , that distributes evenly its m ass between the N v blocks of D v , and then , uni- formly to the inclu ded items of eac h block. For- mally , matrix D is de ﬁned by: D , XY (11) where X , Y deno te th e row n ormalized versions of A D and A ⊺ D respectively . Lemma 1. Matrices H , D ar e well de ﬁned r ow stochastic matrices. Pr oof. W e will begin with matrix H . Fir st, notice that for matrix H to be well deﬁned it is n ecessary diag( Ce ) to be invertible. But this is assured by our model’ s ass u mption th at every item ha ve been rated by at least o ne u ser . Indeed, when this ass umption holds, ev e ry row of matrix C denotes a non-zero vector in R m , thus Ce deno tes a vector of strictly po siti ve ele- ments, which makes the diagonal matrix diag( Ce ) in - vertible, as needed. For matrix D it suf ﬁces to show that for any D - decomp osition, every column and every row of the cor- respond ing aggregation ma trix A D , den ote non-zer o u 1 u 2 u 3 u 4 u 5 u 6 u 7 u 8 u 9 u 10 D 1 D 2 D 3 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 = ⇒ W =                      0 1 2 5 2 5 4 7 4 1 1 7 4 0 0 5 5 2 5 2 0 0 5 2 2 5 2 7 2 11 4 13 4 3 3 13 4 3 3 2 5 4 5 2 0 0 5 2 9 2 19 4 9 2 9 2 19 4 5 5 19 4 0 5 2 5 5 2 5 5 5 5 4 7 2 3 7 2 3 3 3 3 5 5 7 2 17 4 17 4 5 5 17 4 1 1 2 3 2 5 4 3 4 0 0 3 4 5 5 5 5 5 5 5 5                      Fig. 2. W e see matrix W that corresponds to the Example 1. vectors in R m and R K respectively . The latter is en- sured from the fact that NCD blocks are deﬁned to be non-em pty , whereas the for mer condition holds be- cause the un ion of the D -block s deno te a cover of the itemspace. Example 1. T o clarify the deﬁnition of the NCD ma- trices W , D , we gi ve the following example. Co nsider a simp le movie recom mendatio n system co nsisting o f an itemspace o f 8 m ovies and a userspace of 10 users each ha ving rated at least one m ovie. Let the set of rat- ings, R , be the one presented below: R =                        ( u 4 , v 1 , 1) , ( u 7 , v 1 , 4) , ( u 8 , v 1 , 5) , ( u 10 , v 1 , 5) , ( u 5 , v 2 , 5) , ( u 1 , v 3 , 4) , ( u 2 , v 3 , 5) , ( u 8 , v 3 , 2) , ( u 9 , v 3 , 2) , ( u 10 , v 3 , 5) , ( u 3 , v 4 , 2) , ( u 4 , v 4 , 5) , ( u 5 , v 4 , 4) , ( u 9 , v 4 , 1) , ( u 1 , v 5 , 1) , ( u 5 , v 5 , 5) , ( u 6 , v 5 , 5) , ( u 7 , v 5 , 3) , ( u 3 , v 6 , 3) , ( u 10 , v 6 , 5) , ( u 3 , v 7 , 1) , ( u 3 , v 8 , 5) , ( u 6 , v 8 , 5) , ( u 8 , v 8 , 5) ,                        (12) Assume also that the 8 movies of th e itemspa ce be- long to 3 genres as seen below:             D 1 D 2 D 3 N v v 1 X − − 1 v 2 X − X 2 v 3 − X − 1 v 4 X X − 2 v 5 − X X 2 v 6 − − X 1 v 7 − − X 1 v 8 − X X 2             (13) The co rrespond ing aggregation matrix A D ∈ R 8 × 3 is A D =             1 0 0 1 0 1 0 1 0 1 1 0 0 1 1 0 0 1 0 0 1 0 1 1             (14) Follo w ing th e d eﬁnition of m atrix W we get the m a- trix shown in Figu re 2. For th e factor matric es Z , X we have: Z =                 0 5 2 1 0 5 0 2 7 2 3 3 5 0 9 2 9 2 5 0 5 5 4 3 3 5 7 2 5 1 3 2 0 5 5 5                 , X =             1 0 0 1 2 0 1 2 0 1 0 1 2 1 2 0 0 1 2 1 2 0 0 1 0 0 1 0 1 2 1 2             Similarly , in Figure 3 we giv e the d etailed compu ta- tion of the inter -item NCD Pro ximity matr ix D of the C O L D S TA RT component. 2.2.5. Theoretical P r operties of the ColdStart Subco mponent Inform ally , the introdu ction of th e NCD proximity matrix D , he lps th e item space become more “con - v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 D 1 D 2 D 3 v 1 v 2 v 3 v 4 v 5 v 6 v 7 v 8 1/2 1/2 1/2 1/3 1/4 1/5 = ⇒ D =                     1 3 1 3 0 1 3 0 0 0 0 1 6 4 15 0 1 6 1 10 1 10 1 10 1 10 0 0 1 4 1 4 1 4 0 0 1 4 1 6 1 6 1 8 7 24 1 8 0 0 1 8 0 1 10 1 8 1 8 9 40 1 10 1 10 9 40 0 1 5 0 0 1 5 1 5 1 5 1 5 0 1 5 0 0 1 5 1 5 1 5 1 5 0 1 10 1 8 1 8 9 40 1 10 1 10 9 40                     Fig. 3. W e see the matrix D that corresponds to Example 1. W e highlig ht with red and green color the computat ion of the [ D ] 42 and [ D ] 85 , respect i vely . nected”, allowing the re commend er to reach mo re items even for the set o f newly added users. When the blocks are overlapping this effect b ecomes stronger , and in fact, item space coverage can be guarantee d un - der certain conditions. Theorem 1 (Item Space Coverage) . If th e block cou- pling graph G D is co nnected, there exists a uniq ue steady state distrib u tion π o f the Marko v chain cor - r espon ding to matrix S th at depe nds on the p r efer- ence vec tor ω ; ho wever , irr espectively of an y p articu- lar such v ector , the supp ort of this distribution in cludes every item of the underlying space. Pr oof. Before we pr oceed to the actual p roof, we will give a small sketch: When G D is con nected, the Markov chain induced by the stochastic matrix S con- sists o f a single irredu cible and ap eriodic closed set of states, that inclu des all the items. T o pr ove the ir - reducibility part, we will show th at the NCD proxim - ity stocha stic matrix, th at corr esponds to a conn ected block coupling graph, en sures that starting from any particular state of th e chain, there is a po siti ve prob- ability o f reaching every other state. For the aper iod- icity part we will sh ow tha t matrix D ma kes it possi- ble, for th e Markov chain to retu rn to any given state in con secutiv e time epochs. Th e above is tr ue fo r every stochastic vector ω , and f or every po siti ve real num- bers α, β < 1 . Lemma 2 . The co nnectivity of G D implies the irr e- ducibility of the Markov chain with transition p r oba- bility matrix D . Pr oof. From th e decom position theor em o f Markov chains we kn ow that the state space S can b e par ti- tioned uniqu ely as S = T ∪ C 1 ∪ C 2 ∪ · · · (15) where T is the set of transient states, an d the C i are irreducib le closed sets of persistent st ates [19]. Furthermo re, since S is ﬁnite at least one state is p er- sistent and all persistent states are n on-nu ll ( see [19], page 22 5). W e will prove that the conn ectivity of G D alone, ensures that startin g from this state i , we can visit every oth er state of the Markov chain. In other words, the co nnectivity of G D implies th at T = ∅ and there exists only on e irre ducible clo sed set of p ersistent states. Assume, for the sake of contradiction, that G D is connected and there exists a state j outside the set C . This, by deﬁnition , mean s tha t there exists n o path that starts in state i and ends in state j . Here we will show that when G D is conn ected, it is always p ossible to construc t such a path. Let v i be the item corr espondin g to state i an d v j the item corre- sponding to state j . Le t D v i the proximal set of items of v i . W e must have one of the following cases: v j ∈ D v i : In this case, the states are directly con- nected, and Pr { next is j | we ar e in i } eq uals: [ D ] ij = X D k ∈ D v i ,v j ∈D k 1 N v i |D k | (16) which can be seen by Eq. (11) tog ether with the deﬁnitions of Section 2.2.2. v j / ∈ D v i : In this case, the states are not directly connected . Let D v j be a D -blo ck that contains v j , and D v i a D -blo ck that contains v i . No tice th at v j / ∈ D v i implies that D v i ∩ D v j = ∅ . Ho w- ev e r , since G D is assumed con nected, there exists a sequence o f vertices co rrespond ing to D - blocks, that fo rms a p ath in the block co upling gra ph be- tween no des D v i and D v j . Let th is sequ ence be the one below: D v i , D 1 , D 2 , . . . , D n , D v j (17) Then, choo sing arbitr arily one state that cor re- sponds to an item belonging to each o f the D - blocks of the above sequen ce, we get the se- quence of states: i, t 1 , t 2 , . . . , t n , j (18) which correspo nds to th e sequence of items v i , v t 1 , v t 2 , . . . , v t n , v j (19) Notice that the deﬁn ition o f the D -block s togeth er with the deﬁnitions of the proxim al sets and the block coupling graph, imply that this sequence has the p roperty every item, after v i , to b elong to the proximal set of the item preceding it, i.e. v t 1 ∈ D v i , v t 2 ∈ D v t 1 , . . . , v j ∈ D v t n (20) Thus, the con secutiv e states in seq uence (18) commun icate, or i → t 1 → t 2 → · · · → t n → j (21) and ther e exists a positive probability path be- tween states i an d j . In concussion, when G D is connected there will al- ways be a path starting from state i and ending in state j . But becau se state i is persistent, an d belongs to the irreducib le closed set of states C , state j b elongs to the same irreduc ible closed set of states too. This contra- dicts our assumption. Thus, when G D is connected e v - ery state belo ngs to a sin gle irreducib le closed set of states, C . Now it remains to prove the aperiodicity prop erty . Lemma 3. The Ma rkov chain induced by matrix D is aperiodic. Pr oof. It is known that the p eriod of a state i is deﬁned as the greatest comm on di visor of the epochs at which a return to the state is possible [19]. Thus, it suf ﬁces to show that we can re turn to any given state in consecu - ti ve time epochs. But this can be seen readily b ecause the diag onal elements of matr ix D are by deﬁnitio n, all greater than zero; thus, for any state and for e very pos- sible trajectory of the Markov chain of length k th ere is an other one of len gth k + 1 with the same starting and end ing state, that follo ws the self loop as its ﬁ- nal step. In other words, leaving any giv e n state of the correspo nding Mar kov chain , one can al ways return in consecutive time ep ochs, which makes th e chain ape- riodic. And the proof is complete. W e ha ve shown so far that the connectivity of G D results is en ough to en sure the irre ducibility and aperi- odicity of the Markov chain with transition prob ability matrix D . It remains now to prove that the same thing ho lds for the complete stoch astic matrix S . This can b e done using the following useful lemma, the proof of which can be found in the Appendix B. Lemma 4. If A is the tr an sition matrix of an irr e- ducible and a periodic Ma rkov chain with ﬁnite state space, a nd B the tr ansition ma trix of any Marko v chain deﬁ ned onto th e sa me state space, then matrix C = κ A + λ B , where κ, λ > 0 such that κ + λ = 1 de- notes the transition matrix of an irr educible and ape- riodic Markov chain also. Applying Lemma 4 twice, ﬁrst to matrix: T = β H + (1 − β ) D (22) and then to matrix: S = (1 − α ) e ω ⊺ + α T (23) giv es us the irreduc ibility and the aperiodicity of ma- trix S . T aking into accoun t the fact that the state space is ﬁnite, the resulting Markov chain becomes er- godic [1 9] and th ere exists a unique reco mmendatio n vector correspo nding to its s te ady state probability dis- tribution which is gi ven by π = [ π 1 π 2 · · · π m ] = [ 1 µ 1 1 µ 2 · · · 1 µ m ] (24) where µ i is th e m ean recu rrence time of state i . How- ev e r , for ergodic states, by deﬁnition it holds that 1 ≤ µ i < ∞ (25) Thus π i > 0 , fo r all i , an d th e supp ort of the d istribu- tion that d eﬁnes the recommendation vector inc ludes ev e ry item of the underlying space. The above theorem suggests that even for a user who have rated only one item , when the ch osen decom- position enjoys the ab ove property , our recommen der ﬁnds a way to assign pr eference prob abilities for the complete item space. Note th at the criterion for this to be tru e is n ot th at restricti ve. For example for th e MovieLens datasets, u sing as a criterio n of dec om- position the categorization of movies in to gen res, the block coupling graph is co nnected. Th is, proves to be a very useful property , in dealin g with the cold-start problem as we will see in the exper imental evaluation presented in Section 3.4. 2.3. NCDREC Algorithm: Storage an d Computation al Issues It is clear that f or the majority of reason able de- composition s the number of blocks is much smaller than the card inality of the item space, i.e. K ≪ m ; this makes m atrices D and W , e x tremely low-rank. Thus, if we take into accou nt the in herent spar sity of the ratings matrix R , and of the com ponent m atrices X , Y , Z , we see that the storag e needs of NCDREC are in fact modest. Furthermo re, the fact that matrices G and S can be expressed as a sum of sparse and low-rank compo- nents, can also be exploited computationally as we see in the NCDR EC algorithm presented above. Our algo- rithm makes sure th at th e comp utation of the re com- mendation v ectors c an be carr ied o ut without n eeding to explicitly compute matrices G and S . The com putation of the singular triplets is based on a fast partial SVD method p roposed by Baglama and Reighel i n [5]. Ho wever , beca use their method p resup- poses the existence of the ﬁnal matrix, we modiﬁed the partial Lanczos bidiagonalization iterati ve proce- dure to tak e advantage o f the f a ctorization of the NCD preferen ces matrix W into matrices X , Z . The detailed computatio n is presented in the N C D _ P A RT I A L L B D proced ure in Algorithm 1. F or the computation o f the newly ad ded users’ recommen dations, we collect their preferen ce vectors in an extrem ely sparse matrix Ω , and we compu te their station ary distributions using a batch power method approach e xploiting matrices X , Y . Notice that th e e x ploitation o f the factorization of the NCD matrices in both pro cedures r esults in a signiﬁcant d rop of the nu mber of ﬂoating po int oper- ations per iteration, since every d ense Matrix × V ector (MV) multiplication, is now rep laced by a sum o f lower dimensional and sparse MV’ s, makin g the over- all method signiﬁcantly faster . 3. Experimental Evaluation In order to ev alu ate th e p erform ance of NCDREC in recommen ding to p-N lists of items, we ru n a number of expe riments using two real datasets: the Yahoo!R2Musi c , which represents a real snapshot of the Y aho o! Mu sic community’ s preferences f or v ar- ious songs, and the st andard MovieLens (1M and 100K) datasets. Th ese d atasets also co me with infor- mation that relates the items to genres; this was chosen as the criterio n of decomposition behind the deﬁnition of matrices D and W . For further details about these datasets see http://websco pe.sandbox.y ahoo.com and http://g rouplens.org / . A sy nopsis of their basic charac teristics is presented in T able 1. Exploiting meta-info rmation is a very u seful we apon in alleviating sparsity related pro blems [13]. Th us, in order to provid e fair co mparisons we test our method against recomme ndation methods that: (a) can also take advantage of the categorizatio n of items to genres and, (b) are known to show lower sensitivity to the prob- lems of limited coverage and sp arsity [13]. In p articular, we ru n NCDREC 2 against ﬁve state-of- the-art graph-based ap proach es; th e node similarity al- gorithms L † , and K atz ; th e random walk appro aches First Passage Time (FP) and Commute T ime (CT) and the Matrix Forest Algo rithm (MF A). 3.1. Competing Recommend ation Methods The data mod el used for all the c ompeting meth- ods is a grap h rep resentation of the reco mmender 2 The perturbat ion paramete r ǫ was s et to 0.01, the number of la- tent f actors was selected from the rang e 2 to 800, and the C O L D - S TA R T subcomponent parameters were chosen to be α = 0 . 01 and β = 0 . 75 . Algorithm 1 NCDREC Algorithm Input: Matrices R ∈ R n × m , H ∈ R m × m , X ∈ R m × K , Y ∈ R K × m , Z ∈ R n × K . Parameter s α, β , f , ǫ Output: The ma trix with recom mendation vectors for e very user, Π ∈ R n × m Step 1: Find the newly add ed u sers and co llect their preferen ce vectors into matrix Ω . Step 2: Compute Π sparse using the C O L D S TA RT pro- cedure. Step 3: Initialize vector p 1 to be a ra ndom unit length vector . Step 4: Compute the modiﬁed La nczos procedure up to step M , using N C D _ P A RT I A L L B D with starting vector p 1 . Step 5: Comp ute the SVD o f the bid iagonal ma trix B to extract f < M approximate singular triplets: { ˜ u j , σ j , ˜ v j } ← { Qu ( B ) j , σ ( B ) j , Pv ( B ) j } Step 6: Orthogonalize ag ainst the approximate sin- gular vectors to get a ne w starting vector p 1 . Step 7: Continue the Lanczos proced ure fo r M mor e steps using the new startin g vector . Step 8: Check for convergence toleran ce. If met com - pute matrix Π full = ˜ UΣ ˜ V ⊺ else go to Step 4 . Step 9 : Update Π full , replac ing the rows that corre- spond to new u sers with Π sparse . return Π full procedure N C D _ P A RT I A L L B D ( R , X , Z , p 1 , ǫ ) φ ← X ⊺ p 1 ; q 1 ← R p 1 + ǫ Z φ ; b 1 , 1 ← k q 1 k 2 ; u 1 ← q 1 /b 1 , 1 ; for j = 1 to M do φ ← Z ⊺ q j ; r ← R ⊺ q j + ǫ X φ − b j,j p j ; r ← r − [ p 1 . . . p j ] ([ p 1 . . . p j ] ⊺ r ) ; if j < M then b j,j +1 ← k r k ; p j + 1 ← r /b j,j +1 ; φ ← X ⊺ p j + 1 ; q j + 1 ← R p j + 1 + ǫ Z φ − b j,j +1 q j ; q j + 1 ← q j + 1 − [ q 1 . . . q j ] ([ q 1 . . . q j ] ⊺ q j + 1 ) ; b j +1 ,j +1 ← k q j + 1 k ; q j + 1 ← q j + 1 /b j +1 ,j +1 ; end if end for end procedure procedure C O L D S TA RT ( H , X , Y , Ω , α, β ) Π ← Ω ; k ← 0 ; r ← 1; while r > tol and k ≤ maxit do k ← k + 1 ; ˆ Π ← αβ ΠH ; Φ ← ΠX ; ˆ Π ← ˆ Π + α (1 − β ) ΦY + (1 − α ) Ω ; r ← k ˆ Π − Π k ; Π ← ˆ Π ; end while return Π sparse ← Π end procedure system database. Concre tely , consider a weig hted graph G with nod es correspon ding to d atabase el- ements and database links cor respondin g to edg es. For examp le, in the M ovieLens datasets each el- ement of the people set, the mov ie set, and the movie_catego ry set, corresponds to a node of the graph, and each has_watched and belongs_to link is expressed as an edge [14,15]. Generally spe aking, g raph-b ased recommen dation methods w ork by comp uting similarity measure s be- tween e very elem ent in the recommender database and then u sing these measures to com pute ran ked lists of the items with respect to each user . The pseudo in ve rse of the gr aph’s Laplacian ( L † ). This matrix co ntains th e inner p roducts of th e n ode vectors in a Euc lidean space where the no des are ex- actly separ ated by the commute time distance [15 ] . For the computation of the L † matrix we used the f ormula: L † = ( L − 1 n + m + K ee ⊺ ) − 1 + 1 n + m + K ee ⊺ (26) where L is the Laplacian of th e grap h model of the recommen der sy stem, n , the n umber of users, m , the num ber of items, and K , the number of bloc ks (see [14] for details). The MF A similarity matrix M . MF A matrix con- tains elements that also provide similarity measures between n odes of the g raph by integrating indirect paths, based on the m atrix-for est theorem [9]. Matrix M was computed by M = ( I + L ) − 1 (27) T able 1 Dataset s Dataset #Users #Items #Ratings Density MovieLens100 K 943 1682 100,0 00 6.30% MovieLens1M 6,040 3,883 1,0 00,209 4.26 % Yahoo!R2Musi c 1,823,1 79 136,7 36 7 17,87 2,016 0.29% where I , the identity matrix. The Katz similarity matrix K . Katz similarity matrix is computed by K = α A + α 2 A 2 + · · · = ( I − α A ) − 1 − I (28 ) where A is the adjacency ma trix of the graph a nd α measures the attenuation in a link (see [23]). A verag e F irst P assage T imes. The A verage First Pas- sage Time scores are c omputed by iterativ e ly solving the recurren ce  FP( k | k ) = 0 FP( k | i ) = 1 + P n + m + K j =1 p ij FP( k | j ) , for i 6 = k (29) where p ij is the conditio nal pro bability a rand om walker in the graph G , visits n ode j next, gi ven that h e is currently in node i . A verag e Commute T imes. Finally , A verag e Commute T ime s scores can be obtained in terms of the A verage First-Passage T imes by: CT( i, j ) = FP( i | j ) + FP( j | i ) (30) For fu rther details a bout the com peting algorithm s see [15,14,9, 23] and the referen ces therein. 3.2. Quality of Recommend ation T o evaluate the qu ality of o ur m ethod in sug gesting top-N items, we have ado pted the methodology used in [11]. In particular, we ran domly samp led 1.4 % of the r atings o f the dataset in order to create a probe set P , a nd we u se each item v j , rated with 5-star by user u i in P to f orm the test set T . Finally , for e ach item in T , we rando mly select anoth er 1000 u nrated items of the same user and we rank the 100 1 item lists using the different method s mention ed an d we e valuate the quality of recommen dations. For this e valuation, excep t for the standar d Recall and Precision metrics [4,11], we also use a nu mber of other well known ranking measur es, which discount the utility of reco mmend ed items dep ending on their position in the recomme ndation list [6,42]; nam ely the R-Score , the Normalized Disco unted C umula- tive G ain a nd the Mean Reciprocal Rank metrics. R- Score assumes th at th e value of r ecommend ations de- cline exponentia lly fast to yield for each u ser the fo l- lowing score: R( α ) = X q max( y π q − d, 0) 2 q − 1 α − 1 (31) where α is a half-life parameter which contro ls the ex- ponen tial de cline, π q is the index of the q th item in th e recommen dation rankin g list π , and y is a vector of the relev an ce values fo r a sequence of items. In Cumu- lati ve Discounted Gain the rank ing p ositions are d is- counted logarithmically and is deﬁned as: DCG@ k ( y , π ) = k X q =1 2 y π q − 1 log 2 (2 + q ) (32) The Norma lized Discoun ted Cum ulativ e Gain can then be deﬁned as: NDCG@ k = DCG@ k ( y , π ) DCG@ k ( y , π ⋆ ) (33) where, π ⋆ is the best po ssible ordering of the items with respect to the relevant s cores (see [6] for details). Finally , Mean Reciprocal Rank (MRR) is the a verag e of each u ser’ s r eciprocal rank scor e, de ﬁned as f ollows: RR = 1 min q { q : y π q > 0 } (34) MRR decays mo re slo wly than R-Score but f aster than NDCG. Figure 4 repor ts the perform ance of the algo rithms on the Recall, Precision and NDCG metrics. In par - 5 10 15 20 0 0 . 2 0 . 4 0 . 6 Recall@N Movie Lens1 M 5 10 15 20 0 0 . 2 0 . 4 0 . 6 Yahoo !R2Mu sic 0 0 . 2 0 . 4 0 . 6 0 . 8 1 .05 .10 .15 .20 Precisio n/Recall 0 0 . 2 0 . 4 0 . 6 0 . 8 1 .05 .10 .15 5 10 15 20 0 0 . 2 0 . 4 0 . 6 NDCG@N 5 10 15 20 0 0 . 2 0 . 4 0 . 6 NCDREC Katz FP MF A L † CT Fig. 4. Recommenda tion quali ty on Mo vieLens1M and Yahoo!R2Mus ic datasets using Recall@N, Precision and NDCG@N metrics ticular , we repo rt the average Recall as a fu nction of N (f ocusing on the range N = [1 , . . . , 20] ), the Pre- cision at a gi ven Recall, and the NDCG@ N , for the MovieLens1M (1st column ) and Yahoo!R2Music (2nd colum n) datasets. As we can see NCDREC out- perfor ms all other method s, reaching for example at N = 10 , a Recall aro und 0.53 on MovieLens and 0.45 on the sparser Yahoo!R2M usic dataset. Sim- ilar b ehavior is observed for the Precision and the NDCG metrics as well. T able 2 presents the re sults for the R-Scor e (with halﬂife p arameters 5 and 10) and the MRR metrics. Again we see that NCDREC achieves the be st results with MF A and L † doing signiﬁcantly better than the other graph-b ased approaches in the sparser dataset. Finally , fo r co mpleteness, we also r un NCDREC on the standard MovieLens100K dataset using the publicly av ailable 5 pr edeﬁned splittings into train - ing an d test sets. Here, we u se the Degree o f Agre e- ment metric (a variant of Som er’ s D statistic 3 , that have bee n used by many autho rs for the perf or- mance ev a luation of rankin g-based r ecommen dations on MovieLen s100K ) in order to allo w direct co m- parisons with the different r esults to b e found in th e literature [15,16, 18,28,51]. NCDREC obta ined a m acro-averaged DOA scor e of 92.25 and a micro- av er aged DOA o f 90.74 which is – to the b est of our kn owledge – the h ighest scor es achieved thus far on this benchm ark dataset. 3.3. Long-T ail Recommendatio n It is well known that the distribution of rated item s in re commend er systems is long-tailed, i.e. the major- ity of th e r atings is co ncentrated in a few very p opu- lar items. Of course, r ecommen ding po pular items is generally considered an easy task and add s very little utility in recommender sy stems. On the other han d, the task of rec ommend ing long-tail items adds novelty and ser endip ity to the users [11], and it is also kn own to increase the proﬁts of e-commen ce co mpanies signif- icantly [ 2,50]. The in herent spar sity o f the data how- ev e r – which is mag niﬁed even mo re for l ong tail item s – pr esents a major c hallenge fo r mo st state-of-th e-art collaborative ﬁltering metho ds. In order to e valuate NCDREC in recom mending long-tail items we adopt the m ethodolo gy described in [11]. In particu lar , we order the items accordin g to 3 W e giv e a detaile d deﬁnition of the DOA metric in Section 3.4 where we also present other ranking stabil ity metric s. their popular ity (th e popularity w as me asured in terms of number o f ratings) a nd we furth er p artition th e test set T into two subsets, T head and T tail , that in volve items o riginated fro m the short head, an d the long tail of the distribution, r espectiv ely . W e discard the popu- lar items and we evaluate N CDREC and the other al- gorithms on the T tail test set, using the p rocedu re ex- plained in the p revious section. Figure 5 and T able 3 report the results. W e see th at NCDREC achieves again the best re- sults, managing to retain its pe rforman ce in all me t- rics and fo r bo th datasets. Notice here the sign iﬁcant drop in quality of the random walk based methods, which were fou nd to behave very well in th e stan dard recommen dation scenario. This ﬁndin g indicates their bias in recommend ing popular items. M F A an d L † on the other hand, do particularly well, exhibiting great ability in un covering non-trivial relations between the items, especially in the sparser Yahoo! R2Music dataset. 3.4. Recommend ations for Newly Emer ging Users One very common manifestation of sparsity faced by r eal r ecommen der systems is the New-Users P r ob- lem . This p roblem re fers to the difﬁculty of achie ving reliable recommend ations for ne wly emerging u sers in an existing recomme nder system, du e to the de fa cto initial lack of per sonalized feedback . Th is prob lem c an also be seen as an extreme and localized expression of sparsity , that proh ibits CF methods to u ncover m ean- ingful relations between the set of new user s and the rest of th e RS d atabase, and thu s, under mines the reli- ability of the produ ced recommendations. T o ev aluate the perf ormance of our method in coping with this p roblem we ru n th e fo llowing ex- periment. W e rando mly select 10 0 users from the MovieLens1M dataset h aving rated 100 movies or more an d we randomly select to include 4 %, 6%, 8%, 10% o f their ratings in ne w artiﬁcially “sparsiﬁed” ver- sions of the d ataset. Th e idea is that the modiﬁed data represent “earlier snap shots” o f the system, when these users were new , and as such, h ad rated fewer items. W e run NCDREC 4 against the other methods, an d we compare the r ecommend ation vectors with the rank ing lists in duced b y the complete set of ratings, which we use as the referen ce ranking for each user . 4 Note that the ran king list for the set of ne wly adde d users was produced by the C O L D S TAR T subcomponent. 5 10 15 20 0 0 . 2 0 . 4 0 . 6 Recall@N Movie Lens1 M 5 10 15 20 0 0 . 2 0 . 4 0 . 6 Yahoo !R2Mu sic 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 1 0 . 2 Precisio n/Recall 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 1 0 . 2 5 10 15 20 0 0 . 2 0 . 4 0 . 6 NDCG@N 5 10 15 20 0 0 . 2 0 . 4 0 . 6 NCDREC Katz FP MF A L † CT Fig. 5. Long tail recommendation quality on MovieLens1M and Ya hoo!R2Music datasets using Recal l@N, Pre cision and NDCG@N metrics T able 2 Recommenda tion quality on Mov ieLens1M and Yahoo!R2Music dataset s using R-Score and MRR metrics MovieLens1M Yahoo! R2Music R(5) R(10) MRR R(5) R(10) MRR NCDREC 0.39 97 0 .5098 0.30 08 0.353 9 0.45 87 0 .2647 MF A 0.1 217 0.191 1 0. 0887 0.20 17 0.2 875 0.159 1 L † 0.1 216 0.1914 0.08 92 0.196 5 0.281 4 0 .1546 FP 0.2 054 0.2874 0.15 24 0.144 6 0.224 1 0 .0998 Katz 0.21 87 0 .3020 0.16 42 0.170 4 0.25 29 0 .1203 CT 0.2 070 0.2896 0.15 35 0.146 5 0.22 93 0 .1019 T able 3 Long tai l recommendation quality on Mov ieLens1M and Yahoo!R2Music dataset s using R-Score and MRR metrics MovieLens1M Yahoo! R2Music R(5) R(10) MRR R(5) R(10) MRR NCDREC 0.32 79 0 .4376 0.23 95 0.352 0 0.43 22 0 .2834 MF A 0.1 660 0.251 7 0. 1188 0.25 56 0.3 530 0.199 5 L † 0.1 654 0.2507 0.11 93 0.249 2 0.346 1 0 .1939 FP 0.0 183 0.0654 0.02 21 0.019 5 0.068 4 0 .0224 Katz 0.02 75 0 .0822 0.02 67 0.034 9 0.09 39 0 .0309 CT 0.0 192 0.0675 0.02 27 0.021 5 0.07 47 0 .0249 For th is comparison excep t f or the stan dard Spear - man’ s ρ and Kendall’ s τ metrics [4,42], we also use two other well known r anking measures, namely the Degree of Agreement ( DO A) [1 5,16,18] and the Normalized Dista nce-based Performance Measure (NDPM) [42], o utlined below . T able 4 contains all the necessary deﬁnitions. Kendall’ s τ is an intuitiv e nonpa rametric r ank corre- lation index that has been widely u sed in the lit- erature. The τ of ranking lis ts r i , π i is deﬁned to be: τ , C − D √ N − T r √ N − T π (35) and takes the v alu e of 1 for p erfect m atch and -1 for rev er sed ordering. Spearman’s ρ is anoth er widely used non-p arametric measure of rank cor relation. T he ρ of rank ing lists r i , π i is deﬁned to be: ρ , 1 m P v j ( r i v j − ¯ r i )( π i v j − ¯ π i ) σ ( r i ) σ ( π i ) (36) where the ¯ · an d σ ( · ) deno te th e mean and stan- dard deviation. The ρ takes values from -1 to 1 . A ρ of 1 indica tes perfect rank association, a ρ of zero indicates no association between the rank ing lists and a ρ of -1 ind icate a p erfect n egati ve asso- ciation of the ranking s. Degree of Agreement (DO A) is a perform ance index common ly u sed in the recommendation litera- ture to evaluate the qu ality of rankin g-based CF methods [15,16,18, 51]. DO A is a v ariant of the Somers’ D statistic [43], deﬁned as follows: DO A i , P v j ∈T i ∧ v k ∈ W i [ π i v j > π i v k ] | T i | ∗ | ( L i ∪ T i ) | (37) where [ S ] equals 1, if s tatement S is t rue and zero otherwise. Macro- av e raged DO A (macro -DO A) T able 4 A summary of the notati on used for the deﬁnit ion of the ranki ng stabi lity metrics Notation Meaning r i User’ s u i referenc e ranking π i Recommend er System gene rated ranking r i v j Ranking score of the item v j in user’ s u i ranking list (reference ranking) π i v j Ranking score of the item v j in user’ s u i ranking list (Recommende r Sy stem generated ranking ) C Number of pairs that are conco rdant D Nu mber of discordant pairs N T otal num ber of pairs T r Number of tied pairs in the referen ce ranking T π Number of tied pairs in the system ranking X Numbe r of pairs where the reference ranking does not tie, b ut the RS’ s ranking ties ( N − T r − C − D ) is the a verag e of all DO A i and micro -averaged DO A (micro -DO A) is the ratio between the ag- gregate num ber o f item pairs in the correct order and the total number of item pairs checked (f or further details see [15,16]). Normalized Distance-based P erformance Measure The NDPM of ran king lists r i , π i is deﬁn ed to be: NDPM , D + 0 . 5 X N − T r (38) The NDPM measure giv es a per fect score of 0 to RS that correc tly predic t every pref erence r elation asserted b y the ref erence. The w orst score of 1 is assigned to reco mmenda tion vecto rs that contra- dict e very preference relation in r i [42,47]. High scor es on the ﬁr st thre e m etrics ( ρ , τ , DO A) and low score on the last (NDPM), sugg est that the two r anking lists [42] ar e “close”, which mean s that the new users are more lik ely to receiv e recomm endations closer to th eir tastes as de scribed b y their f ull set o f ratings. In Figure 6 we rep ort the average scor es on all four m etrics, for the set of newly added users. W e see that NCDREC clearly outp erforms every oth er method considered , ach ieving g ood results e ven when only 4% of each user’ s ratin gs were includ ed. M F A and L † also do well, especially as the number of r atings in- creases. Th ese resu lts are in accordan ce with the intu- ition behind our app roach an d the theoretical pr oper- ties of the C O L D S TA RT sub compo nent. W e see that, ev e n though ne w users’ tastes are not yet clear , the ex- ploitation of NCD proximity captu red by matrix D , manages to “propagate” this scarce r ating info rmation to the many related elemen ts of the item space, giv- ing our meth od an advantage in uncovering new user s’ preferen ces. This leads to a recomme ndation vector exhibiting lo wer sensitivity to sparsity . 4. Conclusions and Future W ork In this work we propo sed NCDREC; a novel meth od that builds on the intu ition behin d Dec omposab il- ity to provide an ele gant and computationally ef- ﬁcient f ramework for g enerating r ecommen dations. NCDREC exploits the innately hierarch ical structure of the item s pace, in troducin g the n otion of NCD pr ox- imity , which characterizes in ter-le vel relation s betwee n the elements of th e system and gives o ur model usefu l antisparsity theoretical prope rties. One very inte resting direction we are currently pur- suing inv olves th e g eneralization of th e C O L D S T A RT subcomp onent exploiting the functional rankin gs fam- ily [3]. In pa rticular, based on a recently propo sed, multidamp ing reformulation of th ese ran kings [24,2 5] that allows intuitive and fruitful interpre tations of the damping function s in terms rando m surﬁng habits, one could try to cap ture the actual newly emerging users’ behavior as they begin to explore the recommend er system, and map it to suitable collections of person- alized damping factors th at could lead to e ven better recommen dations. Another interesting r esearch path 4% 6% 8% 10% 0 . 3 0 . 32 Percen ta ge of included ra tings Spea rman’s ρ 4% 6% 8% 10% 0 . 24 0 . 25 0 . 26 Percen ta ge of included ra tings Kendall’s τ 4% 6% 8% 10% 0 . 86 0 . 88 Percen ta ge of included ra tings Degree Of Agreement 4% 6% 8% 10% 0 . 1 0 . 12 0 . 14 Percen ta ge of included ra tings NDPM (the smaller the b etter) NCDREC Katz FP MF A L † CT Fig. 6. Recommenda tion performan ce for New Users problem that rema ins to be explored inv olves the introd uction of mor e than one decomp ositions based on different criteria, and the effect it has to the theoretical prop- erties o f the C O L D S TA RT subcom ponent. Notice, that in NCDREC th is gener alization can be ac hiev ed read- ily , thro ugh th e intr oduction of new low-rank prox im- ity matrice s, D 1 , W 1 , D 2 , W 2 , . . . and associated pa- rameters, with no effect on the dimension ality of the model. In this work, we co nsidered the single d ecomposi- tion case. Our e x periments on the Movie Lens and the Yah oo!R2Music datasets, indicate that NC- DREC o utperfo rms several – kno wn for their anti- sparsity properties – s tate -of-the- art gr aph-based algo- rithms in widely used perfor mance m etrics, being at the same time by far the most econ omical one. N ote here that the rand om-walk approac hes, FP and CT , require to handle a gra ph of ( n + m + K ) node s and to compute 2 nm ﬁrst passag e time scores. Simi- larly , L † , Katz and MF A, in volve the in version s o f an ( n + m + K ) -d imensional square matrix. In fact, on ly NCDREC inv olves matrices whose dimensions de- pend solely on the cardina lity of the item space, which in most realistic applications increases slo wly . In conclusion, o ur ﬁn dings suggest tha t NCDREC carries the potential of han dling sparsity effectiv ely , and produce high q uality results in stan dard, long-tail as well as cold-start recomme ndation scen arios. References [1] Kamal Al i and Wi jnand van Stam. Ti vo: Ma king show re com- mendatio ns using a distribut ed collab orati ve ﬁltering architec - ture. In Pr oceedings of the T enth ACM SIGK DD International Confer ence on Knowledg e Discov ery and Data Mining , KDD ’04, page s 394–401, New Y ork, NY , USA, 2004. ACM. [2] Chri s Anderson . The long tail: Why the future of business is selling less of mor e . Hyperion, 2008. [3] Ric ardo Baeza -Y ates, Paol o Boldi, and Carlos Castil lo. Generic damping functio ns for propagat ing importance i n link- based ranking. Int ernet Mathematic s , 3(4):445–47 8, 2006. [4] Ric ardo Baeza -Y ates and Berthier Ri beiro-Net o. Modern In- formation R etrie val . Addi son-W esley P ublishin g Compa ny , USA, 2nd editi on, 2008. [5] James Baglama and L othar Reiche l. Augmented implicit ly restarte d lanc zos bidiagona lizat ion m ethods. SIAM Journal on Scient iﬁc Computi ng , 27(1):19–42, 2005. [6] Suhri d Balakri shnan and Sumit Chopra. Collaborati ve ranking. In Pr oceedings of the ﬁft h ACM i nternati onal confer ence on W eb searc h and data mining , WSDM ’12, pages 143–152, New Y ork, NY , USA, 2012. ACM. [7] J. Bobadilla , F . Ortega, A. Hernando, and A. Gutié Rrez. Rec- ommender systems surve y . Know .-B ased Syst. , 46:109–13 2, July 2013. [8] A. Ce v ahir , C. A ykanat, A. Tur k, and B.B. Cambazoglu. Site- based part itioning an d rep artit ioning te chniques for paralle l pageran k comput ation. P arallel and Distribu ted Sy stems, IEEE T ransacti ons on , 22(5):786–802 , 2011. [9] P av el Chebotare v and Elena Shamis. T he matrix-fore st theo- rem and measuring relati ons in small social groups. Automa- tion and Remote Contr ol , 58(9):1505–1514, 1997. [10] Pie rre-Jacque Courtois. Decomposability : Queueing an d Com- puter Sy stem Applicatio ns . AC M mono graph se ries. Aca demic Press, 1977. [11] P aolo Cremonesi , Y ehuda K oren, and Roberto T urrin. Perfor- mance of recommende r algorithms on top-n recommendat ion tasks. In Pr oceedings of the fou rth ACM confer ence on Rec- ommender systems , RecSys ’10, pages 39–46. A CM, 2010. [12] Chri stian Desrosiers and Geor ge Karypis. A comprehensi ve surve y of neighbo rhood-base d recommendat ion methods. In Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor , editors, Recommende r Systems Handbook , pages 107– 144. Springer US, 2011. [13] Chri stian Desrosiers and Geor ge Karypis. A comprehensi ve surve y of neighbo rhood-base d recommendat ion methods. In Francesco Ricci, Lior Rokach, Bracha Shapira, and Paul B. Kantor , editors, Recommende r Systems Handbook , pages 107– 144. Springer , 2011. [14] Fran çois Fouss, Ke vin Francoisse, Luh Y en, Alain Pirotte, and Marco Saerens. An experimenta l in vestigati on of k ernels on graphs for collaborat ive rec omm endati on and s emisupervi sed classiﬁca tion. Neural Netw . , 31:53–72, July 2012. [15] Fran cois Fouss, Alain Pirotte , Jean-Michel Renders, and Marco Saerens. Random-w alk comput ation of similarit ies betwee n nodes of a graph wit h application to coll aborati ve recommendat ion. IEEE T rans. on Kn owl. and Data Eng. , 19(3):355– 369, March 2007. [16] Anto nino Freno, Edmondo Trent in, and Marco Gori. Sca lable pseudo-li kel ihood estimatio n in hybrid random ﬁelds. In Pro- ceedi ngs of the 15t h AC M SIGKDD int ernational confer ence on Knowledg e discove r y and data mining , KDD ’09, pa ges 319–328. A CM, 2009. [17] Da vid Goldber g, David Nichols, Bria n M. Oki, and Douglas T erry . Using collaborat ive ﬁltering to weav e an informatio n tapestry . Commun. ACM , 35(12):6 1–70, December 1992. [18] Marc o Gori and Augusto Pucci. Itemrank : A random-wal k based scoring algorithm for recommender engines. In Pro- ceedi ngs of the 20th Intern ational J oint Conferen ce on Artif- ical Intellige nce , IJCAI’07, pages 2766–2771, San Francisco, CA, USA, 2007. Morgan Kau fmann Publishers Inc. [19] Geof frey Grimmett and Davi d Stirza ker . Pr obability and ran- dom pr ocesses . Oxford uni versity press, 2001. [20] Jona than L. Herlocke r, Joseph A. Konsta n, Al Bo rchers, and John Riedl. An algorit hmic frame work for performing col- laborat ive ﬁltering. In Pr oceedings of the 22Nd Annual In- ternati onal ACM SIGIR Confer ence on Resear ch and Devel- opment in Inf ormation Retrieval , SIGIR ’99, pages 230–237, Ne w Y ork, NY , USA, 1999 . A CM. [21] Oli ver Hinz and Jochen Eckert. The impact of search and rec- ommendatio n systems on sales in electroni c commerce. Busi- ness & Informati on Systems E ngineeri ng , 2(2):67–77, 2010. [22] Sepa ndar D. Kamvar , T aher H. Haveli wal a, Christopher D. Manning, and Gene H. Golub . Exploitin g the block struct ure of the web for computing pagerank. In Stanfor d University T echnic al Report , 2003. [23] Leo Katz. A ne w s tatus inde x deriv ed from sociometric analy- sis. P sychome trika , 18(1):39–43, 1953. [24] Gior gios Kolli as and Efstratios Gallopoulos. Multidamping simulatio n frame work for link- based ranking. In W eb Informa- tion Retrie val and Linear Algebra Algorithms , 2007. [25] Gior gios K ollias, Efstra tios Gallopo ulos, and Ananth Grama. Surﬁng the n etw ork for ranking by multida mping. IEEE T rans. Knowl. Data Eng. , 26(9): 2323–2336, 2014. [26] Y ehuda K oren and Robert Bell. Adv ances in collaborat i ve ﬁlterin g. In France sco Ricci, Lior Rokach, Bracha Shapira, and P aul B. Kantor , editors, Recommend er Systems Ha ndbook , pages 145–186. Springe r US, 2011. [27] G. Linde n, B. Smith, and J. Y ork. Amazon.com recommenda- tions: it em-to-it em collabo rati ve ﬁltering. Int ernet Computin g, IEEE , 7(1):76–80 , Jan 2003. [28] Qi Liu, Enhong Chen, Hui Xiong, Chris H. Q. Ding, and Jian Chen. E nhanci ng colla borati ve ﬁltering by user intere st expan- sion via personalize d ran king. IE EE T ransactio ns on Systems, Man, and Cyberne tics, P art B , 42(1):218–233, 2012. [29] Be njamin M Marlin and Richard S Z emel. Collabora tiv e pre- dicti on and ranking with non-random missing data. In Pro- ceedi ngs of the third ACM confere nce on Recomme nder sys- tems , RecSys’09, pages 5–12. A CM, 2009. [30] Ca rl D Meyer and Charle s D W essell. Stochasti c data clus- tering. SIAM J ournal on Matrix Analysis and Applications , 33(4):1214 –1236, 2012. [31] Ca rl Dean Mey er , Shaina Race, and Ke vin V alakuzhy . Deter - mining the number of clust ers via iterat ive consensus cluster- ing. In SDM , pag es 94–102, 2013. [32] A.N. Nikolak opoulos and J.D. Garofalaki s. NCDREC: A de- composabil ity inspired framew ork for top-n recommendati on. In W eb Intelli gence (WI) and Intellig ent Agent T echnolo gies (IAT), 2014 IEEE/WIC/AC M Int ernation al Join t Confe rences on , vol ume 1, pages 183–190, Aug 2014. [33] Atha nasios N Nik olakopoul os and John D Garof alakis. NC- Daw areRank: a nove l rank ing method that e xploits the de- composable structu re of the web . In Pr oceedin gs of the sixth ACM internatio nal confer ence on W eb sear ch and data mining , WSDM ’13, pages 143–152 . ACM, 2013. [34] Atha nasios N Nikola kopou los and John D Garofala kis. Random surﬁng without teleporta tion. arXiv pre print arXiv:1506.00092 , 2015. [35] Atha nasios N. N ikol akopoulos, Maria Kala ntzi, and John D. Garofa lakis. On the use of lanczos vecto rs for efﬁcie nt latent fac tor-b ased top-n recommendati on. In Proc eedings of the 4th Internati onal Confer ence on W eb Intellig ence, Mining and Se- mantics (W IMS14) , WIMS ’14, pages 28: 1–28:6, New Y ork, NY , USA, 2014. AC M. [36] Atha nasios N. Nikolako poulos, Marianna A. Kou neli, and John D. Garof alakis. Hierarchica l i temspace rank: Exploiti ng hierarc hy to alle viate sparsi ty in ranking-based recommenda- tion. Neur ocomput ing , 163:126–136, 2015. [37] Bha vik Patha k, Ro bert Garﬁnk el, Ram D Gopal, Rajkuma r V enkate san, and Fang Y in. Empiric al analysis of the i m pact of recommender syst ems on sal es. J ournal of Mana geme nt Infor- mation Systems , 27(2):1 59–188, 2010. [38] P aul Resnick, Neophytos Iaco vou, Mitesh Suchak, Peter Berg strom, and John Riedl. Grouplens: An open archit ecture for coll aborati ve ﬁ lteri ng of netne ws. In Proc eeding s of the 1994 ACM Confere nce on Computer Supported Cooperati ve W ork , CSCW ’9 4, pa ges 175–186, Ne w Y ork, NY , USA, 1994. A CM. [39] Fran cesco Ricci, L ior Rokach, and Bracha Shapira. Intr oduc- tion to re commender systems handbook . Springe r , 2011. [40] Ba drul Sarwar , George Karypis, Joseph Konst an, and John Riedl. Item-based collaborati ve ﬁltering recommendation al- gorithms. In Pr oceedings of th e 10th Inte rnational Confer ence on W orld W ide W eb , WWW ’01, pages 285–295, New Y ork, NY , USA, 2001. A CM. [41] J. Ben Schafe r , Joseph K onstan, and John Riedl. Recommende r systems in e-commerce. In Procee dings of the 1st ACM Con- fer ence on E lectr onic Commer ce , EC ’99, pages 158–166, Ne w Y ork, NY , USA, 1999. ACM. [42] Guy Shan i and Asela Gunaw ardana . Evalua ting recommend a- tion systems. In Francesco Ricci, Lior Rokac h, Bracha Shapira, and P aul B. Kant or, e ditors, Reco mmender Syste ms Handbook , pages 257–297. Springer US, 2011. [43] Sidn ey Siegel and N John Castel lan Jr . Nonparametri c s tatis- tics f or the beh avior al sciences . Mcgraw-Hill Book Compan y , 1988. [44] Herb ert A. Simon. The Scienc es of the Artiﬁcial ( 3r d ed.) . MIT Press, 1996. [45] Herb ert A Simon a nd Albert Ando. Aggregatio n of variab les in dynamic systems. Econometrica: journal of the Econometric Societ y , pages 111–138, 1961. [46] Man olis V ozali s and Konsta ntinos G. Margari tis. On the en- hancemen t of col laborati ve ﬁltering by demographic data. W eb Intell i. and Agent Sys. , 4(2):11 7–138, April 2006. [47] Y . Y . Y ao. Measu ring re trie val e ffecti vene s s based on user pref- erence of documents. Journal of the American Society for In- formation Science , 46(2):133–1 45, 1995. [48] Hil m i Y ildirim and Mukka i S. Krishnamoorthy . A random walk metho d for alle viating the sparsity problem in col labo- rati ve ﬁlterin g. In P r oceedings of the 2008 ACM confer ence on Recommender systems , RecSys ’08, pages 131–138. A CM, 2008. [49] G Geor ge Y in an d Qing Zhang. Continuo us-time Mark ov Chains and Applicati ons: A T wo-time-scal e Appr oach , vol- ume 37. Spring er , 2013. [50] Hongz hi Y in, Bin Cui, Jing Li, Junjie Y ao, and Chen Chen. Challe nging the long tail recommendati on. Procee dings of the VLDB Endowment , 5(9):896–907, 2012. [51] Liya n Zha ng, Kai Zhang, and C hunping Li. A top ical pagerank based algorit hm for recommender systems. In Proc eedings of the 31st annual internation al ACM SIGIR con fer ence on Re- sear ch and develop ment in inf ormation ret rieval , SIGIR ’08. A CM, 2008. Ap pendix A. Theoretical Discussion of NCDREC’ s Main Component Let us con sider the singular value decomposition of matrix G , G = UΣV ⊺ (39) Multiplying fro m the r ight with V and using the fact that its co lumns denote an orth onorm al set of vectors we get GV = UΣ (40) Multiplying from the right w ith the diagona l matrix Diag(1 , . . . , 1 | {z } f , 0 , . . . , 0) gives G  V f 0  = U  Σ f 0 0 0  (41) and ﬁnally , discarding the zero columns we get GV f = U f Σ f (42) Now pluggin g this in Eq. (8) we see that the recom- mendation vector for the user u i , π ⊺ i is giv en by: π ⊺ i = g ⊺ u i V f V f ⊺ (43) Notice that V f contains the orthogonal set of eigen- vectors of the m × m sy mmetric positive semide ﬁnite matrix G ⊺ G = ( R + ǫ W ) ⊺ ( R + ǫ W ) = ( R ⊺ + ǫ W ⊺ )( R + ǫ W ) = R ⊺ R + ǫ ( R ⊺ W + W ⊺ R ) + ǫ 2 W ⊺ W (44) Thus th e reco mmendatio n v ectors p roduced by the main comp onent of NCDREC can be seen as a rising from a low dimensional e igenspace o f the NCD - per- turbed inter-item similarity matrix, as seen in E q. (44). B. Proof of Lemma 4 Lemma 4. If A is the tr an sition matrix of an irr e- ducible and a periodic Ma rkov chain with ﬁnite state space, a nd B the tr ansition ma trix of any Marko v chain deﬁ ned onto th e sa me state space, then matrix C = κ A + λ B , where κ, λ > 0 such that κ + λ = 1 de- notes the transition matrix of an irr educible and ape- riodic Markov chain also. Pr oof. It is ea sy to see that for κ, λ > 0 suc h that κ + λ = 1 matrix C is also a valid transition pro bability matrix. Furtherm ore, when A is ir reducible th ere ex- ists a positive prob ability path between any tw o given states of the co rrespond ing M arkov ch ain. The same path will also be valid for the Markov chain that cor- respond s to matrix C , as long as κ > 0 . Th e same thing is true for the aperiodicity property , since the ad- dition of the stochastic matrix B d oes nothing to the length of the possible paths that allow a r eturn to any giv en state of the Markov ch ain that correspo nds to matrix A . T hus, the irr educibility and the ap eriodicity of A , tog ether with the requiremen t κ > 0 , im ply th e existence of those properties to the ﬁnal matrix C , as needed.

Top-N recommendations in the presence of sparsity: An NCD-based approach

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment