Latent Feature Based FM Model For Rating Prediction
Rating Prediction is a basic problem in Recommender System, and one of the most widely used method is Factorization Machines(FM). However, traditional matrix factorization methods fail to utilize the benefit of implicit feedback, which has been prove…
Authors: Xudong Liu, Bin Zhang, Ting Zhang
Latent Feature Based FM Model For Rating Prediction Xudong Liu, Bin Zhang National Laborat or y of P attern Recognition Institute of A utomation Chinese Academy of Sciences Beijing , 100190, China {lxdwinwin, sjlmustc}@gmail .com Ting Zhang Unive rsity of Science and T echnology of China Hef ei, 230026. China ting.zhang.ema il@gmail.com Chang Liu Alibaba Group Beijing , 100022, China qingsong .lc@alibaba - inc.com ABSTRA CT Rating Prediction is a b asic problem in Recommender Sys- tem, and one of the most widely used meth o d is F actoriza- tion Machines(FM). H o w ev er, traditional matrix factoriza- tion metho ds fail to ut ilize the b enefit of implicit feedback, whic h has b een prov ed to b e imp ortant in Rating Predic- tion problem. In this work, we consider a sp ecific situa- tion, mo vie ra ting prediction, where w e assume that a u ser’s w atc hing history h as a big influence on his/her rating b e- havior on an item. W e introduce tw o mo dels, Latent Diric h- let Allocation(LDA) and w ord2v ec, b oth of which p erform state-of-the-art results in training latent features. Based on t h at, w e prop ose tw o feature based mo dels. One is the T opic-based FM Model which pro vides the implicit feedbac k to the matrix factorizatio n, th e other is the V ector-based FM Model which exploits t he order info of a user’s w atc hing his- tory resulting in bett er p erformance. Empirical results on three datasets d emonstrate that our metho d p erforms b etter than t he baseline mo del and confirm th at V ector-based FM Model usually w orks b ett er as it con tains the order info. Keyw ords Rating Prediction, F actorization Mac hines, LDA, W ord2vec 1. INTR ODUCTION The collaborative filtering problem h as gained significant attentio n in machine learning field since the Netflix Prize. In this challenge, of the most widely u sed is the latent factor mod el which has prove n to work well . T o state the prob lem more formally , w e introdu ce several notations, that is, w e hav e a set of users, U = { U 1 , U 2 , · · · , U N } , a set of items, I = { I 1 , I 2 , · · · , I M } , and t he rating scores which can b e view ed as a sparse matrix R ∈ R N × M , where th e element r ui is the score rated by user U u on item I i . The goal of the problem is to reasonably predict the missing elements in t h e sparse matrix. Permission to make digit al or hard copies of all or part of this work for personal or classroom use is granted without fee pro vided that copies are not made or distrib uted for profit or commercial adv anta ge and that copies bear this notice and the full cita tion on the first page. T o co py otherwise , to republi sh, to post o n serv ers or to redistrib ute to l ists, requi res prior spec ific permission and/or a fee. Copyri ght 20XX AC M X-XXXXX-XX-X/XX/XX ...$15.00. Man y metho ds ha ve b een designed to address that prob- lem. Here, we mainly focus on matrix factorization [9] as it p erforms the state- of-the-arts in dyadic prediction p roblem. How ever, m atrix factorization [5] fails to utilize th e b enefit of the implicit feedback [3], which pla ys an imp ortant role in recommender system. In order to provide implicit feedbac k to th e matrix factoriz ation, SVD+ + [4] mo del is prop osed but it takes much longer time and larger memory in the training pro cess. F actorization Mac hines (FM) can b e re- garded as a classification or regression mo del combined with feature engineering. With different features, FM can mimic different factorization mo dels lik e matrix factorizati on and sp ecialized mod els suc h as S V D++. In this work, w e pro- p ose tw o latent feature based FM mo dels, b oth of which can get t he implicit feedback of a user or an item. One is called T opic-based FM Model, and the oth er is V ector-based FM Model. T opic mo del is a typical statistical mo del in natural lan- guage pro cessing (N LP) area and also u sed in mac hine learn- ing area. One of the most classic mo dels is Latent Dirich- let Allo cation (LD A)[1], whic h is a generative p robabilistic mod el for collections of discrete data such as text corpora. It assumes that eac h docu ment can b e expressed by sev- eral topics, and each topic is generated by some w ords. As a result, w e can express a do cument using a laten t topic factor. Besides that, LDA can also b e applied to a rating prediction problem. Consider a specific situation, where w e w an t to predict how a u ser will rate a mo vie based on user’s w atc hing history which, to some degree, can in d icate user’s interes t on this movie. Thinking of a user’s w atc hing history as a “document” and each movie as a “word” in this “do c- ument” , we can see that a user’s interest can b e similarly obtained by sev eral laten t topics, which are generated by the “w ords” that belong to th e “do cument” . In other w ords, the user’s interests in mo vies can b e draw n from those laten t topics. Therefore, we can use the latent topics as features to train t h e FM mo del, which we called T opic-based FM mod el. The other FM model w e prop osed is V ector-b ased FM Model which is built on w ord2v ec[8]. I t provides an ef- ficient implementation of contin uous b ag-of-words(CBO W) and skip-gram architectures for computing v ector represen- tations of words[7]. Though it is a simple n eural netw ork mod el, it w orks quite well in practice. The main goal of w ord2v ec is to in trod uce techniques that can b e used for learning high-qu alit y word vectors from huge data sets with billions of w ords, and a b ig vocabulary with millions of w ords. Similar to LDA mo del, word2v ec mo del can also train a latent vector from a vocabulary constructed from the training data. The difference from LDA is that this latent vector represen ts the w ord itself instead of the do cument. In our problem, a user is regarded as a “do cument” , and the user’s watc hing history can b e view ed as a sequence of “w ords” . Thus wo rd2vec mod el can b e used here to generate a latent vector for eac h item. F ollo wing the same fashion, w e use those laten t vectors as features to train the FM model resulting in V ector-based FM Model. The follo wing of t his pap er is organized as follo ws. In Section 2, w e provide more d etailed description on T opic- based FM Mo d el and V ector-b ased FM Mo del. Section 3 sho ws experimental ev aluation and analysis of our metho d on th ree large scale collaborative datasets, which demon- strates th at our metho d outp erforms state-of-the-art latent factor approaches. Finally , w e conclude in Section 4. 2. LA TENT FE A TURE B ASED FM MODEL IN RA TING PREDICTION In this section, w e will introduce the tw o latent feature mod els mentioned ab ov e into rating p rediction problem. These laten t features may bring some implicit feedback or some la- tent characters of a user or an item. In the follo wing part, w e will explain how the latent features work in FM mo del. 2.1 T opic-based FM Model T opic-based FM Mo d el is similar to a previous w ork, M 3 F mod el [6] , whic h generates latent factors from user’s history info and item’s history info. H ow ev er, in M 3 F mo del, the laten t factors must b e trained every once a time whic h is time-consuming. Our w ork, on the con trary , do esn’t n eed to train the latent factors every time. W e just n eed to gen- erate the laten t factors for the fi rst time and upd ate them when necessary , leading to a simpler algorithm. Next we will explain h ow topic mo del works in FM mo del and sho w the detailed algorithm. First we in trod uce the M 3 F mod el to make the notation clear here and b elo w. The M 3 F mod el t akes three steps to obtain the parameters of u ser and item using Gibbs Sam- pling. It firstly samples the hyp erparameters, then samples the topics and finally th e user parameters and item param- eters. F or more details ab out th e three step s, you may refer to the pap er [6]. In general, t h e M 3 F mo del introduced tw o metho d s to predict the missing elemen ts in th e rating ma- trix. One is the M 3 F -TIB model, the predicted score rated by user U u on item I i is obtained by the follo wing formula , ˆ r ui ( ~ θ u , ~ θ i ) = p u · q i + K U X k =1 θ uk w uk + K I X l =1 θ il w il . (1) p u and q i are the latent vectors for the user U u and item I i respectively . p u · q i represents th e dot prod uct b etw een the tw o vectors. ~ θ u = [ θ u 1 , θ u 2 , · · · , θ uK U ] T is the latent t opic for the corresp onding user and w u = [ w u 1 , w u 2 , · · · , w uK U ] T is the w eig ht v ector for th e latent topic. Similarly , ~ θ i and w i are t he latent topic for item I i and weig ht vector for that latent topic. The other one is M 3 F -TIF mo del whose prediction form ula is giv en as follo ws, ˆ r ui ( ~ θ u , ~ θ i ) = p u · q i + K U X k =1 K I X l =1 θ uk θ il e uk · e il , (2) where e uk and e il are th e topic-indexed vectors for θ uk and θ il respectively . They provide the wei gh t for the user-item- cross θ ui θ il using the dot prod uct. In our formulation, we solve th e problem b y combining those tw o existing metho ds mentioned ab ov e. Firstly , we train the user’s latent factor based on th e user’s history info. Secondly , we train the item’s latent factor u sing the item’s history info which tells those users who ha v e watc hed this item. After w e get the user’s topic and the item’s topic, w e define our prediction form ula based on FM as follo w s, ˆ r ui ( ~ θ u , ~ θ i ) = µ + b u + b i + p u · q i + K U X k =1 θ uk w uk + K I X l =1 θ il w il + K U X k
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment