Spectral Geometric Matrix Completion
Deep Matrix Factorization (DMF) is an emerging approach to the problem of matrix completion. Recent works have established that gradient descent applied to a DMF model induces an implicit regularization on the rank of the recovered matrix. In this wo…
Authors: Amit Boyarski, Sanketh Vedula, Alex Bronstein
Preprint. Pages: 1 – 26 Sp ectral Geometric Matrix Completion Amit Bo y arski* amitbo y@cs.technion.ac.il T e chnion, Isr ael. Sank eth V edula* sanketh@cs.technion.a c.il T e chnion, Isr ael. Alex Bronstein br on@cs.technion.ac.il T e chnion, Isr ael. Abstract Deep Matrix F actorization (DMF) is an emerging approach to the problem of matrix completion. Recent w orks ha v e established that gradien t descent applied to a DMF model induces an implicit regularization on the rank of the reco vered matrix. In this w ork we in terpret the DMF model through the lens of spectral geometry . This allo ws us to incorporate explicit regularization without breaking the DMF structure, thus enjoying the best of b oth w orlds. In particular, w e focus on matrix completion problems with underlying geometric or top ological relations betw een the ro ws and/or columns. Such relations are prev alen t in matrix completion problems that arise in man y applications, suc h as recommender systems and drug-target interaction. Our contributions enable DMF mo dels to exploit these relations, and make them comp etitiv e on real benchmarks, while exhibiting one of the first successful applications of deep linear net works. Keyw ords: matrix completion, deep matrix factorization, deep linear netw orks, graph signal pro cessing, sp ectral geometry , recommendation system, drug-target interaction. 1. In tro duction Matrix completion deals with the recov ery of missing v alues of a matrix from a subset of its en tries, Find X s . t . X S = M S . (1) Here X stands for the unknown matrix, M ∈ R m × n for the ground truth matrix, S is a binary mask representing the input supp ort, and denotes the Hadamard pro duct. Since problem ( 1 ) is ill-p osed, it is common to assume that M b elongs to some lo w dimensional subspace. Under this assumption, the matrix completion problem can b e cast via the least-squares v ariant, min X rank ( X ) + µ 2 k ( X − M ) S k 2 F . (2) Relaxing the in tractable rank p enalt y to its conv ex en v elop e, namely the nuclear norm, leads to a conv ex problem whose solution coincides with ( 2 ) under some tec hnical conditions ( Candès and Rech t , 2009 ). Another w a y to enforce low rank is b y explicitly parametrizing X in factorized form, X = X 1 X 2 . The rank is upper-b ounded b y the minimal dimension of X 1 , X 2 . F urther developing this idea, X can b e parametrized as a pro duct of sev eral matrices X = Q N i =1 X i , a mo del w e denote as de ep matrix factorization (DMF). F ollowing © A. Boy arski*, S. V edula* & A. Bronstein. Spectral Geometric Ma trix Completion the nomenclature used b y Arora et al. ( 2019 ), a factorization with N > 2 is called de ep while for N ≤ 2 it is called shal low . Nevertheless, to simplify notation, we shall simply call any such mo del as DMF while explicitly sp ecifying N . Gunasek ar et al. ( 2017 ); Arora et al. ( 2019 ) inv estigated the minimization of ov erparametrized DMF mo dels using gradient descen t, and came to the follo wing conclusion (which w e will formally state in Section 2 ): whereas in some restrictive settings minimizing DMF using gradient descen t is equiv alen t to n uclear norm minimization (i.e., conv ex relaxation of ( 2 )), in general these t w o mo dels pro duce differen t results, with the former enforcing a stronger regularization on the rank of X . This regularization gets stronger as N (the depth) increases. In light of these results, w e shall henceforth refer by "DMF" to the aforementioned model coupled with the sp ecific algorithm used for its minimization, namely , gradien t descent. In this w ork we fo cus on matrix completion problems with underlying geometric or top ological relations b et w een the ro ws and/or columns. Such relations are prev alen t in matrix completion problems that arise in applications such as recommender systems and drug-target in teraction. A common w a y of representing these relations is in the form of a graph. F or example, in the Netflix problem ( Candès and Rech t , 2009 ) the rows corresp ond to users, the columns corresp ond to mo vies, and the matrix elemen ts represent the ratings giv en by the users to the movies. The goal is to recov er the full matrix of ratings from an incomplete matrix. In this setting there migh t exist side information on the users and movies whic h can b e used to construct t wo graphs represen ting relations b et w een users and relations b et w een mo vies, and the matrix X can be viewed as a signal on the pr o duct of these graphs. A useful prior on X can b e, for example, mo deling it as a smooth or band-limited signal on this graph, encouraging similar movies to b e assigned similar rating from similar users, and vice v ersa. This kind of geometric structure is generally ov erlooked by purely algebraic entities such as rank, and b ecomes in v aluable in the data p o or regime, where the theorems go v erning reconstruction guaran tees (i.e., Candès and Rech t ( 2009 )) do not hold. Our w ork leverages the recen t adv ances in DMF theory to marry the t w o concepts: a framew ork for matrix completion that is explicitly motiv ated by geometric considerations, while implicitly promoting lo w-rank via its DMF structure. Con tributions. Our contributions are as follo ws: • W e prop ose geometrically inspired DMF mo dels for matrix completion and study their dynamics. • W e successfully apply those mo dels to matrix completion problems in recommendation systems and drug-target interaction, outp erforming v arious complicated metho ds with only a few lines of co de ( Figure 6 ). This serv es as an example to the p o w er of deep linear netw orks, b eing one of their first successful applications to real problems. • Our findings c hallenge the quality of the side information a v ailable in recommendation systems datasets, and the abilit y of contemporary metho ds to utilize it in a meaningful and efficient w a y . 2 Spectral Geometric Ma trix Completion 2. Preliminaries Sp ectral graph theory . Let G = ( V , E , Ω ) b e a (weigh ted) graph sp ecified b y its vertex set V and edge set E , with its adjacency matrix denoted by Ω . Giv en a function x ∈ R | V | on the vertices, we define the follo wing quadratic form (also known as Dirichlet ener gy ) measuring the v ariabilit y of the function x on the graph, x > Lx = X ( a,b ) ∈ E ω a,b ( x ( a ) − x ( b )) 2 . (3) The matrix L is called the (c ombinatorial) gr aph L aplacian , and is giv en by L = D − Ω , where D = diag ( Ω1 ) is the de gr e e matrix , with 1 denoting the vector of all ones. L is symmetric and positive semi-definite and therefore admits a spectral decomposition L = ΦΛΦ > . Since L 1 = 0 , λ 1 = 0 is alwa ys an eigen v alue of L . The graph Laplacian is a discrete generalization of the con tin uous Laplace-Beltrami operator, and therefore has similar prop erties. One can think of the eigenpairs ( φ i , λ i ) as the graph analogues of "harmonic" and "frequency". Structural information ab out the graph is enco ded in the sp ectrum of the Laplacian. F or example, the num b er of connected comp onen ts in the graph is given b y the m ultiplicit y of the zero eigenv alue, and the second eigenv alue (counting m ultiple eigenv alues separately) is a measure for the connectivity of the graph ( Spielman , 2009 ). A function x = P | V | i =1 α i φ i on the v ertices of the graph whose coefficients α i are small for large i , demonstrates a "smo oth" b eha viour on the graph in the sense that the function v alues on nearby nodes will b e similar. A standard approach to promoting suc h smo oth functions on graphs is by using the Diric hlet energy ( 3 ) to regularize some loss term. F or example, this approach giv es rise to the p opular bilateral and non-lo cal means filters ( Singer et al. , 2009 ; Gadde et al. , 2013 ). W e call x a k -bandlimited signal on the graph G if α i = 0 ∀ i > k . Pro duct graphs and functional maps. Let G 1 = ( V 1 , E 1 , Ω 1 ) , G 2 = ( V 2 , E 2 , Ω 2 ) b e t w o graphs, with L 1 = ΦΛ 1 Φ > , L 2 = ΨΛ 2 Ψ > b eing their corresp onding graph Laplacians. The bases Φ , Ψ can b e used to represen t functions on these graphs. W e define the Cartesian pro duct of G 1 and G 2 , denoted by G 1 G 2 , as the graph with v ertex set V 1 × V 2 , on which t wo no des ( u, u 0 ) , ( v , v 0 ) are adjacent if either u = v and ( u 0 , v 0 ) ∈ E 2 or u 0 = v 0 and ( u, v ) ∈ E 1 . The Laplacian of G 1 G 2 is given b y the tensor sum of L 1 and L 2 , L G 1 G 2 = L 1 ⊕ L 2 = L 1 ⊗ I + I ⊗ L 2 , (4) and its eigen v alues are given b y the Cartesian sum of the eigenv alues of L 1 , L 2 , i.e., all com binations λ 1 + λ 2 where λ 1 is an eigen v alue of L 1 and λ 2 is an eigen v alue of L 2 . Let X b e a function defined on G 1 G 2 . Then it can b e represented using the bases Φ , Ψ of the individual Laplacians, C = Φ > X Ψ . In the shap e pro cessing communit y , such C is called a functional map ( Ovsjanik o v et al. , 2012 ), as it it used to map b et w een the functional spaces of G 1 and G 2 . F or example, given t w o functions, x = Φ α on G 1 and y = Ψ β on G 2 , one can use C to map b et w een their representations α and β , i.e., α = Φ > x = C Ψ > y = C β . W e shall henceforth in terchangeably switc h b et ween the terms "signal on the pro duct graph" and "functional map". W e will call a functional map smo oth if it maps close p oin ts on one graph to close p oin ts on the other. A simple w a y to construct a smo oth map is via a linear com bination of 3 Spectral Geometric Ma trix Completion eigen v ectors of L G 1 G 2 corresp onding to small eigenv alues ("lo w frequencies"). Notice that while the singular v ectors of L G 1 G 2 are outer pro ducts of the columns of Φ and Ψ , their ordering with resp ect to the eigenv alues of L G 1 G 2 migh t b e different than their lexicographic order. Implicit regularization of DMF. Let X ∈ R m × n b e a matrix parametrized as a pro duct of N matrices X = Q N i =1 X i (whic h can b e interpreted as N linear lay ers of a neural net work), and let ` ( X ) b e an analytic loss function. W e are interested in the follo wing optimization problem, min X 1 ,..., X N ` N Y i =1 X i ! . (5) Without loss of generalit y , we will assume that m < n . Arora et al. ( 2018 , 2019 ) analyzed the evolution of the singular v alues and singular vectors of X throughout the gradient flo w ˙ X ( t ) = −∇ ` ( X ( t )) , i.e., gradient descent with an infinitesimal step size, with b alanc e d initialization , X i +1 (0) > X i +1 (0) = X i (0) X i (0) > , ∀ i = 1 . . . N . (6) As a first step, w e state that X ( t ) admits an analytic singular v alue decomp osition. Lemma 1 (L emma 1 in Ar or a et al. ( 2019 )). The pr o duct matrix X ( t ) c an b e expr esse d as: X ( t ) = U ( t ) S ( t ) V > ( t ) , (7) wher e: U ( t ) ∈ R m × m , S ( t ) ∈ R m × m and V ( t ) ∈ R n × m ar e analytic functions of t ; and for every t , the matric es U ( t ) and V ( t ) have orthonormal c olumns, while S ( t ) is diagonal (elements on its diagonal may b e ne gative and may app e ar in any or der). The diagonal elements of S ( t ) , which w e denote by σ 1 ( t ) , . . . , σ m ( t ) , are signed singular v alues of X ( t ) ; the columns of U ( t ) and V ( t ) , denoted u 1 ( t ) , . . . , u m ( t ) and v 1 ( t ) , . . . , v m ( t ) , are the corresp onding left and right singular v ectors (resp ectively). Using the ab o v e lemma, Arora et al. ( 2019 ) c haracterized the ev olution of singular v alues as follows: Theorem 2 (The or em 3 in A r or a et al. ( 2019 )). The signe d singular values of the pr o duct matrix X ( t ) evolve by: ˙ σ r ( t ) = − N · σ 2 r ( t ) 1 − 1 / N · h∇ ` ( X ( t )) , u r ( t ) v > r ( t ) i , r = 1 , . . . , m. (8) If the matrix factorization is non-de gener ate, i.e., has depth N ≥ 2 , the singular values ne e d not b e signe d (we may assume σ r ( t ) ≥ 0 for al l t ). The ab o v e theorem implies that the ev olution rates of the singular v alues are dep enden t on their magnitude exp onentiated b y 2 − 2 / N . Ignoring the term h∇ ` ( X ( t )) , u r ( t ) v > r ( t ) i , as N increases,the ev olution rate of the large singular v alues is enhanced while the evolution rate of the small ones is damp ened. The increasing gap b et w een the evolution rates of the large and small singular v alues induces an implicit regularization on the effective rank of X ( t ) . Ho wev er, the evolution of the singular v alues also dep ends on the gradient of the loss function via the term h∇ ` ( X ( t )) , u r ( t ) v > r ( t ) i , making the c hoice of the loss consequen tial. While exact analysis of ( 8 ) is hard in the absence of p erfect characterization of this term, one can still lev erage these dynamics b y empirically exploring different loss functions. 4 Spectral Geometric Ma trix Completion 3. Sp ectral geometric matrix completion W e assume that we are given a set of samples from the unknown matrix M ∈ R m × n , enco ded as a binary mask S , and tw o graphs G r , G c , enco ding relations b et w een the rows and the columns, resp ectiv ely . Denote the Laplacians of these graphs and their sp ectral decomp ositions b y L r = ΦΛ r Φ > , L c = ΨΛ c Ψ > . W e denote the Cartesian pro duct b et w een G c and G r b y G ≡ G c G r , and will henceforth refer to it as our reference graph. Our approach relies on a minimization problem of the form min X E data ( X ) + µE dir ( X ) s . t . rank( X ) ≤ r , (9) with E data denoting a data term of the form E data ( X ) = k ( X − M ) S k 2 F , (10) and E dir is the Dirichlet energy of X on G , given b y (see ( 4 )) 1 E dir ( X ) = tr X > L r X + tr X L c X > . (11) T o that end, we parametrize X via a matrix pro duct X = AZ B > , and discard the rank constrain t, min A , Z , B E data ( AZ B > ) + µE dir AZ B > . (12) Since ( 12 ) is no w a DMF mo del of the form ( 5 ), according to Theorem 2 the discarded rank constrain t will b e captured by the implicit regularization induced b y gradient descent ev en if the factors are full size matrices. T o emphasize this p oin t, in most of our exp erimen ts we used A of size m × m , Z of size m × n and B of size n × n . T o interpret this matrix factorization geometrically , w e interpret Z as a signal living on a latent pro duct graph G 0 . Via the linear transformation AZ B > this signal is transp orted on to the reference graph G , where it is assumed to b e b oth low-rank and smo oth (see Figure 1 ). Notice that the latent graph is used only for the purpose of illustrating the geometric in terpretation, and there is no need to find it explicitly . Nev ertheless, it is p ossible to promote particular prop erties of it via sp ectral constrain ts that can sometime impro ve the p erformance. W e demonstrate these extensions in the sequel. T o giv e a concrete example, suppose X is a band-limited signal on a 2D Euclidean grid G , then there is some lo w rank signal Z that can b e made smo oth on G via an appropriate ordering of its ro ws and columns 2 , i.e., X = Π 1 Z Π > 2 . By smo oth w e mean that it has low Diric hlet energy ( 3 ), where L is the discrete 2D Euclidean Laplacian, i.e., tr X > LX = P i,j ( x i +1 ,j − x i,j ) 2 + ( x i,j +1 − x i,j ) 2 . 1. Note that it is p ossible to weigh the tw o terms differently , as we do in some of our experiments. 2. On a side-note, that is exactly the goal of the well known and closely related seriation problem ( Recanati , 2018 ). 5 Spectral Geometric Ma trix Completion Figure 1: An illustration of the geometric in terpretation of ( 12 ). A low-rank signal Z that lives on a laten t pro duct graph G 0 is transported on to the reference pro duct graph G . The transported signal AZ B > will b e smo oth on the target graph due to the Dirichlet energy . F or later reference let us rewrite ( 12 ) in the sp ectral domain. W e will denote the Laplacians of the latent graph factors comprising G b y L 0 r , L 0 c and their eigenbases by Φ 0 , Ψ 0 . Using those eigenbases and the eigen bases of the reference Laplacians L r , L c , we can write, Z = Φ 0 C Ψ 0 > (13) A = Φ P Φ 0 > (14) B = Ψ Q Ψ 0 > . (15) Under this reparametrization w e get AZ B > = Φ P C Q > Ψ > . (16) With some abuse of notation, ( 12 ) b ecomes min P , C , Q E data ( P C Q > ) + µE dir P C Q > , (17) with E data ( P C Q > ) = Φ P C Q > Ψ > − M S 2 F , (18) and E dir ( P C Q > ) = tr QC > P > Λ r P C Q > + tr P C Q > Λ c QC > P > . (19) 6 Spectral Geometric Ma trix Completion 3.1. Extensions A dditional regularization via sp ectral filtering. W e prop ose a stronger explicit regu- larization b y demanding that b oth X and Z b e smooth on their resp ective graphs. Since w e do not kno w the Laplacian of G 0 , we smo oth Z via sp e ctr al filtering , i.e., through direct manipulation of its sp ectral represen tation C . T o that end, we pass C through a bank of pre-c hosen sp ectral filters { F p } p ∈P , { G q } q ∈Q , i.e., diagonal p ositiv e semi-definite matrices, and transp ort the filtered signals to G according to X p,q = Φ P F p C G > q Q > Ψ > , p ∈ P , q ∈ Q . (20) In particular, we use the following filters, F p = diag ( 1 p ) , G q = diag ( 1 q ) , (21) where 1 k = 1 . . . 1 0 . . . 0 > denotes a v ector with k ones follo wed by zeros. F or these manipulations to tak e effect, we replace E data in ( 17 ) with the following loss function, E z ( P , Z , Q ) = X p ∈P q ∈Q k ( X p,q − M ) S k 2 F . (22) Despite the fact that we used separable filters in ( 20 ), these filters are coupled through the loss ( 22 ). This results in an o v erall inseparable sp ectral filter that still retains a DMF structure, since ( 20 ) is a 5 -la yer DMF with tw o fixed la yers. While Theorem 2 do es not co ver the case of a m ulti-la y er DMF where only a subset of the la y ers are trainable, our empirical ev aluations suggest that the implicit rank regularization is still in place. This additional regularization allows us to get decent reconstruction errors even when the n umber of measurements is extremely small, as we sho w in Section 5.1 . Regularization of the individual la y ers. Another extension we explore is imp osing further regularization on the individual lay ers. F or example, one could ask L 0 r and A > L r A to b e join tly diagonalized by Φ 0 . Using ( 13 )-( 14 ) we get, Λ 0 r = Φ 0 > A > L r A Φ 0 = P > Φ > L r Φ P = P > Λ r P . (23) Th us, we can appro ximately enforce this constrain t with the follo wing p enalt y term, E r diag ( P ) = k off P > Λ r P k 2 F , (24) where off ( · ) denotes the off-diagonal elements. A similar treatment to the columns graph giv es, E c diag ( Q ) = k off Q > Λ c Q k 2 F . (25) While these p enalt y terms are not a function of the pro duct matrix, their inclusion did not harm the implicit regularization. 7 Spectral Geometric Ma trix Completion ( a ) ( b ) ( c ) Figure 2: In these exp eriments we generated band-limited (low-rank and smo oth) matrices using the syn thetic Netflix graphs (see Figure 11 ) to test the dep endence of SGMC and DMF on the rank of the underlying matrix and on the n umber of training samples. Left : reconstruction error (on the test set) vs. the rank of the ground-truth matrix. As the rank increases, the reconstruction error increases, but it increases slo wer for SGMC than for DMF. F or the training set we used 15% of the p oin ts chosen at random (same training set for all exp eriments). µ w as set to 0 . 001 . Middle : reconstruction error (on the test set) vs. density of the sampling set in % of the num ber of matrix elemen ts, for a random rank 10 matrix of size 150 × 200 . As w e increase the n um b er of samples, the gap betw een DMF and SGMC reduces. Still, even when using 30% of the samples, SGMC p erforms b etter for the same n umber of iterations. F or all the experiments we set µ = 0 . 01 , lr = 0 . 001 , maxiter= 3 × 10 6 . Righ t : effectiv e rank ( Ro y and V etterli , 2007 ) vs. training set density , for a random rank 10 matrix. Even for extremely data-po or regimes, SGMC was able to reco ver the effective rank of the ground-truth matrix, whereas is underestimating it. 4. Exp erimen tal study on synthetic data The goal of this section is to compare b et ween our approac h and v anilla DMF on a simple example of a comm unity structured graph. W e exhaustiv ely compare betw een the following distinct metho ds: • Deep matrix factorization (DMF): min P , C , Q P C Q > − M S 2 F , (26) • Sp ectral geometric matrix completion (SGMC): The prop osed approach defined b y the optimization problem ( 17 ). • F unctional Maps (FM, SGMC1): This metho d is lik e SGMC with a single la yer, i.e., we optimize only for C , while P and Q are set to iden tity . Since SGMC uses additional information, it is exp ected to p erform b etter than DMF. How ev er, prop er utilization of the graph information is not trivial (as is eviden t by T able 1 ), and we conduct this set of con trolled exp erimen ts to attest for it. W e use the graphs taken from the syn thetic Netflix dataset. Synthetic Netflix is a small syn thetic dataset constructed by Kalofolias et al. ( 2014 ) and Monti et al. ( 2017 ), in which the user and item graphs hav e strong comm unities structure. See Figure 11 in App endix A for a visualization of the user/item graphs. It is useful in conducting controlled experiments to understand the b eha vior of geometry-exploiting algorithms. In all our tests w e use a randomly generated band-limited matrix on the pro duct graph G c G r . F or the complete details please refer to the captions of the relev ant figures. 8 Spectral Geometric Ma trix Completion Figure 3: In this exp erimen t, we study the robustness of SGMC in the presence of noisy graphs. W e p erturbed the edges of the graphs b y adding random Gaussian noise with zero mean and tunable standard deviation to the adjacency matrix. W e discarded the edges that b ecame negativ e as a result of the noise, and symmetrized the adjacency matrix. SGMC1/SGMC2/SGMC3 stand for SGMC with 1 lay er (training only C ), 2 la yers (training C , P ) and 3 lay ers ( C , P , Q ). Left : With clean graphs all SGMC metho ds p erform well. As the noise increases, the regularization induced by the depth kicks in and there is a clear adv an tage for SGMC3. F or large noise, SGMC3 and DMF ac hieve practically the s ame p erformance. Middle & Righ t : eigenv alues of L r , L c for different noise lev els. Even for mo derately large amoun ts of noise, the structure of the low e r part of the sp ectrum is preserv ed, and the effect on the low-frequency (smooth) signal remains small. P erformance ev aluation. T o ev aluate the p erformance of the algorithms in this section, w e rep ort the r o ot me an squar e d err or , RMSE( X , S ) = s k ( X − M ) S k 2 F P i,j S i,j (27) computed on the complement of the training set. Here X is the recov ered matrix and S is the binary mask represen ting the supp ort of the set on which the RMSE is computed. W e explore the following aspects: Sampling density . W e inv estigate the effect of the num ber of samples on the reconstruction error and the effe ctive r ank of the recov ered matrix ( Ro y and V etterli , 2007 ). W e demonstrate that in the data-p o or regime, the implicit regularization of DMF is to o strong resulting in p oor recov ery , compared to a sup erior p erformance achiev ed by incorporating geometric regularization through SGMC. These exp erimen ts are summarized in Figure 2 . Initialization. In all of our exp eriments w e initialize with balanced initialization ( 6 ), with scaled iden tit y matrices 10 − α I . W e explore the effect of initialization in Figure 12 (in App endix A ). Rank of the underlying matrix. W e explore the effect of the rank of the underlying matrix, sho wing that as the rank increases it b ecomes harder for b oth SGMC and DMF to recov er the matrix. A remark able prop ert y of SGMC is that it is able to get a decent appro ximation of the effective rank of the matrix ev en with extremely low num ber of samples. These exp erimen ts are summarized in Figure 2 . Noisy graphs. W e study the effect of noisy graphs on the preformance of SGMC. Figure 3 demonstrates that SGMC is able to utilize graphs with substan tial amounts of noise b efore its p erformance drops to the level of v anilla DMF (which do es not rely on any kno wledge of the row/column graphs). 9 Spectral Geometric Ma trix Completion 1% 5% 10% DMF SGMC Figure 4: In these exp erimen ts, we plot the dynamics of the singular v alues of the product matrix X ( t ) during the gradient descent iterations. W e sho w singular v alue conv ergence at different sampling densities (left-to-righ t: 1% , 5% , and 10% ) for SGMC and DMF. W e use the synthetic Netflix graphs on which we generate a random rank 10 matrix of size 150 × 200 . In accordance with Figure 2 , we see that SGMC is able to reco ver the rank even for a v ery data-po or regime, whereas DMF demands significan tly higher sample complexit y . Dynamics. Figure 4 shows the dynamics of the singular v alues of X during the optimization. W e visually verify that they behav e according to ( 8 ) and that the con v ergence rate of the relev ant singular v alues is muc h faster in SGMC than in DMF. Co de. An in teractiv e jupyter notebo ok is a v ailable here . 5. Results on recommender systems datasets W e demonstrate the effectiveness of our approac h on the follo wing datasets: Synthetic Netflix, Flixster, Douban, Movielens (ML-100K) and Movielens-1M (ML-1M) as referenced in T able 1 . The datasets include user ratings for items (such as movies) and additional features. F or all the datasets w e use the users and items graphs taken from Monti et al. ( 2017 ) . The ML-1M dataset was taken from Berg et al. ( 2017 ) , for whic h we constructed 10 nearest neighbor graphs for users/items from the features, and used a Gaussian kernel with σ = 1 for edge w eights. See T able 4 in App endix A for a summary of the dataset statistics. F or all the datasets, we rep ort the results for the same test splits as that of Monti et al. ( 2017 ) and Berg et al. ( 2017 ). The compared metho ds are referenced in T able 1 . Prop osed baselines. W e rep ort the results obtained using the metho ds discussed ab ov e, with the addition of the following method: 10 Spectral Geometric Ma trix Completion • SGMC-Z : a v arian t of SGMC that uses ( 22 ) as a data term. F or this metho d we c hose a maximal v alue of p max , q max (whic h can b e larger than m, n ) and a skip determining the sp ectral resolution, denoted by p skip , q skip . W e use p = 1 + k p skip , q = 1 + k q skip , k ∈ N . In addition, we add the diagonalization terms ( 24 ), ( 25 ) weigh ted by ρ r , ρ c , resp ectiv ely , to the SGMC/SGMC-Z metho ds. The optimization is carried out using gradient descent with fixed step size (i.e., fixed learning rate), which is pro vided for each exp erimen t alongside all the other h yp er-parameters in T able 5 . Initialization. All our methods are deterministic and did not require m ultiple runs to accoun t for initialization. W e alw a ys initialize the matrices P , C , Q with 10 − α I . In Figure 12 w e rep orted results on synthetic Netflix and ML-100K datasets for different v alues of α . W e noticed that for SGMC and SGMC-Z it is b est to use 10 − α = 1 . A ccording to ( Gunasek ar et al. , 2017 ; Li et al. , 2017 ), DMF requires a large α to decrease the generalization error. W e used DMF with 10 − α = 0 . 01 for Synthetic Netflix and 10 − α = 1 for the real world datasets, in accordance with Figure 12 and our exp erimen tation. In the cases where only one of the bases was a v ailable, such as in Douban and Flixster-user only b enc hmarks, we set the basis corresp onding to the absent graph to identit y . Stopping condition. Our stopping condition for the gradien t descent iterations is based on a v alidation set. W e use 95% of the av ailable en tries for training (i.e., to construct the mask S ) and the rest 5% for v alidation. The 95 / 5 split was chosen at random. W e stop the iterations when the RMSE ( 27 ), ev aluated on the v alidation set, do es not change b y more than tol = 0 . 000001 b et w een tw o consecutive iterations, | RMSE k − RMSE k − 1 | < tol . Since we did not apply any optimization into the choice of the v alidation set, we also rep ort the b est RMSE achiev ed on the test set via early stopping. In this regard, the num ber of iterations is yet another h yp er parameter that has to b e tuned for b est p erformance. 5.1. Cold start analysis Comparison of test RMSE in the presence of cold start users on the ML-100K dataset. The x-axis corresp onds to the n umber of the cold start users N c = 50 , 100 , . . . 500 . Red, blue and green corresp ond to DMF, SGMC and SGMC-Z metho ds resp ectiv ely as also shown in the legend. Different shap es of the mark ers indicate differen t num ber of maximum ratings ( N r = { 1 , 5 , 10 } ) a v ailable p er cold-start user. A particularly interesting scenario in the con text of recommender sys- tems is the presence of c old-start users, referring to the users who ha v e not rated enough movies y et. W e p erform an analysis of the p er- formance of our metho d in the pres- ence of such cold start users on the ML-100K dataset. In order to generate a dataset consisting of N c cold start users, w e sort the users according to the n umber of ratings provided b y eac h user, and retain at most N r ratings (chosen randomly) of the b ottom N c users (i.e., the users who pro vided the 11 Spectral Geometric Ma trix Completion Mo del Syn thetic Neflix Flixster Douban ML-100K MC ( Candès and Rec h t , 2009 ) – 1 . 533 0 . 845 0 . 973 GMC ( Kalofolias et al. , 2014 ) 0 . 3693 – – 0 . 996 GRALS ( Rao et al. , 2015 ) 0 . 0114 1 . 313 / 1 . 245 0 . 833 0 . 945 R GCNN ( Monti et al. , 2017 ) 0 . 0053 a 1 . 179 / 0 . 926 0 . 801 0 . 929 GC-MC ( Berg et al. , 2017 ) – 0 . 941 / 0 . 917 0 . 734 0 . 910 b FM (ours) 0 . 0064 3 . 32 3 . 15 1 . 10 DMF ( Arora et al. , 2019 ), (ours) 0 . 0468 d 1 . 06 0 . 732 0 . 918 c / 0 . 922 SGMC (ours) 0 . 0021 0 . 971 / 0 . 900 0 . 731 0 . 912 SGMC-Z (ours) 0 . 0036 0 . 957 / 0 . 888 0 . 733 0 . 907 c / 0 . 913 a This num ber corresp onds to the inseparable version of MGCNN. b This num ber corresp onds to GC-MC. c Early stopping. d Initialization with 0 . 01 I . T able 1: RMSE test set scores for runs on Synthetic Netflix ( Mon ti et al. , 2017 ), Flixster ( Jamali and Ester , 2010 ), Douban ( Ma et al. , 2011 ), and Movielens-100K ( Harp er and K onstan , 2016 ). F or Flixster, w e sho w results for both user/item graphs (right n umber) and user graph only (left num b er). Baseline n umbers are taken from ( Monti et al. , 2017 ; Berg et al. , 2017 ). least ratings). W e choose the v al- ues N c = { 50 , 100 , 150 , 200 , 300 , 400 , 500 } and N r = { 1 , 5 , 10 } , and run our algorithms: DMF, SGMC and SGMC-Z, with the same hyperparameter settings used for obtaining T able 1 . W e use the official ML-100K test set for ev aluation. Similar to b efore, we use 5% of the training samples as a v alidation set used for determining the stopping condition. The results presented in the inline figure suggest that the SGMC and SGMC-Z outp erform DMF significantly , indicating the imp ortance of the geometry as data b ecomes scarcer. As exp ected, w e can see that the p erformance drops as the n um b er of ratings per user decreases. F urthermore, w e can observ e that SGMC-Z consistently outp erforms SGMC by a small margin. W e note that SGMC-Z, ev en in the presence of N c = 500 cold start users with N r = 5 ratings, is still able to outp erform the full data p erformance of Mon ti et al. ( 2017 ), demonstrating the strength of geometry and implicit lo w-rank induced by SGMC-Z. Scalabilit y . All the exp erimen ts presented in the pap er were conducted on a mac hine consisting of 64GB CPU memory , on an NVIDIA GTX 2080Ti GPU. Most of our large-scale exp erimen ts take upto 10-30 min utes of time un til con v ergence, therefore, are rather quick. In this work w e fo cused on the conceptual idea of solving matrix completion via the framework of deep matrix factorization by incorp orating geometric regularization, pa ying little atten tion to the issue of scalability . There are t w o main computational b ottlenec ks to our approach: The spatial v ersion ( 12 ) requires the computation of the matrix pro duct AZ B > in each gradien t iteration, and the sp ectral version requires also the eigenv alue decomp osition of L r , L c . These limitations apply to other graph neural netw orks as well ( Hu et al. , 2020 ), and w e b eliev e that they can b e at least partially addressed b y ad-ho c solutions. 5.2. Discussion A few remark able observ ations can b e extracted from T able 1 : First, on the Douban and ML-100K dataesets, v anilla DMF sho ws comp etitiv e performance with all the other metho ds. 12 Spectral Geometric Ma trix Completion This suggests that the geometric information is not very useful for these datasets. Second, the prop osed SGMC algorithms outp erform the other metho ds, despite their simple and fully linear arc hitecture. This suggests that the other geometric metho ds do not exploit the geometry prop erly , and this fact is obscured by their cum b ersome arc hitecture. Third, while some of the exp erimen ts rep orted in T able 1 sho w ed only slight margins in fav or of SGMC/SGMC-Z compared to DMF, the results in the Syn thetic Netflix column, the ones rep orted on Syn thetic Movielens-100K ( T able 3 in App endix A ) and the ones rep orted in Figure 2 , suggest that when the geometric mo del is accurate our metho ds demonstrate sup erior results. T able 2 in App endix A presents the results of Movielens-1M. First, we can deduce that v anilla DMF mo del is able to matc h the p erformance of complex alternativ es. F urthermore, using graphs pro duces sligh t improv emen ts ov er the DMF baseline and o v erall pro vides comp etitiv e p erformance compared to hea vily engineered metho ds. On Syn thetic Netflix, we notice that b y using SGMC, w e outp erform Mon ti et al. ( 2017 ) b y a significant margin, reducing the test RMSE by half. A dditionally , it can b e observed that DMF p erforms p oorly on b oth syn thetic datasets compared to SGMC/SGMC-Z, raising a question as to the qualit y of the graphs provided with those datasets on which DMF p erformed comparably . A comp elling argumen t for this behaviour is giv en b y T able 4 in App endix A . W e can see that in the real datasets we tested on, the n umber of a v ailable samples is w a y b elo w the densit y required by DMF to achiev e go od p erformance, in accordance with our findings in Section 4 . With high quality graphs, w e should hav e expected SGMC to outp erform DMF b y a large margin. 6. Results on drug-target in teraction In this section we demonstrate the effectiv eness of our approach on the problem of predicting drug-target in teraction (DTI). The task is to find effectiv e in teractions b et w een chemical comp ounds (drugs) and amino-acid sequences/proteins (targets). This is traditionally done through wet-lab experiments whic h are costly and lab orious, and lead to high attrition rate. One p ossible wa y to improv e this pro cedure is to predict interaction probabilities through a computational mo del. T o that end, DTI can b e in terpreted as a matrix completion problem where the ro ws corresp ond to different drugs and the columns corresp ond to differen t targets. Eac h entry in the matrix corresp onds to the probability of in teraction b et w een a drug and target. W e assume that we are given t w o graphs encoding similarities b et w een drugs and similarities b et w een targets. The similarit y b et w een tw o drugs is measured by the n um b er of shared substructures within their c hemical structures. The similarity betw een targets is giv en by their genomic sequence similarit y . These similarity measures constitute a standard similarit y score that is common in the DTI prediction task. F or more information on the problem and the construction of the graphs, we refer to Mongia and Ma jumdar ( 2020 ) and references therein. Datasets. W e use three b enchmark datasets introduced in Y amanishi et al. ( 2008 ), having three different classes of proteins: enzymes (Es), ion channels (ICs), and G protein-coupled receptors (GPCRs). The data was simulated from public databases KEGG BRITE ( Kanehisa et al. , 2006 ), BREND A ( Sc hom burg et al. , 2004 ) Sup erT arget ( Günther et al. , 2007 ) and 13 Spectral Geometric Ma trix Completion DrugBank ( Wishart et al. , 2008 ), and is publicly av ailable 3 . The data from each of these databases is formatted as an adjacency matrix b et ween drugs and targets enco ding the in teraction as 1 if drug-target pair are kno wn to interact and 0 otherwise. Baselines. W e v alidated our prop osed metho d by comparing it with three recen t metho ds prop osed in the literature: MGRNNM ( Mongia and Ma jumdar , 2020 ), GRMF ( Ezzat et al. , 2016 ), CMF ( Zheng et al. , 2013 ). F or all the baselines we ran the publicly a v ailable co de 4 on the aforementioned datasets using the same graphs and same train-test splits. Ev aluation proto col. Similarly to ( Mongia and Ma jumdar , 2020 ), we hav e p erformed 5 runs (with different random seeds) of 10 -fold cross-v alidation for each of the algorithms under three cross-v alidation settings (CVS): • CVS1/P air prediction: random drug–target pairs are chosen randomly for the test set. It is the con v en tional setting for v alidation and ev aluation. • CVS2/Drug prediction: complete drug profiles are left out of the training set, i.e., some ro ws are absent. This tests the algorithm’s ability to predict interactions for no v el drugs for which no in teraction information is a v ailable. • CVS3/T arget prediction: complete target profiles are left out of the training set. i.e., some columns are absen t. It tests the algorithm’s ability to predict in teractions for no v el targets. Out of the 10 folds one was left out for testing whereas the remaining 9 folds were used as the training set. T o ev aluate performence we measure area under R OC curv e (A UC), area under the precision-recall curv e (AUPR), and RMSE. In biological drug disco v ery , A UPR is of more significance since it p enalizes high rank ed false p ositiv e in teractions muc h more than A UC. Those pairs would b e biologically v alidated later in the drug discov ery pro cess. The results are summarised in T able 6 in App endix B . Discussion. T able 6 clearly shows that SGMC mostly outp erforms the other metho ds in all metrics. This is without applying an y particular task sp ecific optimization of the loss function and other hyper-parameters. In particular, the RMSE criterion, which is the one optimized b y all the metho ds, is significantly lo w er for SGMC compared to other matrix factorization algorithms. This serves as a further reinforcement of the strength of the implicit regularization in SGMC compared to the nuclear norm ( Mongia and Ma jumdar , 2020 ) and explicit low rank matrix factorization metho ds ( Ezzat et al. , 2016 ). 7. Related w ork Geometric matrix completion. There is a v ast literature on classical approac hes for matrix completion, and cov ering it is b ey ond the scope of this pap er. In recent y ears, the adv ent of deep learning platforms equipp ed with efficien t automatic differen tiation to ols allo ws the exploration of sophisticated mo dels that incorp orate intricate regularizations. Some of these con temp orary approac hes to matrix completion fall under the um brella term 3. h ttp://web.kuicr.ky oto-u.ac.jp/supp/y oshi/drugtarget/ 4. h ttps://github.com/aanc halMongia/MGRNNMforDTI 14 Spectral Geometric Ma trix Completion of ge ometric de ep le arning , which generalizes standard (Euclidean) deep learning to domains suc h as graphs and manifolds. F or example, gr aph c onvolutional neur al networks (GCNNs) follo w the architecture of standard CNNs, but replace the Euclidean conv olution operator with linear filters constructed using the graph Laplacian. W e distinguish b et ween inductive approac hes to matrix completion, which work directly on the users and items features to predict the rating matrix (e.g., Berg et al. ( 2017 )), and tr ansductive approac hes, which mak e use of side information to construct graphs enco ding relations b et ween ro ws/columns ( K o vnatsky et al. , 2014 ; Kalofolias et al. , 2014 ; Monti et al. , 2017 ). More recently , it has b een demonstrated that some graph CNN architectures can b e greatly simplified, and still p erform comp etitiv ely on several graph analysis tasks ( W u et al. , 2019 ). Suc h simple techniques ha v e the adv an tage of b eing easier to analyze and repro duce. One of the simplest notable approaches is de ep line ar networks , netw orks comprising of only linear lay ers. While these netw ork are still mostly used for theoretical inv estigations, we note the recent results of Bell-Kligler et al. ( 2019 ) who successfully emplo yed suc h a netw ork for the tasks of blind image deblurring, and ( Ric hardson and W eiss , 2020 ) who used it for image-to-image translation. Jing et al. ( 2020 ) show ed that ov erparametrized linear lay ers can b e used for implicit rank minimization within a generativ e mo del. A closely related field dealing with reconstruction of signals defined on graphs is graph signal pro cessing. In this field the problem is attack ed b y extending results from harmonic analysis to problems defined on graphs. F or example, Puy and Pérez ( 2018 ); Puy et al. ( 2018 ) hav e dev elop ed a random sampling strategy that provides reconstruction guaran tees for bandlimited signals on graphs. The reconstruction is p erformed via minimizing an l 2 data term with Dirichlet regularization ( 3 ). Random sampling schemes and reconstruction guaran tees for bandlimited signals on pro duct graphs were dev elop ed in Ortiz-Jiménez et al. ( 2018 ); V arma and Ko v acevic ( 2018 ). While these results are extremely useful in designing sampling strategies for bandlimited signals on graphs, they are of less use when we are giv en the samples upfron t and ha v e no abilit y to con trol the sampling pro cess. Neve rtheless, their analysis sheds light on the success of sp ectral regularization in reconstruction problems on graphs and we in tend to integrate these ideas with our approac h in the future. Pro duct manifold filter & Zo omout. The inspiration for our pap er stems from tech- niques for finding shap e corresp ondence. In particular, the functional maps framew ork and its v ariants ( Ovsjanik o v et al. , 2012 , 2016 ). Most notably the w ork of Litany et al. ( 2017 ) who com bined functional maps with joint diagonalization to solv e partial shap e matching problems, and the pr o duct manifold filter (PMF) ( V estner et al. , 2017a , b ) and zo omout ( Melzi et al. , 2019 ) – tw o greedy algorithms for corresp ondence refinemen t by gradual introduction of high frequencies. 8. Conclusion In this work w e ha ve proposed a simple sp ectral tec hnique for matrix completion, building up on recen t practical and theoretical results in geometry pro cessing and deep linear net w orks. W e hav e shown, through extensiv e exp erimen tation on real and synthetic datasets across domains, that combining the implicit regularization of DMF with explicit, and p ossibly noisy , geometric priors can b e extremely useful in data-p oor regimes. Our work is a step to wards building interpretable mo dels that are grounded in theory , and prov es that such 15 Spectral Geometric Ma trix Completion simple models need not only b e considered for theoretical study . Through a prop er lens, they can b e made useful. Figure 6: A few lines of co de. A c kno wledgmen ts W e thank Angshul Ma jumdar for useful discussions, and for pro viding datasets and co de for the matrix completion metho ds used in drug-target interaction prediction. This research w as supp orted b y ERC StG RAPID and ERC CoG EARS. References Sanjeev Arora, Nada v Cohen, and Elad Hazan. On the optimization of deep net w orks: Implicit acceleration by o v erparameterization, 2018. Sanjeev Arora, Nadav Cohen, W ei Hu, and Y uping Luo. Implicit regularization in deep matrix factorization. arXiv pr eprint arXiv:1905.13655 , 2019. Sefi Bell-Kligler, Assaf Sho c her, and Mic hal Irani. Blind sup er-resolution kernel estimation using an internal-gan. In A dvanc es in Neur al Information Pr o c essing Systems 32 , pages 284–293. 2019. 16 Spectral Geometric Ma trix Completion Rianne v an den Berg, Thomas N Kipf, and Max W elling. Graph con v olutional matrix completion. arXiv pr eprint arXiv:1706.02263 , 2017. Emman uel J Candès and Benjamin Rec h t. Exact matrix completion via con vex optimization. F oundations of Computational mathematics , 9(6):717, 2009. Gin tare Karolina Dziugaite and Daniel M. Roy . Neural netw ork matrix factorization. CoRR , abs/1511.06443, 2015. URL . Ali Ezzat, P eilin Zhao, Min W u, Xiao-Li Li, and Chee-Keong Kw oh. Drug-target in teraction prediction with graph regularized matrix factorization. IEEE/A CM tr ansactions on c omputational biolo gy and bioinformatics , 14(3):646–656, 2016. Aksha y Gadde, Sunil K Narang, and Antonio Ortega. Bilateral filter: Graph sp ectral in terpretation and extensions. In 2013 IEEE International Confer enc e on Image Pr o c essing , pages 1222–1226. IEEE, 2013. Suriy a Gunasek ar, Blake E W o o dw orth, Srinadh Bho janapalli, Behnam Neyshabur, and Nati Srebro. Implicit regularization in matrix factorization. In A dvanc es in Neur al Information Pr o c essing Systems , pages 6151–6159, 2017. Stefan Gün ther, Michael Kuhn, Mathias Dunk el, Monica Campillos, Christian Senger, Ev angelia P etsalaki, Jessica Ahmed, Eduardo Garcia Urdiales, Andreas Gewiess, Lars Juhl Jensen, et al. Sup ertarget and matador: resources for exploring drug-target relationships. Nucleic acids r ese ar ch , 36(suppl_1):D919–D922, 2007. F Maxw ell Harp er and Joseph A Konstan. The movielens datasets: History and context. A cm tr ansactions on inter active intel ligent systems (tiis) , 5(4):19, 2016. URL https: //grouplens.org/datasets/movielens/ . W eihua Hu, Matthias F ey , Marink a Zitnik, Y uxiao Dong, Hongyu Ren, Bow en Liu, Michele Catasta, and Jure Lesko v ec. Op en graph b enc hmark: Datasets for mac hine learning on graphs. arXiv pr eprint arXiv:2005.00687 , 2020. Mohsen Jamali and Martin Ester. A matrix factorization tec hnique with trust propagation for recommendation in so cial netw orks. In Pr o c e e dings of the fourth A CM c onfer enc e on R e c ommender systems , pages 135–142. ACM, 2010. Li Jing, Jure Zb on tar, et al. Implicit rank-minimizing auto encoder. A dvanc es in Neur al Information Pr o c essing Systems , 33, 2020. V assilis Kalofolias, Xavier Bresson, Mic hael Bronstein, and Pierre V andergheynst. Matrix completion on graphs. arXiv pr eprint arXiv:1408.1717 , 2014. Minoru Kanehisa, Susum u Goto, Masahiro Hattori, Kiyok o F Aoki-Kinoshita, Masumi Itoh, Sh uichi Kaw ashima, T oshiaki Katay ama, Michihiro Araki, and Mik a Hirak aw a. F rom genomics to c hemical genomics: new dev elopments in kegg. Nucleic acids r ese ar ch , 34 (suppl_1):D354–D357, 2006. 17 Spectral Geometric Ma trix Completion Y ehuda K oren, Rob ert Bell, and Chris V olinsky . Matrix factorization techniques for recom- mender systems. Computer , 42(8):30–37, August 2009. ISSN 0018-9162. Artiom Ko vnatsky , Michael M. Bronstein, Xavier Bresson, and Pierre V andergheynst. F unc- tional corresp ondence b y matrix completion, 2014. Jo onseok Lee, Seungyeon Kim, Guy Lebanon, Y oram Singer, and Samy Bengio. Llorma: Lo cal low-rank matrix appro ximation. Journal of Machine L e arning R ese ar ch , 17(15):1–24, 2016. Y uanzhi Li, T engyu Ma, and Hongyang Zhang. Algorithmic regularization in o v er- parameterized matrix sensing and neural net w orks with quadratic activ ations. arXiv pr eprint arXiv:1712.09203 , 2017. Or Litany , Emanuele Ro dolà, Alexander M Bronstein, and Michael M Bronstein. F ully sp ectral partial shap e matching. In Computer Gr aphics F orum , v olume 36, pages 247–258. Wiley Online Library , 2017. Hao Ma, Dengyong Zhou, Chao Liu, Michael R Lyu, and Irwin King. Recommender systems with so cial regularization. In Pr o c e e dings of the fourth ACM international c onfer enc e on W eb se ar ch and data mining , pages 287–296. ACM, 2011. Simone Melzi, Jing Ren, Emanuele Rodola, Maks Ovsjaniko v, and Peter W onk a. Zo omout: Sp ectral upsampling for efficient shape corresp ondence. arXiv pr eprint arXiv:1904.07865 , 2019. Aanc hal Mongia and Angshul Ma jumdar. Drug-target interaction prediction using multi graph regularized nuclear norm minimization. Plos one , 15(1):e0226484, 2020. F ederico Monti, Mic hael Bronstein, and Xavier Bresson. Geometric matrix completion with recurren t m ulti-graph neural netw orks. In A dvanc es in Neur al Information Pr o c essing Systems , pages 3697–3707, 2017. Guillermo Ortiz-Jiménez, Mario Coutino, Sundeep Prabhak ar Chepuri, and Geert Leus. Sampling and reconstruction of signals on pro duct graphs. In 2018 IEEE Glob al Confer enc e on Signal and Information Pr o c essing (Glob alSIP) , pages 713–717. IEEE, 2018. Maks Ovsjaniko v, Mirela Ben-Chen, Justin Solomon, Adrian Butsc her, and Leonidas Guibas. F unctional maps: a flexible representation of maps b et w een shap es. A CM T r ansactions on Gr aphics (TOG) , 31(4):30, 2012. Maks Ovsjaniko v, Etienne Corman, Mic hael Bronstein, Eman uele Ro dolà, Mirela Ben- Chen, Leonidas Guibas, F rederic Chazal, and Alex Bronstein. Computing and pro cessing corresp ondences with functional maps. In SIGGRAPH ASIA 2016 Courses , page 9. ACM, 2016. Gilles Puy and P atrick Pérez. Structured sampling and fast reconstruction of smo oth graph signals. Information and Infer enc e: A Journal of the IMA , 7(4):657–688, 2018. 18 Spectral Geometric Ma trix Completion Gilles Puy , Nicolas T rembla y , Rémi Grib on v al, and Pierre V andergheynst. Random sampling of bandlimited signals on graphs. Applie d and Computational Harmonic Analysis , 44(2): 446–475, 2018. Nikhil Rao, Hsiang-F u Y u, Pradeep K Ravikumar, and Inderjit S Dhillon. Collab orativ e filtering with graph information: Consistency and scalable methods. In A dvanc es in neur al information pr o c essing systems , pages 2107–2115, 2015. An toine Recanati. R elaxations of the Seriation pr oblem and applic ations to de novo genome assembly . PhD thesis, 2018. Eitan Richardson and Y air W eiss. The surprising effectiveness of linear unsup ervised image- to-image translation. arXiv pr eprint arXiv:2007.12568 , 2020. Olivier Roy and Martin V etterli. The effective rank: A measure of effectiv e dimensionality . In 2007 15th Eur op e an Signal Pr o c essing Confer enc e , pages 606–610. IEEE, 2007. Ruslan Salakhutdino v and Andriy Mnih. Probabilistic matrix factorization. In Pr o c e e dings of the 20th International Confer enc e on Neur al Information Pr o c essing Systems , NIPS’07, pages 1257–1264, USA, 2007. Curran Asso ciates Inc. ISBN 978-1-60560-352-0. Ruslan Salakhutdino v, Andriy Mnih, and Geoffrey Hin ton. Restricted b oltzmann mac hines for collab orativ e filtering. In Pr o c e e dings of the 24th International Confer enc e on Machine L e arning , ICML ’07, pages 791–798, New Y ork, NY, USA, 2007. ACM. ISBN 978-1-59593- 793-3. Ida Schom burg, An tje Chang, Christian Eb eling, Marion Gremse, Christian Heldt, Gregor Huhn, and Dietmar Schom burg. Brenda, the enzyme database: updates and ma jor new dev elopmen ts. Nucleic acids r ese ar ch , 32(suppl_1):D431–D433, 2004. Suv ash Sedhain, Adit y a Krishna Menon, Scott Sanner, and Lexing Xie. Autorec: Auto en- co ders meet collab orativ e filtering. In Pr o c e e dings of the 24th International Confer enc e on W orld Wide W eb , WWW ’15 Companion, pages 111–112, New Y ork, NY, USA, 2015. A CM. ISBN 978-1-4503-3473-0. Amit Singer, Y o el Shk olnisky , and Boaz Nadler. Diffusion in terpretation of nonlo cal neigh- b orhoo d filters for signal denoising. SIAM Journal on Imaging Scienc es , 2(1):118–139, 2009. Daniel Spielman. Sp ectral graph theory . L e ctur e Notes, Y ale University , pages 740–0776, 2009. Rohan A V arma and Jelena K o v acevic. Sampling theory for graph signals on pro duct graphs. In 2018 IEEE Glob al Confer enc e on Signal and Information Pr o c essing (Glob alSIP) , pages 768–772. IEEE, 2018. Matthias V estner, Zorah Lähner, Amit Boy arski, Or Litan y , Ron Slossb erg, T al Remez, Eman uele Ro dola, Alex Bronstein, Mic hael Bronstein, Ron Kimmel, et al. Efficien t deformable shap e corresp ondence via kernel matc hing. In 2017 International Confer enc e on 3D Vision (3DV) , pages 517–526. IEEE, 2017a. 19 Spectral Geometric Ma trix Completion Matthias V estner, Ro ee Litman, Emanuele Ro dolà, Alex Bronstein, and Daniel Cremers. Pro duct manifold filter: Non-rigid shap e corresp ondence via k ernel densit y estimation in the pro duct space. In Pr o c e e dings of the IEEE Confer enc e on Computer Vision and Pattern R e c o gnition , pages 3327–3336, 2017b. Da vid S Wishart, Craig Knox, An Chi Guo, Dean Cheng, Sa vita Shriv asta v a, Dan T zur, Bija ya Gautam, and Murtaza Hassanali. Drugbank: a kno wledgebase for drugs, drug actions and drug targets. Nucleic acids r ese ar ch , 36(suppl_1):D901–D906, 2008. F elix W u, Tianyi Zhang, Amauri Holanda de Souza Jr, Christopher Fifty , T ao Y u, and Kilian Q W einberger. Simplifying graph conv olutional netw orks. arXiv pr eprint arXiv:1902.07153 , 2019. Y oshihiro Y amanishi, Michihiro Araki, Alex Gutteridge, W ataru Honda, and Minoru Kanehisa. Prediction of drug–target in teraction netw orks from the integration of chemical and genomic spaces. Bioinformatics , 24(13):i232–i240, 2008. Xiao dong Zheng, Hao Ding, Hiroshi Mamitsuk a, and Shanfeng Zh u. Collaborative matrix factorization with multiple similarities for predicting drug-target interactions. In Pr o c e e dings of the 19th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 1025–1033, 2013. Yin Zheng, Bangsheng T ang, W enkui Ding, and Hanning Zhou. A neural autoregressive approac h to collab orativ e filtering. In Maria Florina Balcan and Kilian Q. W ein b erger, editors, Pr o c e e dings of The 33r d International Confer enc e on Machine L e arning , v olume 48 of Pr o c e e dings of Machine L e arning R ese ar ch , pages 764–773, New Y ork, New Y ork, USA, 20–22 Jun 2016. PMLR. App endix A. Recommendation systems Ablation study . W e study the effects of different hyper-parameters of the algorithms on the final reconstruction of the matrix. W e p erform an ablation study on the effects of ρ, µ, p max , q max on DMF, SGMC and SGMC-Z. The results are summarized in Figures 7 , 8 , 9 . It is interesting to note that in the case of DMF and SGMC, ov erparametrizing C , Q , P consisten tly improv es the p erformance (see Figure 9 ), but it only holds up to a certain p oin t, beyond which the o verparametrization do es not seem to effect the reconstruction error. Notice that in the T able 5 , µ r , µ c con trol the Dirichlet energy of rows and columns; while ρ r , ρ c go v ern the w eigh ts of row/column diagonalization energy . Syn thetic Mo vieLens-100K. While the exp erimen ts rep orted in T able 1 show ed sligh t margins in fav or of metho ds using geometry , w e further experimented with a syn thetic model generated from the ML-100K dataset. The purp ose of this exp eriment is to in vestigate whether the results are due to the DMF mo del or due to the geometry as incorp orated b y SGMC/SGMC-Z. The syn thetic mo del w as generated b y pro jecting M on the first 50 eigen vecto rs of L r , L c , and then matc hing the ratings histogram with that of the original ML-100K dataset. This nonlinear op eration increased the rank of the matrix from 50 to ab out 400 . See Figure 10 in the App endix A for a visualization of the full matrix, singular 20 Spectral Geometric Ma trix Completion v alue distribution and the users/items graphs. The test set and training set were generated randomly and are the same size as those of the original dataset. The results rep orted in T able 3 and those on the Synthetic Netflix column in T able 1 clearly indicate that SGMC/SGMC-Z outp erforms DMF, suggesting that when the geometric mo del is accurate it is p ossible to use it to impro v e the results. Mo del ML-1M PMF ( Salakhutdino v and Mnih , 2007 ) 0 . 883 I-RBM ( Salakhutdino v et al. , 2007 ) 0 . 854 BiasMF ( Koren et al. , 2009 ) 0 . 845 NNMF ( Dziugaite and Ro y , 2015 ) 0 . 843 LLORMA-Lo cal ( Lee et al. , 2016 ) 0 . 833 I-A UTOREC ( Sedhain et al. , 2015 ) 0 . 831 CF-NADE ( Zheng et al. , 2016 ) 0 . 829 GC-MC ( Berg et al. , 2017 ) 0 . 832 DMF ( Arora et al. , 2019 ), (ours) 0 . 843 SGMC (ours) 0 . 839 T able 2: Comparison of test RMSE scores on Movielens-1M dataset. Baseline scores are taken from ( Zheng et al. , 2016 ; Berg et al. , 2017 ) Mo del Syn thetic ML-100K DMF 0 . 9147 SGMC 0 . 5006 SGMC-Z 0 . 4777 T able 3: Comparison of av erage RMSE of DMF, SGMC and SGMC-Z baselines calculated on 5 randomly generated Syn thetic Mo vielens-100K datasets. ( a ) ( b ) Figure 7: Ablating ρ r = ρ c and µ r = µ c of SGMC on the ML-100K dataset. The rest of the parameters were set to the ones rep orted in T able 5 . Green X denotes the baseline from T able 1 . 21 Spectral Geometric Ma trix Completion Dataset Users Items F eatures Ratings Densit y Rating levels Flixster 3 , 000 3 , 000 Users/Items 26 , 173 0 . 0029 0 . 5 , 1 , . . . , 5 Douban 3 , 000 3 , 000 Users 136 , 891 0 . 0152 1 , 2 , . . . , 5 Mo vieLens-100K 943 1 , 682 Users/Items 100 , 000 0 . 0630 1 , 2 , . . . , 5 Mo vieLens-1M 6 , 040 3 , 706 Users/Items 1 , 000 , 209 0 . 0447 1 , 2 , . . . , 5 Syn thetic Netflix 150 200 Users/Items 4500 0 . 15 1 . . . 5 a Syn thetic ML-100K 943 1 , 682 Users/Items 100 , 000 0 . 0630 1 , 2 , . . . , 5 T able 4: Num b er of users, items and ratings for Flixster, Douban, Mo vielens-100K, Mo vielens-1M, Syn thetic Netflix and Syn thetic Movielens-100K datasets used in our exp erimen ts and their resp ectiv e rating density and rating lev els. a The ratings are not integer-v alued. ( a ) ( b ) Figure 8: Ablating ρ r , ρ c and µ r , µ c of SGMC-Z on the ML-100K dataset. The rest of the parameters were set to the ones rep orted in T able 5 . ( a ) ( b ) Figure 9: Effect of ov erparametrization: SGMC (left) and DMF (right). x-axis indicates the v alues of p max , q max , and y-axis presen ts the RMSE. Green X denotes the baseline from T able 1 . 22 Spectral Geometric Ma trix Completion ( a ) ( b ) ( c ) ( d ) Figure 10: Syn thetic Mo vielens-100k. T op-left: F ull matrix. T op-righ t: singular v alues of the full matrix. Bottom left & right: items & users graph. Both graphs are constructed using 10 nearest neighbors. 23 Spectral Geometric Ma trix Completion ( a ) ( b ) ( c ) ( d ) Figure 11: Syn thetic Netflix. T op-left: F ull matrix. T op-right: singular v alues of the full matrix. Bottom left & right: items & users graph. T aken from ( Monti et al. , 2017 ). ( a ) ( b ) Figure 12: Reconstruction error (on the test set) vs. scale of initialization. F or each metho d w e initialized P , Q with α I . SGMC consisten tly outp erforms DMF for an y initialization. 24 Spectral Geometric Ma trix Completion Dataset Metho d p max /q max p skip /q skip µ r /µ c ρ r /ρ c T rainable v ariables learning rate DMF 200 / 200 − / − − / − − / − P , C , Q 5 × 10 − 5 FM 200 / 200 1 / 1 0 . 4 / 0 . 4 − / − C 5 × 10 − 4 Syn thetic Netflix SGMC 20 / 20 − / − 0 . 001 / 0 . 001 0 . 1 / − P , C 5 × 10 − 3 SGMC-Z 500 / 500 3 / 1 0 . 4 / 0 . 4 0 . 1 / 0 . 1 P , C 5 × 10 − 5 DMF 3000 / 3000 − / − − / − − / − P , C , Q 1 × 10 − 4 Flixster SGMC 3000 / 3000 − / − 0 . 0001 / 0 . 0001 0 . 0001 / 0 . 0001 P , C , Q 1 × 10 − 4 SGMC-Z 200 / 200 2 / 2 0 . 0025 / 0 . 0025 − / − P , C 5 × 10 − 6 Flixster SGMC 3000 / 3000 − / − 0 . 0001 / − 0 . 0001 / − P , C , Q 5 × 10 − 5 (users only) SGMC-Z 200 / 200 20 / 20 0 . 0025 / − 0 . 001 / − P , C , Q 5 × 10 − 7 DMF 3000 / 3000 − / − − / − − / − P , C , Q 6 × 10 − 6 Douban SGMC 2500 / 2500 − / − 0 . 001 / − 0 . 001 / − P , C , Q 2 × 10 − 6 SGMC-Z 1000 / 1000 50 / 1000 0 . 011 / 0 0 . 004 / 0 P , C , Q 2 × 10 − 6 DMF 2000 / 2000 − / − − / − − / − P , C , Q 5 × 10 − 5 ML-100K SGMC 4000 / 4000 − / − 0 . 0003 / 0 . 0003 0 . 0001 / 0 . 0001 P , C , Q 5 × 10 − 5 SGMC-Z 3200 / 3200 30 / 35 0 . 03 / 0 . 03 0 . 2 / 0 . 2 P , C , Q 3 × 10 − 7 DMF 7000 / 7000 − / − − / − − / − P , C , Q 1 × 10 − 5 ML-1M SGMC 7000 / 7000 − / − 0 . 0001 / 0 . 0001 − / − P , C 8 × 10 − 5 DMF 8000 / 8000 − / − − / − − / − P , C , Q 9 × 10 − 5 Syn thetic ML-100K SGMC 600 / 600 − / − 0 . 001 / 0 . 001 0 . 009 / 0 . 009 P , C , Q 2 × 10 − 5 SGMC-Z 500 / 500 3 / 1 0 . 001 / 0 . 001 0 . 009 / 0 . 009 P , C 5 × 10 − 6 T able 5: Hyp er-parameter settings for the algorithms: DMF, SGMC and SGMC-Z, rep orted in T ables 1 , 2 , 3 . 25 Spectral Geometric Ma trix Completion App endix B. Drug-target in teraction Es MGRNNM GRMF CMF SGMC CVS1 A UC 0.9940 ± 0.0019 0.9900 ± 0.0017 0.8443 ± 0.0178 0.9967 ± 0.0016 A UPR 0.9559 ± 0.0059 0.9295 ± 0.0081 0.6733 ± 0.0238 0.9729 ± 0.0033 RMSE 0.0441 ± 0.0008 0.0476 ± 0.0008 0.2045 ± 0.0079 0.0432 ± 0.0007 CVS2 A UC 0.9333 ± 0.0229 0.9582 ± 0.0138 0.9183 ± 0.1068 0.9656 ± 0.0168 A UPR 0.8350 ± 0.0364 0.8553 ± 0.0284 0.3406 ± 0.0726 0.8565 ± 0.0285 RMSE 0.0776 ± 0.0065 0.0827 ± 0.0066 0.5427 ± 0.0628 0.0541 ± 0.0049 CVS3 A UC 0.9709 ± 0.0109 0.9674 ± 0.0159 0.8525 ± 0.0188 0.9760 ± 0.1009 A UPR 0.0909 ± 0.0280 0.9011 ± 0.0292 0.1958 ± 0.0636 0.9134 ± 0.0373 RMSE 0.0959 ± 0.0039 0.0925 ± 0.0036 0.1842 ± 0.0370 0.0512 ± 0.0037 GPCRs MGRNNM GRMF CMF SGMC CVS1 A UC 0.9770 ± 0.0068 0.9765 ± 0.0061 0.9129 ± 0.0114 0.9831 ± 0.0065 A UPR 0.7995 ± 0.023 0.8000 ± 0.0028 0.7306 ± 0.0164 0.8691 ± 0.02 RMSE 0.1136 ± 0.0027 0.1139 ± 0.0026 0.7306 ± 0.0164 0.0954 ± 0.0026 CV-B AUC 0.9664 ± 0.0087 0.9705 ± 0.0091 0.9601 ± 0.0153 0.9750 ± 0.0009 A UPR 0.8936 ± 0.0187 0.8892 ± 0.0188 0.8754 ± 0.0364 0.8836 ± 0.0199 RMSE 0.1440 ± 0.0070 0.1476 ± 0.0065 0.1381 ± 0.0151 0.1009 ± 0.0060 CVS3 A UC 0.8762 ± 0.0258 0.9297 ± 0.0170 0.7843 ± 0.0701 0.9299 ± 0.0258 A UPR 0.6866 ± 0.0658 0.7149 ± 0.0493 0.2256 ± 0.1021 0.7232 ± 0.0566 RMSE 0.1495 ± 0.0150 0.1499 ± 0.0173 1.5743 ± 0.2302 0.1179 ± 0.0099 ICs MGRNNM GRMF CMF SGMC CVS1 A UC 0.9947 ± 0.0013 0.9922 ± 0.0015 0.8745 ± 0.0134 0.9964 ± 0.001 A UPR 0.9584 ± 0.0038 0.9527 ± 0.0043 0.8172 ± 0.0259 0.9784 ± 0.0023 RMSE 0.0874 ± 0.0047 0.0780 ± 0.0021 0.2487 ± 0.0085 0.0710 ± 0.0015 CVS2 A UC 0.0971 ± 0.0142 0.9689 ± 0.0138 0.9229 ± 0.0184 0.9714 ± 0.0156 A UPR 0.9026 ± 0.0326 0.9014 ± 0.0314 0.6426 ± 0.0632 0.9044 ± 0.0308 RMSE 0.1780 ± 0.0118 0.1548 ± 0.0122 0.4632 ± 0.1578 0.0948 ± 0.0117 CVS3 A UC 0.9547 ± 0.0188 0.9703 ± 0.0115 0.7781 ± 0.0344 0.9731 ± 0.0116 A UPR 0.9030 ± 0.0341 0.9147 ± 0.0304 0.2198 ± 0.0580 0.9196 ± 0.0264 RMSE 0.0901 ± 0.0084 0.1520 ± 0.0045 0.3598 ± 0.0615 0.0911 ± 0.0068 T able 6: Results obtained on three drug-target interaction datasets: enzymes (Es), ion c hannels (ICs), G protein-coupled receptors (GPCRs). Eac h en try presen ts the mean and standard deviation across 5 runs (with differen t random seeds) of 10-fold cross v alidation. Descriptions of the ev aluated baselines are rep orted in the text. Colored in green are the cases where SGMC ranks first, and in red are the cases where SGMC ranks is second or third. 26
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment