Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

1 Coupled Graphs and T ensor F actor ization f or Recommender Systems and Community Detection V assilis N. Ioannidis, Student member , IEEE, Ahmed S. Zamzam, Student member , IEEE, Georgios B. Giannakis , F ellow , IEEE, and Nicholas D . Sidiropoulos , F ellow , IEEE Abstract —Joint analysis of data from multiple inf ormation repositor ies f acilitates uncovering the underlying structure in heterogeneous datasets. Single and coupled matrix-tensor factorization (CMTF) has been widely used in this conte xt f or imputation-based recommendation from ratings, social network, and other user-item data. When this side information is in the form of item-item correlation matrices or graphs, e xisting CMTF algorithms may f all short. Alleviating current limitations, we introduce a no vel model coined coupled graph-tensor f actorization (CGTF) that judiciously accounts for g raph-related side information. The CGTF model has the potential to ov ercome practical challenges, such as missing slabs from the tensor and/or missing rows/columns from the correlation matr ices. A no vel alternating direction method of multipliers (ADMM) is also developed that reco vers the nonnegative f actors of CGTF . Our algorithm enjoys closed-form updates that result in reduced computational comple xity and allow f or conv ergence claims. A nov el direction is further explored b y emplo ying the inter pretable f actors to detect graph communities having the tensor as side inf ormation. The resulting community detection approach is successful ev en when some links in the graphs are missing. Results with real data sets corroborate the merits of the proposed methods relative to state-of-the-art competing factorization techniques in providing recommendations and detecting communities. Index T erms —T ensor-matrix factorization, tensor-graph imputation, gr aph data, recommender systems, community detection. F 1 I N T R O D U C T I O N Multi-relational data emerge in applications as diverse as social networks, recommender systems, biomedical imaging, computer vision and communication networks, and are typically modeled using high-order tensors [3]. However , in many real settings only a subset of the data is observed due to application-speciﬁc restrictions. For example, in recommender systems ratings of new users ar e missing; in social applications individuals may be r eluctant to share personal information due to privacy concerns; and brain data may contain misses due to inadequate spatial resolution. In this context, a task of paramount importance is to infer unavailable entries given the available data. Inference of unavailable tensor data can certainly beneﬁt from side information that can be available in the form of correlations, social interactions, or , biological relations, all of which can be captured by a graph [4]. In recommender systems for instance, one may beneﬁt from available user - user interactions over a social network to impute the missing • V . N. Ioannidis, A. S. Zamzam, and G. B. Giannakis are with the ECE Dept. and Digital T ech. Center , Univ . of Minnesota, Mpls, MN 55455, USA. E-mails: { ioann006, ahmedz, geor gios } @umn.edu • N. D. Sidiropoulos is with the ECE Dept., Univ . of V irginia, Charlottesville, V A 22903, USA. E-mail: nikos@virginia.edu The work of V . N. Ioannidis and G. B. Giannakis was supported by NSF grants 1442686, 1514056, and 1711471. The work of A. S. Zamzam and N. D. Sidiropoulos was partially supported by NSF grant CIF-1525194. V . N. Ioannidis is also supported by the Doctoral Dissertation Fellowship from the University of Minnesota. Preliminary r esults of this work were pr esented in [1], [2]. A summary of differences is included in the supplementary material. ratings, and also extrapolate (that is predict) proﬁtable recommendations to new costumers. In addition to graph-aided inference of tensor data, beneﬁts can be effected in the opposite dir ection through tensor data employed to improve graph inference tasks, such as community detection (CD). CD amounts to ﬁnding clusters of vertices densely connected within each cluster and scarcely connected across clusters [5]. A major challenges emerges here when some links in the graph ar e missing due to privacy or observation constraints. In a social network for example, not all users will provide their social network connections. Additional data organized in a tensor can be utilized to improve CD performance and cope with the missing lnks of the graph. The present paper develops a novel appr oach to infer ence with incomplete data by jointly leveraging tensor factoriza- tion and associated graphs. 1.1 Related work Matrix factorization (MF) techniques have been employed for matrix completion with documented success in user-item recommender systems [6]. MF-based techniques assume that the ratings matrix is of low rank, and hence can be modeled by a reduced number of factors. Although the two-relation recommendation model has wide applicability , multi-r elation data motivate the use of high-dimensional tensor models. Scalable algorithms for nonnegative tensor factorization (TF) have been pursued [7], but do not consider further structur e on tensor modes or any other form of side information. Side information in the form of matrices sharing factors with a data tensor has been investigated in the so-termed 2 coupled matrix-tensor factorization (CMTF) [8], [9], [10]. T ypically , CMTF adopts a low-rank model for the tensor to recover the missing entries. Misses in both the side information and the tensor were handled in [8], [9], but not with the use of graph adjacency matrices. Using a Bayesian approach, infer ence r elying on tensor factorization with low-rank covariance regularization, was reported in [11]. Albeit interesting, this approach assumes that the similarity matrices are fully observable, which is not the case in several applications e.g., social networks. 1.2 Our Contributions Alleviating the limitations of existing approaches, this paper introduces a novel factorization model coined coupled graph and tensor factorization (CGTF) to account for the graph structur e of side information. The CGTF factors ar e estimated via a novel algorithm based on the alternating method of multipliers (ADMM) to infer missing entries in both the matrices and the tensor . The CGTF is subsequently explor ed to detect communities in the partially observed coupled graphs. Speciﬁcally , the contribution of this paper is fourfold. C1. A novel model is intr oduced to link multiple reposito- ries of information bearing data and their correlations in the form of high-order tensors and graphs. The proposed approach can over come practical challenges, such as missing slabs from the tensor and/or missing rows/columns from the correlation matrices (graph links), known as the cold start problem. C2. A novel ADMM algorithm is developed that features convergence guarantees and low computational com- plexity by using closed-form updates. Our accelerated ADMM solver leverages data sparsity [9] and can easily incorporate other types of constraints on the latent factors. C3. The proposed approach is applied to recommender systems and markedly improves the rating prediction performance. The results in two real datasets corrobo- rate that the novel method is successful in providing accurate recommendations as well as recovering missing links in graphs. C4. Finally , the proposed coupled factorization approach enables detection of communities on graphs by using the recover ed factors. Experiments testify to the ability of CGTF to exploit the tensor data for CD even when graph links are missing; e.g., cold start problem. The novel contribution of this work concerning CD is in the coupling between tensor and graph data. Nodes in the recover ed communities have similar graph connections and tensor data. Differ ent than traditional CD methods [12], [13], [14], [15], [16], [17], [18] that ﬁnd communities given only the graph, our CGTF ﬁnds communities fr om the coupled tensor and graph data. The rest of this paper is organized as follows. Sec. II describes the model and the problem formulation. Sec. III introduces the novel algorithm, and Sec. IV deals with the application of CGTF to community detection. Sec. V demonstrates the effectiveness of the proposed approach in real and synthetic data. Finally , Sec. VI summarizes some closing remarks. Fig. 1: Illustration of the SNMF mode l (2) for G 1 with d 1 = 1 . White cells correspond to small-value entries. The rows and columns of G 1 have been reor ganized to place nodes in the same community one after the other . Throughout, lower and upper boldface letters are used to denote vectors and matrices, respectively . The tensors are denoted by underlined upper case boldface symbols. For any general matrix X , X T , X − 1 , T r( X ) , and diag ( X ) denote respectively the transpose, inverse, trace, and diagonal of X . The Khatri-Rao and Hadamard products of two matrices X and Y are denoted by X  Y and X ∗ Y , respectively . The operator vec( · ) denotes the vectorization of ( · ) . 2 C O U P L E D FAC T O R I Z AT I O N M O D E L Consider a tensor X of order N and size I 1 × I 2 × · · · × I N . An entry of X is denoted by [ X ] ( i 1 ,i 2 , ··· ,i N ) , where index i n refers to the n -th mode of the tensor . The focus of this paper is on tensors with positive entries that appear in diverse applications such as recommender systems, ﬁnance, or biology . The mode- k matricization of X is denoted by the matrix X k , which arranges the mode- k one-dimensional ﬁbers as columns of the resulting matrix; see [3] for details. W ithout loss of generality , consider 3-way tensors X ∈ R I 1 × I 2 × I 3 + . In many real settings, tensors have low rank and hence can be expressed via the well-known parallel factor (P ARAF AC) decomposition [3] that models a rank- R tensor as [ X ] ( i 1 ,i 2 ,i 3 ) = R X r =1 [ A 1 ] ( i 1 ,r ) [ A 2 ] ( i 2 ,r ) [ A 3 ] ( i 3 ,r ) + [ E ] ( i 1 ,i 2 ,i 3 ) where { A n ∈ R I n × R + } 3 n =1 repr esent the low-rank factor matrices corresponding to the three modes of the tensor , and E ∈ R I 1 × I 2 × I 3 captures model mismatch. The P ARAF AC model is written in tensor-matrix form as X =  [ A 1 , A 2 , A 3 ]  + E (1) where  [ A 1 , A 2 , A 3 ]  is the outer product of these matrices resulting in a tensor . Oftentimes, only a subset of entries of X is observable due to application-speciﬁc constraints such as privacy in social network applications; experimental error in the data collection process; or missing ratings in recom- mender systems. Hence, we write X = X A + X M , wher e X A contains the available tensor entries and otherwise is zero and X M holds the missing values and zeros elsewhere. 3 The tensor entries are also related through a set of per-mode graph adjacency or similarity matrices { G n ∈ R I n × I n + } 3 n =1 . The ( i, i 0 ) -th entry of G n reﬂects the similarity between the i -th and i 0 -th data items of the n -th tensor mode and thus, G n captures the connectivity of the correspond- ing mode- n graph. This prior information for the tensor entries is well-motivated since network data are available across numerous disciplines including sociology , biology , neuroscience and engineering. In these domains, subsets of entries (her e graph nodes) form communities in the sense that they exhibit dense intra-connections and sparse inter- connections, which are captured by G n . Such connections are common in e.g., social networks [19], where friends tend to form dense clusters. W e will model this graph- induced side information on tensor data using a symmetric nonnegative matrix factorization (SNMF) model [16], which can efﬁciently provide identiﬁable factors and recover graph clusters. Speciﬁcally , we advocate the following diagonally- scaled SNMF model G n = A n diag ( d n ) A > n + V n , n = 1 , 2 , 3 (2) where { V n ∈ R I n × I n } n capture modeling errors; { d n ∈ R R × 1 + } n weight the factor matrices; and { A n ∈ R I n × R } n denote factor matrices of rank R < I n that readily reveal communities in the graphs corr esponding to { G n } n [16], [20]. Recovering the community of the i -th node in the n -th graph is straightforward, by selecting the largest entry in the i -th row of A n [16], [20]; see Fig. 1. Unfortunately , the topologies of { G n } may contain missing entries, which can be attributed to privacy concerns in social networks, or down-sampling massive networks. Hence, the graph matrices are modeled as G n = G A n + G M n , where G A n contains the available links and G M n holds the unavailable ones. The factors { A n } n are shared among the tensor and the graph of each corresponding item, which justiﬁes the name of the proposed model as coupled graph tensor factorization. Whereas classical CMTF approaches model the side information as A n B > n , the novel CGTF captures the graph structure by employing A n diag ( d n ) A > n . Adding the diagonal loading matrices endows the model with the ability to adjust the relative weight between the tensor and the side information matrices. The novel CGTF model is depicted in Fig. 2. Problem statement. Given X A and { G A n } 3 n =1 , our goal is to estimate X M and { A n , d n , G M n } 3 n =1 by employing the CGTF model in (1) and (2) . As a byproduct, the recovered { A n , d n } 3 n =1 will be utilized to detect communities. 3 C O U P L E D G R A P H T E N S O R FAC TO R I Z AT I O N Given (1) and (2) , this section develops a novel algorithm to infer the latent factor matrices and hence estimate X M and Fig. 2: Illustration of the tensor and graphs that partake in the CGTF model. The heat maps suggest that { G n } 3 n =1 exhibit community structure. { G M n } 3 n =1 . T o this end, consider the optimization task minimize X M , { A n , d n , G M n } 3 n =1 k X −  [ A 1 , A 2 , A 3 ]  k 2 F + µ 3 X n =1 k G n − A n diag ( d n ) A > n k 2 F s. t. A n ≥ 0 , d n ≥ 0 , (3) X = X A + X M , G n = G A n + G M n , P Ω ( X M ) = 0 , P Ω n ( G M n ) = 0 , n = 1 , 2 , 3 where µ > 0 tunes the relative importance of the ﬁt between the tensor and the graph-induced side information. The ﬁrst term accounts for the LS ﬁtting error of the P ARAF AC model (1) , and the second sum of LS costs accounts for the SNMF model (2) . The positivity constraints stem from prior knowledge related to the factor and diagonal matrices. The equality conditions constrain X and { G n } 3 n =1 to be equal to X A and { G A n } 3 n =1 at the observed entries and to the optimization variables X M and { G M n } 3 n =1 otherwise. The operators P Ω and P Ω n force the optimization variables to be zero at the observed entries. The optimization pr oblem in (3) is non-convex due to the trilinear terms  [ A 1 , A 2 , A 3 ]  and A n diag ( d n ) A > n . The next section develops an efﬁcient solver for (3) based on the ADMM [7]. Remark 1. In some applications, a graph G n may not be available for one or more modes n of the tensor . Hence, before solving (3) one may remove the corresponding ﬁtting term k G n − A n diag ( d n ) A > n k 2 F and related graph constraints. As a byproduct, our novel framework may utilize the recovered factor A n and obtain a similarity matrix G n (2). 3.1 ADMM for CGTF First notice that the optimization pr oblem (3) is even non- convex for each A n separately due to the product of factor 4 matrices in the SMNF model. This poses an additional challenge to any ADMM algorithm that iteratively pursues per block minimizers of the augmented Lagrangian. Hence, we introduce { ¯ A n } n auxiliary variables and rewrite the SMNF cost as k G n − A n diag ( d n ) ¯ A > n k 2 F . (4) Furthermore, to handle the positivity constraints we intro- duce g ( M ) = ( 0 , if M ≥ 0 ∞ , otherwise (5) and the auxiliary variables { ˜ A n , ˜ d n } n . Next, we rewrite (3) to an equivalent form as minimize X M , { A n , ¯ A n , ˜ A n , d n , ˜ d n , G M n } 3 n =1 k X −  [ A 1 , A 2 , A 3 ]  k 2 F + 3 X f =1 g ( ˜ A n ) + µ 3 X n =1 k G n − A n diag ( d n ) ¯ A > n k 2 F + 3 X f =1 g ( ˜ d n ) s. t. A n = ¯ A n , A n = ˜ A n , d n = ˜ d n , (6) X = X A + X M , G n = G A n + G M n , P Ω ( X M ) = 0 , P Ω n ( G M n ) = 0 , n = 1 , 2 , 3 . Even though (6) is still non-convex in all the variables, it is convex with r espect to each block variable separately . T owar ds deriving an ADMM solver , we introduce the dual variables { Y ¯ A n ∈ R I n × R , Y ˜ A n ∈ R I n × R , y ˜ d n ∈ R R × 1 } n and the penalty parameters { ρ ¯ A n > 0 , ρ ˜ A n > 0 , ρ ˜ d n > 0 } n . The augmented Lagrangian is given in (7) , at the bottom of the next page, where f ( · ) repr esents the cost func- tion in (6) and we collect all factor variables in Φ := ( { A n , ¯ A n , ˜ A n , d n , ˜ d n } 3 n =1 ) . For ease of notation no ADMM superscripts will be used in the following equations. For brevity , only the ADMM updates for n = 1 will be presented. The update for A 1 can be obtained by taking the deriva- tive of L in (7) with respect to (w .r .t.) A 1 and equating it to zero that yields ˆ A 1 ( M > 1 M 1 + µ D 1 ¯ A > 1 ¯ A 1 D 1 + ( ρ ˜ A 1 + ρ ¯ A 1 ) I R ) (8a) = X > 1 M 1 + µ G 1 ¯ A 1 D 1 + ρ ¯ A 1 ¯ A 1 + ρ ˜ A 1 ˜ A 1 − Y ˜ A 1 − Y ¯ A 1 where M 1 := A 3  A 2 , and D 1 := diag ( d 1 ) . The update for d 1 can be obtained likewise as (( ¯ A 1  A 1 ) > ( µ ¯ A 1  A 1 ) + ρ ˜ d 1 I R ) ˆ d 1 (8b) = µ ( ¯ A 1  A 1 ) > g 1 + ρ ˜ d 1 ˜ d 1 − y ˜ d 1 . where g n := v ec( G n ) . Accordingly , the update for the ¯ A 1 is given by ˆ ¯ A 1 ( µ D 1 A > 1 A 1 D 1 + ρ ¯ A 1 I R ) (8c) = µ G > 1 A 1 D 1 + ρ ¯ A 1 A 1 + Y ¯ A 1 . The auxiliary variables ˜ A 1 , ˜ d 1 are updated by projecting to the nonnegative orthant as follows ˆ ˜ A 1 =  A 1 + 1 ρ ˜ A 1 Y ˜ A 1  + , ˆ ˜ d 1 =  d 1 + 1 ρ ˜ d 1 y ˜ d 1  + . (8d) Using the estimated factors { ˆ A n } n the updates for the missing tensor elements are given by ˆ X M = P Ω (  [ ˆ A 1 , ˆ A 2 , ˆ A 3 ]  ) . (8e) Similarly , the missing entries in G 1 can be obtained by ˆ G M 1 = P Ω 1 ( ˆ A 1 diag ( ˆ d 1 ) ˆ ¯ A > 1 ) . (8f) Finally , the updates for the Lagrange multipliers are Y ¯ A 1 = Y ¯ A 1 + ρ ¯ A 1 ( A 1 − ¯ A 1 ) Y ˜ A 1 = Y ˜ A 1 + ρ ˜ A 1 ( A 1 − ˜ A 1 ) y ˜ d 1 = y ˜ d 1 + ρ ˜ d 1 ( d 1 − ˜ d 1 ) . (8g) The steps of our CGTF algorithm are listed in Algorithm 1. Since (6) is a non-convex problem, a judicious initialization of { A n } n is required. T owar ds that end, we adopt an efﬁcient algorithm for SNMF , see [21], to initialize the factor matrices using only the available elements in the corresponding graphs { G A n } , while { d n } are initialized as all- ones vectors. Since SNMF is unique under certain conditions, the initialization is likely to be a good one [21]. The ADMM algorithm stops when the primal residuals and the dual feasibility residuals are sufﬁciently small. Even though { ˜ A n , ˜ d n } n are by construction non-negative, { A n , ¯ A n , d n } n are not necessarily non-negative, but they become so upon convergence. The advantage of introducing the auxiliary variables is threefold. First, by employing ¯ A n , we bypass solving the non- convex SNMF that would requir e a costly iterative algorithm per factor update. Second, by intr oducing { ˜ A n , ˜ d n } , we avoid the solution to a constrained optimization problem, resulting in a more computationally affordable update compared to constrained least-squares based algorithms. In a nutshell, our novel reformulation allows for closed-form updates per step of the ADMM solver . Lastly , the closed-form updates allow us to make convergence claims to a stationary point of (6) in Sec. 3.2. Remark 2. The era of data science brings opportunities for adversaries that aim to corrupt the data, e.g., recommen- dation data may be corrupted by malicious users that provide fake ratings, or social networks may contain spamming users. The CGTF model can be extended to account for anomalies in the graph links and the tensor data. Speciﬁcally , consider the robust CGTF (R- CGTF) as X =  [ A 1 , A 2 , A 3 ]  + O + E and G n = L  X M , Φ , { G M n , Y ¯ A n , Y ˜ A n , y ˜ d n } 3 n =1  := f  X M , Φ , { G M n } 3 n =1  + 3 X f =1  T r( Y > ¯ A n ( A n − ¯ A n ) + ρ ¯ A n 2 k A n − ¯ A n k 2 F (7) + T r( Y > ˜ A n ( A n − ˜ A n ))+ ρ ˜ A n 2 k A n − ˜ A n k 2 F + y > ˜ d n ( d n − ˜ d n ) + ρ ˜ d n 2 k d n − ˜ d n k 2 F  . 5 A n diag ( d n ) A > n + O n + V n for the tensor and the graph matrices r espectively . The variables O ∈ R I 1 × I 2 × I 3 and { O n ∈ R I n × R } n model the anomalies in the tensor and graphs that should occur infr equently , and hence most entries of O and { O n } n are zer o. Hence, the optimization (3) and the ADMM solver can be readily modiﬁed to obtain sparse estimates of O and { O n } n as well; see e.g., [22] and [20]. 3.2 Con vergence Here, convergence of Algorithm 1 is examined when all the measurements are available { G A n = G n } n , and X A = X , the extension for the case with misses is straightforwar d [23]. A point Φ := ( { A n , ¯ A n , ˜ A n , d n , ˜ d n } 3 n =1 ) satisﬁes the Karush-Kuhn-T ucker (KKT) conditions for problem (6) if there exist dual variables Ψ := ( { Y ¯ A n , Y ˜ A n , y ˜ d n } 3 n =1 ) such that ( X n − A n M > n ) M n + µ ( G n − A n D n ¯ A > n ) ¯ A n D n (9) − Y ˜ A n − Y ¯ A n = 0 µ ( ¯ A n  A n ) > ( g n − ¯ A n  A n d n ) − y ˜ d n = 0 µ ( G n − ¯ A n D n A > n ) A n D n − Y ¯ A n = 0 A n − ¯ A n = 0 , A n − ˜ A n = 0 , d n − ˜ d n = 0 Y ˜ A n ≤ 0 ≥ ˜ A n , y ˜ d n ≤ 0 ≥ ˜ d n Y ˜ A n ∗ ˜ A n = 0 , y ˜ d n ∗ ˜ d n = 0 , n = 1 , 2 , 3 . Proposition 1. Let { Φ l , Ψ l } l be a sequence generated by Algorithm 1. If the sequence of dual variables { Ψ l } l is bounded and satisﬁes ∞ X l =0 k Ψ l +1 − Ψ l k 2 F < ∞ (10) then any accumulation point of { Φ l } l satisﬁes the KKT conditions of (6) . Hence, any accumulation point of {{ A l n , d l n } 3 n =1 } l satisﬁes the KKT conditions for problem (3). Proof: See Sec. 7. Proposition 1 suggests that upon convergence of the dual variables { Ψ l } l , the sequence { Φ l } l reaches a KKT point. Note that the closed-form updates of Algorithm 1 are instrumental in establishing the convergence claim. Empirical convergence with numerical tests is provided in Sec. 5. 4 C O M M U N I T Y D E T E C T I O N V I A C G T F A task of major practical importance in network science is the identiﬁcation of groups of vertices or communities that are more densely connected to each other than to the rest of the nodes in the network. Community detection unveils the structure of the network and facilitates a number of applications. For example, clustering web clients improves the performance of web services, identifying communities among customers leads to accurate recommendations, or grouping proteins based on their dependencies enables the development of targeted drugs [5]. This section exempliﬁes how the novel CGTF can r ecover the communities in graphs Algorithm 1 ADMM for CGTF Input: X A and { G A n } 3 n =1 1: Initialization: SNMF for { A n } n using [21]. 2: while iterates not converge do 3: Update ˆ A n using (8a). 4: Update ˆ d n using (8b). 5: Update ˆ ¯ A n using (8c). 6: Update { ˆ ˜ A n , ˆ ˜ d n } using (8d). 7: Update ˆ X M using (8e). 8: Update ˆ G M n using (8f). 9: Update Lagrange multipliers using (8g). 10: end while Output: ˆ X M , { ˆ A n , ˆ d n , ˆ G M n } n even when some links are missing; what is known as the cold start problem. Community detection methods aim to learn for each node i ∈ { 1 , . . . , I n } of G n a mapping to a cluster assignment α n,i ∈ { 1 , . . . , C n } , where C n is the number of communities in the n -th graph. Collecting all the nodal assignments, one seeks an I n × 1 vector α n := [ α n, 1 , . . . , α n,I n ] > . If C n is not known a priori, the recovered factor A n can be directly utilized to provide a community assignment. First, we scale A n to account for the weighting vector C n := A n diag ( √ d n ) . The largest entry in each r ow of C n indicates clustering assignments [16]. Speciﬁcally , we estimate the community assignment of a node i as ˆ α n,i = arg max r =1 ,...,R [ C n ] ( i,r ) (11) and ˆ α n := [ ˆ α n, 1 , . . . , ˆ α n,I n ] > is the estimated assignment vector . Hence, in lieu of prior information about C n we implicitly assume that C n = R for n = 1 , 2 , 3 . Oftentimes, in CD problems C n is available. If C n 6 = R one cannot apply directly (11) to recover the communities. In such a case, we regar d C n as a repr esentation of G n in a latent space of lower dimension. Hence, we apply the celebrated k-means algorithm [24] obtain ˆ α n = k -means ( C n , C n ) (12) The community assignment procedur e is summarized in Algorithm 2. Note that the discussed method amounts to a hard community assignment in the sense that each node is assigned to exactly one community . Nonetheless, the factors can be utilized to perform soft community assignment, where one node may belong to more than one communities. If the rows of C n are normalized to sum to 1, [ C n ] ( i,r ) can be interpreted as the probability of the i -th node belonging to the r -th community . 4.1 Community detection evaluation For a graph of I nodes and graph adjacency matrix G , we deﬁne the cover set S := {C c } C c =1 where C c contains all the nodes that belong to community c as captured by the assignment vector α , i.e., C c := { i | s.t. α i = c } . The estimated cover set is deﬁned as ˆ S that uses ˆ α from Algorithm 2. 6 Algorithm 2 Community detection via CGTF Input: X A , { G A n } 3 n =1 , and { C n } 3 n =1 1: Initialization: Algorithm 1 for { A n , d n } 3 n =1 . 2: for n = 1 , 2 , 3 3: C n := A n diag ( √ d n ) 4: if C n = R do 5: Compute ˆ α n using (11) 6: else 7: Compute ˆ α n using (12) Output: { ˆ α n } n For networks with ground truth communities, we employ the normalized mutual information (NMI) metric [25] to evaluate the r ecovered communities, ˆ α . The NMI takes values between 0 and 1 and is deﬁned as NMI := 2 I ( S , ˆ S ) H ( S ) + H ( ˆ S ) (13) where H denotes the entropy ( |C | is the cardinality of C ) H ( S ) := − C X c =1 |C c | I log |C c | I (14) and I ( S , ˆ S ) stands for the mutual information (MI) between S and ˆ S deﬁned as I ( S , ˆ S ) := C X c =1 ˆ C X c 0 =1 | ˆ C c 0 ∩ C c | I log | ˆ C c 0 ∩ C c | I | ˆ C c 0 ||C c | . (15) Whereas MI encodes how similar two community cover sets are, the entropy measures the level of uncertainty in each cover set individually; see e.g., [5]. For successful clustering algorithms, the resulting NMI is close to 1, and otherwise 0. On the other hand, to evaluate the quality of a recovered community ˆ C even without ground-tr uth community labels, the conductance π ( ˆ C ) is traditionally employed [26] π ( ˆ C ) := P i ∈ ˆ C P i 0 / ∈ ˆ C [ G ] ( i,i 0 ) min( vol ( ˆ C ) , vol ( ˆ C c )) (16) where vol ( ˆ C ) := X i ∈ ˆ C I X i 0 =1 [ G ] ( i,i 0 ) (17) and the set ˆ C c contains all nodes in the graph not in ˆ C . For successful CD, the connections among nodes in ˆ C are dense and otherwise sparse that leads to small scores of π ( ˆ C ) . A metric that summarizes the conductance across com- munities { ˆ C c } c ∈ ˆ S is the so-termed coverage χ ( ˆ S , α ) := 1 I   [ π ( ˆ C c ) <α ˆ C c   , { ˆ C c } c ∈ ˆ S (18) where α ∈ [0 , 1] is a suitable threshold. The coverage gives the portion of nodes that belong to communities with conductance less than α and since low conductance scores correspond to more cohesive communities, large values of coverage for small thresholds are desirable. 5 E X P E R I M E N TA L E VA L UAT I O N This section evaluates the performance of the proposed CGTF on synthetic and real data. The approaches compared include the CANDECOMP/P ARAF AC W eighted OPT imiza- tion (P ARAF AC) algorithm [10]; the nonnegative tensor factorization (NTF) implemented as in [27]; and the CMTF [8]. The algorithms wer e initialized using the proposed SNMF scheme, which enhances the performance of all methods. Un- less stated otherwise, the following parameters were selected for CGTF: { ρ ¯ A n = 100 , ρ ˜ A n = 100 , ρ ˜ d n = 100 } n , µ = 1 . 1 5.1 T ensor Imputation Synthetic tensor data X ∈ R 350 × 350 × 30 with R = 4 was generated according to the P ARAF AC model (1) , where the true factors { A n } 3 n =1 are drawn from a uniform distribution. Matrices { G n } 3 n =1 were generated using the SMNF (2). T o evaluate the performance of the various factorization algorithms, the entries of X were corrupted with i.i.d. Gaussian noise. Fig. 3 depicts the normalized mean squared error NMSE := P I 3 i 3 =1 k ˆ X (: , : , i 3 ) − X (: , : , i 3 ) k 2 F / P I 3 i 3 =1 k X (: , : , i 3 ) k 2 F against the signal-to-noise ratio (SNR) of the tensor data. . The novel CGTF exploits the graph adjacency matrices and achieves superior performance relative to the competing methods. 5 10 15 20 25 30 35 10 0 10 1 Signal to Noise Ratio (dB) NMSE CGTF CMTF P ARAF AC NTF Fig. 3: T ensor imputation performance based on NMSE. Furthermore, the convergence of the proposed approach is evaluated. Fig. 4 testiﬁes to the theoretical convergence results established in Prop. 1. 5.2 Community detection T o evaluate the performance of the CGTF in detecting com- munities, we employed the Lancicchinetti-Fortunato-Radicci (LFR) benchmark [28] that generates graphs with ground truth communities. LFR graphs capture pr operties of real- world networks such as heterogeneity in the distributions of node degrees and also in the community sizes. First, we generated 3 LFR networks { G n } 3 n =1 with I 1 = 100 , I 2 = 300 , and I 3 = 500 nodes, correspondingly 1. The ADMM implementation of the proposed CGTF method can be found in https://github.com/bioannidis/Coupled tensors graphs 7 0 1 , 000 2 , 000 3 , 000 4 , 000 10 − 9 10 − 7 10 − 5 10 − 3 Iterate l squared difference Fig. 4: Conver gence of ADMM iterates {k Φ l − Φ l − 1 k 2 F , k Ψ l − Ψ l − 1 k 2 F } l , and k{ A l n − ˜ A l n k 2 F , k A l n − ¯ A l n k 2 F , k d l n − ˜ d l n k 2 F } l . comprising C 1 = 5 , C 2 = 3 , C 3 = 4 communities; see Fig. 5. W e recover the factors { A n } n of { G n } n using SNMF , and construct X using (1) . Next, we observe noisy versions of the tensor data and the corresponding graph adjacency matrices; for G 1 we observe only 10% of its entries and R = 5 . Fig. 6a shows the NMI performance of CGTF and SNMF [16], as we increase the SNR for G 2 and G 3 . The proposed approach recovers successfully robust community assignments. Furthermore, Fig. 6b depicts the NMI performance of the algorithms with 90% entries of G 1 missing. As expected, SNMF cannot recover the community assignments of the nodes in this partially observed G 1 . On the other hand, the novel CGTF exploits the tensor data, copes with missing links, and provides reliable estimates of α 1 . 5.3 Activities of users at different locations T o assess the potential of our approach in pr oviding accurate recommendations, we further tested a real recommendation dataset that comprises a three-way tensor indicating the frequency of a user performing an activity at a certain location [29]. It contains information about 164 users, 168 locations and 5 activities. A binary tensor X is constructed to represent the links between users, their locations and corresponding activities. In other words, X ( i 1 , i 2 , i 3 ) equals 1 if user i 1 visited location i 2 and performed activity i 3 ; otherwise, it is 0. Additionally , similarity matrices between the users and the activities are provided. The similarity value between two locations is deﬁned by the inner product of the corresponding feature vectors. The dataset is miss- ing social network information for 28 users, and feature vectors for 32 locations. The parameters of CGTF were { ρ ¯ A n = 100 , ρ ˜ A n = 100 , ρ ˜ d n = 100 } n , µ = 10 − 4 , and for all approaches R = 5 . T able 1 lists the NMSE for variable per centages of missing tensor data. The CGTF model exploits judiciously the struc- ture of the available graph information, which enables our efﬁcient ADMM solver to outperform competing alternatives, and lead to improved recommendations. In order to assess the recommendation quality of the proposed approach, we changed the thereshold for detecting an activity (edge) on the tensor (graphs). Per threshold value, Missing NTF P ARAF AC CMTF CGTF 40% 0.995 1.016 0.98 0 . 46 50% 0.99 0.96 0.99 0 . 68 T ABLE 1: NMSE for differ ent ratios of missing data. we then obtained the probabilities of detection and false alarm. Fig. 8 depicts the receiver operating characteristic (ROC) for the tensor entries, and as expected the novel CGTF outperforms the alternative. Moreover , Fig. 7 shows the ROC for discovering concealed links in the user-graph with only 10% of observed graph entries when the factors are initialized either using the SNMF or randomly . In both cases, CGTF performs successful edge identiﬁcation and yields accurate link predictions. The performance gap among CGTF and CMTF , becomes more pronounced when the factors are initialized randomly , which suggests that initialization is crucial in achieving a good stationary point. 5.3.1 Community detection Furthermore, CD is pursued for the user and location graphs with 70% entries missing in the tensor and no misses in the graphs. W e compar e our CD performance against the follow- ing baselines: Potts [13], NewmanF [12], SP [14], AFG [15] and SNMF [16] 2 . In lieu of gr ound-truth communities, we evaluate the CD performance by the maximum conductance- coverage curve. This curve is plotted by varying α from 0 to 1 (cf. (18) ) and r eporting the corresponding coverage value on the x-axis and maximum conductunce the y-axis. Low values of conductance for large values of coverage correspond to more cohesive communities. Hence, a smaller area under curve (AUC) implies better performance; see Sec. 4.1. Fig. 9 reports the coverage scores relative to the maximum conductance ( α ) for the users graph (left) and the locations graph (right). The proposed CGTF achieves higher coverage scores for smaller conductance and outperforms competing approaches. CGTF achieves the smallest AUC value in the user graph and one of the smallest in the location graph. Hence, the factors obtained by our coupled approach indeed improve CD performance. 5.4 P osts of users in a social network W e also tested the performance of CGTF on the Digg dataset. Digg is a social network that allows users to submit, Digg, and comment on news stories. In [30], the data was collected from a large subset of users and stories. The dataset includes stories, and users along with their time-stamped actions with respect to stories, as well as the social network of users. In addition, a set of keywords is assigned to each story . After discretizing the time into 20 time intervals over 3 days, we construct a tensor comprising the number of comments that user i wrote on story j during the k -th time interval stored in the ( i, j, k ) item. Also, a story-story graph is constructed where any two stories are connected only if they shar e more than two keywords. The original tensor containing all users and stories includes a large number of inactive users and unpopular stories. In order 2. W e use the Matlab implementations provided by the authors. 8 Fig. 5: LFR clustered graphs; G 1 left, G 2 middle, and G 3 right. 10 15 20 25 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Signal to Noise Ratio (dB) NMI CGTF CGTF SNMF SNMF (a) G 2 (dashed) and G 3 (solid) 10 15 20 25 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Signal to Noise Ratio (dB) NMI CGTF SNMF (b) G 1 Fig. 6: Community detection performance based on NMI. 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF CMTF random 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF CMTF random Fig. 7: ROC curve for G 1 using the SNMF for initialization of A 1 (left), random initialization (right) with 40% misses in X and 90% misses in G 1 and G 2 . to assess performance of the proposed method, the data is subsampled so that the 175 most active users and the 800 most popular stories are kept. Hence, the size of the tensor in this experiment is I 1 = 175 users, I 2 = 800 stories and I 2 = 20 time intervals. In addition, the side information comprises two graphs that repr esent the users’ social network and the similarities of the stories. The tensor and the two graphs are fused jointly as in (3) 9 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF NTF P ARAF AC CMTF random 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF NTF P ARAF AC CMTF random Fig. 8: ROC for 40% (left); and 50% (right) tensor missing entries. 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Coverage Maximum conductunce CGTF NMF AFG NewmanF SP Potts 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Coverage Maximum conductunce CGTF SNMF AFG NewmanF SP Potts Fig. 9: Community detection performance based on coverage for the user graph (left); and the location graph (right). with R = 10 . Then, the proposed ADMM-based algorithm is employed to obtain the latent factors of the CGTF model. As there is no graph on the third mode (time intervals), the term k G 3 − A 3 diag ( d 3 ) A > 3 k 2 F is not included in (3) . W e assume that 40% of the tensor entries, as well as 30% of the links in the user-user and story-story graphs are missing. In Figs. 10, and 11, the ROC is presented for the tensor and the graphs. The proposed approach outperforms competing approaches in completing the missing tensor entries as well as predicting the missing links in the graph, and leads to accurate recommendations for previously unseen data. 5.4.1 Community detection under missing links In this experiment we assume that 40% of the tensor entries and 50% of the graph links are missing. The goal here is to examine whether CGTF recovers the communities in the graphs even with hidden graph links. Fig. 12 reports the coverage scores relative to the maximum conductance for the users graph (left) and the stories graph (right). 3 Competing approaches that only utilize the partially observed graphs can not recover crisp graph communities. On the other hand, 3. AFG did not provide meaningful results and was not included. 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF NTF P ARAF AC CMTF random Fig. 10: ROC for 40% tensor missing entries. our novel CGTF utilizes judiciously the partially observed graphs and tensors and reports superior performance. The advantage of the proposed framework in community detec- 10 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF CMTF random 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 False positive rate T rue positive rate CGTF CMTF random Fig. 11: ROC for the prediction in the users’ social network G 1 (left); and the story graph adjacency G 2 (right). 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Coverage Maximum conductunce CGTF SNMF NewmanF SP Potts 0 0 . 2 0 . 4 0 . 6 0 . 8 1 0 0 . 2 0 . 4 0 . 6 0 . 8 1 Coverage Maximum conductunce CGTF SNMF NewmanF SP Potts Fig. 12: Community detection performance based on coverage for the user graph (left); and the story graph (right). tion is more evident in this experiment (compare Fig. 12 and Fig. 9). Digg Activities 0 . 5 1 1 . 5 2 Relative Runtime P ARAF AC CGTF NTF CMTF Fig. 13: Runtime comparisons relative to CGTF . 5.5 Runtime comparisons The scalabilty of CGTF is reﬂected on the relative runtime comparisons listed in Fig. 13, for r ecovering the tensor entries for the Activities and Digg datasets in Fig. 8 and Fig. 10 respectively . All experiments wer e run on a machine with i7-4790 @3.60 Ghz CPU, and 32GB of RAM. W e used the Matlab implementations provided by the authors of the compared algorithms. The bars in Fig. 13 indicate the r untime of the algorithms relative to CGTF’s runtime. Evidently , our efﬁcient yet effective CGTF implementation is almost as fast as the P ARAF AC, while achieving superior tensor imputation performance (see Figs. 8,10). 6 C O N C L U S I O N S A N D F U T U R E W O R K This paper investigates the inference of unavailable entries in tensors and graphs based on a novel CGTF model. An efﬁcient algorithm is developed to identify the factor matrices and recover the missing entries. The ADMM solver features closed-form updates and is amenable to parallel and accelerated implementation. In addition, the proposed method can overcome the so-called cold-start problem, where the tensor has missing slabs or the similarity matrices are not complete. The novel algorithm makes accurate prediction of the missing values and can be used in many real world settings, especially in recommender systems. A novel direction is further explored by employing the interpretable factors of CGTF to detect communities of nodes in the graphs having the tensor as side information. Through numerical 11 tests with synthetic as well as real-data, the novel algorithm was observed to perform markedly better than existing alternatives and further yield accurate recommendations, as well as effective identiﬁcation of communities. Our future resear ch agenda will focus in two dir ection. T odays era of data deluge has grown the interest for robust methods that can handle anomalies in collections of high- dimensional data. T owar ds this end, we aim at a robust CGTF to handle anomalies in the tensor and graph data. Furthermore, in many scenarios prior information on the tensor and graph data can be accounted for to improve impu- tation performance. CGTF may incorporate such knowledge by introducing a probabilistic prior for certain graphs e.g. stochastic block models [31]. 7 P R O O F O F P R O P O S I T I O N 1 In what follows, we omit the terms X M , { G M n } 3 n =1 , although the proof can be easily modiﬁed to accommodate misses in the graphs and in the tensor . First, we claim Φ l +1 − Φ l → 0 , Ψ l +1 − Ψ l → 0 . (19) Observe that the Lagragian L ( Φ , Ψ ) is bounded from below which follows because L  Φ , Ψ  := f  Φ  + 3 X f =1  ρ ¯ A n 2 k A n − ¯ A n + Y ¯ A n ρ ¯ A n k 2 F − 1 2 ρ ¯ A n k Y ¯ A n k 2 F + ρ ˜ A n 2 k A n − ˜ A n + Y ˜ A n ρ ˜ A n k 2 F − 1 2 ρ ˜ A n k Y ˜ A n k 2 F + ρ ˜ d n 2 k d n − ˜ d n + y ˜ d n ρ ˜ d n k 2 2 − 1 2 ρ ˜ d n k y ˜ d n k 2 2  and Ψ is bounded. Owing to the appropriate reformulation (6) , L ( · ) is strongly convex w .r .t. each matrix variable V ∈ { A n , ¯ A n , ˜ A n , d n , ˜ d n } 3 n =1 separately . As a result, it holds for V that L ( V + δ V ) − L ( V ) ≥ ∂ V L ( V ) > δ V + ρ k δ V k 2 F (20) where ρ is a properly selected parameter , while the vari- ables except V are kept the same. Moreover , if V ∗ := arg min V L ( V ) it follows that ∂ V L ( V ∗ ) > δ V ≥ 0 . Hence, for δ V = V l − V l +1 and since V l +1 := arg max V L ( V ) at the l -th iteration, it follows from (20) that L ( V l ) − L ( V l +1 ) ≥ ρ k V l − V l +1 k 2 F (21) Specifying (21) to each variable in Φ , yields for n = 1 , 2 , 3 L ( A l n ) − L ( A l +1 n ) ≥ ρ ¯ A n + ρ ˜ A n 2 k A l n − A l +1 n k 2 F (22a) L ( ¯ A l n ) − L ( ¯ A l +1 n ) ≥ ρ ¯ A n 2 k ¯ A l n − ¯ A l +1 n k 2 F (22b) L ( ˜ A l n ) − L ( ˜ A l +1 n ) ≥ ρ ˜ A n 2 k ˜ A l n − ˜ A l +1 n k 2 F (22c) L ( d l n ) − L ( d l +1 n ) ≥ ρ ˜ d n 2 k d l n − d l +1 n k 2 F (22d) L ( ˜ d l n ) − L ( ˜ d l +1 n ) ≥ ρ ˜ d n 2 k ˜ d l n − ˜ d l +1 n k 2 F (22e) It follows then for R := min { ρ ¯ A n , ρ ˜ A n , ρ ˜ d n } n that L ( Φ l , Ψ l ) − L ( Φ l +1 , Ψ l ) ≥ R k Φ l − Φ l +1 k 2 F . (23) On the other hand, it holds for the dual variables that L ( Y l ¯ A n ) − L ( Y l +1 ¯ A n ) = T r( Y l ¯ A n − Y l +1 ¯ A n ) > ( ¯ A l n − ¯ A l +1 n ) = − 1 ρ ¯ A n k Y l ¯ A n − Y l +1 ¯ A n k 2 F (24a) where the last equality follows from (8g), and similarly L ( Y l ˜ A n ) − L ( Y l +1 ˜ A n ) = − 1 ρ ˜ A n k Y l ˜ A n − Y l +1 ˜ A n k 2 F (24b) L ( y l ˜ d n ) − L ( y l +1 ˜ d n ) = − 1 ρ ˜ d n k y l ˜ d n − y l +1 ˜ d n k 2 F . (24c) Hence, we ﬁnd that L ( Φ l +1 , Ψ l ) − L ( Φ l +1 , Ψ l +1 ) ≥ − 1 R k Ψ l − Ψ l +1 k 2 F (25) and upon combining (25) and (23), we arrive at L ( Φ l , Ψ l ) − L ( Φ l +1 , Ψ l +1 ) ≥ (26) R k Φ l − Φ l +1 k 2 F − 1 R k Ψ l − Ψ l +1 k 2 F . Since L ( · ) is bounded, we have ∞ X l =0 R k Φ l − Φ l +1 k 2 F − 1 R k Ψ l − Ψ l +1 k 2 F < ∞ (27) and after applying (10) we establish that Φ l +1 − Φ l → 0 and Ψ l +1 − Ψ l → 0 . Next, we rewrite the ADMM updates in (8) as [ A l +1 n − A l n )( M > n M n + µ D l n ¯ A l n > ¯ A l n D l n + ( ρ ˜ A n + ρ ¯ A n ) I ] =( X n − A l n M > n ) M n + µ ( G n − A l n D l n ¯ A l n > ) ¯ A l n D l n + ρ ˜ A n ( A l n − ˜ A l n ) + ρ ¯ A n ( A l n − ¯ A l n ) − Y l ˜ A n − Y l ¯ A n (28a) ( d l +1 n − d l n )(( ¯ A l n  A l n ) > ( µ ¯ A l n  A l n ) + ρ ˜ d n I ) = µ ( ¯ A l n  A l n ) > ( g n − ¯ A l n  A l n ˜ d l n ) + ρ y ˜ d n ( d l n − ˜ d l n ) − y l ˜ d n (28b) ( ¯ A l +1 n − ¯ A l n )( µ D l n A l n > A l n D l n + ρ ¯ A n I ) = µ ( G n − ¯ A l n D l n A l n > ) A l n D l n + ρ ¯ A n ( A l n − ¯ A l n ) − Y l ¯ A n (28c) ˜ A l +1 n − ˜ A l n =  A l n + 1 ρ ˜ A n Y l ˜ A n  + − ˜ A l n (28d) ˜ d l +1 n − ˜ d l n =  d l n + 1 ρ ˜ d n y l ˜ d n  + − ˜ d l n (28e) and for the dual updates Y l +1 ¯ A n − Y l ¯ A n = ρ ¯ A n ( A l n − ¯ A l n ) Y l +1 ˜ A n − Y l ˜ A n = ρ ˜ A n ( A l n − ˜ A l n ) y l +1 ˜ d n − y l ˜ d n = ρ ˜ d n ( d l n − ˜ d l n ) . (28f) Next, we leverage (19) and establish that the left hand side of the equations in (28) is equal to 0 . Hence, from (28f) we deduce that A l n − ¯ A l n → 0 , A l n − ¯ A l n → 0 , and A l n − ¯ A l n → 0 . So far we have proved that the KKT conditions (9) relating to the primal variables Φ , are satisﬁed. The variables ˜ A n and ˜ d n are nonnegative by construction. For the dual variables, notice from (28f) that if [ A l n ] ( i n ,r ) = [ ˜ A l n ] ( i n ,r ) = 0 then  [ Y l ˜ A n ] ( i n ,r )  + = 0 , which implies that [ Y l ˜ A n ] ( i n ,r ) ≤ 0 , else 12 if [ A l n ] ( i n ,r ) = [ ˜ A l n ] ( i n ,r ) ≥ 0 then [ Y l ˜ A n ] ( i n ,r ) = 0 . The same argument applies for y l ˜ d n and hence we have established satisfaction of the last KKT conditions concerning Y l ˜ A n and y l ˜ d n . R E F E R E N C E S [1] V . N. Ioannidis, A. S. Zamzam, G. B. Giannakis, and N. D. Sidiropoulos, “Imputation of coupled tensors and graphs,” in Global Conf. Sig. Inf. Process. , Anaheim, CA, Nov . 2018. [2] A. S. Zamzam, V . N. Ioannidis, and N. D. Sidiropoulos, “Coupled graph tensor factorization,” in Proc. Asilomar Conf. Sig., Syst., Comput. , Paciﬁc Grove, CA, Nov . 2016, pp. 1755–1759. [3] N. D. Sidiropoulos, L. D. Lathauwer , X. Fu, K. Huang, E. E. Papalexakis, and C. Faloutsos, “T ensor decomposition for signal processing and machine learning,” IEEE T rans. Sig. Process. , vol. 65, no. 13, pp. 3551–3582, Jul. 2017. [4] V . N. Ioannidis, M. Ma, A. Nikolakopoulos, G. B. Giannakis, and D. Romero, “Kernel-based inference of functions on graphs,” in Adaptive Learning Methods for Nonlinear System Modeling , D. Com- miniello and J. Principe, Eds. Elsevier , 2018. [5] S. Fortunato, “Community detection in graphs,” Physics reports , vol. 486, no. 3-5, pp. 75–174, 2010. [6] Y . Koren, R. Bell, and C. V olinsky , “Matrix factorization techniques for recommender systems,” Computer , no. 8, pp. 30–37, Aug. 2009. [7] A. P . Liavas and N. D. Sidiropoulos, “Parallel algorithms for constrained tensor factorization via alternating direction method of multipliers,” IEEE T rans. Sig. Process. , vol. 63, no. 20, pp. 5450–5463, Oct. 2015. [8] B. Ermi s ¸ , E. Acar , and A. T . Cemgil, “Link prediction in hetero- geneous data via generalized coupled tensor factorization,” Data Mining and Knowledge Discovery , vol. 29, no. 1, pp. 203–236, 2015. [9] E. E. Papalexakis, C. Faloutsos, T . M. Mitchell, P . P . T alukdar , N. D. Sidiropoulos, and B. Murphy , “T urbo-SMT: Accelerating Coupled Sparse Matrix-Tensor Factorizations by 200x.” in SDM . SIAM, 2014, pp. 118–126. [10] E. Acar , D. M. Dunlavy , T . G. Kolda, and M. Mørup, “Scalable Tensor Factorizations with Missing Data.” in SDM . SIAM, 2010, pp. 701–712. [11] J. A. Bazerque, G. Mateos, and G. B. Giannakis, “Rank Regu- larization and Bayesian Inference for T ensor Completion and Extrapolation,” IEEE T ransactions on Signal Processing , vol. 61, no. 22, pp. 5689–5703, Nov 2013. [12] L. Danon, A. D ´ ıaz-Guilera, and A. Arenas, “The effect of size heterogeneity on community identiﬁcation in complex networks,” Journal of Statistical Mechanics: Theory and Experiment , vol. 2006, no. 11, 2006. [13] P . Ronhovde and Z. Nussinov , “Local resolution-limit-fr ee potts model for community detection,” Physical Review E , vol. 81, no. 4, 2010. [14] J. P . Hespanha, “An efﬁcient matlab algorithm for graph partition- ing,” T ech. Report , 2004. [15] A. Arenas, A. Fernandez, and S. Gomez, “Analysis of the structure of complex networks at differ ent resolution levels,” New journal of physics , vol. 10, no. 5, 2008. [16] D. Kuang, H. Park, and C. H. Ding, “Symmetric Nonnegative Matrix Factorization for Graph Clustering.” in SDM , vol. 12. SIAM, 2012, pp. 106–117. [17] F . Huang, U. Niranjan, M. U. Hakeem, and A. Anandkumar , “Online tensor methods for learning latent variable models,” J. Mach. Learn. Res. , vol. 16, no. 1, pp. 2797–2835, 2015. [18] A. Anandkumar , R. Ge, D. Hsu, and S. M. Kakade, “A tensor approach to learning mixed membership community models,” J. Mach. Learn. Res. , vol. 15, no. 1, pp. 2239–2312, 2014. [19] F . Sheikholeslami and G. B. Giannakis, “Overlapping community detection via constrained parafac: A divide and conquer approach,” in IEEE Int. Conf. on Data Mining (ICDM) , New Orleans, LA, Nov . 2017, pp. 127–136. [20] B. Baingana and G. B. Giannakis, “Joint community and anomaly tracking in dynamic networks,” IEEE T rans. Sig. Process. , vol. 64, no. 8, pp. 2013–2025, Sep. 2016. [21] K. Huang and N. Sidiropoulos, “Putting nonnegative matrix factorization to the test: a tutorial derivation of pertinent Cramer- Rao bounds and performance benchmarking,” IEEE Sig. Process. Mag. , vol. 31, no. 3, pp. 76–86, April 2014. [22] M. Mardani, G. Mateos, and G. B. Giannakis, “Dynamic anoma- lography: T racking network anomalies via sparsity and low rank,” IEEE J. Sel. T opics Sig. Process. , vol. 7, no. 1, pp. 50–66, Feb. 2013. [23] Y . Xu, W . Y in, Z. W en, and Y . Zhang, “An alternating direction al- gorithm for matrix completion with nonnegative factors,” Frontiers of Mathematics in China , vol. 7, no. 2, pp. 365–384, 2012. [24] J. A. Hartigan and M. A. W ong, “Algorithm as 136: A k-means clustering algorithm,” Journal of the Royal Statistical Society . Series C (Applied Statistics) , vol. 28, no. 1, pp. 100–108, 1979. [25] L. Danon, A. Diaz-Guilera, J. Duch, and A. Arenas, “Comparing community structure identiﬁcation,” Journal of Stat. Mech.: Theory and Experiment , vol. 2005, no. 09, p. P09008, Sep. 2005. [26] B. Bollob ´ as, Modern Graph Theory . Springer Science & Business Media, 2013, vol. 184. [27] C. A. Andersson and R. Bro, “The N-way toolbox for MA TLAB,” Chemometrics and Intelligent Laboratory Systems , vol. 52, no. 1, pp. 1–4, 2000. [28] A. Lancichinetti, S. Fortunato, and F . Radicchi, “Benchmark graphs for testing community detection algorithms,” Phys. Rev . E , vol. 78, p. 046110, Oct 2008. [29] V . W . Zheng, B. Cao, Y . Zheng, X. Xie, and Q. Y ang, “Collaborative ﬁltering meets mobile recommendation: A user-centered approach,” in AAAI , vol. 10, 2010, pp. 236–241. [30] Y .-R. Lin, J. Sun, P . Castro, R. Konuru, H. Sundaram, and A. Kel- liher , “Metafac: community discovery via relational hypergraph factorization,” in Proc. of the 15th ACM SIGKDD Intern. Conf. on Knowledge Disc. and Data Mining . ACM, 2009, pp. 527–536. [31] E. M. Airoldi, D. M. Blei, S. E. Fienberg, and E. P . Xing, “Mixed membership stochastic blockmodels,” J. Mach. Learn. Res. , vol. 9, no. Sep, pp. 1981–2014, 2008. V assilis N. Ioannidis (S’16) received his Diploma in Electr ical and Computer Engineer- ing from the National T echnical University of Athens, Greece, in 2015, and the M.Sc. degree in Electr ical Engineer ing from the Univ ersity of Minnesota (UMN), T win Cities, Minneapolis, MN, USA, in 2017. Currently , he is wor king towards the Ph.D . degree in the Depar tment of Electrical and Computer Engineering at the Univ ersity of Minnesota, T win Cities. V assilis receiv ed the Doctoral Disser tation F ellowship (2019) from the University of Minnesota. He also received Student T ra vel A wards from the IEEE Signal Processing Society (2017,2018) and from the IEEE (2018). From 2014 to 2015, he worked as a middlew are consultant for Oracle in Athens, Greece , and received a Perf or mance Excellence award. His research interests include machine learning, big data analytics, and network science. Ahmed S. Zamzam (S’14) is a PhD Candi- date at the Depar tment of Electr ical and Com- puter Engineering at the University of Minnesota, where he is also afﬁliated with the Signal and T ensor Analytics Research (ST AR) group under the super vision of Professor N. D . Sidiropoulos. Previously , he ear ned his BSc at Cairo University in 2013. In 2015, he received the MSc from Nile University . Ahmed received the Louis John Schnell Fello wship (2015), and the Doctoral Dis- ser tation F ellowship (2018) from the Univ ersity of Minnesota. He also receiv ed Student T rav el A wards from the IEEE Signal Processing Society (2017), the IEEE P ow er and Energy Society (2018), and the Council of Graduate Students at the University of Minnesota (2016, 2018). His research interests include control and optimization of smar t grids, large-scale complex energy systems, grid data analytics, and machine learning. 13 G. B. Giannakis (Fello w’97) received his Diploma in Electr ical Engr . from the Ntl. T ech. Univ . of Athens, Greece, 1981. F rom 1982 to 1986 he w as with the Univ . of Southern Calif ornia (USC), where he receiv ed his MSc. in Electrical Engineering, 1983, MSc. in Mathematics, 1986, and Ph.D . in Electrical Engr ., 1986. He was a faculty member with the University of Virginia from 1987 to 1998, and since 1999 he has been a professor with the Univ . of Minnesota, where he holds an ADC Endowed Chair , a University of Minnesota McKnight Presidential Chair in ECE, and ser ves as director of the Digital T echnology Center. His general interests span the areas of statistical learning, communi- cations, and networking - subjects on which he has published more than 450 journal papers, 750 conference papers , 25 book chapters, two edited books and two research monographs (h-index 140). Current research focuses on Data Science , and Network Science with applications to the Internet of Things, social, brain, and power networks with renewab les. He is the (co-) inventor of 32 patents issued, and the (co-) recipient of 9 best journal paper awards from the IEEE Signal Processing (SP) and Communications Societies, including the G. Marconi Prize P aper A ward in Wireless Communications . He also receiv ed T echnical Achie v ement A wards from the SP Society (2000), from EURASIP (2005), a Y oung F aculty T eaching A ward, the G. W . T a ylor Award for Distinguished Research from the University of Minnesota, and the IEEE Fourier T echnical Field Award (inaugural recipient in 2015). He is a Fellow of EURASIP , and has ser ved the IEEE in a number of posts , including that of a Distinguished Lecturer for the IEEE-SPS . Nicholas D . Sidiropoulos (F’09) earned the Diploma in Electrical Engineering from Aristotle University of Thessaloniki, Greece, and M.S. and Ph.D . degrees in Electrical Engineering from the University of Maryland at College P ark, in 1988, 1990 and 1992, respectively . He has ser ved on the faculty of the Univ ersity of Virginia, University of Minnesota, and the T echnical University of Crete, Greece, pr ior to his current appointment as Louis T . Rader Prof essor and Chair of ECE at UV A. From 2015 to 2017 he was an ADC Chair Professor at the University of Minnesota. His research interests are in signal processing, communications, optimization, tensor decomposition, and factor analysis , with applications in machine learning and communi- cations. He receiv ed the NSF/CAREER award in 1998, the IEEE Signal Processing Society (SPS) Best P aper A ward in 2001, 2007, and 2011, served as IEEE SPS Distinguished Lecturer (2008-2009), and currently serves as Vice President - Membership of IEEE SPS. He received the 2010 IEEE Signal Processing Society Mer itorious Ser vice Aw ard, and the 2013 Distinguished Alumni Award from the Univ ersity of Mar yland, Dept. of ECE. He is a F ellow of IEEE (2009) and a F ellow of EURASIP (2014).

Coupled Graphs and Tensor Factorization for Recommender Systems and Community Detection

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment