Semi-supervised Graph Embedding Approach to Dynamic Link Prediction
We propose a simple discrete time semi-supervised graph embedding approach to link prediction in dynamic networks. The learned embedding reflects information from both the temporal and cross-sectional network structures, which is performed by definin…
Authors: Ryohei Hisano
Semi–sup ervised Graph Em b edding Approac h to Dynamic Link Prediction Ry ohei Hisano ∗ Abstract W e prop ose a simple discrete time semi–supervised graph embedding approac h to link prediction in dynamic net works. The learned em b edding reflects information from b oth the temp oral and cross–sectional net work structures, which is performed by defining the loss function as a weigh ted sum of the sup ervised loss from past dynamics and the unsupervised loss of predicting the neighborho od con text in the current net w ork. Our model is also capable of learning different embeddings for b oth formation and dissolution dynamics. These k ey asp ects con tributes to the predictive p erformance of our mo del and w e pro vide exp eriments with three real– w orld dynamic net works showing that our metho d is comparable to state of the art metho ds in link formation prediction and outperforms state of the art baseline methods in link dissolution prediction. 1 In tro duction One of the central tasks concerning netw ork data is the problem of link pre- diction. Link prediction can b e roughly divided into tw o types: static link prediction and temp oral link prediction. Static link prediction is concerned with the problem of predicting the o verall structure of a net w ork. The goal is to predict missing links in partially observ ed netw ork data that are absen t from the dataset but that should in fact exist. Example applications include kno wledge graph completion, predicting relationships among participan ts in so- cial net w orking services and protein-protein interactions. W e refer to [1, 2, 3] for excellen t reviews of the field. In a temp oral link prediction problem, the goal is to predict the future netw ork state given previous link age patterns [4, 5, 6]. Example applications include recommender systems where users and pro ducts are mo deled as a bipartite graph and user purchases are mo deled as link ages o ver time. The goal here is to predict future purchase patterns of users from past purchase patterns. In this paper, we focus on a sligh t v ariation of the temp oral link prediction problem. Giv en a sequence of netw ork snapshots from time 1 to time t , our problem is to predict the tr ansition of a netw ork from time t to time t + 1. A tr ansition of a netw ork can be summarized using t wo net w orks, a link formation net work and a link dissolution netw ork. W e c ho ose to predict the tr ansition of a netw ork instead of a netw ork at the next time step for three main reasons. Firstly , by predicting a net work only at the next time step, one cannot dis- tinguish whether the prediction of link formation is successful, whether the prediction of link dissolution is successful or whether the netw ork itself did not ∗ Social ICT Researc h Cen ter, Graduate Sc ho ol of Information Science and T ec hnology , The Universit y of T oky o, email: em072010@y aho o.co.jp 1 c hange m uch b etw een different time steps, and whether simply using the net- w ork information from the last time step might suffice for prediction. W e w an t to a v oid this redundancy by fo cusing on predicting the tr ansition . Secondly , differen t forces might gov ern link formation and link dissolution. Our hope is that b y separately mo deling these forces we might obtain b etter predictive ac- curacy . Thirdly , predicting link dissolution is imp ortan t in its own right. F or instance, in the financial crisis of 2008, man y banks w ere reported to dissolv e their relationships with p o orly p erforming firms while forming new links with b etter p erforming firms. Being able to predict the formation and dissolution dynamics of a netw ork separately in this setting is an imp ortan t issue in risk managemen t. This is true ev en in social netw orks, where imp ortant dissolutions in links migh t prev en t the spread of go o d or bad influences in a comm unity [7]. Our mo delling approac h is a v arian t of semi–sup ervised graph em b edding [8]. The supervised part consists of a complex–v alued laten t feature bilinear mo del [9] where past link formation and link dissolution information plays the role of target v alues in the training data. The unsup ervised part consists of a graph em b edding predicting the neigh b orhoo d con text in the current netw ork [10]. The same complex–v alued vectors are used in both tasks, and the weigh ted sum of these t w o losses is the total loss in our mo del. Semi–supervised graph em b edding [8] w as originally intended for use in no de classification, but we extend the idea to learning complex–v alued vectors capable of predicting the tr ansition of a netw ork. T o gain a b etter understanding of our model, we suggest the following in- tuitiv e interpretation (refer to Fig 1 for an ov erview of our approac h). While the temp oral information concerning past link formation and link dissolution net works pro vides a direct target signal for which nodes were more lik ely to form or dissolv e links with eac h other, these net w orks are usually m uch sparser than the current net work. Th us, by only using the past net w ork information w e may not ha ve enough information to learn the complex–v alued v ector bilin- ear mo del sufficiently . On the other hand, the current netw ork can b e seen as pro viding a different dimension, such as a spatial dimension in spatiotemporal mo deling, which is indep endent of the temporal information. Our strategy is to lev erage this extra dimension to enhance the mo del learned from our sup ervised task. Thus the pow er of graph em b edding to effectively learn a distributional con text capable of predicting nearby no des is used in our mo del to force nearby no des in the net work to ha v e similar complex-v alued vectors [10]. W e show that our semi–supervised approach gives b etter predictiv e p erformance than using a sup ervised or an unsup ervised approach alone. The main contributions of this pap er are as follows. • W e prop ose a simple and scalable discrete time semi–sup ervised graph em b edding approach to dynamic link prediction capable of incorp orating b oth temporal and cross–sectional netw ork structures. • Our mo del is one of the few approaches capable of learning different em- b eddings for b oth the formation and dissolution pro cesses. • Exp erimen ts with three real–w orld datasets show significan t empirical im- pro vemen ts esp ecially when predicting link dissolution. The rest of the paper is organized as follows. W e present our proposed mo del in Section 2. Our training methodology is presen ted in Section 3. W e 2 giv e empirical results in Section 4, follow ed by related work and conclusions in Sections 5 and 6. Dissolution Network | Dissolution Network | | Formation Network | Formation Network 図解 … Current Network Supervised Learning Graph Embedding Prediction | Formation Network Dissolution Network | | Formation Network Dissolution Network | … + Figure 1: Ov erview of our semi–sup ervised graph em b edding approach to dy- namic link prediction. 2 Prop osed Metho d W e refer to our link prediction method as SemiGr aph , which has the ob jectiv e functions in Eq. (2.9) and Eq. (2.10) for link formation and link dissolution, resp ectiv ely . Predictions are made using Eq. (2.13) and Eq. (2.14). 2.1 Notations W e now give a brief explanation of our notation and definitions for some termi- nology . Consider a sequence of directed netw orks defined as a set of adjacency matrices G = { G 1 , G 2 , . . . , G t } , where G ij t equals 1 if the link i − > j exists at time t and equals 0 otherwise. Let V denote the set of no des in the union of eac h snapshot of the netw ork G 1 ∪ G 2 ∪ · · · ∪ G t , and let | V | denote the num ber of no des in the union of all the netw orks. The goal of this paper is to predict the transition of the net work from G t to G t +1 using the information up to G t . W e define three kinds of netw ork. The curr ent network is the net work state just b efore prediction. With the ab o v e definitions, this is simply G t . The past formation networks are defined b y concatenating all the link formation adja- cency matrices un til time t . The adjacency matrix describing the link formation net work at time t is defined as ( F ij t = 1 if G ij t − G ij t − 1 = 1 F ij t = 0 other w ise. The past dissolution networks are defined similarly , where the adjacency matrix describing the link dissolution net work at time t is defined as ( D ij t = 1 if G ij t − G ij t − 1 = − 1 D ij t = 0 other w ise. 3 2.2 Learning from past formation and dissolution netw orks W e start with the sup ervised part, whic h consists of learning a complex–v alued v ector bilinear mo del with past link formation and link dissolution information pla ying the role of target v alues in the training data. The complex–v alued matrix of the no de representations (i.e. C | V |× d , where | V | denotes the n umber of no des in the net w ork and d the dimension of the learned represen tations) are learned separately for link formation and link dissolution. These are learned in an identical manner, and w e fo cus on the link formation case. F ormally , let ( i, j ) b e a set of links in the past formation netw orks. The set of past formation netw orks is restricted to the information from link formation net works for a time window F t , F t − 1 , , F t − p . The loss function can b e written as Σ i,j ∈ ( i,j ) log p ( j | i ) = Σ i,j ∈ ( i,j ) ( Re ( v T f i W f v f j ) − log Σ j 0 ∈ N e exp ( Re ( v T f i W f v f j 0 ))) , (1) where N e is the set of all edges that did not form links with i in the past for- mation netw orks, W f is a diagonal complex–v alued matrix defining the scaling of the basis, v f i is the complex vector represen tation for node i with dimension d , v denotes the conjugate of v (i.e. v = Re ( v ) − iI m ( v )) and Re() is a function k eeping only the real part of a complex v alue. The use of a complex–v alued v ector instead of a real–v alued v ector is to take in to accoun t symmetric as well as an tisymmetric relations in b oth linear space and time complexity b y using the Hermitian dot pro duct [9] < u, v > = u T v , (2) where u and v are complex–v alued v ectors. The Hermitian dot pro duct has the nice property that < u, v > do es not necessarily equal < v , u > , making it p ossible to consider an tisymmetric relations [9]. W e also restrict each diag- onal elemen t of W f and W d to ha ve an absolute v alue of 1 to mak e the mo del iden tifiable. It is often in tractable to directly optimize Eq. (1) due to the normalization constan t, and w e use negativ e sampling to address this issue [11]. F ormally , giv en a triple ( i, j, γ f ), where i and j are nodes (we assume that i 6 = j ) and γ f is a binary lab el indicating whether a no de pair exists in the past link forma- tion netw orks (this is p ositiv e when links exists in the formation netw orks), w e minimize the cross entrop y loss of classifying the pair i, j with a binary lab el γ f : I ( γ f = 1) log σ ( Re ( v T f i W f v f j )) + I ( γ f = − 1) log σ ( − Re ( v T f i W f v f j )) , (3) where I ( . ) is an indicator function that outputs 1 when the argument is true and 0 otherwise and σ is a sigmoid function defined as σ ( x ) = 1 / (1 + e − x ). Therefore, the sup ervised loss with negative sampling can b e written more succinctly as 4 L f s = E i,j,γ f log σ ( γ f Re ( v T f i W f v f j )) . (4) The sup ervised loss for past dissolution netw orks is defined in an iden tical manner, resulting in L ds = E i,j,γ d log σ ( γ d Re ( v T di W d v d j )) . (5) 2.3 Graph Em b edding from the Curren t Net w ork The unsup ervised part of our mo del consists of a graph embedding defined by the curren t netw ork. In previous w orks, a Skipgram mo del [11] is used to learn the embedding and we adhere to this approach. Giv en a pair of an instance and its context (i.e. ( i, c )), the loss function can be written as Σ i,c ∈ ( i,c ) log p ( c | i ) = Σ i,c ∈ ( i,c ) ( Re ( v T f i u f c ) − log Σ j ∈ N e exp ( Re ( v T f i u f c ))) , (6) where v f i is the complex vector represen tation for node i as used in Eq. (1) and u f c is a parameter for the Skipgram mo del. A con text for each no de is generated by p erforming a truncated random walk (i.e. deep walk) starting from the instance no de [10]. Although other types of walk besides the simple random w alk (suc h as a breadth–first walk) are p ossible [12], preliminary exp eriments sho wed that the difference is marginal and we use the simple deep walk in this pap er. As in Eq. (1), Eq. (6) is in tractable due to the normalization constants and we again resort to negative sampling, resulting in L f u = E i,c,γ c log σ ( γ c Re ( v T f i u f c )) . (7) The unsup ervised loss for link dissolution is developed in an iden tical man- ner, resulting in L du = E i,c,γ c log σ ( γ c Re ( v T di u dc )) . (8) 2.4 Semi–sup ervised Graph Embedding Approach Giv en the loss functions defined in the previous sections, the loss functions for our framework can be expressed as L f = L f s + λ f L f u (9) for learning link formation and L d = L ds + λ d L du (10) 5 for learning link dissolution. The L f s and L ds terms are the supervised losses for predicting past formation or dissolution netw orks, resp ectively , and L f u and L du are the unsup ervised losses for predicting the graph context from the curren t netw ork. The loss function is similar in spirit to graph–based semi– sup ervised learning [13, 14], where graph embedding w as used instead of the graph Laplacian as in [8]. 2.5 Prediction Prediction is made b y using the learned complex–v alued vectors and matrices v f , v d , W f and W d . A straightforw ard approac h is to predict p ( G ij t +1 = 1 | G ij t = 0) = σ ( Re ( v T f i W f v f j )) (11) for link formation and p ( G ij t +1 = 0 | G ij t = 1) = σ ( Re ( v T di W d v d j )) (12) for link dissolution. Although this simple prediction works quite w ell in practice, the predictiv e p erformance can b e further impro ved by combining the predic- tions as p ( G ij t +1 = 1 | G ij t = 0) = σ ( Re ( v T f i W f v f j )) + Re ( v T di W d v d j )) (13) for link formation and p ( G ij t +1 = 0 | G ij t = 1) = σ ( Re ( v T di W d v d j )) + Re ( v T f i W f v f j )) (14) for link dissolution. The underlying understanding of this prediction is that link formation and link dissolution are more likely to b e driv en b y a rewiring pro cess: Thus the more likely a node is to form new links, the more likely the no de is to dissolv e an existing link at the same time. Although subtracting the t wo effects, as in p ( G ij t +1 = 1 | G ij t = 0) = σ ( Re ( v T f i W f v f j )) − Re ( v T di W d v d j )) (15) for link formation and p ( G ij t +1 = 0 | G ij t = 1) = σ ( Re ( v T di W d v d j )) − Re ( v T f i W f v f j )) (16) 6 for link dissolution, is also reasonable (i.e. a growing net work where the more lik ely a node is to form links the less likely the no de is to lose a link), in our exp erimen ts Eqs. (2.13) and (2.14) outp erform the other prediction metho d, so w e use this prediction in our exp eriments. 3 T raining W e use sto chastic gradien t descen t to train our model [15]. W e first sample a no de and p erform a deep walk [10] to sample the con text no des from a net w ork. W e then sample negativ e samples from the current netw ork, past formation net works, and past dissolution net works. Equipp ed with these p ositiv e and negativ e samples, w e tak e a gradien t step with learning rate η 1 for v f , v d , u f and u d . Eac h diagonal elemen t of W f and W d is learned in a different manner. As noted before, to make the mo del iden tifiable we restrict eac h diagonal element of W f and W d to tak e an absolute v alue of 1. Thus each diagonal elemen t of W f can b e rewritten as W f ( i, i ) = cos ( θ ) + isin ( θ ) (17) for i = 1 , 2 , , d . W e tak e a gradien t step with learning rate η 2 in θ instead. All the off–diagonal elements are set to 0. 4 Exp erimen ts Our empirical inv estigations are based on three real–w orld netw orks: a w orld trade netw ork, an interfirm buy er–seller netw ork and bipartite customs data b et w een Japan and the US (Japan to US exp orts only). 4.1 Data W e next give a brief outline of the data used. • W orldT rade is a net work of w orld trade relationships among 50 countries from 1981 to 2000 [16]. W e define tw o countries to b e link ed if the trading v olume was ab o v e the 90th p ercen tile for all trade in a given y ear. • FirmNet work is an interfirm buy er–seller net work for Japan from 2003 to 2012. W e use a subset of this dataset, restricting our atten tion to firms in Hokk aido in the northern part of Japan [17]. • Customs is a bipartite net w ork dataset that records the names of exp orters and consignees of trade from Japan to the US. The data was obtained from the US customs office and co vers the p eriod from Jan uary 2003 to Decem b er 2014. W e fo cus on firms that had more than 500 transactions during the time p erio d, whic h results in 431 Japanese firms and 603 US firms. T o adjust for seasonal effe cts, we aggregate the netw ork data on a y early basis resulting in snapshots of 12 netw orks. Tw o firms are linked if there was a trade relation more than once a year. The basic statistics for eac h dataset are rep orted in T able 1. 7 Dataset Num No des Num Edges Num Unique Edges Av e F orm Ave Diss Snapshots W orldT rade 50 6620 477 16.7 16.7 20 Firm 690 13108 1995 118.9 126.3 10 Customs 1043 7825 1488 113.9 126 12 T able 1: Statistics for datasets. Num Edges denotes the total n umber of inter- actions, Num Unique Edges denotes the n umber of distinct interactions, Av e F orm denotes the a verage num ber of formed edges, Av e Diss denotes the av erage n umber of dissolv ed edges and Snapshots denotes the n umber of discrete time p oin ts observed in our datasets. 4.2 Ev aluation Criteria Giv en a training net w ork G 1: t , w e predict the transition from time t to time t + 1 which consists of a link formation netw ork (i.e. F t +1 ) and a link dissolution net work (i.e. D t +1 ) as sho wn in Fig 1. F or link prediction accuracy , w e use the area under the receiv er op erating characteristic curv e (AUC), where the v alue is calculated for both link dissolution net works and link formation netw orks. The A UC has the nice property that it is not influenced by the distribution of classes, making it suitable in our setting where classes (e.g. formed or not formed, dissolv ed or not dissolved) are highly imbalanced [18]. Higher AUC v alues indicate b etter link prediction p erformance. 4.3 Baseline Metho ds W e compare our prediction algorithm with the following baselines. • Adamic-Adar (AA): scores are calculated as the w eighted v ariation of common neighbors [19] using the current net w ork only . • Preferen tial attachmen t (P A): scores are calculated as the product of the degree of each node from the current net work. • Last time of link age (LL): scores are calculated by ranking pairs in as- cending order according to the last time of link age [20]. W e also compute AA-all and P A-all, whic h are computed ov er the union of all net works until the current net work. The graph heuristic approaches presented here are simple but hav e b een shown to be surprisingly hard to b eat in practice, making them go o d baselines for comparison [1, 20, 21]. In particular, LL has b een shown to often be among the best heuristic measures for link prediction [20, 21]. When predicting link dissolution, we use the complementary score metho d as in [22, 23]. W e also compare our mo del with unsupervised graph embedding and supervised approach (i.e. our mo del without the graph embedding term) to clarify the impro vemen t in semi–sup ervised learning. Throughout all of the exp erimen ts, we set d = 3, the n umber of walks as five, λ f = λ d = 0 . 05, η 1 = 0 . 05, η 2 = 5 × 10 − 6 and p = t − 1 (i.e. using all past information). The learning rate is decreased linearly with the num b er of no des that ha v e been used for training to that p oin t. 8 4.4 Exp erimen tal Results Results for the link formation prediction task are presented in T able 2. W e make the following observ ations. F or the Firm and Customs datasets, our prop osed metho d is the b est, but for the W orldT rade dataset, P A-all sho ws sligh tly better p erformance than our metho d. Nonetheless, for all the netw orks studied here, our prop osed metho d is among the top p erforming metho ds. W e observe that the state of the art baseline metho ds w ork quite well esp ecially when using the union of past netw orks. F or the bipartite Customs dataset, AA and AA-all p erform almost as the same as random selection b ecause we do not ha v e enough link age information to calculate common neighbors. Our metho d also sho ws significan t improv emen ts o ver graph em b edding and sup ervised learning. In this experiment, supervised learning is outperformed by our metho d b y around 15 % - 18 %, while graph embedding is outp erformed by more than 40 %, suggesting the added v alue of our semi–supervised approac h. Dataset W orldT rade Firm Customs AA 0.647 0.615 0.5 P A 0.761 0.709 0.517 AA-all 0.643 0.689 0.5 P A-all 0.885 0.787 0.748 LastTime 0.762 0.778 0.834 Sup ervised 0.703 0.717 0.764 GraphEm b 0.588 0.581 0.606 SemiGraph 0.835 0.828 0.842 T able 2: A UC for link formation prediction Results for link diss olution prediction are presented in T able 3. W e make the following observ ations. F or all the experiments our metho d p erforms b etter than the state of the art baseline metho ds. It is worth noting that our method outp erforms the other methods quite significantly for the Firm dataset, whereas other unsupervised approaches show almost no signs of predictabilit y . In this exp erimen t, supervised learning is outperformed by our metho d by around 7 % - 13 %, suggesting again the added v alue of our semi–sup ervised approac h. The graph em b edding approac h sho ws almost no sign of predictabilit y in predicting link dissolution. W e also observe that when predicting link dissolution, adding past information do es not necessarily increase the predictiv e p erformance. F or the Customs dataset, using the complementary score does not necessarily im- pro ve predictability , and a b etter AUC score can be obtained by using the normal P A score. T o see how an increase in past information affects the performance of our prop osed model, we report results on predicting the transition of a net work for the y ears 2005 to 2012 for the Firm dataset. Because w e only ha v e ten snap- shots of the net w ork, the prediction in 2005 is based on only one past transition and the last netw ork b efore prediction. W e observe that for link formation prediction, almost all the metho ds including our prop osed metho d show im- pro ved accuracy with an increase in past information. Our method is among the b est performing methods, with a p erformance comparable to P A-all. Com- 9 Dataset W orldT rade Firm Customs AA 0.638 0.522 0.496 P A 0.711 0.504 0.325 AA-all 0.642 0.488 0.49 P A-all 0.629 0.458 0.467 LastTime 0.596 0.529 0.671 Sup ervised 0.651 0.674 0.620 GraphEm b 0.486 0.514 0.395 SemiGraph 0.737 0.725 0.684 T able 3: A UC for link dissolution prediction. paring our p erformance with sup ervised learning (our method without graph em b edding), w e clearly see the b enefit of our semi–sup ervised approach. F or link dissolution, we observ e that our method p erforms better than the base- line methods. Although sup ervised learning sometimes p erforms slightly b etter than our metho d, ov erall we observe the added v alue of our semi–sup ervised approac h. Although less clear than link formation prediction w e also observe that our metho d sho w impro ved accuracy with an increase in past information. ● ● ● ● ● ● ● ● 0.5 0.6 0.7 0.8 2006 2008 2010 2012 Y ear A UC v ar iab le ● SemiGr aph AA.All P A.All LastTime Super vised Gr aphEmb (a) Link formation ● ● ● ● ● ● ● ● 0.45 0.50 0.55 0.60 0.65 0.70 2006 2008 2010 2012 Y ear A UC variable ● SemiGraph AA.All P A.All LastTime Supervised GraphEmb (b) Link dissolution Figure 2: AUC for link formation and link dissolution prediction for the Firm dataset. 4.5 P arameter Sensitivit y T o ev aluate how changes to the parametrization affects the final predictiv e p er- formance, we rep ort the effect of v arying the num b er of dimensions and λ (w e set λ := λ f = λ d ). Other parameters are held fixed as b efore. Figure 3(a) sho ws the e ffect of v arying the n um b er of dimensions, and shows that while the p erformance do es not v ary greatly , the optimum seems to be three. Figure 3(b) examines the effect of v arying λ . This sho ws a clear impro vemen t compared to sup ervised learning (i.e. λ = 0), where the optimum v alue seems to b e around 0.05. Beyond that, the performance gradually deteriorates as λ increases. These 10 exp erimen ts show that although the usefulness of our model depends on several parameters, the choice is not to o sensitive to these parameters. ● ● ● ● ● 0.68 0.72 0.76 0.80 0.84 1 2 3 4 5 dimension A UC variable ● Form Dissolve (a) Stabilit y ov er dimension d . ● ● ● ● ● 0.70 0.75 0.80 0.00 0.05 0.10 0.15 0.20 lambda A UC variable ● Form Dissolve (b) Stabilit y ov er λ . Figure 3: Parameter sensitivit y . 5 Related w ork 5.1 Link Prediction The static link prediction problem has b een extensiv ely studied in the literature [1]. Among the man y proposed approaches, graph–based heuristics are the most p opular due to their simplicity and high p erformance on a v ariety of practical problems [19]. In the dynamic setting, [20, 21] examined extensions of exist- ing static graph–based heuristic measures for temporal link prediction. They sho wed that extremely simple graph–based heuristic measures such as last time to link work surprisingly w ell in practice. 5.2 Link Dissolution Prediction Previous research fo cusing on predicting link dissolution is m uch less common than for link formation prediction. Recen t research includes [24], whic h stud- ied unfollo wing behavior on t witter, [25] whic h studied unfriending b eha vior on F aceb o ok and [22, 23] which studied link dissolution on Wikip edia. In all of these previous studies, it w as sho wn that predicting link dissolution is harder than predicting link formation. Compared to these approac hes, where infor- mation additional to netw ork information is required to p erform prediction, our approac h is versatile in the sense that we only need snapshots of netw ork information. 5.3 Other Related Approaches F rom a sup ervised learning p erspective, our approach can b e seen as a de- scendan t of a laten t feature or matrix factorization approac h to link prediction [18, 26]. The main differences are 1) learning past link formation and dissolu- tion dynamics directly as well as separately , 2) using complex–v alued v ectors to make it p ossible take into accoun t symmetric as well as antisymmetric rela- tions for b oth linear space and time complexity and 3) the unsup ervised graph em b edding part prop osed in this paper. Bay esian extensions of latent feature 11 mo dels also exist [27], with some studies allowing for an infinite n um b er of laten t features [28]. Semi–sup ervised approac hes to dynamic link prediction ha ve also previously b een explored. In [29, 30], Link Propagation w as proposed, where a k ernel– based semi–sup ervised approac h to link prediction is performed by constructing a kernel that compares no de pairs that constrains the v alues in the adjacency matrix to v ary smo othly according to the kernel. Our approac h is arguably simpler than their approac h, as the effectiv eness of their method depends on the choice of k ernel which has to b e pre-sp ecified. A p opular approac h to temporal link prediction is based on extensions of static laten t space models [31, 32] and mixed mem b ership stochastic block mod- els [33, 34] in a temp oral setting. The main idea is to mo del longitudinal netw ork data as smo oth tra jectories in a latent space. In social netw orks, several mo dels extending the exponential random graph models to a dynamic setting hav e been prop osed [35, 36]. Along these lines, [36] is a nice extension of the exp onential random graph models that enables differen t mo deling for b oth link formation and link dissolution dynamics. A mo del similar to the exponential random graph model was also prop osed for statistical relational learning [37]. How ev er, these approaches are generally computationally expensive which limits scalabil- it y . Other studies concerning temporal net works include [16], which prop osed a longitudinal mixed effect mo del capable of learning laten t represen tations that ev olves in a simple auto-regressive manner, [38] where a v ector autoregressiv e mo del was used for link prediction in dynamic graphs and [6] which prop osed a tensor–based method to predict perio dic temporal data with multiple patterns. 6 Conclusions W e hav e proposed SemiGr aph , a simple discrete–time semi–sup ervised graph em b edding approach to link prediction in dynamic netw orks. Our mo del is capable of learning differen t em b eddings for b oth formation and dissolution dynamics. T o show the effectiveness of our approach, w e fo cused on predicting the tr ansition of a netw ork, including b oth link formation prediction and link dissolution prediction. W e ha ve sho wed that our metho d outp erforms previous state of the art baseline metho ds in predicting link dissolution and is comparable to state of the art metho ds in predicting link formation through exp eriments using a v ariet y of real–world netw orks. References [1] D. Lib en-No well and J. Klein b erg, “The link prediction problem for so cial net works,” in Pr o c e e dings of the Twelfth International Confer enc e on In- formation and Know le dge Management , CIKM ’03, (New Y ork, NY, USA), pp. 556–559, ACM, 2003. [2] L. Geto or and C. P . Diehl, “Link mining: A survey ,” SIGKDD Explor. Newsl. , vol. 7, pp. 3–12, Dec. 2005. 12 [3] A. Clauset, C. Mo ore, and M. E. J. Newman, “Hierarc hical structure and the prediction of missing links in netw orks,” Natur e , vol. 453, pp. 98–101, 2008. [4] P . Sark ar, S. M. Siddiqi, and G. J. Gordon, “A laten t space approach to dy- namic em bedding of co-o ccurrence data,” in Pr o c e e dings of the Eleventh In- ternational Confer enc e on A rtificial Intel ligenc e and Statistics (AIST A TS 2007) (M. Meila and X. Shen, eds.), 2007. [5] M. A. Hasan, V. Chao ji, S. Salem, and M. Zaki, “Link prediction using sup ervised learning,” in In Pr o c. of SDM 06 workshop on Link Analysis, Counterterr orism and Se curity , 2006. [6] D. M. Dunlavy , T. G. Kolda, and E. Acar, “T emp oral link prediction using matrix and tensor factorizations,” ACM T r ans. Know l. Disc ov. Data , v ol. 5, pp. 10:1–10:27, F eb. 2011. [7] N. A. A. Christakis and J. H. H. F owler, “The Spread of Obesity in a Large So cial Net w ork o ver 32 Y ears,” New England Journal of Me dicine , vol. 357, pp. 370–379, July 2007. [8] Z. Y ang, W. W. Cohen, and R. Salakh utdinov, “Revisiting semi-supervised learning with graph embeddings,” in Pr o c e e dings of the 33nd International Confer enc e on Machine L e arning, ICML 2016, New Y ork City, NY, USA, June 19-24, 2016 , pp. 40–48, 2016. [9] T. T rouillon, J. W elbl, S. Riedel, ´ E. Gaussier, and G. Bouc hard, “Complex em b eddings for simple link prediction,” pp. 1–2, 2016. [10] B. P erozzi, R. Al-Rfou, and S. Skiena, “Deepw alk: Online learning of social represen tations,” in Pr o c e e dings of the 20th ACM S IGKDD International Confer enc e on Know le dge Disc overy and Data Mining , KDD ’14, (New Y ork, NY, USA), pp. 701–710, A CM, 2014. [11] T. Mikolo v, I. Sutsk ever, K. Chen, G. S. Corrado, and J. Dean, “Dis- tributed representations of words and phrases and their compositional- it y ,” in A dvanc es in Neur al Information Pr o c essing Systems 26 (C. J. C. Burges, L. Bottou, M. W elling, Z. Ghahramani, and K. Q. W einberger, eds.), pp. 3111–3119, Curran Asso ciates, Inc., 2013. [12] J. T ang, M. Qu, M. W ang, M. Zhang, J. Y an, and Q. Mei, “Line: Large- scale information netw ork em b edding,” in Pr o c e e dings of the 24th Interna- tional Confer enc e on World Wide Web , WWW ’15, (New Y ork, NY, USA), pp. 1067–1077, ACM, 2015. [13] D. Zhou, O. Bousquet, T. N. Lal, J. W eston, and B. Schlk opf, “Learn- ing with local and global consistency ,” in A dvanc es in Neur al Information Pr o c essing Systems 16 , pp. 321–328, MIT Press, 2004. [14] X. Zhu, Z. Ghahramani, and J. Laffert y , “Semi-supervised learning using gaussian fields and harmonic functions,” in IN ICML , pp. 912–919, 2003. 13 [15] L. Bottou, “Large-scale mac hine learning with sto chastic gradient de- scen t,” in Pr o c e e dings of the 19th International Confer enc e on Computa- tional Statistics (COMPST A T’2010) (Y. Lechev allier and G. Saporta, eds.), (P aris, F rance), pp. 177–187, Springer, August 2010. [16] A. H. W estveld and P . D. Hoff, “A mixed effects model for longitudinal relational and netw ork data, with applications to in ternational trade and conflict,” The Annals of Applie d Statistics , vol. 5, pp. 843–872, 06 2011. [17] R. Hisano, T. W atanab e, T. Mizuno, T. Ohnishi, and D. Sornette, “The gradual evolution of buyer-seller netw orks and their role in aggregate fluctu- ations,” CARF Working p ap er , v ol. CARF-F-389, no. http://www.carf.e.u- toky o.ac.jp/workingpaper/F389.html, 2015. [18] A. K. Menon and C. Elk an, “Link prediction via matrix factorization,” in Pr o c e e dings of the 2011 Eur op e an Confer enc e on Machine L e arning and Know le dge Disc overy in Datab ases - V olume Part II , ECML PKDD’11, (Berlin, Heidelb erg), pp. 437–452, Springer-V erlag, 2011. [19] L. A. Adamic and E. Adar, “F riends and neighbors on the web,” So cial Networks , vol. 25, no. 3, pp. 211–230, 2003. [20] T. T ylenda, R. Angelov a, and S. Bedathur, “T ow ards time-a ware link pre- diction in ev olving social net works,” in Pr o c e e dings of the 3r d Workshop on So cial Network Mining and A nalysis , SNA-KDD ’09, (New Y ork, NY, USA), pp. 9:1–9:10, ACM, 2009. [21] P . Sark ar, D. Chakrabarti, and M. Jordan, “Nonparametric link predic- tion in large scale dynamic net works,” Ele ctr on. J. Statist. , vol. 8, no. 2, pp. 2022–2065, 2014. [22] J. Preusse, J. Kunegis, M. Thimm, T. Gottron, and S. Staab, “Structural Dynamics of Kno wledge Netw orks,” in ICWSM’13: Pr o c e e dings of the 7th International AAAI Confer enc e on Weblo gs and So cial Me dia , 2013. [23] J. Preusse, J. Kunegis, M. Thimm, and S. Sizov, “DecLiNe – mo dels for deca y of links in netw orks,” 2014. [24] H. Kw ak, S. B. Mo on, and W. Lee, “More of a receiver than a giver: Wh y do people unfollo w in t witter?,” in ICWSM (J. G. Breslin, N. B. Ellison, J. G. Shanahan, and Z. T ufek ci, eds.), The AAAI Press, 2012. [25] Y. Y ang, N. V. Chawla, P . Basu, B. Prabhala, and T. L. Porta, “Link prediction in human mobilit y netw orks,” in A dvanc es in So cial Networks A nalysis and Mining (ASONAM), 2013 IEEE/ACM International Confer- enc e on , pp. 380–387, Aug 2013. [26] M. Kolar, L. Song, A. Ahmed, and E. P . Xing, “Estimating time-v arying net works,” A nn. Appl. Stat. , vol. 4, pp. 94–123, 03 2010. [27] P . D. Hoff, “Bilinear Mixed-Effects Mo dels for Dyadic Data,” Journal of the A meric an Statistic al Asso ciation , vol. 100, pp. 286–295, Mar. 2005. 14 [28] K. Miller, M. I. Jordan, and T. L. Griffiths, “Nonparametric latent feature mo dels for link prediction,” in A dvanc es in Neur al Information Pr o c essing Systems 22 (Y. Bengio, D. Sch uurmans, J. D. Lafferty , C. K. I. Williams, and A. Culotta, eds.), pp. 1276–1284, Curran Asso ciates, Inc., 2009. [29] H. Kashima, T. Kato, Y. Y amanishi, M. Sugiy ama, and K. Tsuda, “Link propagation: A fast semi-supervised learning algorithm for link prediction.” [30] R. Ra ymond and H. Kashima, “F ast and Scalable Algorithms for Semi- sup ervised Link Prediction on Static and Dynamic Graphs,” in Machine L e arning and Know le dge Disc overy in Datab ases (J. Balc´ azar, F. Bonchi, A. Gionis, and M. Sebag, eds.), v ol. 6323 of L e ctur e Notes in Computer Sci- enc e , c h. 9, pp. 131–147, Berlin, Heidelb erg: Springer Berlin / Heidelb erg, 2010. [31] P . Sark ar and A. W. Mo ore, “Dynamic social netw ork analysis using laten t space mo dels,” SIGKDD Explor. Newsl. , vol. 7, pp. 31–40, Dec. 2005. [32] D. K. Sew ell and Y. Chen, “Latent space mo dels for dynamic netw orks,” Journal of the Americ an Statistic al Asso ciation , v ol. 110, no. 512, pp. 1646– 1657, 2015. [33] W. F u, L. Song, and E. P . Xing, “Dynamic mixed membership blo ck- mo del for evolving netw orks,” in Pr o c e e dings of the 26th Annual Interna- tional Confer enc e on Machine L e arning , ICML ’09, (New Y ork, NY, USA), pp. 329–336, ACM, 2009. [34] E. P . Xing, W. F u, and L. Song, “A state-space mixed mem b ership blo c k- mo del for dynamic net w ork tomograph y ,” A nnals of Applie d Statistics , v ol. 4, no. 2, pp. 535–566, 2010. [35] F. Guo, S. Hanneke, W. F u, and E. P . Xing, “Recov ering temp orally rewiring netw orks: A mo del-based approach,” in Pr o c e e dings of the 24th International Confer enc e on Machine L e arning , ICML ’07, (New Y ork, NY, USA), pp. 321–328, ACM, 2007. [36] P . N. Krivitsky and M. S. Handco c k, “A separable mo del for dynamic net works,” Journal of the R oyal Statistic al So ciety: Series B (Statistic al Metho dolo gy) , vol. 76, no. 1, pp. 29–46, 2014. [37] B. T ask ar, M. fai W ong, P . Abb eel, and D. Koller, “Link prediction in relational data,” in in Neur al Information Pr o c essing Systems , 2003. [38] E. Richard, N. Baskiotis, T. Evgeniou, and N. V ay atis, “Link disco v ery us- ing graph feature trac king,” in A dvanc es in Neur al Information Pr o c essing Systems 23 (J. D. Laffert y , C. K. I. Williams, J. Sha we-T aylor, R. S. Zemel, and A. Culotta, eds.), pp. 1966–1974, Curran Asso ciates, Inc., 2010. 15
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment