COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution

COEV OL VE: A Join t P oin t Pro cess Mo del for Information Diﬀusion and Net w ork Co-ev olution Mehrdad F ara jtabar 1 , Yic hen W ang 1 , Man uel Gomez Ro driguez 2 , Sh uang Li 1 , Hongyuan Zha 1 , and Le Song 1 1 Georgia Institute of T ec hnology , { mehrdad,yichen.w ang,sli370 } @gatec h.edu, { zha,lsong } @cc.gatec h.edu 2 Max Plank Institute for Soft ware Systems, man uelgr@mpi-sws.org Abstract Information diﬀusion in online so cial netw orks is aﬀected by the underlying netw ork topology , but it also has the p o w er to change it. Online users are constan tly creating new links when exp osed to new information sources, and in turn these links are alternating the w ay information spreads. How ever, these t wo highly in tertwined stochastic processes, information diﬀusion and net w ork evolution, ha ve b een predominan tly studied sep ar ately , ignoring their co-evolutionary dynamics. W e prop ose a temporal p oin t process model, Coevol ve , for such join t dynamics, allowing the inten- sit y of one pro cess to be modulated b y that of the other. This mo del allows us to eﬃcien tly simulate in terleav ed diﬀusion and netw ork ev ents, and generate traces ob eying common diﬀusion and net work pat- terns observ ed in real-world net works. F urthermore, we also dev elop a conv ex optimization framework to learn the parameters of the mo del from historical diﬀusion and netw ork evolution traces. W e experi- men ted with b oth syn thetic data and data gathered from Twitter, and sho w that our mo del pro vides a go od ﬁt to the data as w ell as more accurate predictions than alternatives. 1 In tro duction Online so cial netw orks, such as Twitter or W eibo, hav e b ecome large information netw orks where p eople share, discuss and search for information of p ersonal interest as well as breaking news [1]. In this con text, users often forw ard to their fol lowers information they are exp osed to via their fol lowe es , triggering the emergence of information c asc ades that tra vel through the netw ork [2], and constantly create new links to information sources, triggering changes in the netw ork itself o ver time. Imp ortan tly , recent empirical studies with Twitter data ha ve shown that b oth information diﬀusion and netw ork evolution are coupled and netw ork c hanges are often triggered b y information diﬀusion [3, 4, 5]. While there hav e b een many recen t w orks on modeling information diﬀusion [6, 7, 8, 2, 9] and net work ev olution [10, 11, 12], most of them treat these tw o sto c hastic pro cesses indep enden tly and separately , ignoring the inﬂuence one may ha ve on the other o ver time. Thus, to b etter understand information diﬀusion and netw ork ev olution, there is an urgen t need for join t probabilistic mo dels of the tw o pro cesses, which are largely inexistent to date. In this paper, w e propose a probabilistic generativ e mo del, Coev ol ve , for the join t dynamics of infor- mation diﬀusion and netw ork ev olution. Our mo del is based on the framew ork of temp oral p oin t processes, whic h explicitly characterizes the con tinuous time in terv al b et w een even ts, and it consists of tw o interw ov en and interdependent comp onen ts, as sho wn in Figure 1: I. Information diﬀusion pro cess. W e design an “iden tity revealing” multiv ariate Ha wkes pro cess [13] to capture the mutual excitation b ehavior of retw eeting even ts, where the intensit y of such ev ents in a 1 Diffusion network Information diffusion process Drive Link creation process Support Alter Figure 1: Illustration of how information diﬀusion and netw ork structure processes interact user is b o osted by previous ev ents from her time-v arying set of follow ees. Although Hawk es pro cesses ha ve b een used for information diﬀusion b efore [14, 15, 16, 17, 18, 19, 20, 21], the k ey inno v ation of our approach is to explicitly mo del the excitation due to a particular source node, hence rev ealing the identit y of the source. Suc h design reﬂects the reality that information sources are explicitly ac knowledged, and it also allows a particular information source to acquire new links in a rate according to her “informativeness”. I I. Netw ork evolution pro cess. W e mo del link creation as an “information driv en” surviv al process, and couple the in tensity of this process with retw eeting even ts. Although surviv al pro cesses ha ve b een used for link creation before [22, 23], the k ey inno v ation in our model is to incorporate ret weeting ev ents as the driving force for suc h pro cesses. Since our mo del has captured the source iden tity of each ret weeting ev ent, new links will b e targeted tow ard information sources, with an in tensity prop ortional to their degree of excitation and eac h source’s inﬂuence. Our mo del is designed in such a w ay that it allo ws the t wo pro cesses, information diﬀusion and netw ork ev olution, unfold sim ultaneously in the same time scale and exercise bidirectional inﬂuence on each other, allo wing sophisticated coevolutionary dynamics to b e generated, as illustrated in Figure 2. Imp ortan tly , the ﬂexibility of our mo del do es not preven t us from eﬃciently sim ulating diﬀusion and link ev ents from the mo del and learning its parameters from real world data: • Eﬃcient simulation. W e design a scalable sampling pro cedure that exploits the sparsit y of the generated net works. Its complexity is O ( nd log m ), where n is the num b er of even ts, m is the num b er of users and d is the maxim um n umber of follow ees p er user. • Conv ex parameters learning. W e sho w that the model parameters that maximize the joint lik eli- ho od of observed diﬀusion and link creation ev ents can b e eﬃciently found via conv ex optimization. Then, we experiment with our mo del and show that it can produce coevolutionary dynamics of information diﬀusion and netw ork evolution, and generate retw eet and link even ts that ob ey common information diﬀusion patterns ( e.g. , cascade structure, size and depth), static netw ork patterns ( e.g. , no de degree) and temp oral net work patterns ( e.g. , shrinking diameter) describ ed in related literature [24, 12, 25]. Finally , we sho w that, b y mo deling the co ev olutionary dynamics, our model provides signiﬁcan tly more accurate link and diﬀusion ev ent predictions than alternatives in large scale Twitter dataset [3]. The remainder of this article is organized as follows. W e ﬁrst pro ceed by building suﬃcient background on the temp oral p oin t pro cesses framew ork in Section 2. Then, w e introduce our join t mo del of information diﬀusion and netw ork structure co-evolution in Section 3. Sections 4 and 5 are devoted to answer t w o essen tial questions: how can w e generate data from the model? and how can we eﬃciently learn the mo del parameters from historical ev ent data? An y generative mo del should be able to answer the ab ov e questions. In Sections 6, 7, and 8 we p erform empirical in vestigation of the prop erties of the mo del, we ev aluate the accuracy of the parameter estimation in synthetic data, and we ev aluate the p erformance of the prop osed mo del in real-w orld dataset, resp ectiv ely . Section 9 reviews the related work and Section 10 discusses some extensions to the proposed mo del. Finally , the pap er is concluded in Section 11. 2 Chris tine Sophie Da vid Ja c ob Bob 1pm, D : Cool pap er 1 :1 0 p m, @ D : Ind eed 1:3 0pm, @S @ D : V er y use ful 2:0 3pm, @ D : I w an t th a t c ar 1: 20 pm @C @S @ D : R eal l y? Wi l l ch eck 2pm, D : Ni ce c ar 1:3 5pm @B @S @ D : Ind eed bri l l i an t 1: 45 pm D S me ans S f ol l o w s D 1: 15 pm, @S @ D : Cl as si c Figure 2: Illustration of information diﬀusion and netw ork structure co-evolution: David’s tw eet at 1:00 pm ab out a pap er is ret weeted b y Sophie and Christine resp ectiv ely at 1:10 pm and 1:15 pm to reac h out to Jacob. Jacob retw eets ab out this pap er at 1:20 pm and 1:35 pm and then ﬁnds David a go o d source of information and decides to follow him directly at 1:45 pm. Therefore, a new path of information to him (and his do wnstream follo wers) is created. As a consequence, a subsequent tw eet b y David ab out a car at 2:00 pm directly reaches out to Jacob without need to Sophie and Christine ret weet. 2 Bac kground on T emp oral P oin t Pro cesses A temporal point pro cess is a random pro cess whose realization consists of a list of discrete ev ents lo calized in time, { t i } with t i ∈ R + and i ∈ Z + . Many diﬀerent types of data pro duced in online so cial net works can b e represen ted as temp oral p oin t processes, such as the times of retw eets and link creations. A temp oral p oint pro cess can b e equiv alently represented as a counting pro cess, N ( t ), which records the num b er of even ts b efore time t . Let the history H ( t ) b e the list of times of even ts { t 1 , t 2 , . . . , t n } up to but not including time t . Then, the n umber of observed even ts in a small time window [ t, t + dt ) of length dt is dN ( t ) = X t i ∈H ( t ) δ ( t − t i ) dt, (1) and hence N ( t ) = R t 0 dN ( s ), where δ ( t ) is a Dirac delta function. More generally , given a function f ( t ), we can deﬁne the con volution with respect to dN ( t ) as f ( t ) ? dN ( t ) := Z t 0 f ( t − τ ) dN ( τ ) = X t i ∈H ( t ) f ( t − t i ) . (2) The p oint process represen tation of temp oral data is fundamentally diﬀerent from the discrete time repre- sen tation typically used in so cial net work analysis. It directly models the time interv al b etw een even ts as random v ariables, av oids the need to pic k a time window to aggregate even ts, and allows temp oral even ts to b e mo deled in a ﬁne grained fashion. Moreov er, it has a remark ably ric h theoretical supp ort [26]. An important wa y to characterize temp oral point pro cesses is via the conditional intensit y function — a stochastic model for the time of the next ev ent giv en all the times of previous ev ents. F ormally , the conditional intensit y function λ ∗ ( t ) (intensit y , for short) is the conditional probabilit y of observing an even t in a small windo w [ t, t + dt ) given the history H ( t ), i.e. , λ ∗ ( t ) dt := P { even t in [ t, t + dt ) |H ( t ) } = E [ dN ( t ) |H ( t )] , (3) where one typically assumes that only one even t can happ en in a small windo w of size dt and thus dN ( t ) ∈ 3 𝑡 𝑓 ∗ ( 𝑡 ) 𝑆 ∗ ( 𝑡 ) P r . ev en t su r vi v es a ft er 𝑡 ( sur vi v al fun c ti on) 𝐹 ∗ ( 𝑡 ) P r . ev en t oc c ur s be f or e 𝑡 ( c df ) 𝑡 + 𝑑𝑡 𝑡 𝑓 ∗ 𝑡 𝑑𝑡 P r . ev en t oc c ur s be tw een [ 𝑡 , 𝑡 + d 𝑡 ] Figure 3: Illustration of the conditional densit y function, the conditional cum ulative densit y function and the surviv al function a) Poisson pro cess 𝜆 ∗ 𝑡 = 𝜇 𝑡 b) Hawk es pro cess                           󰇛  󰇜   󰇛  󰇜  c) Surviv al pro cess 𝜆 ∗ 𝑡 = 1 − 𝑁 𝑡 𝑔 ( 𝑡 ) 𝑡 Figure 4: Three t yp es of p oin t processes with a typical realization { 0 , 1 } . Then, giv en the observ ation un til time t and a time t 0 > t , we can also c haracterize the conditional probabilit y that no even t happens until t 0 as S ∗ ( t 0 ) = exp − Z t 0 t λ ∗ ( τ ) dτ ! , (4) the (conditional) probability density function that an ev ent o ccurs at time t 0 as f ∗ ( t 0 ) = λ ∗ ( t 0 ) S ∗ ( t 0 ) , (5) and the (conditional) cum ulative density function, which accoun ts for the probability that an even t happ ens b efore time t 0 : F ∗ ( t 0 ) = 1 − S ∗ ( t 0 ) = Z t 0 t f ∗ ( τ ) dτ . (6) Figure 3 illustrates these quan tities. Moreov er, w e can express the log-lik eliho o d of a list of even ts { t 1 , t 2 , . . . , t n } in an observ ation window [0 , T ) as L = n X i =1 log λ ∗ ( t i ) − Z T 0 λ ∗ ( τ ) dτ , T > t n . (7) This simple log-lik eliho od will later enable us to learn the parameters of our mo del from observed data. Finally , the functional form of the in tensity λ ∗ ( t ) is often designed to capture the phenomena of interests. Some useful functional forms we will use are [26]: (i) Poisson pro cess. The in tensity is assumed to be indep enden t of the history H ( t ), but it can b e a nonnegativ e time-v arying function, i.e. , λ ∗ ( t ) = g ( t ) > 0 . (8) 4 (ii) Hawk es Pro cess. The intensit y is history dep enden t and mo dels a mutual excitation b et ween ev ents, i.e. , λ ∗ ( t ) = µ + ακ ω ( t ) ? dN ( t ) = µ + α X t i ∈H ( t ) κ ω ( t − t i ) , (9) where, κ ω ( t ) := exp( − ω t ) I [ t > 0] (10) is an exp onen tial triggering k ernel and µ > 0 is a baseline in tensity independent of the history . Here, the o ccurrence of each historical ev ent increases the in tensity b y a certain amount determined b y the k ernel and the weigh t α > 0, making the in tensity history dep endent and a stochastic pro cess by itself. In our work, we focus on the exp onen tial kernel, ho wev er, other functional forms, such as log-logistic function, are p ossible, and the general prop erties of our mo del do not dep end on this particular choice. (iii) Surviv al pro cess. There is only one even t for an instantiation of the process, i.e. , λ ∗ ( t ) = (1 − N ( t )) g ( t ) , (11) where g ( t ) > 0 and the term (1 − N ( t )) makes sure λ ∗ ( t ) is 0 if an even t already happened b efore t . Figure 4 illustrates these pro cesses. In terested reader should refer to [26] for more details on the framew ork of temp oral point pro cesses. 3 Generativ e Mo del of Inf ormation Diﬀusion and Net w ork Evo- lution In this section, we use the abov e bac kground on temp oral p oin t pro cesses to formulate Coev ol ve , our probabilistic mo del for the joint dynamics of information diﬀusion and netw ork evolution. 3.1 Ev en t Represen tation W e mo del the generation of tw o types of ev ents: t weet/ret weet even ts, e r , and link creation even ts, e l . Instead of just the time t , w e record each even t as a triplet, as illustrated in Figure 5(a): e r or e l := ( u ↑ destination , source ↓ s, t ↑ time ) . (12) F or ret weet ev ent , the triplet means that the destination no de u retw eets at time t a tw eet originally p osted by source no de s . Recording the source no de s reﬂects the real world scenario that information sources are explicitly ac knowledged. Note that the occurrence of even t e r do es not mean that u is directly ret weeting from or is connected to s . This even t can happ en when u is retw eeting a message b y another no de u 0 where the original information source s is ac knowledged. No de u will pass on the same source ackno wledgemen t to its follow ers ( e.g. , “I agree @a @b @c @s”). Original tw eets p osted by no de u are allow ed in this notation. In this case, the ev ent will simply be e r = ( u, u, t ). Giv en a list of ret weet even ts up to but not including time t , the history H r us ( t ) of retw eets by u due to source s is H r us ( t ) = { e r i = ( u i , s i , t i ) | u i = u and s i = s } . (13) The entire history of ret weet even ts is denoted as H r ( t ) := ∪ u,s ∈ [ m ] H r us ( t ) (14) F or link creation even t , the triplet means that destination no de u creates at time t a link to source no de s , i.e. , from time t on, no de u starts following node s . T o ease the exp osition, w e restrict ourselves to the case where links cannot b e deleted and thus each (directed) link is created only once. How ever, our mo del can b e easily augmen ted to consider multiple link creations and deletions per node pair, as discussed in Section 10. W e denote the link creation history as H l ( t ). 5 D S m eans S f ol l o w s D 1p m, D : Cool paper (D , D , 1: 00 ) 1: 35 pm @B @S @ D : Indeed bri l l i a n t (J , D , 1: 35 ) 4: 10 pm, @ B : Beauti ful (J , B , 4:10 ) 4p m, B : It sno w s (B , B , 4: 00 ) (J , D , 1 :4 5 ) 5 p m, J : Goi ng ou t (J , J , 5:00 ) (J , S, 5: 25 ) 1: 45 pm Chris tine Soph ie Da vid Ja c ob Bob Li nk cr ea ti on ev en t 𝑒 𝑙 = ( des ti na ti on,so ur ce , ti me ) T w eet/R et w eet ev en t 𝑒 𝑟 = ( des ti na ti on,so ur ce , ti me ) “Iden ti ty Reveal i ng” twe et/retwe et processe s 𝑁 𝑡 ∈ 0 ∪ 𝑍 + “Informati on dri ve n” l i nk creati on processe s 𝐴 𝑡 ∈ { 0 , 1 } 𝑡 (J , D) (J , S) … … 𝑡 𝐴 𝐽𝐷 ( 𝑡 ) 𝐴 𝐽𝑆 ( 𝑡 ) 𝑡 (J , J ) (J , D) (J , B) … 𝑡 𝑁 𝐽𝐽 ( 𝑡 ) 𝑁 𝐽𝐷 ( 𝑡 ) 𝑁 𝐽𝐵 ( 𝑡 ) … a) Even t represen tation b) Poin t and counting pro cesses Figure 5: Even ts as point and coun ting processes. P anel (a) sho ws a trace of ev en ts generated b y a tw eet from David follow ed by new links Jacob creates to follow Da vid and Sophie. Panel (b) shows the associated p oin ts in time and the counting pro cess realization. 3.2 Join t Mo del with Tw o In terwo v en Comp onen ts Giv en m users, we use t wo sets of counting processes to record the generated even ts, one for information diﬀusion and another for netw ork ev olution. More sp eciﬁcally , I. Retw eet ev ents are recorded using a matrix N ( t ) of size m × m for eac h ﬁxed time p oint t . The ( u, s )-th en try in the matrix, N us ( t ) ∈ { 0 } ∪ Z + , counts the n umber of retw eets of u due to source s up to time t . These counting pro cesses are “identit y revealing”, since they keep track of the source no de that triggers each retw eet. The matrix N ( t ) is typically le ss sparse than A ( t ), since N us ( t ) can b e nonzero ev en when node u does not directly follow s . W e also let d N ( t ) := ( dN us ( t ) ) u,s ∈ [ m ] . I I. Link ev ents are recorded using an adjacency matrix A ( t ) of size m × m for each ﬁxed time p oin t t . The ( u, s )-th entry in the matrix, A us ( t ) ∈ { 0 , 1 } , indicates whether u is directly following s . Therefore, A us ( t ) = 1 means the directed link has b een created b efore t . F or simplicity of exposition, we do not allo w self-links. The matrix A ( t ) is typically sparse, but the num b er of nonzero entries can change o ver time. W e also deﬁne d A ( t ) := ( dA us ( t ) ) u,s ∈ [ m ] . Then, the interw o ven information diﬀusion and net work ev olution pro cesses can b e characterized using their resp ectiv e intensities E [ d N ( t ) | H r ( t ) ∪ H l ( t )] = Γ ∗ ( t ) dt (15) E [ d A ( t ) | H r ( t ) ∪ H l ( t )] = Λ ∗ ( t ) dt, (16) where, Γ ∗ ( t ) = ( γ ∗ us ( t ) ) u,s ∈ [ m ] (17) Λ ∗ ( t ) = ( λ ∗ us ( t ) ) u,s ∈ [ m ] . (18) The sign ∗ means that the intensit y matrices will depend on the joint history , H r ( t ) ∪ H l ( t ), and hence their ev olution will be coupled. By this coupling, we make: (i) the counting pro cesses for link creation to b e “information driv en” and (ii) the ev olution of the linking structure to change the information diﬀusion pro cess. In the next tw o sections, w e will sp ecify the details of these tw o in tensity matrices. 6                             Check whether the link already there Retw eet through B Hawk es process Initiative coming from neighbors Survival Process High intensity when no link and retweet often Poisson process User  ’ s own initiative Retw eet through C  󰇟   󰇠       D S means S follow s D Chris tine Soph ie Da vid Ja c ob Bob 1: 45 pm 𝑁 𝐵𝐷 𝑡 𝑁 𝐶 𝐷 𝑡 𝑁 𝐷 𝐷 𝑡  󰇟   󰇠                               Exposure due to  Exposure due to  Aggregat e exposure from all follo wees Hawk es process High intensity with more exposur e  󰇟   󰇠       Poisson pr ocess User  ’ s own initiative        a) Link creation process b) So cial net work c) Information diﬀusion process Figure 6: The breakdown of conditional in tensity functions for 1) information diﬀusion pro cess of Jacob ret weeting posts originated from David N J D ( t ); 2) information diﬀusion pro cess of David t w eeting on his o wn initiative N DD ( t ); 3) link creation pro cess of Jacob follo wing David A J D ( t ) 3.3 Information Diﬀusion Pro cess W e mo del the in tensity , Γ ∗ ( t ), for retw eeting even ts using multiv ariate Ha wkes pro cess [13]: γ ∗ us ( t ) = I [ u = s ] η u + I [ u 6 = s ] β s X v ∈F u ( t ) κ ω 1 ( t ) ? ( A uv ( t ) dN v s ( t )) , (19) where I [ · ] is the indicator function and F u ( t ) := { v ∈ [ m ] : A uv ( t ) = 1 } is the current set of followees of u . The term η u > 0 is the intensit y of original tw eets by a user u on his own initiative, b ecoming the source of a cascade, and the term β s P v ∈F u ( t ) κ ω ( t ) ? ( A uv ( t ) dN v s ( t )) models the propagation of p eer inﬂuence ov er the netw ork, where the triggering k ernel κ ω 1 ( t ) mo dels the decay of p eer inﬂuence o ver time. Note that the retw eeting intensit y matrix Γ ∗ ( t ) is by itself a sto c hastic pro cess that dep ends on the time- v arying netw ork top ology , the non-zero entries in A ( t ), whose growth is controlled by the net work evolution pro cess in Section 3.4. Hence the mo del design captures the inﬂuence of the netw ork top ology and each source’s inﬂuence, β s , on the information diﬀusion process. More speciﬁcally , to compute γ ∗ us ( t ), one ﬁrst ﬁnds the curren t set F u ( t ) of follo wees of u , and then aggregates the retw eets of these follo wees that are due to source s . Note that these follow ees ma y or ma y not dir e ctly follow source s . Then, the more frequently no de u is exp osed to retw eets of tw eets originated from source s via her follow ees, the more lik ely she will also retw eet a tw eet originated from source s . Once no de u ret weets due to s ource s , the corresp onding N us ( t ) will b e incremented, and this in turn will increase the likelihoo d of triggering retw eets due to source s among the follo wers of u . Th us, the source do es not simply broadcast the message to no des directly follo wing her but her inﬂuence propagates through the netw ork ev en to those nodes that do not directly follow her. Finally , this information diﬀusion mo del allows a no de to rep eatedly generate even ts in a cascade, and is v ery diﬀeren t from the independent cascade or linear threshold mo dels [27] which allow at most one ev en t p er no de p er cascade. 3.4 Net w ork Ev olution Pro cess In our mo del, each user is exp osed to information through a time-v arying set of neighbors. By doing so, information diﬀusion aﬀects net work evolution, increasing the practical application of our mo del to real- 7 w orld netw ork datasets. The particular deﬁnition of exp osure ( e.g. , a ret weet’s neigh b or) dep ends on the t yp e of historical information that is a v ailable. Remark ably , the ﬂexibility of our mo del allo ws for diﬀerent t yp es of diﬀusion ev ents, which we can broadly classify in to t wo categories. In the ﬁrst category , ev ents corresp onds to the times when an information cascade hits a p erson, for example, through a retw eet from one of her neighbors, but she do es not explicitly like or forward the asso ciated p ost. Here, we mo del the intensit y , Λ ∗ ( t ), for link creation using a combination of surviv al and Ha wkes pro cess: λ ∗ us ( t ) = (1 − A us ( t ))   µ u + α u X v ∈F u ( t ) κ ω 2 ( t ) ? dN v s ( t )   , (20) where the term 1 − A us ( t ) eﬀectively ensures a link is created only once, and after that, the corresp onding in tensity is set to zero. The term µ u > 0 denotes a baseline intensit y , which models when a no de u decides to follo w a source s sp ontaneously at her own initiativ e. The term α u κ ω 2 ( t ) ? dN v s ( t ) corresp onds to the ret weets b y node v (a follow ee of no de u ) whic h are originated from source s . The triggering k ernel κ ω 2 ( t ) mo dels the deca y of interests ov er time. In the second category , the p erson decides to explicitly lik e or forw ard the asso ciated post and inﬂuencing ev ents correspond to the times when she does so. In this case, w e mo del the in tensit y , Λ ∗ ( t ), for link creation as: λ ∗ us ( t ) = (1 − A us ( t ))( µ u + α u κ ω 2 ( t ) ? dN us ( t )) , (21) where the terms 1 − A us ( t ), µ u > 0, and the decaying kernel κ ω 2 ( t ) play the same role as the corresp onding ones in Equation (20). The term α u κ ω 2 ( t ) ? dN us ( t ) corresp onds to the retw eets of no de u due to tw eets originally published by source s . The higher the corresponding retw eet intensit y , the more likely u will ﬁnd information by source s useful and will create a dir e ct link to s . In b oth cases, the link creation intensit y Λ ∗ ( t ) is also a sto c hastic pro cess b y itself, which depends on the ret weet even ts, be it the ret weets by the neighbors of no de u or the retw eets by no de u herself, resp ectively . Therefore, it captures the inﬂuence of ret weets on the link creation, and closes the lo op of mutual inﬂuence b et w een information diﬀusion and net work top ology . Figure 6 illustrates these tw o in terdep enden t in tensities. In tuitively , in the latter category , information diﬀusion ev ents are more prone to trigger new connections, b ecause, they in volv e the target and source no des in an explicit in teraction, ho wev er, they are also less frequen t. Therefore, it is mostly suitable to large ev ent datasets, as the ones we generate in our synthetic exp erimen ts. In contrast, in the former category , information diﬀusion even ts are less likely to inspire new links but found in abundance. Therefore, it is more suitable for smaller datasets, as the ones we use in our real-w orld exp erimen ts. Consequen tly , in our synthetic exp erimen ts w e used the latter and in our real-world exp erimen ts, we used the former. More generally , the choice of exp osure even t should b e made based on the t yp e and amount of a v ailable historical information. Finally , note that creating a link is more than just adding a path or allo wing information sources to tak e shortcuts during diﬀusion. The netw ork evolution makes fundamental c hanges to the diﬀusion dynamics and stationary distribution of the diﬀusion pro cess in Section 3.3. As sho wn in [18], given a ﬁxed netw ork structure A , the exp ected ret weet intensit y µ s ( t ) at time t due to source s will dep end of the netw ork structure in a nonlinear fashion, i.e. , µ s ( t ) := E [ Γ ∗ · s ( t )] = ( e ( A − ω 1 I ) t + ω 1 ( A − ω 1 I ) − 1 ( e ( A − ω 1 I ) t − I )) η s , (22) where η s ∈ R m has a single nonzero entry with v alue η s and e ( A − ω 1 I ) t is the matrix exp onen tial. When t → ∞ , the stationary in tensit y ¯ µ s = ( I − A /ω ) − 1 η s is also nonlinearly related to the net work structure. Th us, given t wo netw ork structures A ( t ) and A ( t 0 ) at tw o p oin ts in time, which are diﬀerent b y a few edges, the eﬀect of these edges on the information diﬀusion is not just an additiv e relation. Dep ending on ho w these newly created edges mo dify the eigen-structure of the sparse matrix A ( t ), their eﬀect on the information diﬀusion dynamics can be very signiﬁcant. 8 t λ U * ( t ') λ 2 * ( t ') λ 1 * ( t ') + + + = t ' λ 2 * ( τ ) λ 1 * ( τ ) λ U * ( τ ) λ sum * ( τ ) τ τ τ τ t τ min τ 2 τ 1 τ U τ τ τ τ λ 2 * ( τ ) λ 1 * ( τ ) λ U * ( τ ) (a) Ogata’s algorithm (b) Prop osed algorithm Figure 7: Ogata’s algorithm vs our sim ulation algorithm in simulating U interdependent p oint pro cesses c haracterized b y intensit y functions λ 1 ( t ) , . . . , λ U ( t ). Panel (a) illustrates Ogata’s algorithm, which ﬁrst tak es a sample from the pro cess with intensit y equal to sum of individual intensities and then assigns it to the prop er dimension prop ortionally to its contribution to the sum of intensities. Panel (b) illustrates our proposed algorithm, which ﬁrst draws a sample from each dimension indep enden tly and then takes the minim um time among them. 4 Eﬃcien t Sim ulation of Co ev olutionary Dynamics W e could simulate samples (link creations, t weets and ret weets) from our mo del by adapting Ogata’s thinning algorithm [28], originally designed for multidimensional Hawk es processes. Ho wev er, a naive implemen tation of Ogata’s algorithm would scale p oorly , i.e. , for each sample, w e would need to re-ev aluate Γ ∗ ( t ) and Λ ∗ ( t ). Th us, to draw n sample even ts, w e would need to p erform O ( m 2 n 2 ) op erations, where m is the num b er of no des. Figure 7(a) sc hematically demonstrates the main steps of Ogata’s algorithm. Please refer to App endix A for further details. Here, w e design a sampling pro cedure that is esp ecially w ell-ﬁtted for the structure of our mo del. The algorithm is based on the follo wing key idea: if w e consider each intensit y function in Γ ∗ ( t ) and Λ ∗ ( t ) as a separate p oin t pro cess and dra w a sample from eac h, the minimum among all these samples is a v alid sample for the multidimensional p oin t process. As the results of this section are general and can b e applied to simulate any multi-dimensional p oin t pro cess mo del we abuse the notation a little bit and represen t U (p ossibly inter-dependent) p oin t processes b y U intensit y functions λ ∗ 1 , . . . , λ ∗ U . In the speciﬁc case of sim ulating coevolutionary dynamics we hav e U = m 2 + m ( m − 1) were the ﬁrst and second terms are the num b er information diﬀusion and link creation pro cesses, respectively . Figure 7 illustrates the w ay in whic h b oth algorithms diﬀer. The new algorithm has the following steps: 1. Initialization: Sim ulate eac h dimension separately and ﬁnd their next sampled even t time. 2. Minimization: T ak e the minim um among all the sampled times and declare it as the next ev ent of the m ultidimensional pro cess. 9 Algorithm 1 Sim ulation Algorithm for Coevol ve Initialization: Initialize the priority queue Q for ∀ u, s ∈ [ m ] do Sample next link ev ent e l us from A us (Algorithm 3) Q.inser t ( e l us ) Sample next retw eet even t e r us from N us (Algorithm 3) Q.inser t ( e r us ) end for General Subroutine: t ← 0 while t < T do e ← Q.extr act min () if e = ( u, s, t 0 ) is a ret weet even t then Up date the history H r us ( t 0 ) = H r us ( t ) ∪ { e } for ∀ v s.t. u v do Up date even t intensit y: γ v s ( t 0 ) = γ v s ( t 0− ) + β Sample retw eet ev ent e r v s from γ v s (Algorithm 3) Q.update k ey ( e r v s ) if NOT s v then Up date link in tensity: λ ∗ v s ( t 0 ) = λ ∗ v s ( t 0− ) + α Sample link even t e l v s from λ v s (Algorithm 3) Q.update k ey ( e l v s ) end if end for else Up date the history H l us ( t 0 ) = H l us ( t ) ∪ { e } λ ∗ us ( t ) ← 0 ∀ t > t 0 end if t ← t 0 end while 3. Up date: Recalculate the intensities of the dimensions that are aﬀected by this approv ed sample and re-sample only their next even t. Then go to step 2. T o prov e that the new algorithm generates samples from the same distribution as Ogata’s algorithm doe s w e need the following Lemma. It justiﬁes step 2 of the ab ov e outline. Lemma 1 Assume we have U indep endent non-homo gene ous Poisson pr o c esses with intensity λ ∗ 1 ( τ ) , . . . , λ ∗ U ( τ ) . T ake r andom variable τ u e qual t o the time of pr o c ess u ’s ﬁrst event after time t . Deﬁne τ min = min 1 ≤ u ≤ U { τ u } and u min = argmin 1 ≤ u ≤ U { τ u } . Then, (a) τ min is the ﬁrst event after time t of the Poisson pr o c ess with intensity λ ∗ sum ( τ ) . In other wor ds, τ min has the same distribution as the next event ( t 0 ) in Ogata’s algorithm. (b) u min fol lows the c onditional distribution P ( u min = u | τ min = x ) = λ ∗ U ( x ) λ ∗ sum ( x ) . I.e. the dimension ﬁring the event c omes fr om the same distribution as the one in Ogata’s algorithm. Pro of (a) The w aiting time of the ﬁrst even t of a dimension u is exp onen tially distributed 1 random v ariable 1 If random v ariable X is exponentially distributed with parameter r , then f X ( x ) = r exp( − r x ) is its probability distribution function and F X ( x ) = 1 − exp( − rx ) is the cumulativ e distribution function. 10 Algorithm 2 Eﬃcien t In tensity Computation Global V ariab els: Last time of in tensity computation: t Last v alue of intensit y computation: I Initialization: t ← 0 I ← µ function g et intensity ( t 0 ) I 0 ← ( I − µ ) exp( − ω ( t 0 − t )) + µ t ← t 0 I ← I 0 return I end function Algorithm 3 1-D next ev ent sampling Input: Curren t time: t Output: Next even t time: s s ← t ˆ λ ← λ ∗ ( s ) (Algorithm 2) while s < T do g ∼ E xponential ( ˆ λ ) s ← s + g ¯ λ ← λ ∗ ( s ) (Algorithm 2) Rejection test: d ∼ U nif or m (0 , 1) if d × ˆ λ < ¯ λ then return s else ˆ λ = ¯ λ end if end while return s [29]; i.e. , τ u − t ∼ E xponential  R t + τ u t λ ∗ u ( τ ) dτ  . W e hav e: P ( τ min ≤ x | x > t ) = 1 − P ( τ min > x | x > t ) = 1 − P (min ( τ 1 , . . . , τ U ) > x | x > t ) = 1 − P ( τ 1 > x, . . . , τ U > x | x > t ) = 1 − U Y u =1 P ( τ u > x | x > t ) = 1 − U Y u =1 exp  − Z t + x t λ ∗ u ( τ ) dτ  = 1 − exp  − Z t + x t λ ∗ sum ( τ ) dτ  . (23) Therefore, τ min − t is exponentially distributed with parameter R τ min t λ ∗ sum ( τ ) dτ which can be seen as the ﬁrst even t of a non-homogenous poisson pro cess with intensit y λ ∗ sum ( τ ) after time t . (b) T o ﬁnd the distribution of u min w e hav e P ( u min = u | τ min = x ) = λ ∗ u ( x ) exp  − Z t + x t λ ∗ u ( τ ) dτ  Y v 6 = u exp  − Z t + x t λ ∗ v ( τ ) dτ  = λ ∗ u ( x ) Y v exp  − Z t + x t λ ∗ v ( τ ) dτ  . (24) 11 After normalization we get P ( u min = u | τ min = x ) = λ ∗ U ( x ) λ ∗ sum ( x ) . Giv en the ab o ve Lemma, w e can now prov e that the distribution of the samples generated by the proposed algorithm is identical to the one generated b y Ogata’s metho d. Theorem 2 The se quenc e of samples fr om Ogata’s algorithm and our pr op ose d algorithm fol low the same distribution. Pro of Using the chain rule the probability of observing H T = { ( t 1 , u 1 ) , . . . , ( t n , u n ) } is written as: P { ( t 1 , u 1 ) , . . . , ( t n , u n ) } = n Y i =1 P { ( t i , u i ) | ( t i − 1 , u i − 1 ) , . . . , ( t 1 , u 1 ) } = n Y i =1 P { ( t i , u i ) |H t i } (25) By ﬁxing the history up to some time, sa y t i , all dimensions of multiv ariate Ha wkes pro cess b ecome inde- p enden t of each other (un til next even t happens). Therefore, the ab o ve lemma can b e applied to sho w that the next sample time from Ogata’s algorithm and the prop osed one come from the same distribution, i.e. , for every i , P { ( t i , u i ) |H t i } is the same for b oth algorithms. Thus, the multiplication of individual terms is also equal for both. This will prov e the theorem. This new algorithm is sp ecially suitable for the structure of our in ter-coupled pro cesses. Since social and information net works are t ypically sparse, every time we sample a new node (or link) even t from the mo del, only a small num b er of intensit y functions in the local neigh b orhoo d of the no de (or the link), will c hange. This num b er is of O ( d ) where d is the maximum n umber of follow ers/follow ees per no de. As a consequence, we can reuse most of the individual samples for the next ov erall sample. Moreo ver, we can ﬁnd whic h in tensity function has the minimum sample time in O (log m ) operations using a heap priorit y queue. The heap data structure will help maintain the minimum and ﬁnd it in logarithmic time with resp ect to the n umber of elemen ts therein. Therefore, we hav e reduced an O ( nm ) factor in the original algorithm to O ( d log m ). Finally , we exploit the properties of the exponential function to update individual in tensities for eac h new sample in O (1). F or simplicity consider a Hawk es pro cess with intensit y λ ∗ ( t ) = µ + P t i ∈H t α ω exp( − ω ( t − t i )). Note that b oth link creation and information diﬀusion pro cesses hav e this structure. Now, let t i < t i +1 b e tw o arbitrary times, we hav e λ ∗ ( t i +1 ) = ( λ ∗ ( t i ) − µ ) exp( − ω ( t i +1 − t i )) + µ. (26) It can b e readily generalized to the m ultiv ariate case to o. Therefore, we can compute the curren t intensit y without explicitly iterating ov er all previous even ts. As a result w e can change an O ( n ) factor in the original algorithm to O (1). F urthermore, the exp onen tial kernel also facilitates ﬁnding the upp er b ound of the in tensity since it alw ays lies at the b eginning of one of the pro cesses taken in to consideration. Algorithm 2 summarizes the pro cedure to compute in tensities with exponential k ernels, and Algorithm 3 shows the pro cedure to sample the next even t in each dimension making use of the sp ecial prop erty of exp onen tial k ernel functions. The sim ulation algorithm is shown in Algorithm 1. By using this algorithm we reduce the complexity from O ( n 2 m 2 ) to O ( nd log m ), where d is the maximum n umber of follow ees p er no de. That means, our algorithm scales logarithmically with the num b er of nodes and linearly with the num b er of edges at an y p oin t in time during the simulation. Moreov er, ev ents for new links, tw eets and retw eets are generated in a temp orally intert wined and in terleaving fashion, since ev ery new retw eet ev ent will mo dify the in tensity for link creation and vice versa. 12 5 Eﬃcien t P arameter Estimation from Co ev olutionary Ev en ts In this section, w e ﬁrst show that learning the parameters of our prop osed model reduces to solving a con vex optimization problem and then develop an eﬃcien t, parameter-free Minorization-Maximization algorithm to solv e such problem. 5.1 Conca v e P arameter Learning Problem Giv en a collection of retw eet even ts E = { e r i } and link creation even ts A = { e l i } recorded within a time window [0 , T ), we can easily estimate the parameters needed in our mo del using maximum likelihoo d estimation. T o this aim, we compute the joint log-likelihoo d L of these even ts using Equation (7), i.e. , L ( { µ u } , { α u } , { η u } , { β s } ) = X e r i ∈E log  γ ∗ u i s i ( t i )  − X u,s ∈ [ m ] Z T 0 γ ∗ us ( τ ) dτ | {z } tw eet / retw eet + X e l i ∈A log  λ ∗ u i s i ( t i )  − X u,s ∈ [ m ] Z T 0 λ ∗ us ( τ ) dτ | {z } links . (27) F or the terms corresp onding to retw eets, the log term sums only ov er the actual observed ev en ts while the integral term actually sums o ver all p ossible combination of destination and source pairs, ev en if there is no ev ent b et ween a particular pair of destination and source. F or such pairs with no observed even ts, the corresp onding coun ting processes hav e essentially survived the observ ation windo w [0 , T ), and the term − R T 0 γ ∗ us ( τ ) dτ simply corresp onds to the log surviv al probability . The terms corresponding to links hav e a similar structure. Once we hav e an expression for the joint log-lik eliho od of the retw eet and link creation even ts, the parameter learning problem can b e then formulated as follo ws: minimize { µ u } , { α u } , { η u } , { β s } − L ( { µ u } , { α u } , { η u } , { β s } ) sub ject to µ u ≥ 0 , α u ≥ 0 η u ≥ 0 , β s ≥ 0 ∀ u, s ∈ [ m ] . (28) Theorem 3 The optimization pr oblem deﬁne d by Equation (28) is jointly c onvex. Pro of W e expand the likelihoo d by replacing the intensit y functions into Equation (27): L = X e r i ∈E log  I [ u i = s i ] η u i + I [ u i 6 = s i ] β s i X v ∈F u i ( t i )  κ ω 1 ( t ) ? ( A u i v ( t ) dN v s i ( t ))     t = t i  − X u,s ∈ [ m ] I [ u = s ] η u Z T 0 dt + I [ u 6 = s ] β s X v ∈F u ( t ) Z T 0 κ ω 1 ( t ) ? ( A uv ( t ) dN v s ( t )) dt + X e l i ∈A log   µ u i + α u i X v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i   − X u,s ∈ [ m ] µ u Z T 0 (1 − A us ( t )) dt + α u Z T 0 (1 − A us ( t ))  X v ∈F u ( t ) κ ω 2 ( t ) ? dN v s ( t )  dt (29) If we stack all parameters in a vector x = ( { µ u } , { α u } , { η u } , { β s } ), one can easily notice that the log- lik eliho o d L can b e written as P j log( a > j x ) − P k b > k x , which is clearly a conca ve function with resp ect to x [30], and th us − L is conv ex. Moreov er, the constrain ts are linear inequalities and th us the domain is a con vex set. This completes the proof for con vexit y of the optimization problem. 13 Algorithm 4 MM-t yp e parameter learning for Coevol ve Input: Set of ret weet even ts E = { e r i } and link creation even ts A = { e l i } observed in time windo w [0 , T ) Output: Learned parameters { µ u } , { α u } , { η u } , { β s } Initialization: for u ← 1 to m do Initialize µ u and α u randomly end for for u ← 1 to m do η u = P e r i ∈E I [ u = u i = s i ] T end for for s ← 1 to m do β s = P e r i ∈E I [ s = s i 6 = u i ] P u ∈ [ m ] I [ u 6 = s ] P v ∈F u ( t ) R T 0 κ ω 1 ( t ) ? ( A uv ( t ) dN vs ( t )) dt end for while not conv erged do for i ← 1 to n l do ν i 1 = µ u i µ u i + α u i P v ∈F u i ( t i )  κ ω 2 ( t ) ?dN vs ( t )     t = t i ν i 2 = α u i P v ∈F u i ( t i )  κ ω 2 ( t ) ?dN vs ( t )     t = t i µ u i + α u i P v ∈F u i ( t i )  κ ω 2 ( t ) ?dN vs ( t )     t = t i end for for u ← 1 to m do µ u = P e l i ∈A I [ u = u i ] ν i 1 P s ∈ [ m ] R T 0 (1 − A us ( t )) dt α u = P e l i ∈A I [ u = u i ] ν i 2 P s ∈ [ m ] R T 0 (1 − A us ( t ))( κ ω 2 ( t ) ?dN us ( t )) dt end for end while It’s notable that the optimization problem decomposes in m independent problems, one p er node u , and can b e readily parallelized. 5.2 Eﬃcien t Minorization-Maximization Algorithm Since the optimization problem is jointly con vex with resp ect to all the parameters, one can simply take an y conv ex optimization metho d to learn the parameters. Ho wev er, these metho ds usually require hyper parameters like step size or initialization, which ma y signiﬁcantly inﬂuence the conv ergence. Instead, the structure of our problem allows us to develop an eﬃcien t algorithm inspired by previous work [16, 17], which lev erages Minorization Maximization (MM) [31] and is parameter free and insensitiv e to initialization. Our algorithm utilizes Jensen’s inequality to provide a low er b ound for the second log-sum term in the log-lik eliho o d giv en by Equation (27). More sp eciﬁcally , consider a set of arbitrary auxiliary v ariable ν ij , where 1 ≤ i ≤ n l , j = 1 , 2 and n l is the num b er of link even ts, i.e. , n l = |A| . F urther, assume these v ariables satisfy ∀ 1 ≤ i ≤ n l : ν i 1 , ν i 2 ≥ 0 , ν i 1 + ν i 2 = 1 (30) 14 Then, we can low er b ound the logarithm in Equation (29) using Jensen’s inequalit y as follows: log   µ u i + α u i X v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i   = log   ν i 1 µ u i ν i 1 + ν i 2 α u i ν i 2 X v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i   ≥ ν i 1 log  µ u i ν i 1  + ν i 2 log   α u i ν i 2 X v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i   ≥ ν i 1 log( µ u i ) + ν i 2 log( α u i ) + ν i 2 log   X v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i   − ν i 1 log( ν i 1 ) − ν i 2 log( ν i 2 ) . (31) No w, we can low er b ound the log-lik eliho od given b y Equation (29) as: L ≥ L 0 = X e r i ∈E I [ u i = s i ] log ( η u i ) + X e r i ∈E I [ u i 6 = s i ] log ( β s i ) + X e r i ∈E I [ u i 6 = s i ] log  X v ∈F u i ( t i )  κ ω 1 ( t ) ? ( A u i v ( t ) dN v s i ( t ))     t = t i  − X u,s ∈ [ m ] η u T + β s X v ∈F u ( t ) Z T 0 κ ω 1 ( t ) ? ( A uv ( t ) dN v s ( t )) dt + X e l i ∈A ν i 1 log( µ u i ) + ν i 2 log( α u i ) + ν i 2 log  X v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i  − X e l i ∈A ν i 1 log( ν i 1 ) + ν i 2 log( ν i 2 ) − X u,s ∈ [ m ] µ u Z T 0 (1 − A us ( t )) dt + α u Z T 0 (1 − A us ( t ))( κ ω 2 ( t ) ? dN us ( t )) dt (32) By taking the gradient of the low er-b ound with respect to the parameters, w e can ﬁnd the closed form up dates to optimize the low er-b ound: η u = P e r i ∈E I [ u = u i = s i ] T (33) β s = P e r i ∈E I [ s = s i 6 = u i ] P u ∈ [ m ] I [ u 6 = s ] P v ∈F u ( t ) R T 0 κ ω 1 ( t ) ? ( A uv ( t ) dN v s ( t )) dt (34) µ u = P e l i ∈A I [ u = u i ] ν i 1 P s ∈ [ m ] R T 0 (1 − A us ( t )) dt (35) α u = P e l i ∈A I [ u = u i ] ν i 2 P s ∈ [ m ] R T 0 (1 − A us ( t ))( κ ω 2 ( t ) ? dN us ( t )) dt . (36) Finally , although the low er b ound is v alid for every choice of ν ij satisfying Equation (30), by maximizing 15 the low er b ound with respect to the auxiliary v ariables we can make sure that the low er bound is tigh t: maximize { ν ij } L 0 ( { µ u } , { α u } , { η u } , { β s } , { ν ij } ) sub ject to ν i 1 + ν i 2 = 1 ∀ i : 1 ≤ i ≤ n l ν i 0 , ν i 1 ≥ 0 ∀ i : 1 ≤ i ≤ n l . (37) F ortunately , the ab ov e constrained optimization problem can b e solved easily via Lagrange m ultipliers, which leads to closed form up dates: ν i 1 = µ u i µ u i + α u i P v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i (38) ν i 2 = α u i P v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i µ u i + α u i P v ∈F u i ( t i )  κ ω 2 ( t ) ? dN v s ( t )    t = t i . (39) Algorithm 4 summarizes the learning pro cedure. It is guaran teed to con v erge to a global optimum [31, 16] 6 Prop erties of Sim ulated Co-ev olution, Net w orks and Cascades In this section, we p erform an empirical inv estigation of the properties of the netw orks and information cascades generated by our mo del. In particular, w e sho w that our mo del can generate co-evolutionary ret weet and link dynamics and a wide sp ectrum of static and temp oral net work patterns and information cascades. 6.1 Sim ulation Settings Throughout this section, if not said otherwise, we simulate the evolution of a 8,000-no de net work as w ell as the propagation of information ov er the netw ork by sampling from our mo del using Algorithm 1. W e set the exogenous intensities of the link and diﬀusion even ts to µ u = µ = 4 × 10 − 6 and η u = η = 1 . 5 respectively , and the triggering k ernel parameter to ω 1 = ω 2 = 1. The parameter µ determines the indep enden t growth of the net work – roughly sp eaking, the expected n umber of links eac h user establishes spontaneously b efore time T is µT . Whenev er w e inv estigate a static prop ert y , w e c ho ose the same sparsit y lev el of 0 . 001. 6.2 Ret w eet and Link Co evolution Figures 8(a,b) visualize the retw e et and link even ts, aggregated across diﬀerent sources, and the corresp onding in tensities for one no de and one realization, pick ed at random. Here, it is already apparent that ret weets and link creations are clustered in time and often follo w each other. F urther, Figure 8(c) sho ws the cross- co v ariance of the retw eet and link creation intensit y , computed across multiple realizations, for the same node, i.e. , if f ( t ) and g ( t ) are t wo in tensities, the cross-cov ariance is a function h ( τ ) = R f ( t + τ ) g ( t ) dt . It can be seen that the cross-cov ariance has its p eak around 0, i.e. , ret weets and link creations are highly correlated and co-evolv e ov er time. F or ease of exp osition, w e illustrated co-evolution using one no de, ho w ever, we found consistent results across nodes. 6.3 Degree Distribution Empirical studies ha ve shown that the degree distribution of online so cial net works and microblogging sites follo w a p o wer law [10, 1], and argued that it is a consequence of the rich get richer phenomena. The degree distribution of a net work is a p o wer law if the exp ected num b er of nodes m d with degree d is given by m d ∝ d − γ , where γ > 0. In tuitively , the higher the v alues of the parameters α and β , the closer the resulting degree distribution follows a p ow er-law. This is b ecause the netw ork gro ws more lo cally . Interestingly , the lo wer their v alues, the closer the distribution to an Erdos-Renyi random graph [32], because, the edges are added almost uniformly and indep endently without inﬂuence from the lo cal structure. Figure 9 conﬁrms this intuition by showing the degree distribution for diﬀeren t v alues of β and α . 16 0 20 40 60 Event occurrence time Spike trains Retweet Link 0 20 40 60 0 0.6 Event occurrence time Intensity Retweet Link −50 0 50 0 2 4 Cross covariance Lag (a) (b) (c) Figure 8: Co ev olutionary dynamics for syn thetic data. a) Spik e trains of link and ret w eet ev ents. b) Link and retw eet in tensities. c) Cross cov ariance of link and retw eet intensities. 10 0 10 1 10 0 10 2 10 4 Data Power−law fit Poisson fit 10 0 10 1 10 0 10 2 10 4 Data Power−law fit Poisson fit 10 0 10 1 10 0 10 2 10 4 Data Power−law fit Poisson fit 10 0 10 1 10 2 10 0 10 2 10 4 Data Power−law fit Poisson fit (a) β = 0 (b) β = 0 . 001 (c) β = 0 . 1 (d) β = 0 . 8 10 0 10 1 10 0 10 2 10 4 Data Power−law fit Poisson fit 10 0 10 1 10 0 10 2 10 4 Data Power−law fit Poisson fit 10 0 10 1 10 2 10 0 10 2 10 4 Data Power−law fit Poisson fit 10 0 10 1 10 2 10 0 10 2 10 4 Data Power−law fit Poisson fit (a) α = 0 (b) α = 0 . 05 (c) α = 0 . 1 (d) α = 0 . 2 Figure 9: Degree distributions when net work sparsit y level reac hes 0.001 for diﬀeren t β ( α ) v alues and ﬁxed α = 0 . 1 ( β = 0 . 1). 6.4 Small (shrinking) Diameter There is empirical evidence that the diameter of online so cial net works and microblogging sites exhibit relativ ely small diameter and shrinks (or ﬂattens) as the net w ork gro ws [33, 10, 24]. Figures 10(a-b) show the diameter on the largest connected comp onen t (LCC) against the sparsity of the netw ork ov er time for diﬀeren t v alues of α and β . Although at the b eginning, there is a short increase in the diameter due to the merge of small connected comp onen ts, the diameter decreases as the netw ork evolv es. Moreov er, larger v alues of α or β lead to higher levels of lo cal gro wth in the netw ork and, as a consequence, slo wer shrink age. Here, no des arrive to the netw ork when they follow (or are follow ed b y) a no de in the largest connected comp onen t. 6.5 Clustering Co eﬃcien t T riadic closure [34, 11, 35] has b een often presented as a plausible link creation mechanism. How ev er, diﬀeren t so cial net works and microblogging sites presen t diﬀeren t lev els of triadic closure [36]. Importantly , our metho d is able to generate netw orks with diﬀerent levels of triadic closure, as shown by Figure 10(c-d), where we plot the clustering co eﬃcient [37], whic h is prop ortional to the frequency of triadic closure, for diﬀeren t v alues of α and β . 17 5 10 x 10 −4 0 40 80 diameter sparsity β =0 β =0.05 β =0.1 β =0.2 5 10 x 10 −4 0 40 80 diameter sparsity α =0 α =0.001 α =0.1 α =0.8 0 0.1 0.2 0 0.15 0.3 β clustering coefficient 0 0.75 1.5 0 0.15 0.3 α clustering coefficient (a) Diameter, α = 0 . 1 (b) Diameter, β = 0 . 1 (c) CC, α = 0 . 1 (d) CC, β = 0 . 1 Figure 10: Diameter and clustering coeﬃcient for net work sparsity 0.001. P anels (a) and (b) show the diameter against sparsity o ver time for ﬁxed α = 0 . 1, and for ﬁxed β = 0 . 1 resp ectiv ely . Panels (c) and (d) sho w the clustering co eﬃcien t (CC) against β and α , resp ectiv ely . 6.6 Net w ork Visualization Figure 11 visualizes sev eral snapshots of the largest connected comp onen t (LCC) of tw o 300-no de netw orks for t wo particular realizations of our mo del, under t wo diﬀeren t v alues of β . In both cases, we used µ = 2 × 10 − 4 , α = 1, and η = 1 . 5. The top tw o rows correspond to β = 0 and represen t one end of the sp ectrum, i.e. , Erdos-Ren yi random netw ork. Here, the net work evolv es uniformly . The b ottom t wo rows correspond to β = 0 . 8 and represent the other end, i.e. , scale-free netw orks. Here, the netw ork ev olves lo cally , and clusters emerge naturally as a consequence of the lo cal growth. They are depicted using a combination of forced directed and F ruc hterman Reingold lay out with Gephi 2 . Moreo ver, the ﬁgure also shows the ret weet even ts (from others as source) for tw o no des, A and B , on the b ottom ro w. These tw o no des arriv e almost at the same time and establish links to tw o other no des. How ever, no de A ’s follo wees are more cen tral, therefore, A is being exp osed to more retw eets. Thus, no de A p erforms more retw eets than B does. It again sho ws how information diﬀusion is aﬀected by netw ork structure. Overall, this ﬁgure clearly illustrates that b y careful c hoice of parameters we can generate netw orks with a very diﬀeren t structure. Figure 12 illustrates the spik e trains (t weet, retw eet, and link even ts) for the ﬁrst 140 nodes of a net work sim ulated with a similar set of parameters as ab ov e and Figure 13 shows three snapshots of the netw ork at diﬀerent times. First, consider no de 6 in the net work. After she joins the netw ork, a few no des b egin to follow him. Then, when she starts to tw eet, her t weets are retw eeted man y times by others (red spikes) in the ﬁgure and these retw eets subsequently b o ost the num b er of nodes that link to her (Magenta spikes). This clearly illustrates the scenario in which information diﬀusion triggers c hanges on the netw ork structure. Second, consider no des 46 and 68 and compare their asso ciated ev ents o ver time. After some time, no de 46 b ecomes m uch more activ e than no de 68. T o understand wh y , note that so on after time 137, no de 46 follo wed no de 130, whic h is a v ery cen tral no de ( i.e. following a lot of p eople), while no de 68 did not. This clearly illustrates the scenario in which net work ev olution triggers c hanges on the dynamics of information diﬀusion. 6.7 Cascade P atterns Our mo del can produce the most commonly o ccurring cascades structures as w ell as heavy-tailed cascade size and depth distributions, as observed in historical Twitter data reported in [25]. Figure 14 summarizes the results, which provide empirical evidence that the higher the α ( β ) v alue, the shallow er and wider the cascades. 2 http://gephi.gith ub.io/ 18 t = 5 t=20 t=35 t=50 t=65 t=80 t = 5 t=20 t=35 A B A B t=50 t=65 t=80 55 70 85 A 55 70 85 B Figure 11: Evolution of tw o netw orks: one with β = 0 (1st and 2nd ro ws) and another one with β = 0 . 8 (3rd and 4th ro ws), and spike trains of nodes A and B (5th ro w). 19 tweet retweet link retweet from source 6 link to source 6 followee of node 46 followee of node 68 125 130 135 140 145 150 6 20 46 60 68 80 100 120 130 time node Figure 12: Co ev olutionary dynamics of even ts for the netw ork sho wn in Figure 13. Information Diﬀusion − → Net work Ev olution: When node 6 joins the netw ork a few nodes follo w her and ret weet her p osts. Her tw eets b eing propagated (shown in red) turning her to a v aluable source of information. Therefore, those retw eets are follow ed by links created to her (sho wn in magenta). Net work Ev olution − → Information Diﬀusion: No des 46 and 68 b oth ha ve almost the same n umber of follo wees. How ever, as so on as no de 46 connects to no de 130 (which is a central no de and retw eets very m uch) her activit y dramatically increases compared to no de 68. 20 68 130 6 46 68 130 6 46 68 130 t=125 t=137 t=150 Figure 13: Netw ork structure in which even ts from Figure 12 take place, at diﬀeren t times. 7 Exp erimen ts on Mo del Estimation and Prediction on Syn thetic Data In this section, we ﬁrst sho w that our mo del estimation method can accurately recov er the true mo del parameters from historical link and diﬀusion ev en ts data and then demonstrate that our mo del can accurately predict the net w ork ev olution and information diﬀusion ov er time, signiﬁcan tly outp erforming tw o state of the art metho ds [4, 3, 5] at predicting new links, and a baseline Hawk es process that do es not consider net work evolution at predicting new even ts. 7.1 Exp erimen tal Setup Throughout this section, we exp eriment with our mo del considering m =400 no des. W e set the model parameters for each node in the netw ork by dra wing samples from µ ∼ U (0 , 0 . 0004), α ∼ U (0 , 0 . 1), η ∼ U (0 , 1 . 5) and β ∼ U (0 , 0 . 1). W e then sample up to 60,000 link and information diﬀusion even ts from our model using Algorithm 1 and a verage ov er 8 diﬀeren t sim ulation runs. 7.2 Mo del Estimation W e ev aluate the accuracy of our model estimation procedure via t wo measures: (i) the relativ e mean absolute error ( i.e. , E [ | x − ˆ x | /x ], MAE) b et ween the estimated parameters ( x ) and the true parameters ( ˆ x ), (ii) the Kendall’s rank correlation coeﬃcient b et ween each estimated parameter and its true v alue, and (iii) test log-lik eliho o d. Figure 15 shows that as w e feed more ev ents into the estimation procedure, the estimation b ecomes more accurate. 7.3 Link Prediction W e use our mo del to predict the identit y of the source for eac h test link even t, given the historical even ts b efore the time of the prediction, and compare its p erformance with tw o state of the art metho ds, which we denote as TRF [3] and WENG [4]. TRF measures the probability of creating a link from a source at a given time b y simply computing the prop ortion of new links created from the source o ver all total created links up to the given time. WENG considers several link creation strategies and makes a prediction by combining these strategies. Here, we ev aluate the p erformance b y computing the probability of all p oten tial links using our mo del, TRF and WENG and then compute (i) the a verage rank of all true (test) ev ents (AvgRank) and, (ii) the success probability that the true (test) even ts rank among the top-1 p oten tial even ts at each test time (T op- 1). Figure 16 summarizes the results, where we trained our mo del with an increasing num b er of even ts. Our mo del outp erforms b oth TRF and WENG for a signiﬁcan t margin. 21 Oth ers cascade size 1 2 3 4 5 6 7 8 others percentage 0 0.1% 1% 10% 100% , =0 , =0.1 , =0.8 cascade depth 0 1 2 3 4 5 6 7 others percentage 0 0.1% 1% 10% 100% , =0 , =0.1 , =0.8 (a) (b) (c) Oth ers cascade size 1 2 3 4 5 6 7 8 others percentage 0 0.1% 1% 10% 100% - =0.05 - =0.1 - =0.2 cascade depth 0 1 2 3 4 5 6 7 others percentage 0 0.1% 1% 10% 100% - =0.05 - =0.1 - =0.2 (d) (e) (f ) Figure 14: Distribution of cascade structure, size and depth for diﬀeren t α ( β ) v alues and ﬁxed β = 0 . 2 ( α = 0 . 8). 1 3 5 x 10 4 0 0.5 1 # events RelErr η µ α β 1 3 5 x 10 4 0 0.5 1 # events RankCorr η µ α β 1 3 5 x 10 4 −2.8 −2.2 −1.6 x 10 5 # events PredLik (a) Relative MAE (b) Rank correlation (c) T est log-lik eliho o d Figure 15: Performance of model estimation for a 400-no de synthetic netw ork. 7.4 Activit y Prediction W e use our model to predict the iden tity of the no de that generates eac h test diﬀusion ev ent, given the historical even ts before the time of the prediction, and compare its p erformance with a baseline consisting of a Hawk es pro cess without netw ork ev olution. F or the Hawk es baseline, w e take a snapshot of the netw ork righ t b efore the prediction time, and use all historical ret weeting ev ents to ﬁt the mo del. Here, we ev aluate the p erformance via the same tw o measures as in the link prediction task and summarize the results in Figure 16 against an increasing num b er of training even ts. The results sho w that, by mo deling the net work ev olution, our model p erforms signiﬁcantly b etter than the baseline. 8 Exp erimen ts on Co ev olution and Prediction on Real Data In this section, we v alidate our mo del using a large Twitter dataset con taining nearly 550,000 tw eet, retw eet and link even ts from more than 280,000 users [3]. W e will show that our model can capture the co- ev olutionary dynamics and, by doing so, it predicts retw eet and link creation even ts more accurately than sev eral alternatives. 22 1 3 5 x 10 4 25 50 # events AvgRank COEVOLVE TRF WENG 1 3 5 x 10 4 0 0.2 0.4 # events Top−1 COEVOLVE TRF WENG 1 3 5 x 10 4 0 10 20 # events AvgRank COEVOLVE HAWKES 1 3 5 x 10 4 0 0.3 0.6 # events Top−1 COEVOLVE HAWKES (a) Links: AR (b) Links: T op-1 (c) Activity: AR Activit y: T op-1 Figure 16: Prediction p erformance for a 400-no de synthetic net work b y means of av erage rank (AR) and success probability that the true (test) ev ents rank among the top-1 ev ents (T op-1). 8.1 Dataset Description & Exp erimen tal Setup W e use a dataset that contains b oth link even ts as w ell as t weets/ret weets from millions of Twitter users [3]. In particular, the dataset con tains data from three sets of users in 20 da ys; nearly 8 million tw eet, retw eet, and link even ts by more than 6.5 million users. The ﬁrst set of users (8,779 users) are source no des s , for whom all their t weet times w ere collected. The second set of users (77,200 users) are the followers of the ﬁrst set of users, for whom all their retw eet times (and source identities) w ere collected. The third set of users (6,546,650 users) are the users that start following at least one user in the ﬁrst set during the recording p eriod, for whom all the link times w ere collected. In our exp erimen ts, w e fo cus on all even ts (and users) during a 10-day p erio d (Sep 21 2012 - 30 Sep 2012) and used the information before Sep 21 to construct the initial social net work (original links b et ween users). W e model the co-ev olution in the second 10-da y perio d using our framework. More speciﬁcally , in the co evolution mo deling, we hav e 5,567 users in the ﬁrst la yer who p ost 221,201 t weets. In the second la yer 101,465 ret weets are generated by the whole 77,200 users in that in terv al. And in the third la yer we hav e 198,518 users who create 219,134 links to 1978 users (out of 5567) in the ﬁrst lay er. W e split even ts in to a training set (cov ering 85% of the retw eet and link even ts) and a test set (cov ering the remaining 15%) according to time, i.e. , all ev ents in the training set o ccur earlier than those in the test set. W e then use our mo del estimation pro cedure to ﬁt the parameters from an increasing prop ortion of ev ents from the training data. 8.2 Ret w eet and Link Co evolution Figures 17 visualizes the retw eet and link even ts, aggregated across diﬀerent targets, and the corresp onding in tensities giv en b y our trained model for four source no des, pick ed at random. Here, it is already apparent that retw eets (of his p osts) and link creations (to him) are clustered in time and often follow eac h other, and our ﬁtted mo del intensities successfully track suc h b eha vior. F urther, Figure 18 compares the cross- co v ariance betw een the empirical retw eet and link creation intensities and b et ween the retw eet and link creation intensities giv en b y our trained mo del, computed across m ultiple realizations, for the same nodes. F or all no des, the similarity b etw een b oth cross-co v ariances is striking and b oth has their peak around 0, i.e. , retw eets and link creations are highly correlated and co-evolv e ov er time. F or ease of exp osition, as in Section 6, we illustrated co-ev olution using four no des, how ev er, we found consisten t results across no des. T o further verify that our mo del can capture the co ev olution, we compute the av erage v alue of the empirical cross co v ariance function, denoted by m cc , p er user. Intuitiv ely , one could expect that our mo del estimation metho d should assign higher α and/or β v alues to users with high m cc . Figure 19 conﬁrms this in tuition on 1,000 users, pick ed at random. Whenever a user has high α and/or β v alue, she exhibits a high cross cov ariance b et ween her created links and retw eets. 23 0 50 100 Event occurrence time Spike trains Retweet Link 0 50 100 0 0.4 0.8 Event occurrence time Intensity Retweet Link 0 20 40 60 80 Event occurrence time Spike trains Retweet Link 0 20 40 60 80 0 0.5 1 Event occurrence time Intensity Retweet Link (a) (b) (c) (d) 0 20 40 60 Event occurrence time Spike trains Retweet Link 0 20 40 60 0 0.5 1 Event occurrence time Intensity Retweet Link 0 20 40 Event occurrence time Spike trains Retweet Link 0 20 40 0 0.3 0.6 Event occurrence time Intensity Retweet Link (e) (f ) (g) (h) Figure 17: Link and retw eet b eha vior of 4 typical users in the real-world dataset. P anels (a,c,e,g) sho w the spik e trains of link and retw eet ev ents and P anels (b,d,f,h) sho w the estimated link and retw eet in tensities −100 0 100 0 2 4 6 Lag Cross covariance Estimated Empirical −100 0 100 0 2 4 Lag Cross covariance Estimated Empirical −100 0 100 0 2 4 6 Lag Cross covariance Estimated Empirical −100 0 100 0 2 4 6 Lag Cross covariance Estimated Empirical (a) (b) (c) (d) Figure 18: Empirical and simulated cross co v ariance of link and ret weet in tensities for 4 typical users. 8.3 Link prediction W e use our mo del to predict the identit y of the source for each test link ev ent, giv en the historical (link and ret weet) even ts before the time of the prediction, and compare its p erformance with the same t wo state of the art metho ds as in the synthetic exp erimen ts, TRF [3] and WENG [4]. W e ev aluate the p erformance by computing the probabilit y of all p oten tial links using diﬀeren t metho ds, and then compute (i) the av erage rank of all true (test) even ts (AvgRank) and, (ii) the success probabilit y (SP) that the true (test) even ts rank among the top-1 p oten tial even ts at each test time (T op-1). W e summarize the results in Figure 20(a-b), where we consider an increasing n umber of training retw eet/tw eet ev ents. Our mo del outperforms TRF and WENG consisten tly . F or example, for 8 · 10 4 training ev ents, our mo del achiev es a SP 2 . 5x times larger than TRF and WENG. 8.4 Activit y prediction W e use our model to predict the iden tity of the no de that generates eac h test diﬀusion ev ent, given the historical even ts before the time of the prediction, and compare its p erformance with a baseline consisting of a Hawk es pro cess without netw ork ev olution. F or the Hawk es baseline, w e take a snapshot of the netw ork 24 1 500 1000 0 0.75 1.5 Index of users Parameters m cc α β Figure 19: Empirical cross cov ariance and learned mo del parameters for 1,000 users, pic ked at random # events × 10 5 1 3 5 AvgRank 10 70 140 COEVOLVE TRF WENG # events × 10 5 1 3 5 Top1 0 0.1 0.2 COEVOLVE TRF WENG # events × 10 5 1 3 5 AvgRank 40 80 COEVOLVE HAWKES # events × 10 5 1 3 5 Top1 0 0.15 0.3 COEVOLVE HAWKES (a) Links: AR (b) Links: T op-1 (c) Activity: AR Activit y: T op-1 Figure 20: Prediction p erformance in the Twitter dataset by means of av erage rank (AR) and success probabilit y that the true (test) even ts rank among the top-1 even ts (T op-1). righ t b efore the prediction time, and use all historical ret weeting ev ents to ﬁt the mo del. Here, we ev aluate the p erformance the via the same tw o measures as in the link prediction task and summarize the results in Figure 20(c-d) against an increasing n um b er of training ev ents. The results sho w that, b y modeling the co-ev olutionary dynamics, our mo del p erforms signiﬁcan tly better than the baseline. 8.5 Mo del Checking Giv en all the subsequent even t times generated using a Hawk es process, i.e. , t i and t i +1 , according to the time changing theorem [38], the in tensity integrals R t i +1 t i λ ( t ) dt should conform to the unit-rate exp onen tial distribution. Figure 21 presents the quan tiles of the intensit y integrals computed using in tensities with the parameters estimated from the real Twitter data against the quantiles of the unit-rate exp onen tial distribution. It clearly sho ws that the points appro ximately lie on the same line, giving empirical evidence that a Hawk es pro cess is the right mo del to capture the real dynamics. 9 Related W ork In this section, w e survey related w orks in mo deling temporal net works follow ed b y a subsection on co- ev olution dynamics. Next, we review the literature on information diﬀusion models. Finally , we conclude this section by works that are closely related and are dev elop ed for almost the same goal. T emp oral Net w orks. Muc h eﬀort has b een devoted to mo deling the ev olution of so cial netw orks [39, 40, 41, 42, 43]. Of the proposed metho ds in characterizing link creation, triadic closure [34] is a simple but p o w erful principle to mo del the evolution based on shared friends. Mo deling timing and rich features of so cial in teractions has b een attracting increasing interest in the so cial netw ork mo deling communit y [44]. Ho wev er, most of these mo dels use timing information as discrete indices. The dynamics of the resulting time- discretized mo del can b e quite sensitiv e to the chosen discretization time steps; T o o coarse a discretization will miss imp ortan t dynamic features of the pro cess, and to o ﬁne a discretization will increase the computational and inference costs of the algorithms. In con trast, the ev en ts w e try to model tend to be async hronous with a n umber of diﬀerent time scales. [45] used rule-based metho ds to model the evolution of the graph 25 0 5 10 0 5 10 Q. observed time samples Q. exponential distribution 0 5 10 0 5 10 15 Q. observed time samples Q. exponential distribution (a) Link pro cess (b) Retw eet pro cess Figure 21: Quantile plots of the in tensity integrals from the real link and retw eet ev ent time o ver time. [46] analyzed communit y structure o ver time and [47] studied the in teraction of the friendship graph among group members and group growth. Recen tly , [48] used a Cox-in tensity P oisson mo del with exp onen tial random graphs to mo del friendship dynamics. [49] extended this model to the temp oral sequence of interactions that tak e place in the so cial net work, but with insuﬃcient mo del ﬂexibilit y , and limited scalabilit y . Mo deling temp oral dynamics of in teractions in this wa y provides new opp ortunities for identifying net work top ology at multiple scales [50] and for early detection of p opular resources [51, 52]. Ho wev er, these w orks largely fail to mo del the in terdep endency betw een even ts generated by diﬀeren t users, whic h is one of the focuses of our prop osed framework. Most of this line of work is summarized in a recent survey [53], with a short section dev oted to p oint pro cess based approac hes. Co-ev olution Dynamics. In mac hine learning and several other communities, b oth the dynamics on the netw ork and the dynamics of the netw ork hav e b een extensively studied, and com bining the tw o is a natural next step. F or example, [54] claimed that con tent generation in so cial netw orks is inﬂuenced not just by their p ersonal features like age and gender, but also by their so cial netw ork structure. F urthermore, researc h has b een done to address the co-evolution problems, for example, in the complex netw ork literature, under the name of adaptive system [55, 56, 57]. The main premise is that the ev olution of the top ology dep ends on the dynamics of the no des in the net work, and a feedback lo op can be created b et ween the t wo, whic h allo ws dynamical exchange of information. It has b een shown that adaptiv e net works are capable of self-organizing to wards dynamically critical states, lik e phase transitions by the interpla y b etw een the t wo pro cesses on diﬀerent time scales [58]. In a diﬀerent context, epidemiologists hav e found that no des ma y rewire their links to try to a void con tact with the infected ones [59, 60]. Co-evolutionary models hav e b een also developed for collective opinion formation, in vestigating whether the co ev olutionary dynamics will ev entually lead to consensus or fragmentation of the p opulation [61]. How ever, this line of researc h tends to b e less data-driven.Moreo ver, although the general nonlinear dynamic-system based metho ds usually address co- ev olutionary phenomena that are macroscopic in nature, they lac k the inference p o wer of statistical generative mo dels which are more adapted to teasing out microscopic details from the data. Finally , we would also lik e to mention a diﬀerent line of researc h exempliﬁed b y the actor-oriented models dev elop ed by [62], where a con tinuous-time Mark ov c hain on the space of directed netw orks is sp eciﬁed by lo cal no de-cen tric probabilistic link change rules, and MCMC and metho d of momen ts are used for parameter estimation. Ha wkes processes w e used are generally non-Mark ovian and making use of ev ent history far into the past. Information Diﬀusion. The presence of timing information in ev ent data and the ability to model suc h information bring up the in teresting question of ho w to use the learned model for time-sensitive inference or decision making. F urthermore, the dev elopment of online so cial netw orks has attracted a lot of empirical studies of the online inﬂuence patterns of online communities [63, 64, 65, 66], micro blogs [67, 68] and so on. How ever, these w orks usually consider only relativ ely simple mo dels for the inﬂuence, whic h may not b e very predictiv e. F or more mathematically oriented works, based on information cascades (a sp ecial case of asynchronous even t data) from social net works, discrete-time diﬀusion mo dels hav e b een ﬁtted to the cascades [69, 70] and used for decision making, suc h as identifying inﬂuencer [63], maximizing information spread [27, 71], and marketing planing [72, 73, 74, 75]. Several recen t exp erimen tal comparisons on b oth syn thetic and real world data sho wed that contin uous-time models yield signiﬁcan t impro vemen t in settings 26 suc h as recov ering hidden diﬀusion net work top ologies from cascade data [76, 7, 77], predicting the timings of future ev ents [78, 79], ﬁnding source of information cascades [9]. Besides this, Poin t pro cess modeling of activit y in net work is becoming increasingly p opular [80, 81, 82]. These time-sensitive modeling and dec ision making problems can usually b e framed into optimization problems and are usually diﬃcult to solv e. This brings up interesting optimization problems, suc h as eﬃcient submodular function optimization with prov able guaran tees [83, 27], sampling methods [84, 85] for inference and prediction, and con vex framework proposed in [18] to mak e decisions to shap e the activity to a v ariety of ob jectives. F urthermore, the high dimensional nature of modern ev ent data makes the ev aluation of ob jectiv e function of the optimization problem even more exp ensive. Therefore, more accurate mo deling and sophisticated algorithm needed to b e designed to tac kle the c hallenges posed by mo dern even t data applications. The work most closely related to ours is the empirical study of information diﬀusion and netw ork ev olu- tion [55, 86, 4, 3, 5]. Among them, [4] w as the ﬁrst to show exp erimen tal evidence that information diﬀusion inﬂuences net work evolution in microblogging sites b oth at system-wide and individual levels. In particular, they studied Y aho o! Meme , a so cial micro-blogging site similar to Twitter, whic h was activ e betw een 2009 and 2012, and show ed that the likelihoo d that a user u starts follo wing a user s increases with the num b er of messages from s seen by u . [3] inv estigated the temp oral and statistical characteristics of retw eet-driven connections within the Twitter net work and then iden tiﬁed the num b er of retw eets as a key factor to infer suc h connections. [5] sho wed that the Twitter netw ork can b e characterized by steady rates of c hange, inter- rupted b y sudden bursts of new connections, triggered by retw eet cascades. They also developed a metho d to predict which retw eets are more likely to trigger these bursts. Finally , [87] utilized multiv ariate Ha wkes pro cess to establish a connection b et ween temp oral prop erties of activities and the structure of the net work. In contrast to our work they studied the static prop erties, e.g. , communit y structure and inferred the latent clusters using the observ ed activities. Ho wev er, there are fundamen tal diﬀerences b et w een the ab o ve-men tioned studies and our w ork. First, they only c haracterize the eﬀect that information diﬀusion has on the net w ork dynamics, but not the bidirec- tional inﬂuence. In contrast, our probabilistic generativ e mo del tak es into accoun t the bidirectional inﬂuence b et w een information diﬀusion and net work dynamics. Second, previous studies are mostly empirical and only mak e binary predictions on link creation ev ents. F or example, the work of [4, 3] predict whether a new link will b e created based on the num b er of retw eets; and, [5] predict whether a burst of new links will o ccur based on the n umber of ret weets and users’ similarity . Ho wev er, our mo del is able to learn parameters from real world data, and predict the precise timing of b oth diﬀusion and new link ev ents. 10 Extensions The basic mo del presented in Section 3 is just a show-case of the potential of p oin t pro cesses in mo deling net works and pro cesses ov er them. In this section, w e extend our mo del in a v ariety of wa ys. More sp eciﬁcally , w e explain how the mo del can b e augmented to supp ort link remo v al, no de birth and death, and connection sp eciﬁc parameters. W e did not p erform exp erimen ts with these extensions b ecause our real-world dataset do es not con tain information regarding to link remov al and no de birth and death. Curating a comprehensive dataset that can be used in mo deling all these asp ects of net works is left as interesting future work. 10.1 Link deletion W e can generalize our mo del to supp ort link deletion by introducing an intensit y matrix Ξ ∗ ( t ) = ( ξ ∗ us ( t )) u,s ∈ [ m ] and mo del each individual in tensity as a surviv al pro cess. Assume A + ( t ) is the previously deﬁned counting matrix A ( t ), which indicates the existence of an edge at time t . Then, we introduce a new counting matrix A − ( t ) = ( A − us ( t )) u,s ∈ [ m ] , which indicates the lack of an edge at time t , and we deﬁne it via its intensit y function as E [ d A − ( t ) | H r ( t ) ∪ H l ( t )] = Ξ ∗ ( t ) dt, (40) 27 Then, we deﬁne the in tensity as ξ ∗ us ( t ) = A + us ( t )( ζ u + ν s X v ∈F u κ ω 3 ( t ) ? dA − v s ( t )) , (41) where the term A + us ( t ) guarantees that the link has p ositiv e in tensity to b e remov ed only if it already exists, just like the term 1 − A us ( t ) in Equation (21), the parameter ζ u is the base rate of link deletion and ν s P v ∈F u κ ω 3 ( t ) ? dA − v s ( t ) is the increased link deletion intensit y due to increased num b er of follo wees of u who decided to unfollo w s . This is an excitation term due to deleted links to source s ; given s is unfollo wed b y some follo wees of u , then u may ﬁnd s not a go od source of information too. Giv en a pair of no des ( u, s ), the pro cess starts with A + us ( t ) = 0. Whenever a link is created this pro cess ends and a remov al process A − us ( t ) starts. Similarly , when the remov al process ﬁres, the connection is remo ved and a new link creation pro cess is instan tiated. These tw o pro cesses interlea ve until the end. 10.2 No de birth and death W e can augment our mo del to consider the num b er of nodes m ( t ) to change ov er time: m ( t ) = m b ( t ) − m d ( t ) (42) where m b ( t ) and m d ( t ) are counting pro cesses mo deling the num b ers of no des that join and left the net work till time t , respectively . The wa y we construct m b ( t ) and m d ( t ) guarantees that m ( t ) is alwa ys non-negativ e. The birth pro cess, m b ( t ), is characterized by a conditional intensit y function φ ∗ ( t ): E [ dm b ( t ) | H r ( t ) ∪ H l ( t )] = φ ∗ ( t ) dt, (43) where φ ∗ ( t ) =  + θ X u,s ∈ [ m ( t )] κ ω 4 ( t ) ? dN us ( t ) , (44) Here,  is the constant rate of arriv al and θ P u,s ∈ [ m ( t )] κ ω 4 ( t ) ? dN us ( t ) is the increased rate of node arriv al due to the increased activit y of no des. Intuitiv ely , the higher the ov erall activit y in the existing netw ork, the larger the num b er of new users. The construction of the death pro cess, m d ( t ), is more inv olved. Ev ery time a new user joins the netw ork, w e start a surviv al process that con trols whether she leav es the net work. Thus, w e can stac k all these surviv al pro cesses in a vector, l ( t ) = ( l u ( t )) u ∈ [ m ] , characterized by a multidimensional conditional intensit y function σ ∗ ( t ) = ( σ u ( t )) u ∈ [ m b ( t )] : E [ d l ( t ) |H r ( t ) ∪ H l ( t )] = σ ∗ ( t ) dt, (45) In tuitively , w e exp ect the nodes with low er activit y to b e more likely to lea v e the netw ork and thus its conditional intensit y function to adopt the following form: σ ∗ u ( t ) = (1 − l u ( t ))   J X j =1 π j g j ( t ) +   h ( t ) − X s ∈ [ m ( t )] κ ω 5 ( t ) ? dN us ( t )   +   , (46) where the term (1 − l u ( t )) ensures that a no de is deleted only once, P J j =1 π j g j ( t ) is the history-indep endent t ypical rate of death, shared across no des, which we represent by a grid of known temp oral k ernels, { g j ( t ) } with unkno wn co eﬃcien ts, { π j } , and the second term is capturing the eﬀect of activity on the probability of leaving the netw ork. More sp eciﬁcally , if a no de is not activ e, w e assume its intensit y is upper bounded b y h ( t ) and the most activ e she becomes, the low er its probability of leaving the netw ork and the larger the term P s ∈ [ m ( t )] κ ω 5 ( t ) ? dN us ( t ). The hinge function ( · ) + guaran tees the in tensity is alw ays p ositiv e. Then, given the individual death pro cesses the total death pro cess is m d ( t ) = m b ( t ) X u =1 l u ( t ) , (47) whic h completes the mo deling of the time-v arying n umber of no des. 28 10.3 Incorp orating features One can simply enric h the model b y taking in to accoun t the longitudinal or static information of the net- w orked data, e.g. , by conditioning the in tensity on additional external features, such as no de attributes or edge types. Let us assume eac h user u comes with a K -dimensional feature vector x u including prop erties suc h as her age, job, lo cation, n umber of follow ers, n umber of tw eets, etc. Then, we can augmen t the information diﬀusion intensit y as follo ws. W e in tro duce a K -dimensional link in tensity parameter η u in whic h each dimension reﬂects the con tribution of the corresp onding elemen t in the feature v ector to the in tensit y and replace the baseline rate η u b y η > u x u . Similarly , w e introduce a K -dimensional vector β s where each dimension has a corresp onding elemen t in the feature vector x s and substitute β s b y β s x s . Therefore, one can rewrite the original information diﬀusion intensit y given b y Equation (19) as: γ ∗ us ( t ) = I [ u = s ] η > u x u + I [ u 6 = s ] β > s x s X v ∈F u ( t ) κ ω 1 ( t ) ? ( A uv ( t ) dN v s ( t )) , (48) Similarly , we can parameterize the co eﬃcien ts of the link creation in tensit y b y a K -dimensional v ector and write the coun ter-part of Equation (20) incorp orating features of the no de for computing the in tensity: λ ∗ us ( t ) = (1 − A us ( t ))( µ > u x u + α > u x u X v ∈F u ( t ) κ ω 2 ( t ) ? dN v s ( t )) (49) Surprisingly enough, all the results for conv exity for parameter learning, and eﬃcient sim ulation tech- niques are still v alid for this case to o. As far as the features contribute to the intensit y linearly , the log- lik eliho o d is conca ve and w e can simulate the mo del as eﬃcien tly as the original model. 10.4 Connection sp eciﬁc parameters Up to this point, the parameters of the link creation and remo v al, no de birth and death and the information diﬀusion in tensities depend on one end p oin t of the in teractions. F or example β s and η u in the information diﬀusion intensit y giv en by Equation (19) only dep end on the source and the actor, resp ectively . How ever, pro ceeding with this example, parameters can b e made connection sp eciﬁc, i.e. , Equation (19) can be restated as γ ∗ us ( t ) = I [ u = s ] η us + I [ u 6 = s ] β us X v ∈F u ( t ) κ ω 1 ( t ) ? ( A uv ( t ) dN v s ( t )) , (50) where η us is the base intensit y of u ret weeting a t weet originated b y s and β us is the co eﬃcien t of excitement of u to ret weet s when one of her follow ees ret weets something from s . Giv en enough computational resources and large amoun ts of historical data, one can tak e in to accoun t more complex scenarios and larger and more ﬂexible models. F or example, the middle user, say v , who is along the path of diﬀusion and forw ards the t w eet originated from s to u can also be taking in to consideration, i.e. , deﬁning β sv u as the amoun t of increase in intensit y of user u ret w eeting from s when user v has just ret weeted a p ost from s . All desirable prop erties of simulation algorithm and parameter estimation metho d still hold. 11 Conclusion and F uture W orks In this work, w e prop osed a joint contin uous-time mo del of information diﬀusion and netw ork evolution, which can capture the co ev olutionary dynamics, can mimic the most common static and temporal netw ork patterns observ ed in real-w orld netw orks and information diﬀusion data, and can predict the net work evolution and information diﬀusion more accurately than previous state-of-the-arts. Using p oint pro cesses to mo del in tertwined even ts in information and so cial netw orks opens up man y interesting ven ues for future. Our curren t mo del is just a show-case of a ric h set of p ossibilities oﬀered by a point pro cess framew ork, which ha ve b een rarely explored b efore in large scale so cial net work modeling. There are quite a few directions that remain as future work and are v ery in teresting to explore. F or example: 29 • A large and diverse range of p oin t pro cesses can also be used instead in the framew ork and augmen t the curren t mo del without c hanging the eﬃciency of sim ulation and the con vexit y of parameter estimation. • W e can incorporate features from previous state of the diﬀusion or net work structure. F or example, one can mo del information ov erload by adding a nonlinear transfer function on top of the diﬀusion in tensity , or mo del p eer pressure by adding a nonlinear transfer function depending on the num b er of neigh b ors. • There are situations that the pro cesses are naturally evolv e in diﬀerent time scales. F or example, link dynamics is meaningful in the scale of days, ho wev er, the resolution in which information propaga- tion occurs is usually in hours or ev en min utes. Developing an eﬃcien t mec hanism to account for heterogeneit y in time resolution w ould improv e the model’s ability to predict. • W e may augmen t the framework to allo w time-v arying parameters. The simulation w ould not b e aﬀected and the estimation of time-v arying in teraction can still b e carried out via a conv ex optimization problem [17]. • Alternatively , one can use diﬀerent triggering kernels for the Ha wkes processes and learn them to capture ﬁner details of temp oral dynamics. Ac knowledgemen t. The authors would lik e to thank Demetris Antoniades and Constantine Dovro- lis for providing them with the dataset. The research w as supp orted in part by NSF/NIH BIGDA T A 1R01GM108341, ONR N00014-15-1-2340, NSF I IS-1218749, NSF CAREER I IS-1350983. A Ogata’s Algorithm In this section, we revisit Ogata’s algorithm in more details. Consider a U -dimensional p oint pro cess in whic h each dimension u is c haracterized b y a conditional intensit y function λ ∗ u ( t ). Ogata’s algorithm starts with summing the intensities, λ ∗ sum ( τ ) = P U u =1 λ ∗ u ( τ ). Then, assuming we hav e sim ulated up to time t , the next sample time, t 0 , is the ﬁrst even t drawn from the non-homogenous Poisson pro cess with in tensity λ ∗ sum ( τ ) whic h begins at time t . Here, the algorithm exploits that, given a ﬁxed history , the Hawk es Pro cess is a non-homogenous Poisson pro cess, which runs until the next even t happ ens. Then, the new even t will result in an update of the intensities and a new non-homogenous Poisson pro cess starts. It can b e shown that the waiting time of a non-homogeneous Poisson pro cess is an exp onen tially dis- tributed random v ariable with rate equal to integral of the intensit y [29], i.e. s ∼ E xponential  R t + s t λ ∗ sum ( τ ) dτ  . Th us, the next sample time can b e computed as t 0 = t |{z} current time + s. |{z} waiting time for the ﬁrst even t (51) Sampling from a non-homogenous Poisson pro cess is not straigh t-forward, therefore, Ogata’s algorithm uses rejection sampling with a homogenous Poisson process as the prop osal distribution. More in detail, giv en ˆ λ = max t ≤ τ ≤ T λ ∗ sum ( τ ) , t 0 is the time of ﬁrst even t of homogenous Poisson Pro cess with rate ˆ λ . Then, w e accept the sample time with probabilit y λ ∗ sum ( t 0 ) / ˆ λ. Finally , the dimension ﬁring the even t is determined b y sampling prop ortionally to the contribution of the intensit y of that user to the total intensit y , i.e. , λ ∗ u ( t 0 ) /λ ∗ sum ( t 0 ) for 1 ≤ u ≤ U . This pro cedure is iterated un til w e reach the end of sim ulation time T . Algorithm 5 presents the complete pro cedure. Ogata’s algorithm would scale p o orly with the dimension of the pro cess, b ecause, after eac h sample, w e w ould need to re-ev aluate the aﬀected in tensities and ﬁnd the upp er b ound. As a consequence, a naiv e implemen tation to draw n samples require O ( U n 2 ) time complexit y , where U is the num b er of dimensions. This is because for each sample w e need to ﬁnd the new summation of in tensities, which inv olves O ( U ) individual ones, each taking O ( n ) time to accum ulate ov er this history . In our so cial netw orks application, 30 Algorithm 5 Ogata’s Algorithm Input: U dimensional Ha wkes pro cess { λ ∗ u ( t ) } u =1 ...U , Due time: T 2: Output: Set of even ts: H = { ( t 1 , u 1 ) , . . . , ( t n , u n ) } t ← 0 4: i ← 0 while t < T do 6: λ ∗ sum ( τ ) ← P U u =1 λ ∗ u ( τ ) ˆ λ ← max t ≤ τ ≤ T λ ∗ sum ( τ ) 8: s ∼ E xponential ( ˆ λ ) t 0 ← t + s 10: if t 0 ≥ T then break 12: end if ¯ λ ← λ ∗ sum ( t 0 ) 14: d ∼ U nif or m (0 , 1) if d × ˆ λ > ¯ λ then 16: t ← t 0 Goto 6 18: end if S ← 0 20: d ∼ U nif or m (0 , 1) for u ← 1 to U do 22: S ← S + λ ∗ u ( t 0 ) if S ≥ d then 24: i ← i + 1 u i ← u 26: t i ← t 0 t ← t 0 28: Goto 6 end if 30: end for Giv en the new even t just sampled up date intensit y functions λ ∗ u ( τ ) 32: end while Sampling next ev ent time Rejection test A ttribution test w e ha ve m 2 − m p oint pro cesses for link creation and m 2 ones for retw eeting, i.e. , U = O ( m 2 ). Therefore, Ogata’s algorithm takes O ( m 2 n 2 ) time complexity . References [1] Haewoon Kw ak, Changhyun Lee, Hosung P ark, and Sue Mo on. What is Twitter, a social net work or a news media? In Pr o c e e dings of the 19th International Confer enc e on World Wide Web , pages 591–600, New Y ork, NY, USA, 2010. A CM. [2] Justin Cheng, Lada Adamic, P Alex Dow, Jon Michael Kleinberg, and Jure Lesko vec. Can cascades b e predicted? In Pr o c e e dings of the 23r d international c onfer enc e on World wide web , pages 925–936, 2014. [3] Demetris An toniades and Constan tine Do vrolis. Co-evolutionary dynamics in so cial netw orks: A case study of twitter. arXiv pr eprint arXiv:1309.6001 , 2013. 31 [4] Lilian W eng, Jacob Ratkiewicz, Nicola Perra, Bruno Gon¸ calv es, Carlos Castillo, F rancesco Bonchi, Rossano Sc hifanella, Filippo Menczer, and Alessandro Flammini. The role of information diﬀusion in the evolution of so cial netw orks. In Pr o c e e dings of the 19th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 356–364. A CM, 2013. [5] Seth A Myers and Jure Lesko vec. The bursty dynamics of the twitter information netw ork. In 23r d International Confer enc e on the World Wide Web , pages 913–924, 2014. [6] Manuel Gomez-Ro driguez, Jure Lesk ov ec, and Andreas Krause. Inferring netw orks of diﬀusion and inﬂuence. In Pr o c e e dings of the 16th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 1019–1028. ACM, 2010. [7] Manuel Gomez-Ro driguez, Da vid Balduzzi, and Bernhard Sch¨ olk opf. Uncov ering the temp oral dynamics of diﬀusion netw orks. In Pr o c e e dings of the International Confer enc e on Machine L e arning , 2011. [8] Nan Du, Le Song, Manuel Gomez-Rodriguez, and Hongyuan Zha. Scalable inﬂuence estimation in con tinuous-time diﬀusion net works. In A dvanc es in Neur al Information Pr o c essing Systems 26 , 2013. [9] Mehrdad F ara jtabar, Manuel Gomez-Ro driguez, Nan Du, Mohammad Zamani, Hongyuan Zha, and Le Song. Bac k to the past: Source identiﬁcation in diﬀusion net works from partially observ ed cascades. In Pr o c e e dings of the 18th International Confer enc e on Artiﬁcial Intel ligenc e and Statistics (AIST A TS) , 2015. [10] Deepay an Chakrabarti, Yiping Zhan, and Christos F aloutsos. R-mat: A recursive mo del for graph mining. Computer Scienc e Dep artment , page 541, 2004. [11] Jure Lesko vec, Lars Backstrom, Ravi Kumar, and Andrew T omkins. Microscopic ev olution of so cial net works. In Pr o c e e dings of the 14th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 462–470. ACM, 2008. [12] Jure Lesko vec, Deepay an Chakrabarti, Jon Klein b erg, Christos F aloutsos, and Zoubin Ghahramani. Kro- nec ker graphs: An approach to modeling net works. Journal of Machine L e arning R ese ar ch , 11(F eb):985– 1042, 2010. [13] Thomas Josef Liniger. Multivariate Hawkes Pr o c esses . PhD thesis, Swiss F ederal Institute of T ec hnology Zuric h, 2009. [14] Charles Blundell, Jeﬀ Beck, and Katherine A Heller. Modelling recipro cating relationships with ha wkes pro cesses. In nips , 2012. [15] T omoharu Iwata, Amar Shah, and Zoubin Ghahramani. Disco vering laten t inﬂuence in online so cial activities via shared cascade p oisson pro cesses. In Pr o c e e dings of the 19th A CM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 266–274. A CM, 2013. [16] Ke Zhou, Hongyuan Zha, and Le Song. Learning so cial infectivity in sparse low-rank net works using m ulti-dimensional hawk es pro cesses. In A rtiﬁcial Intel ligenc e and Statistics (AIST A TS) , 2013. [17] Ke Zhou, Hongyuan Zha, and Le Song. Learning triggering kernels for multi-dimensional hawk es pro- cesses. In International Confer enc e on Machine L e arning (ICML) , 2013. [18] Mehrdad F ara jtabar, Nan Du, Manuel Gomez-Ro driguez, Isab el V alera, Hongyuan Zha, and Le Song. Shaping social activit y b y incen tivizing users. In A dvanc es in Neur al Information Pr o c essing Systems (NIPS) , 2014. [19] Scott W Linderman and Ryan P Adams. Disco vering latent netw ork structure in p oin t pro cess data. In International Confer enc e on Machine L e arning (ICML) , 2014. 32 [20] Nan Du, Mehrdad F ara jtabar, Amr Ahmed, Alexander J Smola, and Le Song. Dirichlet-ha wkes processes with applications to clustering contin uous-time document streams. In KDD . ACM, 2015. [21] Isab el V alera and Man uel Gomez-Ro driguez. Mo deling adoption of comp eting products and con ven tions in so cial media. IEEE International Confer enc e on Data Mining , 2015. [22] David Hunter, Padhraic Smyth, Duy Q V u, and Arthur U Asuncion. Dynamic egocentric models for citation netw orks. In Pr o c e e dings of the 28th International Confer enc e on Machine L e arning , pages 857–864, 2011. [23] Duy Q V u, David Hun ter, Padhraic Sm yth, and Arthur U Asuncion. Contin uous-time regression models for longitudinal net works. In A dvanc es in Neur al Information Pr o c essing Systems , pages 2492–2500, 2011. [24] Jure Lesko vec, Jon Kleinberg, and Christos F aloutsos. Graphs ov er time: densiﬁcation laws, shrink- ing diameters and p ossible explanations. In Pr o c e e dings of the eleventh ACM SIGKDD international c onfer enc e on Know le dge disc overy in data mining , pages 177–187. A CM, 2005. [25] Sharad Go el, Duncan J W atts, and Daniel G Goldstein. The structure of online diﬀusion net works. In Pr o c e e dings of the 13th ACM c onfer enc e on ele ctr onic c ommer c e , pages 623–638, 2012. [26] Odd Aalen, Ornulf Borgan, and Hakon Gjessing. Survival and event history analysis: a pr o c ess p oint of view . Springer, 2008. [27] David Kemp e, Jon Kleinberg, and ´ Ev a T ardos. Maximizing the spread of inﬂuence through a so cial net work. In SIGKDD , pages 137–146. A CM, 2003. [28] Y osihik o Ogata. On lewis’ sim ulation metho d for point pro cesses. Information The ory, IEEE T r ansac- tions on , 27(1):23–31, 1981. [29] Sheldon M. Ross. Intr o duction to Pr ob ability Mo dels, T enth Edition . Academic Press, Inc., 2011. [30] Stephen Boyd and Lieven V andenberghe. Convex Optimization . Cam bridge Univ ersity Press, Cam- bridge, England, 2004. [31] David R Hunter and Kenneth Lange. A tutorial on mm algorithms. The Americ an Statistician , 58(1):30– 37, 2004. [32] Paul Erdos and A R ´ en yi. On the ev olution of random graphs. Publ. Math. Inst. Hungar. A c ad. Sci , 5:17–61, 1960. [33] Lars Bac kstrom, Paolo Boldi, Marco Rosa, Johan Ugander, and Sebastiano Vigna. F our degrees of separation. In Pr o c e e dings of the 4th A nnual A CM Web Scienc e Confer enc e , pages 33–42, 2012. [34] Mark Grano vetter. The strength of weak ties. A meric an journal of so ciolo gy , pages 1360–1380, 1973. [35] Daniel Mauricio Romero and Jon Klein b erg. The directed closure process in hybrid so cial-information net works, with an analysis of link formation on t witter. In ICWSM , 2010. [36] Johan Ugander, Lars Backstrom, and Jon Klein b erg. Subgraph frequencies: Mapping the empirical and extremal geography of large graph collections. In Pr o c e e dings of the 22nd international c onfer enc e on World Wide Web , pages 1307–1318. In ternational W orld Wide W eb Conferences Steering Committee, 2013. [37] Duncan J W atts and Steven H Strogatz. Collectiv e dynamics of small-world netw orks. Natur e , 393(6684):440–442, June 1998. 33 [38] Daryl J Daley and David V ere-Jones. A n intr o duction to the the ory of p oint pr o c esses: volume II: gener al the ory and structur e . Springer Science & Business Media, 2007. [39] T uan Q Phan and Edoardo M Airoldi. A natural exp erimen t of so cial netw ork formation and dynamics. Pr o c e e dings of the National A c ademy of Scienc es , 112(21):6595–6600, 2015. [40] Patric k Doreian and F rans Stokman. Evolution of so cial networks . Routledge, 2013. [41] Eric W ang, Jorge Silv a, Reb ecca Willett, and La wrence Carin. Time-evolving modeling of social net- w orks. In A c oustics, Sp e e ch and Signal Pr o c essing (ICASSP), 2011 IEEE International Confer enc e on , pages 2184–2187. IEEE, 2011. [42] Mark Newman. Networks: an intr o duction . Oxford Universit y Press, 2010. [43] Alain Barrat, Marc Barthelem y , and Alessandro V espignani. Dynamic al pr o c esses on c omplex networks . Cam bridge Universit y Press, 2008. [44] Anna Golden b erg, Alice X Zheng, Stephen E Fienberg, and Edoardo M Airoldi. A survey of statistical net work mo dels. F oundations and T r ends R  in Machine L e arning , 2(2):129–233, 2010. [45] David R Heise. Mo deling even t structures*. Journal of Mathematic al So ciolo gy , 14(2-3):139–169, 1989. [46] Michelle Girv an and Mark EJ Newman. Comm unity structure in so cial and biological net works. Pr o- c e e dings of the national ac ademy of scienc es , 99(12):7821–7826, 2002. [47] J. Lesko vec, L. Backstrom, and J. Klein b erg. Meme-tracking and the dynamics of the news cycle. In Pr o c e e dings of the 15th A CM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 497–506. A CM, 2009. [48] T om AB Snijders and SR Luchini. Statistical metho ds for netw ork dynamics. In Pr o c e e dings of the XLIII Scientiﬁc Me eting, Italian Statistic al So ciety . CLEUP , 2006. [49] Ulrik Brandes, J ¨ urgen Lerner, and T om AB Snijders. Net works ev olving step by step: Statistical analysis of dy adic even t data. In So cial Network Analysis and Mining, 2009. ASONAM’09. International Confer enc e on A dvanc es in , pages 200–205. IEEE, 2009. [50] Rumi Ghosh and Kristina Lerman. The role of dynamic in teractions in multi-scale analysis of netw ork structure. CoRR , 2012. [51] T ad Hogg and Kristina Lerman. So cial dynamics of digg. EPJ Data Scienc e , 1(1):1–26, 2012. [52] Kristina Lerman, Aram Galsty an, Greg V er Steeg, and T ad Hogg. So cial mechanics: An empirically grounded science of so cial media. In Fifth International AAAI Confer enc e on Weblo gs and So cial Me dia , 2011. [53] Petter Holme. Mo dern temporal net work theory: A colloquium. arXiv pr eprint arXiv:1508.01303 , 2015. [54] Prasanta Bhattachary a, T uan Q Phan, and Edoardo M Airoldi. Analyzing the co-evolution of net work structure and con tent generation in online so cial netw orks. ECIS 2015 Complete d R ese ar ch Pap ers , page 18, 2015. [55] Thilo Gross and Bernd Blasius. Adaptive coevolutionary netw orks: a review. Journal of The R oyal So ciety Interfac e , 5(20):259–271, 2008. [56] Thilo Gross and Hiroki Sa yama. A daptive networks . Springer, 2009. [57] Hiroki Say ama, Irene P estov, Jeﬀrey Schmidt, Benjamin James Bush, Ch un W ong, Junichi Y amanoi, and Thilo Gross. Mo deling complex systems with adaptive netw orks. Computers & Mathematics with Applic ations , 65(10):1645–1664, 2013. 34 [58] Stefan Bornholdt and Thimo Rohlf. T op ological evolution of dynamical netw orks: Global criticality from lo cal dynamics. Physic al R eview L etters , 84(26):6114, 2000. [59] Thilo Gross, Carlos J Dommar DLima, and Bernd Blasius. Epidemic dynamics on an adaptiv e netw ork. Physic al r eview letters , 96(20):208701, 2006. [60] Dami´ an H Zanette and Sebasti´ an Risau-Gusm´ an. Infection spreading in a p opulation with evolving con tacts. Journal of biolo gic al physics , 34(1-2):135–148, 2008. [61] Gerd Zsc haler, Gesa A B¨ ohme, Mic hael Seißinger, Cristi´ an Huepe, and Thilo Gross. Early fragmen tation in the adaptive voter mo del on directed netw orks. Physic al R eview E , 85(4):046107, 2012. [62] T om AB Snijders. Siena: Statistical mo deling of longitudinal net work data. In Encyclop e dia of So cial Network Analysis and Mining , pages 1718–1725. Springer, 2014. [63] Nitin Agarw al, Huan Liu, Lei T ang, and Philip S Y u. Identifying the inﬂuen tial bloggers in a communit y . In Pr o c e e dings of the 2008 international c onfer enc e on web se ar ch and data mining , pages 207–218. ACM, 2008. [64] Daniel Gruhl, Ramanathan Guha, Da vid Lib en-Now ell, and Andrew T omkins. Information diﬀusion through blogspace. In Pr o c e e dings of the 13th international c onfer enc e on World Wide Web , pages 491–501. ACM, 2004. [65] T ao Sun, W ei Chen, Zhenming Liu, Y a jun W ang, Xiaorui Sun, Ming Zhang, and Chin-Y ew Lin. P ar- ticipation maximization based on social inﬂuence in online discussion forums. In Pr o c e e dings of the International AAAI Confer enc e on Weblo gs and So cial Me dia , 2011. [66] F ang jian Guo, Charles Blundell, Hanna W allach, Katherine Heller, and UCL Gatsby Unit. The ba yesian ec ho c hamber: Mo deling so cial inﬂuence via linguistic accommo dation. In Pr o c e e dings of the Eighte enth International Confer enc e on Artiﬁcial Intel ligenc e and Statistics , pages 315–323, 2015. [67] Jianshu W eng, Ee-P eng Lim, Jing Jiang, and Qi He. Twitterrank: ﬁnding topic-sensitiv e inﬂuential t witterers. In Pr o c e e dings of the thir d ACM international c onfer enc e on Web se ar ch and data mining , pages 261–270. ACM, 2010. [68] Eytan Bakshy , Jak e M. Hofman, Winter A. Mason, and Duncan J. W atts. Every one’s an inﬂuencer: Quan tifying inﬂuence on twitter. In WSDM , pages 65–74, 2011. [69] Kazumi Saito, Ryohei Nak ano, and Masahiro Kimura. Prediction of information diﬀusion probabilities for indep enden t cascade mo del. In Know le dge-b ase d intel ligent information and engine ering systems , pages 67–75. Springer, 2008. [70] Amit Goy al, F rancesco Bonchi, and Laks VS Lakshmanan. Learning inﬂuence probabilities in so cial net works. In Pr o c e e dings of the thir d ACM international c onfer enc e on Web se ar ch and data mining , pages 241–250. ACM, 2010. [71] M.G. Ro driguez and B. Sc h¨ olkopf. Inﬂuence maximization in contin uous time diﬀusion net works. In Pr o c e e dings of the International Confer enc e on Machine L e arning , 2012. [72] Matthew Ric hardson and P edro Domingos. Mining knowledge-sharing sites for viral marketing. In Pr o c e e dings of the eighth ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 61–70. A CM, 2002. [73] Pedro Domingos and Matt Ric hardson. Mining the netw ork v alue of customers. In Pr o c e e dings of the seventh A CM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , pages 57–66. A CM, 2001. 35 [74] Smriti Bhagat, Amit Goy al, and Laks VS Lakshmanan. Maximizing pro duct adoption in social netw orks. In Pr o c e e dings of the ﬁfth ACM international c onfer enc e on Web se ar ch and data mining , pages 603–612. A CM, 2012. [75] Rushi Bhatt, Vineet Chao ji, and Ra jesh Parekh. Predicting pro duct adoption in large-scale so cial net works. In Pr o c e e dings of the 19th ACM international c onfer enc e on Information and know le dge management , pages 1039–1048. A CM, 2010. [76] Nan Du, Le Song, Ming Y uan, and Alex J Smola. Learning netw orks of heterogeneous inﬂuence. In A dvanc es in Neur al Information Pr o c essing Systems , pages 2780–2788, 2012. [77] Shuang-Hong Y ang and Hongyuan Zha. Mixture of mutually exciting processes for viral diﬀusion. In Pr o c e e dings of the 30th International Confer enc e on Machine L e arning (ICML-13) , pages 1–9, 2013. [78] Nan Du, Le Song, Hyenkyun W o o, and Hongyuan Zha. Unco ver topic-sensitive information diﬀusion net works. In Pr o c e e dings of the sixte enth international c onfer enc e on artiﬁcial intel ligenc e and statistics , pages 229–237, 2013. [79] Manuel Gomez Rodriguez, Jure Lesk ov ec, and Bernhard Sc h¨ olkopf. Mo deling information propagation with surviv al theory . arXiv pr eprint arXiv:1305.3616 , 2013. [80] W enzhao Lian, Ricardo Henao, Vina yak Rao, Joseph Lucas, and Lawrence Carin. A multitask point pro cess predictiv e mo del. In Pr o c e e dings of the 32nd International Confer enc e on Machine L e arning (ICML-15) , pages 2030–2038, 2015. [81] Ankur P P arikh, Asela Guna wardana, and Christopher Meek. Conjoin t modeling of temp oral dep en- dencies in even t streams. In UAI Bayesian Mo del ling Applic ations Workshop . Citeseer, 2012. [82] Eric C Hall and Reb ecca M Willett. T racking dynamic p oin t pro cesses on netw orks. arXiv pr eprint arXiv:1409.0031 , 2014. [83] Amit Go yal, F rancesco Bonc hi, Laks VS Lakshmanan, and Suresh V enk atasubramanian. Approximation analysis of inﬂuence spread in so cial netw orks. arXiv pr eprint arXiv:1008.2005 , 2010. [84] W. Lian, V. A. Rao, B. Eriksson, and L. Carin. Mo deling correlated arriv al even ts with latent semi- mark ov pro cesses. In Pr o c e e dings of the International Confer enc e on Machine L e arning , 2014. [85] Asela Gunaw ardana, Christopher Meek, and Puyang Xu. A model for temp oral dependencies in even t streams. In A dvanc es in Neur al Information Pr o c essing Systems , pages 1962–1970, 2011. [86] Philipp Singer, Claudia W agner, and Markus Strohmaier. F actors inﬂuencing the co-ev olution of so cial and conten t net works in online so cial media. In Mo deling and Mining Ubiquitous So cial Me dia , pages 40–59. Springer, 2012. [87] Long T ran, Mehrdad F ara jtabar, Le Song, and Hongyuan Zha. Netcodec: Comm unity detection from individual activities. In SDM , 2015. 36

COEVOLVE: A Joint Point Process Model for Information Diffusion and Network Co-evolution

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment