The Bursty Dynamics of the Twitter Information Network
In online social media systems users are not only posting, consuming, and resharing content, but also creating new and destroying existing connections in the underlying social network. While each of these two types of dynamics has individually been s…
Authors: Seth A. Myers, Jure Leskovec
The Bur sty Dynamics of the T witter Inf ormation Netw ork Seth Myers Stanf ord Univ ersity samy ers@stanf ord .edu Ju re Lesk ov ec Stanf ord Univ ersity jure@cs.stanf o rd.edu ABSTRA CT In online social media sy stems users are no t only posting, consum- ing, and resh aring conten t, but also creating ne w and destroying existing connections i n t he underlying social network. While each of these two typ es of dy namics has indi vidually been studied in the past, much less is kn own ab out the conne ction between the two . Ho w does user information posting and seeking behavio r interact with the ev olution of the underly ing social netw ork structure? Here, we study ways in which network structure reacts to users posting and sharing content. W e e xamine the complete dynamics of the T witter i nformation netwo rk, where users post and reshare information while they also create and destroy connections. W e find that the dynamics of netw ork structure can be characterized by steady rates of change, interrupted by sudde n bursts. Information diffu sion in t he form of cascades of post re-sharing often creates such sudden b ursts of ne w connections, which sign ificantly change users’ local network structure. These bursts transform users’ net- works of follo wers to beco me structurally more cohesi v e as well as more homogenou s in terms of follower i nterests. W e also explore the effect of the information content on the dynamics of the net- work and find e vidence that the appearance of ne w topics and real- world events can lead to significant changes in edge creations and deletions. Lastly , we de velop a model that quantifies the dyna m- ics of the network and the occurrence of t hese bursts as a function of the information spread ing through the netwo rk. The model can successfully p redict which information dif fusion e ven ts will lead to bursts in netw ork dynamics. Categories an d Su bject Descriptors: H.2.8 [Database Manage- ment] : Database applications— Data mining General T erms: Algorithms; Experimentation. K eywords: Net work dy namics, Networks of diffusio n, T witter . 1. INTR ODUCTION Online social networking and social media sites hav e become an ubiquitous mechanism for sharing and seeking i nformation. In these s ites, users form a netw ork of connections by linking to friends, celebrities, or ganizations, and ne ws outlets. By creating such fol- Permission to m ake digital or hard copie s of all or part of thi s work for personal or classroom use is granted without fee provided that copies are not made or dist ribut ed for profit or co mmercial adva ntage and that copi es bear this not ice and the ful l cit ation on the first page. Cop yright s for compone nts of this work o wned by others than A CM must be hono red. Abstrac ting with credit is permit ted. T o copy oth erwise, or republi sh, to post on serv ers or t o redistri bute to lists, requires prior specific permission and/or a fee. Request permissions from permissions@ac m.org. Copyri ght 20XX A CM X-XXXXX-XX-X/XX/XX ...$15.00. lower co nnections to others, users sub scribe to the content that oth- ers post. Thus, as users choose who to connect to, they are also (implicitly) choosing to which information they will ha ve access. Such behavior leads to two interesting types of dyna mics. F irst is the dynamics of creation and destruction of connection s of the un- derlying social n etworks of follo wer relationships, while the second is the dy namics of information flo w in these network s, where users produce posts that others then consume as well as r eshar e to their o wn sets of follo wers. Both of these processes are by now relatively well studied and understood. For example, dynamics of netwo rks which e volv e by creation of ne w links [3, 1 8, 22, 23] and by the de- struction of unwanted ones [20, 33] has been examined. Similarly , study of the dyn amics of creation, consumption, and resharing of information in online netw orks has lead to a ri ch bod y of w ork. Ex- amples include predicting what content will become reshared and popular [8, 13, 19 , 29], rec ommending items to othe rs [21], quanti- fying the influence of users on the content consumed by others [ 4, 7, 15], and studying the propag ation o f pieces of inform ation across large netw orks [10, 12, 17, 24]. Ho wev er , the interaction between these two types of d ynamics is much less understoo d. For example, it is possible t hat the network could react and reconfigure itself due to the flows of information along its edges. In particular , as u sers in netw orks create and delete edges, they control the content to which they are e xposed. Thus, as the information is shared from a user to user and flo ws through the netw ork, users might react to it by breaking old connections and also c reating n ew one s. For e xample, if the information shared by a user is offen siv e or not interesting, a follo wer might decide to drop a conn ection. On the other hand, if a user posts a piece of content that g ets reshared th rough the netw ork, others might get e xposed to it and find it interesting. As a res ult they might decide to connect to the original poster and directly get access to the information she is posting. In both of these cases, the sharing of content af fected ho w users are connected to each other in the network. It is t hus important to consider the question of the interaction b e- tween the two d ynamic processes: the process of users posting and sharing information, and the process of network e vo lution. When do information sharing ev ents cause changes to the network dy- namics? How do these changes ef fect the ne twork of a user as well as future information to which she is expo sed? Can information- driv en network cha nges be detected and predicted? These open questions pose a challenge because establishing the connection between the dynamics of information sharing and the dynamics of network e v olution also requires understan ding of how information spreads in networks. Explicit traces of information sharing and netw ork evo lution have been traditionally hard to ob- tain. Additionally , as large-scale information sharing eve nts are relativ ely r are [11], it might be hard to quantify fine-grained ef- fects of information diffusion on the underlying network dynam- ics. Without a richer understanding of this question, ho we ver , it is difficu lt to reason about networks, the mechanisms of how content spreads thro ugh them, and the conne ction between information and networks. Present w ork: Inf ormation causes bursts in network e volution. Here, we study the dynamics of a large social network and ho w it is affected by users sharing information and content. W e e xam- ine the dynamics of the T witter follo wer netwo rk, where the graph is changing as users create ne w edges and destroy old ones. W e study the complete dyn amics of a subgraph of 13.1 milli on English- speaking users. W ithin this subgraph o f a fixed set of T witt er users, 1.2 b illion tweets were pos ted as well as 112.3 million new con nec- tions were formed and 39.2 million existing one s were deleted. Bursts of edge cr eations and deletions. W e discover that th e T wit - ter network is highly dynamic with about 9% of all connections changing in a month. For example, an a verage user with 100 fol- lo wers gains 10% more follo wers, while also losing about 3% of their existing followers in a giv en month, and ov erall the network is slo wly densifying [23]. There is a constant backgro und “fl ux” of edge creation and deletion even ts. Ho wev er , this fl ux gets inter- rupted when there is a larg e information cascade spreading through the network. In particular, we find that as information gets shared through the network, it can cause abrupt changes or bursts i n the dynamics of the underlying network structure. W e discove r that such information cascades r esult in t wo phe- nomena. First, users in a coord inated way drop their connection s to the information source (we refer to this as the unfollow bur st ). And second, many other users almost simultaneously create new con- nections to the information source (we refer to this as the follow bur st ). Such sudden bursts in network acti vity can hav e a signifi- cant impact on a user’ s netw ork structure. W e find that the similar- ity between a user and her followers (measured by textual similarity of users’ posts) increases sharply during such bursts. For the f ol- lo w bursts, the increase is caused b y others discov ering a user with similar interests through information dif fusion and then conne cting to her . For t he unfollo w bursts, less similar existing follo wers un- follo w the user, which t hen also results in an increased similarity of the user’ s f ollo wers. Add itionally , the density of connections between the user’ s followers also increases during a burst. In the same manner that new follo wers discov er the user , they also dis- cov er each other . Overall, the bursts increase the cohe rence of the local network by both i ncreasing the similarity of t he connected users as well as the density of the underlying network structure. While bursts in network dyn amics are created by users r eshar- ing information, we also ex amine the content of tweets that cause bursts of new followers. Using the “Occup y W all Street” movem ent as a case study , we fi nd e vidence that external real-world events hav e the po wer to connect similar users. As news of an event dif- fuses through t he network, it appears that users interested in the e vent connect to each other to learn more about it. Modeling and predicting bursts. The T wi tter netwo rk dynamics comprises of a constant flux of edge creations and deletions that occasionally gets interrupted by a sudd en burst in n etwork activity . The interesting question then is whether we can model (as well as predict) whether a piece of information that gets reshared through network will result in a b urst of new follo wers to a gi ven user . W e de velop a model that quantifies the occurrence of bursts in the dyna mics of the netw ork as a fun ction of information dif fusion. Our model is based on the intuition that bursts of new followers occur when a user is discov ered by other highly si milar users who then connect to her . In this case the diffusion of information facili- tates the e xposure of similar other us ers to the target u ser and gi ves others an opportunity to link t o her . Our model quantifies the sim- ilarity of potential new followers ex posed to the user’ s post. The model compares the similarity of potential new followe rs with that of others who re gularly get e xposed to the user’ s posts, an d using it as a signal predicts whether a new follo wer b urst will occur . W ith our model, it is possible to make predictions about the f u- ture evo lution of the network, to identify users who are about to gain many more follo wers, and to predict the affect of a ne w infor- mation dif fusion ev ent on the local network properties. The rest of the paper is structured as follows. In Section 2 we describe our dataset and empirically st udy the conn ection between information cascades and network ev olution. Section 3 then pro- poses a mod el capa ble of predicting whether a p iece of information will result in a burst of ne w follo wers. W e briefly revie w related work in Section 4 and conclude in Section 5. 2. AN AL YSIS OF B URSTS W e begin our in v estigations with an empirical analysis of the dy- namics of the T witter follo wer graph. W e study ho w the formation of ne w edges ( follows ) and the remov al of existing edges ( unfol- lows ) can be modeled as an ef fect of the tweets that prop agate ov er the T witter network. 2.1 Dataset description Our dataset con sists o f a subset the T witt er follower graph during the month of No vemb er 2011. W e focus on English sp eaking users that tweeted at least once during the month. Overall, this giv es us a subgraph of about 13.1 mil lion nodes (users) with 1.7 billion follo wer edges. Moreover , for every edge i n the network we also obtained the e xact timestamp of its creation/deletion , which allows us t o i n v estigate fine-grained network dynamics that might be a result of information flows. Users of T witter also create and reshare posts by re tweeting them. Such resharing beha vior re sults in information cascad es, as a single post can prop agate between a large numbe r of users in the network. Thus, for e very user in this netwo rk, we also analyze her complete tweeting history and reconstruct the information flo ws. In total, the users of our subgrap h posted 1.2 billion tweets and retweeted each other 116.3 million times. 2.2 T witter graph is highly dynamic Examining the e volution of t he T witter follo wer graph we find that the network is highly dynamic. Amongst our 13 million users we identified 11 2.3 million new follo ws, as well as 39.2 million un- follo ws. In relation to the edges that e xisted at the b eginning of the month, nearly 7% new edges were added, and 2.3% of edges go t re- mov ed. Thus, ev en thoug h we are observing a fixed subpopulation of T witter users, 9% of the edges change in a gi v en month. This sho ws that the T witter graph is h ighly dyn amic and thus sho uld not be thought of as an “only-gro wing” netwo rk (a network that e volves mostly by users only adding edges [22]). In contrast, in T wit ter we see approx imately 1 edge deletion for ev ery 3 edge creations. Thus, about one quarter of all network ev olution even ts are in fact edge deletions, which means the network st ructure i s highly fluid and dynamic. T o better illustrate the amount of dynamics in the T witter net- work, Figure 1 plots the a verage monthly acti vity as a function of a user’ s indegree (i.e., follo wer co unt) at the be ginning of the month. W e plot the a verage number of n ew follo ws, unfo llows, tweets, and retweets a user recei ves as a function of her inde gree. W e observ e that the average number of ne w follo ws and unfollo ws strictly in- 10 -1 10 0 10 1 10 2 10 0 10 1 10 2 10 3 10 4 Count Indegree New Follows Unfollows Num. of Retweets Num. of Tweets Figure 1: The number of new follo ws gained , fo llows lost, the number o f retweets, and the number o f tweets all scale with the indegree (number of f ollower s) of a user . In a giv en month a user of degree 100 tends to ga in 10 an d loose 3 f ollowers. k j i T weet Retweet k j i Figure 2: Inf ormation diffusion and network dynamics. User i follo wing j who fo llows k . When user k posts a tweet that j subsequ ently r etweets (left), then user i gets exposed t o the tweet, and as a result decides to f ollow k (right). creases wit h the degree of the user . Note that ev en users with only 100 follo wers gain on av erage 10 more follo wers during the month, while losing abou t 3 of the m. This high churn ra te in users’ follow- ers remains consistent, ev en for users with high indegree . There is a “background” churn of followers, where a user i s constantly gaining and losing followers. Lastly , we also note that the distri bution of ne w followers that T witter users receiv e is heavily ske wed, and as a r esult the fol- lo w/unfollo w dynamics are heterogeneously distribu ted as well. In fact, the top 20% of users with the highest indeg ree recei ve 59.4% of all the follows and u nfollows in a gi ven mon th. 2.3 Inf ormation diffusion and follows/unf ollows Having obs erved the highly dynamic nature of the T wit ter fol- lo wer graph, we next focus on examinin g information diffusion mechanisms that might cause a user to follo w someone, and also the mechan isms that may cau se a use r to unfollo w one of her exist- ing connections . Figure 2 illustrates an exa mple of a process by which a new fol- lo w edge is created when a user discov ers another user through a retweet [1, 32]. Consider user i follo wing user j that follo ws user k . User i might enjoy k ’ s tweets, but i does not know about k and is not following her . If j happens to r etweet k ’ s post, then i gets 50 100 150 200 250 300 350 10 0 10 1 10 2 10 3 New Follow Count Retweet Count (a) Ne w follows vs. retweets 35 40 45 50 55 60 65 70 10 -1 10 0 10 1 10 2 10 3 10 4 Unfollow Count Tweet Count (b) Lost follo ws vs. tweets Figure 3: (a) The number of re tweets a u ser acquires against the number new f ollo wers they gain, for users of fixed indegre e of 1000-2000 . Even when conditioni ng on th e indegree of the user there is a re lationship between the number of retweets and the number of new f ollowers gained. (b) Number of un fo llows as a function of tweeting activity . T weetin g about 10 times in a month minimizes the number of follo wers lost. expo sed to it and thus learns about k ’ s existence. As a result of such exposu re i might decide to follo w k . Thus, as the tweet prop- agates throug h a network, users might decide to follo w the tweet originator k , and this way the newly created links will point up the information diffusion cascade. In fact, in our dataset 21% of all ne w follows are formed b y users who recently saw a retweet of the user they are ne wly follo wing. The process that causes unfollows is somewhat different and more local. Here, current followers of user k can decide to drop their connections. For example, posting offensi v e tweets or sud- denly i ncreasing the frequency of t weets causes users to lose fol- lo wers. This was ex plained by the fact that a follower who sees a particular user’ s tweets dominating her t imeline, becomes ann oyed, and then unfollo ws the user [20, 33]. W ith these intuitions in mind, we shall no w in vestigate whether such phenomena indeed occur in the T witter graph. First, we exa mine ho w a user’ s acti vity changes with her i nde- gree. In Fi gure 1 we also plot the numb er of tweets and retweets per user as a function of her indegree. As was the case with the dynamics of follows and unfollo ws, the information posting activ- ity also scales wit h a user’ s inde gree. This could partially be ex- plained by t he fac t that acti ve users who tweet frequently tend to hav e more followers, and thus high indegree users get retweeted more often. Ho we ver , e ven if we con dition on a user’ s indeg ree, we still observe a str ong relationship between users’ information posting activity an d the dynamics of network edges. Figure 3(a) plots the number of ne w follo wers as a function of the number retweets of a user’ s posts. Here we only consider users with inde gree between 1000 and 2000. Even without the variation in i ndeg ree, t here is a clear relationship between information dif- fusion and network dynamics. The more retweets a user receiv es, the more often new users are exp osed to her tweets, and the more opportunities they ha ve to follo w her . Similarly , Figure 3(b) sho ws the number of unfollo ws a user re- cei ves as a function of her tweeting activity . Interestingly , we ob- serve a non-monotonic r elationship where users who do not tweet enough, or users who tweet too much, tend to lose more follow- ers. W e find that for users of degree 1000-2000, tweeti ng about 10 times in a month minimizes the number of followe rs they loose. As a caution ary n ote, this is not to say information flo w dri ves all of the network dynamics. As we hav e sho wn i n the previous sec- tion, users also e xperience a steady flo w of follo w/unfollo w e vents. Thus, we can think of the graph as being in a steady state flux, and information flow th en causes perturbations to this steady state. 2.4 Detecting b ursts in the flow of f ollowers T o gain intuition about temporal dynamics of T witter user activ- ity , in Figure 4 we plot the arri v als of new follows and unfollows per hour for severa l high degree users as a function of time. Also plotted against these arr iv al rates is ho w many times per hour the user is retweeted, as well as when they themse lves tweet. Each plot represents a dif ferent individual user as they gain and lose follo w- ers, tweet, and are retweeted o ver th e hours of th e month. In all the plots we notice the presence of fluctuations with 24 hour period- icity . These fluctuations correspond to the daily acti vity cycles on T witter and represent the steady state of the T witter network. W e also ob serve interesting de viations. For example, in Fig. 4(a), around hour 110 the user recei ves a large number of retweets, which is later fo llowed by a l arge number of ne w follo wers. This is an e x- ample of an information diffusio n e vent causing a perturbation in the n ew follo wer arri va l rate. Ho we ver , retweets do no t alw ays lead to ne w follo ws. For example, in F igure 4(b) a bu rst in t he number of retweets occurs around hour 30 , but this causes only a ne gligible increase in the new follower arri va l rate. And t o demonstrate that e ven users with no activity sti ll gain and lose follo wers, Fig. 4(c) sho ws a user who consistently receiv es ne w follows and unfollows at a constant rate, ev en though she does nothing. Detecting follow perturbations. Next we focus our analysis on two cases when steady follower arriv al and departure rates change suddenly and abruptly . These abrupt c hanges are often the response to information dif fusion. Our aim is to understand why so me infor- mation dif fusion e ven ts cause network change s while others do not. In order to do so, we first de v elop a method f or i dentifying pertur- bations to the steady arriv al of ne w follows and unfollows. T o be more specific, we aim to identify periods of t ime in which a user re- cei ves more than a gi ven threshold of follows/unfollo ws compared to what was ex pected historically . The biggest challenge associated with identifying these pertur - bations, or bu rsts as the y will hence forth be called, is the periodic fluctuations in t he arri v al rate across the hours of the day . T o re- mov e this periodicity , we initially employed traditional methods such as Fourier Tran sforms, but the abundanc e of noise in the ar- riv al s as well the bursts themselves proved to be prob lematic. In- stead, we proceed as follows. W e treat the arri v al of ne w follo ws and un follows o ver the course of the month of each user as an independ ent time series: Let x = { x 1 , x 2 , ..., x n } be the number of ne w follows a user recei ves for each hour of the month. (For t he sake of clarity , we wil l describe our method using only follows, but the exact same analysis is ap- plied to all users’ unfollo ws independ ently .) W e are interested in interv als of time in which the number of follows i ncreases si gnifi- cantly more than expected, gi ven the hour of day . Let t i represent the i th hour of the month, and let f ( t i ) be t he dif ference between actual ne w follows and e xpected follows du ring t i : f ( t i ) = x i − E [ x | h ( t i )] = x i − P j ; | t i − t j |≤ 48 , h ( t i )= h ( t j ) x j · w ( t i − t j ) P j ; | t i − t j |≤ 48 , h ( t i )= h ( t j ) w ( t i − t j ) where h ( t ) r eturns the hour of day in which time t occurs, and w ( t ) is an exp onentially decaying weight function whose parame- ters are set using maximum likelihoo d. E ffe ctiv ely , this is locally weighted regression, but only points at 24 hour periods are used to calculate the expected av erage. The function f ( t i ) now represents ho w many ne w followers a user received compared to the expected amoun t for hour t i . W hen f ( t i ) remains close to 0, this is consid ered to be the steady state behav ior that most high degree users typically demonstrate. Ho w- e ver , we consider a burst to occur at time t i if f ( t i ) is greater t han two standard de viations. It is worth noting that we also expe rimented w ith an alt erna- tiv e method for remo ving the periodicity in the follo wer dynamics. W e employed the method proposed by S zabo and Huberman [29] where time is not counted in actual seconds but in the number of posts made on the entir e site. While the method remov ed some periodicity , we still found follo w activ ity to be correlated with the hour of day . W e explain this by pointing out that most high degree users do hav e periodicity in the arriv al of new followers, but this pe- riodicity is out of syn c with other users. For e xample, a user in San Francisco who is followed by primarily local users would likely hav e a different hour of the day when they recei ve the most ne w follo ws compared to a user in England. Therefore, for our dataset it is necessary to remove periodicity in a way that is independe nt for each user . Burst co-occurr ences. W ith a mechanism for detecting bursts, we no w focus on instances in which bursts occur in close proximity to information dif fusion e vents. Changes in the arri v al of ne w follo ws to a giv en u ser often coinc ide with large retweet casc ades g enerated by a tweet posted by the user . W e apply the normalization process to a user’ s arriv al of new follows, unfollo ws, tweets, and retweets, all independently . W e call r etweet-follow b urst any ti me a (2 stan- dard deviation) burst in retweets occurs in one hour , and then a burst in follo ws o ccurs within the ne xt hour . Additionally , tweeting too much (or more than the user norma lly does) can annoy user’ s follo wers and thus cause unfollows. Therefore, we also focus on tweet-unfollow bu rsts : a burst in tweets occurs within an hour of a burst in unfollo ws. As we shall show next, these two types of bursts are not only common , but they can explain man y changes in the user’ s l ocal network . 2.5 User’ s ego-network during a b urst Through the process d escribed abov e, we examin e all users with at least 2000 followers and identify 2.1 million instances where a retweet b urst was immediately follo wed by a follow burst. W e fo- cus on high degree users because for l o w degree users, t he arri va l in new follo ws or unfollo ws is not frequent enough, so detecting sudden changes is not reliable. Our analysis focuses on the ego- network of a user: the subgrap h co mposed of a user’ s follo wers (e x- cluding the user herself) and all the followe r relationships between them. Throug h examining the properties of users’ ego-netwo rks before and after the occurrence of the b ursts, we sho w that bursts contribute to network e vo lution by adv ancing it in abrupt interv als. The similarity of f ollower tweets. The first question we ask is ho w similar a user is to her follo wers, and ho w this similarity changes during a burst. More specifically , for a pair of users we want to quantify ho w similar are their interests in dif ferent types of in- formation. T o do t his, we measure the textual similarity of their tweets. The more similar the tweets of users, the more similar the information that dif fuses through them. For each user , we aggre gate e very tweet she posted during the month into a single “docume nt. ” W e define the user tweet similarity as the cosine simil arity of the TF-IDF weighted word vectors between the two users’ aggrega ted tweet docum ents. Although simple, this method provides a robu st measure of similari ty between a pair of users. W e note that the ag- gregated tweet docum ents also contain retweets, but retweets only account for a small fraction of all tweets. T hus, a user’ s tweet doc- ument is largely unaf fected by any retweets the y might hav e made. Using the tweet similarity of a user and their follo wers before and after a burst occurs, we in vestigate whether users’ foll o wers 0 5 10 15 20 25 0 20 40 60 80 100 120 140 Arrivals per hour Time (hours) unfollows follows retweets tweets (a) user with d in = 266 , 842 0 5 10 15 20 25 30 35 0 20 40 60 80 100 120 Arrivals per Hour Time (hours) unfollows follows retweets tweets (b) user with d in = 218 , 045 0 1 2 3 4 5 6 7 8 0 50 100 150 200 Arrivals per Hour Time (hours) unfollows follows retweets tweets (c) user with d in = 112 , 988 Figure 4: The arriv al of f ollows, unfollo ws, tweets, and retweets as a function of time f or seve ral high-degree users ( d in refers to the user’ s indegree). Each plot r epresents a different individual u ser as she gains and loses fo llowers, tweet, and is retweeted. (a) Ar ound hour 110 the user receive s a large number of retweets, which causes a lar ge number of new f ollowers. (b) A burst in retweets at hour 30 causes only a n egligible in crease in the new fo llowers. (c) The user consisten tly receiv es new f ollows and unfollo ws without tweeting or being retweeted. become m ore or less similar in th e con tent of their tweets. For each burst, we measured the av erage follo wer simil arity at 24 hour in- terv als for multi ple days before and aft er the burst. W e need to av- erage this metric across all users and bursts, so we normalize each measurement by its value exactly at the time of the burst. Thus, the metric is comparable across al l users. Fi gures 5(a), 5(b) show the r esults averag ed across all b ursts of the respectiv e types. If the y -axis is 1, then the metric f or that period of ti me was the same v alue at the time of the burst. For both t ypes of b ursts, we observ e a statistically significant increase in the user similarity , but for the retweet-follow b ursts there is an abrupt increase. Overall, we find that during a retweet-follow burst, the followe r tweet similarity increases abruptly , and to a lesser e xtent, for tweet- unfollo w b ursts. On av erage the follo wer tweet similarity increases ov er the course of the month, which implies that most users’ fol- lo wers become more congruent t o them. Howe v er , this rate of in- crease speeds up by 25.5% during a retweet-followe r burst, sho wn in T able 1. The reason for this acceleration of chang e is the nature of new follo wers gained during a bu rst versus the new follo wers that are not gained throug h information diffusion. Ne w foll o wers that a user gains through being retweeted hav e a 76.6% higher tweet similar- ity than ne w follo wers that ne v er were exposed to a retweet. Ad- ditionally , the new followers gained through retweets are 109.5% more similar t han pre-existing followers. This means that, during a retweet-follo w b urst, a user gains followers more similar to her than she would normally gain, and these followers are e ven more similar than the followers she alread y has. On the other h and, for the case of tweet-un follow bursts, a u ser’ s follo wers tweet similarity increase rate sp eeds up by only 1 1% (see T able 1). This increase, howe ver , is caused by current follo wers with less simil ar t weets, who then unfollow the user . Indeed, t he tweet similarity of a user who unfollows is 36.1% lower compared to a follower that end ures the entire month. T o rule out spurious causes of the increase in tweet similarity , we conducted a separate randomized experiment where we ran the abov e analysis again, b ut this time fo r each actual follo w/unfollo w , we randomized the recipient of the action b ut preserved the source user . In other words, if the data contains the even t “user A follows user B ”, we would replace user B with another randomly selected user in the network. Here, the tweet similarity between users and their followers decr eased significantly , which means that the in- creased similarity observe d above is not spurious. W e conclud e that during the retweet-follow and tweet-unfollow bursts, t he similarity of interests of a user’ s fo llowers increases, and so the user’ s network becomes more homogeneous. This means that bursts cause a sudden “jump” in the netw ork’ s e volution to ward bringing similar users together an d pushing dissimilar use rs farther apart. Follo wer tweet coherence. T o test t he idea that follo wers become more related to one another (not just to the use r) during a bu rst, we measure the si milarity in tweets between the followers of a user . Using the same method of T F-IDF cosine simil arity of t weet con- tent, we measured the similarity across all pairs of followers of a gi ven user in the days succeeding and preceding the b urst. T hese measurements were n ormalized by their v alue d uring the b urst, just as they were in the prev ious section. W e refer to this as follower tweet coher ence and plot the results in F igures 5(c) and 5(d). W e observ e that before t he burst, the follower coherence st eadily in- creases. But just as the follower tweet similarity suddenly increases during the b urst, so do the follo wer tweet coherence. Both types of bursts cause the follo wers’ tw eets t o become more aligned with each other . Connected components amongst followe rs. W e also examine structural changes to a user’ s local network neighborhood before and after a burst. In particular , we study the number of weakly connected components (WCC) of the e go-network during a bu rst. If the number of conne cted componen ts is high, then the subgraph of followers is fragmented. This would indicate that user’ s follo w- ers do not belong to a si ngle cohesi ve community and tend not to follo w each other . W e discov er t hat bursts cause a large increase in the number of connected components in the network. Relat iv e to a user’ s ego- network WCC chan ge o ve r the entire month, a retweet-follow burst causes the arriv al of WCC’s to increase by 17.4 times during the burst, while tweet-unfollo w burst increases the arri v al by 4.0 times (see T able 1). Furthermore, Figures 5(e) and 5(f) sho w the relativ e number of WCC’ s in t he days proceeding and succeeding bursts. In both cases, the number of conn ected components in the follo wer ego-n etwork increases for severa l days after the burst. Thus, we conclude that bursts cause an influx of new follo wers from unaf fil- iated communities into a user’ s follo wer ego -network. Follo wers following each other . Last, we analyze the follower ego-n etwork edge density . For a giv en set of followers, the metr ic T weet-Unfollo w Bursts Retweet-Follo w Bursts 0.996 0.997 0.998 0.999 1 1.001 1.002 1.003 1.004 -4 -3 -2 -1 0 1 2 3 4 Follower TF-IDF Similarity days (a) 0.996 0.997 0.998 0.999 1 1.001 1.002 1.003 1.004 1.005 -4 -3 -2 -1 0 1 2 3 4 Follower TF-IDF Similarity days (b) Follo wer T w eet Similarity 0.998 0.9985 0.999 0.9995 1 1.0005 1.001 1.0015 1.002 1.0025 -4 -3 -2 -1 0 1 2 3 4 Follower Tweet Coherence days (c) 0.994 0.996 0.998 1 1.002 1.004 1.006 -4 -3 -2 -1 0 1 2 3 4 Follower Tweet Coherence days (d) Follo wer T weet Coherence 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 -4 -3 -2 -1 0 1 2 3 4 Ego Connected Components days (e) 0.98 1 1.02 1.04 1.06 1.08 1.1 1.12 -4 -3 -2 -1 0 1 2 3 4 Ego Connected Components days (f) Follo wer W eakly Connected Components 1 1.01 1.02 1.03 1.04 1.05 1.06 -4 -3 -2 -1 0 1 2 3 4 Ego Network Edge Density days (g) 0.98 1 1.02 1.04 1.06 1.08 1.1 1.12 1.14 1.16 -4 -3 -2 -1 0 1 2 3 4 Ego Network Edge Density days (h) Follo wer Edge Density Figure 5: Metrics of user’ s follo wer ego-network around the time of a retwee t-follo w and a tweet-unf ollow b urst. Fo r 4 days preceding and succeeding a burst, we plot th e a verage value relati ve to the metric at the moment of the burst. For exa m- ple, if at day = -2, the metric is 1.02, that means on a verag e 48 hours b ef ore the burst the metric was 2% gr eater than wh at is was on t he day of th e burst. Overall, bursts tend to increase the coher ence of the local network in both th e interests of the connected users as well as the number of mutual connections between them. represents what fraction of potential following relationships actu- ally exist. If t his value i s l o w , then a user’ s follo wers tend not to follo w each other , whereas a high v alue would indicate a well con- nected ego-netw ork. In general, the e go-edge density for followers decreases, lar gely because users’ ego-n etworks g row in the numbe r of nodes ov er time. W e find that during both retweet-follo w bursts and tweet-unfollo w bursts, the density decreases significantly faster than when no b ursts Metric ∆ Durin g Burst Retweet-Follo w Burst ∆ During Spike T weet-Unfollo w Burst T weet similarity 25.5% 11.8% Components 1737% 399.8% Edge density -4199% -1467 % T abl e 1: How much does the rate of change of each metric accelerate/decelerate durin g a burst? T weet similarity slowly increases over time, but this increase is accelerated by 25.5% on av erage when a u ser is undergoing a retweet-f ollo w burst. The number of connected components also sharply increases. Edge Density is slowly decreasing , but during a retweet-f ollow burst this ra te of decrease is 4,199% faster . occur . As sho wn in T able 1 the density decreases 41.1 times faster for retweet-follow and 14.6 times faster for tweet-unfollow burst when compared to the period of no bursts. Even more interesting is the change in ego-network density in the days arou nd the burst. F igures 5(g) and 5(h) plot the relativ e change in density for days before and after a burst. Before either type of a burst, there is a steady decrease in density . For t he days aft er the burst, howe ver , we observ e interesting beh avior . For the retweet- follo wer burst, the density actually remains nearly constant, while for the tweet-unfollow burst, the density actually increases, effec- tiv ely as fa st as it decreased before the burst. W e explain the two observ ations as follows: during either type of a burst, there i s a large change in the population of users in the ego-n etwork, which results in a decrease in edge density . In some sense, the burst is a shock to the structure of the ego -network. In the days after the burst, ho wev er , the ego-ne twork be gins t o densify as follo wers be- gin to connect to each other . After a b urst, tweet si milarity of the ego-n etwork increases (Figures 5(a), 5(b)). As this happens, t he follo wers themselve s become more related to each other . W ith highly related users close to each other in the network, they are more likely to discov er and follo w each other, thereby increasing the density of the ego-netw ork. 2.6 T weet content and bursts No w that we ha ve a better understanding of the network effects of a bursts, we address the content of the information causing the bursts. W e ask the question whether there are certain types of con- tent that are more likely to cause a b urst in new follo wers. T o stud y this question, we iterated across all instan ces of retweet bursts and extracted the text of the t weet creating the burst. For each token that occurred in these tweets, we measured whether or not the p resence of the tok en increased or decreased the probability that the t weet w ould cau se a bu rst in n ew follo wers. W e filtered out tokens which were present i n less than 10 tweets. W e then identi- fied all tokens that violated the null hyp othesis of having no ef fect on ne w foll o w b urst probability , using the Pearson χ 2 test with 95% confidence. Note that all these tweets caused a burst of retweets, but only a fraction of them lead to a burst in new followers. These tokens thus had a statistically significant effect on whether or not the tweet would cause a b urst in ne w follo wers. W e then ran k these tokens by the ratio R ( tok i ) = P r ( new follo wer bu rst occurs | tok i in tweet ) P r ( new follo wer burst occurs | tok i not in tweet ) for all tokens t ok i . This ratio quantifies how much the presence of a particular token within a tweet will increase the chances of a follo wer burst. All tweets in our dataset were created in November of 2011, at T oken Prob . Ratio of ficer 2.9082 of ficers 2.5901 #n30 2.5655 #occup yphilly 2.5599 LAPD 2.49 42 solidarity 2.4847 e viction 2.4675 riot 2.2935 protestors 2.1301 @occup yla 2.1290 police 2.0845 #nov 30 2.0406 arrest 2.0179 cops 1.9983 #o ws 1.5879 protesters 1.5278 T abl e 2: M any of the top 100 tokens that h ad the most rela- tive incr ease in the probability of causing a f ollower burst were associated with the Occupy W all St reet mov ement. which time the “Occup y W all Street” mov ement was less than two months old. Occupy W all Str eet was a protest movemen t against income ineq uality , and on se v eral occa sions the protests led to con - flicts with po lice. Of the top 10 0 tok ens with the largest probab ility increase ratio, at least 16 of them are associated with this move- ment. T hese 16 token s are listed in T able 2. Additionally , a tweet containing the top token “of ficer” is almost thr ee t imes more like ly to cause a new follo w burst than i f the tweet did not contain the token. On the other hand, we find t hat there are two types of ev ents that generally cause large unfollo w bursts. As expected , t he first is when tweets include content that is either of fensi ve (such as ob- scenities) or is spam. For e xample, tok ens such as “free”, “sale”, and “do wnload” all increase the probability of an unfollo w burst. Interestingly , the second is when sports stars chang e teams. In our data we observed sev eral instances when a professional sportsman would announce switching his team, which in turn would lead to a huge shift in his follo wer base: supporters of his old team wo uld unfollo w , while supp orters of the new team would create new fol- lo wer links. 3. MODELING FOLLO WER BURSTS So far , we ha ve seen how i nformation dif fusion e ven ts such as large retweet cascades can cause sudde n b ursts in network dyn am- ics. T o better understan d t he ef fect of bursts, we deve lop a model that not only explains these observation s but can predict them as well. Therefore, our goal is to dev elop a model that, giv en a set of retweet bursts, predicts which bursts will cause a spike of ne w fol- lo wers. The model we present here doe s not consider the content of the tweets b ut still performs well i n practice. The proce ss that inspires our model i s not applicable to bu rsts in tweets leading to bursts in unfollo ws. Most new foll o ws are the result of a triadic closure: if user i creates a ne w follow edge to another user k , then often there ex- ists at least one “intermediate” follower j (as illustrated in Fi g- ure 2(right)). In f act, across all ne w follo ws in our dataset, the a ver - age directed shortest path between the users just before the creation of the n ew follo w edge is 2 . 036 ± 0 . 007 . (If all ne w follo ws w ould be a result of a triadic closure then shortest path before the follo w edge creation would be 2.) In this light, we can safely assume that rarely r etweet often retweet more co mpatible less compatible likely to f ollow during bursts unlikely to f ollow during bursts L 1 ( i ) L 2 ( i ) R 1 ( i ) R 2 ( i ) i Figure 6: An example of a user i who would ha ve a high prob- ability of experiencing a burst. There is a subset of users in N 2 ( i ) who hav e a high tweet similarity with i and th us would be more compatible followers if they were exposed to her tweets, but the users in N 1 ( i ) th at they follow do not retweet so this exposure has not happened. Th e burst would occur when t his subset finall y sees a retweet during a retweet burst and they subsequentl y f ollow i . On th e other h and, the other u sers in N 2 ( i ) hav e most likely already seen i ’ s t weets, and so they are less likely to f orm a new fo llow during a burst. all of a user’ s ne w follows come from users who already follo w o ne of h er follo wers (bu t d o not directly follo wer her already). W e refer to this set as a user’ s 2-hop neighborhood , and we call the user’ s current set of followe rs the 1-hop neighborhoo d . No w , in order to predict the occurrence of follo w bu rsts, we need to model the probability of a user transitioning from the 2- hop neighborhood to the one 1-hop neighborho od during a large information diffusion e vent. A retweet follo w b urst is defined as a sudd en increase in new fol- lo ws to a user . These new follo ws would not normally occur dur - ing steady-state beha vior , but instead represent a set of poten tial- follo wers discovering the user for the first time, thanks to an infor- mation diffusion ev ent. Therefore, the key to predicting a retweet- follo w b urst is identifying a se t of similar users who would no t nor- mally be expo sed to the user but respond to the burst in retweets. Figure 6 illustrates the process. Consider the case whe n user i is expe riencing a retweet-burst b y her tweet being propagated through the network . T hen there is a set L 2 ( i ) of 2-hop neigbh ors of i that have very compatible interests with i . H o wev er , users in L 2 ( i ) only have access to i t hrough a set of 1-hop neighbors L 1 ( i ) who rarely retweet i . Thus, users in L 2 ( i ) are unlikely to kno w about i as they have not b een e xposed t o i . Users in L 2 ( i ) are unique to other 2-hop neighb ors (e.g., R 2 ( i ) ) that either ha ve incompatible interests with i or have already been expo sed t o i ’ s tweets but ha ve chosen not to follow he r . A burst of new follows then happens when users in L 2 ( i ) are expo sed to i ’ s c ontent for the first time (via a retweet cascade), and as a result they are likely to follo w user i . Next, we use this intuition to deve lop the follo wer burst predic- tion model, but first we must quantify what it means for two users to hav e compatible interests. T weet similarity driv es probability of new fo llow . As discussed in the previou s section, new follows tend to increase the average 0 5 10 15 20 25 30 10 -6 10 -5 10 -4 10 -3 10 -2 10 -1 10 0 Distribution Tweet Similarity d in =4,170 d in =4,344 d in =6,075 d in =2,582 Figure 7: Th e d istribution of follower tw eet similarity f or sev- eral different users. T he number of f ollowers for each user is listed in the legend. Even f or users with compara ble n umber of fo llowers, the v ariability in the distribution of f ollower tweet similarity is significant. tweet similarity between a user and her followers, and t his is es- pecially the case during a retweet-follo w burst. W e take the tweet similarity between two users as a simple but effecti ve measure of ho w compatible their interests are, and t hus ho w compatible they would be as followers of each other . T hus, we expec t t he probabil- ity of user i foll o wing user k to be high, if the ne wly created edge i → k would increase the ov erall tweet similari ty of k ’ s follo wers. Increasing the ave rage follower tweet similari ty does not have the same effect on all the users. For ex ample, T witt er user @cnnb rk (CNN Breaking Ne ws) might be follo wed b y a wide range of users who simply w ant to stay inform ed about ne ws e vents. On the other hand followers of @espn (ESP N News) will likely hav e a more narro w interest in sports. Therefore, a new follo wer of @cnnbrk has to hav e a lower abso lute tweet similarity in order to increase the av erage tweet simil arity of @cnnbrk’ s foll o wers. On the other hand, a ne w follower wou ld ha ve to be very interested in sports in order to in crease the av erage tweet similarity of @espn’ s follo wers. Figure 7 confirms this intuition by plotti ng the distribution of tweet similarities of followers for sev eral differen t users. Notice a high variab ility in the distribution of follower tweets for indi vidual users. In order to re liably mod el the prob ability of a n ew follo w we no w account for this variability . Comparing similarity acr oss all u sers. W e take advan tage of the fact that the distribution of followe r t weet similarity foll o ws a log- normal distri bution for each user . As illustrated in Figure 7 the tweet similarit y S ( i, j ) between user i and her follower j follows a log-normal distribution: ln [ S ( i, j )] ∼ N µ i , σ 2 i . And so it follo ws for all users i Y ij ≡ ln [ S ( i, j )] − µ i σ i ∼ N (0 , 1) with µ i = 1 | N 1 ( i ) | X k ∈ N 1 ( i ) ln [ S ( i, j )] σ 2 i = 1 | N 1 ( i ) | X k ∈ N 1 ( i ) (ln [ S ( i, k )] − µ i ) 2 . 0 0.0001 0.0002 0.0003 0.0004 0.0005 0.0006 0.0007 0.0008 0.0009 -2.5 -2 -1.5 -1 -0.5 0 0.5 1 1.5 Prob. j follows i Y ij Fitted model Empirical Figure 8: The normalized log-tweet simil arity between two users i and j plotted against the probability that j will f ollow i , giv en that i recently tweeted. where the set N 1 ( i ) is the current 1-hop neighborhood of user i . No w , the v alue Y ij quantifies where in the distribution of follo wer tweet similarity user j would be if she chose to follow user i . If Y ij = 0 , then j ’ s similarit y to i is equal to the av erage similarity of i ’ s follo wers. Ho we ver , the sign as well as the magnitude of Y ij quantify how much more/less similar j is when compared t o an “av - erage” follower of i . This way Y ij is normalized and comp arable across all users. No w , finding the prob ability of a ne w follo w a s a function of Y ij is a matter of empirical observ ation. Figure 8 plots Y ij against the av erage prob ability of j follo wing i within 3 d ays o f i tweeting. W e observ e (and a likelihood test confirms) that P ( j follo ws i | Y ij ) is an exponential function of Y ij . Plotted along with the empirical observ ation of P ( j → i | Y ij ) is the fitted curv e: ˆ P j,i ≡ C · exp ( α · Y ij ) = C · S ( i, j ) exp( µ i ) α/σ i (1) where C = 3 . 32 × 10 − 4 and α = 0 . 6445 were set using maximum likelihood. The abov e result says that the probability of a user moving from the 2-hop neigh borhood to the 1-ho p neighborhood ( i.e. , forming a ne w follow edge) increases expo nentially with normalized tweet similarity Y ij . Moreove r , the result also explains why a user’ s av- erage follower tweet similarity tends to increase over time: A po- tential new follower j , who is more similar to i than i ’ s current follo wers, is almost an order of magnitude more l ikely to follow i than someone who is dissimilar . Predicting bursts. W ith ˆ P j,i gi ven by Equation (1), a possible wa y to predict th e total o f number of ne w follo ws a user recei v es during an information diffusion ev ent wo uld be to compute the exp ected number of ne w follows: X j ∈ N 2 ( i ) ˆ P j,i . where N 2 ( i ) is the 2-hop neighborhood of user i . This quantity , ho we ver , i s a poor predictor of follo w bursts. In fac t, the larger it is, the less l ikely a giv en retweet burst will prod uce a ne w follo w burst. The reason is that ˆ P j,i is the probability of a ne w follo w that is not conditioned on the occurrence of a retweet b urst. It quantifies the a rriv al of new follo ws during a steady-state, non-b urst beha vior . If, for example, ˆ P j,i is high and a retweet burst occurs, there is a high chance that user j has already made the decision whether or not to follo w i . A retweet-follow burst, on the other hand, is the interruption of steady-state network dynamics. A retweet-follow burst occurs when users i n N 2 ( i ) with high tweet compatibility t o i are expo sed to i ’ s tweet for the first time. A retweet-follo w b urst occurs when a sudden burst in retweets reaches a set of potential followers that are more compatible with the user than t he typical users who are usually reached. The more compatible the set of users reach ed during the retweet, and the less compatible the set of users that are normally reached, the more likely a bu rst is to occur . Let N RT ( i, [ t, t + ∆ t )) be the set of users in the 2-h op neighb orhood who follow someo ne tha t retweeted user i during the time interval [ t, t + ∆ t ) . I n other words, users in N RT ( i, [ t, t + ∆ t )) ha ve just been ex posed to a retweet of i ’ s tweet. No w , f or a g iv en retweet b urst that occurs between times t 0 and t 1 , we compute: P ( follow b urst for i | retweet burst ∈ [ t 0 , t 1 )) ∼ P j ∈ N RT ( i, [ t,t +∆ t ) ) ˆ P j,i P j ∈ N 2 ( i ) ˆ P j,i . (2) The above expression simply quantifies the relativ e fraction of ne w- follo w prob ability ˆ P j,i among i ’ s 2-hop neighbo rs that got e xposed to i ’ s retweet. No w , we can test ho w well Equation (2) can predict the occur- rence of follow bursts as a means of validating our analysis. In ad- dition to v ali dation, predicting th e appearanc e of new follo w bu rsts for a gi ven information dif fusion ev ent is potentially very usefu l. As gaining more followers is an objectiv e of man y T witter users, Equation 2 can identify where undiscovered ne w poten tial follow- ers are i n the network, and which of the user’ s current followe rs need to retweet her in order for her to obtain ne w followers. 3.1 T esting the model T o quantify how successful the model is at predicting when a retweet burst will cause a follo w burst, we devised a si mple exper - iment. W e randomly selected 400K retweet b ursts, 21% of which were followed immediately by a ne w follo w burst for the user . W e were sure not to ov erlap samples with t he ones used to fit Equa- tion (1). Then Equation ( 2) was used t o rank the retweet bursts in order of most likely to be succeeded by a new follow burst. The highest ranked burst w as con sidered our “fi rst guess” to be a retweet-follow burst, the second highes t ranked burst was our sec- ond guess, and so on. This sequence produced a precision-reca ll curve, and we calculated the area under the precision-reca ll curve ( AU C ) as a measure of the performance of the mod el. If the model ranks all retweet-follow b ursts as most likely , then AU C = 1 ; Baselines. W e compared the model’ s performance against a series of baselines. For each baseline, we used a different property of the retweet b urst or a property of the user who is experiencing the burst. Each baseline provides a method of ranking the most likely ne w follow bursts. For each baseline as well as for our model we compute th e area under the precision-recall curve ( AU C ) . W e c on- sider the followin g baselines: • Number of retweet exposures: If a user follo ws someone who retweeted the user as part of th e retweet b urst, then the y hav e been e xposed to the user’ s tweets. T he retweet bursts are ranked according to the number of such 2-hop neigh- bors. The more 2-hop neighb ors that are exposed to the Ranking Method AU C Our Model 0.518611 Number of retweet expo sures 0.3818 Number of retweets 0.3340 Number of followers 0.2213 Random 0.2118 T abl e 3: Predicting whi ch retweet bursts w ill cause a subse- quent burst in n ew followe rs. Each retweet burst is ranked ac- cording to assigned probability of becoming a retweet-fo llow burst, and the area und er the precision-recall curve of th e re- sulting list is calculated. Our model outperf orms all th e base- lines by a significant mar gin. user’ s tweets, the more opp ortunities for new f ollo ws to oc- cur . As such, this is expected to be t he most powerful base- line. • Number of retweets: The retweet bursts are ranked accord- ing to the raw number of retweets the user recei ved during the burst. • Number of f ollowers: The retweet bu rsts for users with the largest n umber of previou s follow bu rsts as ranked first. • Random: The retweet bursts are sorted rando mly . 3.2 Results and observ ations The results of the exp eriment are shown in T able 3. Our model outperforms each of t he baselines by a significant margin, with a AU C score of 0.519 compared to the best baseline score of 0.382. Ranking based on the number of retweet expo sures performed the best out of the baselines, while using the number of followers the user has before the burst did mar ginally better than random. User v isibility is l ess impor tant. The poor performance of the fo l- lo wer baseline i s surprising. In [28], for example, authors found that a user’ s follower count is highly predicti ve of future follows. This may be true in terms of the raw number of new follows a user recei ves; as we ha ve sho wn, the number of ne w follo ws scales proportionally with increasing follower count. Howe v er , in terms of predicting bursts in new follow arriv als, the current follower count does not perform well. In some sense, highly r etweeted high - degree users are less primed to experience a new follower burst. This is because a large number of potential compatible follo wers has already been e xposed t o user’ s tweets. This implies that lower degree u sers are in fact mo re susceptible to ne w follo wer bursts. In other words, anybody could potentially exp erience a burst in ne w follo w arriv als if the circumstances permit. Retweets ar e not enough. Using retweets to predict follo w bursts is more useful than the indegree of a user . It is not enough to hav e a high degree, but rather a user must also participate in i nforma- tion d iffusion e ven ts in o rder to exp erience a ne w follo w bu rst. The improv ement of the Retweet Baseline is also partially due to the fact that the intensity of retweet bu rsts tends to correlate with t he subsequen t change in new follows arriv als. Fi gure 9 illustrates the phenomen a by c orrelating the magnitude of the retweet b urst (mea- sured in standard d ev iations from the e xpected n umber of retweets) with the magnitude of a follower bu rst. W e observe that the more intense the retweet burst a user expe riences, the more intense the resulting follo wer burst. Despite these correlations, the numbe r of retweets experience d during the bu rst is still not as succe ssful of an indicator as the num- ber of retweet exposures. In other words, whil e being r etweeted 0 0.2 0.4 0.6 0.8 1 1.2 1.4 1.6 1.8 2 2.2 0 1 2 3 4 5 6 7 Standard Dev. of New Follow Standard Dev. of Retweet Figure 9: W e iterated acro ss all high degree users, across ev- ery hou r of t he month in which a single retweet occurr ed. The number of standard deviations from the expected value for retweets is plotted against t he standard deviations f or n ew f ol- lows during the next hour . frequently might lead to a b urst in ne w follo ws, it is fa r more ef fec- tiv e to be retweeted by users with large follo wings themselve s. The main take-aw ay fr om the success of our model is what con- ditions are most ideal for the occurrence of a retweet burst. A user is most likely to exp erience a burst when her tweets are expo sed t o a large set of users wh o are compatible follo wers (in terms of tweet similarity) but are only discov ering her for the first time. Users who are retweeted reg ularly t end not to show bursty behavior but instead enjoy a steady -state arri v al o f ne w follo ws. This also means that the users who are most poised for ne w follow b ursts are ones who ha ve a larg e portion of nearby similar users that are yet to be discov ered. 4. RELA TED WORK There have been many works that focus exclusi vely on the dy- namics of networks. Some of these works hav e modeled various aspects of network e volution over time [2, 18, 22 ]. More recently , research has focused on predicting local changes in the network, such as the addition of specific edges between nodes [3, 14]. In [20, 33], the authors pred icted the deletion of e dges between users. In the prediction of both edge creation and deletion, many features were fo und to be useful, including information diffusion-based fea- tures. Similarly , the diffusion of content throug h social networks and media continues to be an active field of research [10, 11, 24]. Many works hav e focused on the differen t factors that affect the spread of information, inclu ding the netw ork structure [1 6, 27, 31 ], influence between members of the social network [4, 5 , 7], out-of- network influenc es such as other forms of media [9, 26 ], c ompeting pieces of information [15, 25], and the nature of the content itself [6, 13, 19, 30]. Recently , there hav e been sev eral works that address the effect of information diffusion on network dynamics, and we consider these works to be most closely related to our own. In [32], the au- thors used a mixing of differe nt edge creation null mode ls to find that information diffusion motiv ates about 12% of all new edges formed. Al so, [28] used autoreg ression to sho w ho w various net- work and information dif fusion properties were correlated in time. In [1], the autho rs predicted instances of retw eet events leading to the formation of ne w edges. Our work, howe ver , focuses on bu rsts of new follows and unfollo ws. W e observe that the arriv al of ne w follo ws as well as unfollo ws is steady for many users, e ven those with no tweeting behavior . Moreo ver , we identify that the inter- play between information diffu sion and netwo rk dynamics lies in the sudden interruption of these steady arriv als. 5. CONCLUSION In this paper we explored how the burst in network ev olution can be a result of information spreading through the netw ork. W e observ ed burst-lik e behavior in the network dynamics, as users re- cei ve sudden influxes of edge creation or deletion e vents. W e es- tablished that such b ursts are caused by lar ge in formation diffu sion e vents. Lastly , we d ev eloped a model that can not only predict with high le v el of accurac y whe ther a dif fusion e vent will trigger a sp ike in graph dy namics, but it also provide s insight into what cau ses the occurrence of spikes . Potential avenue s of future work include a more detailed study of unfollo w spikes as well as examining what aspects of user com- patibility is important in creating bursts of new f ollo wers. Further analysis of the content and the ef fect of dif ferent topics on netw ork dynamics has the potential to lead to new insights. Additionally , extend ing the model to incorporate the content of each tweet as well would also be interesting. Acknowledgements. This research has been supp orted in part by National Science Foundation grants CNS-1010921, IIS -1016909 , IIS-1149837, IIS-1159679 , as well as D ARP A SMISC, Albert Y u & Mary Bechmann Foundation, Boeing, Allyes, Samsung, Intel, Alfred P . Sloan Fello wship and the Microsoft Faculty F ello wship. W e w ould also like t o thank T witter, Inc. for gi ving us access to the data. 6. REFERE NCES [1] D. Antoniades and C. Dovro lis. Co-ev olutionary dynamics in social networks: A case study of T witter. http://arxiv .or g/abs/13 09.6001 , 2013. [2] L. Backstrom, D. Huttenlocher , J. Kleinber g, and X. Lan. Group formation in large so cial networks: membe rship, gro wth, and ev olution. In KDD , 2006. [3] L. Backstrom and J. Lesko ve c. S upervised random walks: predicting and recommendin g links in social netwo rks. In WSDM , 2011. [4] E. Bakshy , J. M. Hofman, W . A. Mason, and D. J. W atts. Everyon e’ s an influencer: quantifying influence on twitt er . In WSDM , 2011. [5] E. Bakshy , B. Karrer, an d L. A. Adamic. Social influence and the diffu sion of user-created content. In CEC , 2009. [6] J. Berger an d K. Milkman. Social transmission, emotion, and the virality of online content. http://dx.doi.or g/10.21 39/ssrn.15280 77 , 200 9. [7] M. Cha, H. Haddadi, F . Bene ven uto, and P . K. Gummadi. Measuring user influence in twitter: The milli on follo wer fallacy . In I CWSM , 2010. [8] J. Cheng, L. Adamic, D. Alex, J. Kleinber g, and J. Lesko vec. Can cascades be predicted? In WWW , 2014. [9] R. Crane and D. Sornette. Robust dynamic classes rev ealed by measuring the response function of a social system. PNAS , 2 008. [10] P . A. Do w , L . A. Adamic, and A. Friggeri. The anatomy of large fa cebook cascades. In ICWSM , 2013. [11] S . Goel, D. J. W atts, and D. G. Goldstein. The structure of online dif fusion networks. In EC , 2012. [12] D. Gruhl, R . V . Guha, D. Liben-No well, and A. T omkins. Information diffusion through blogspace. In WWW , 2004. [13] L . Hong, O. Dan, and B. D. Davison. Predicting pop ular messages in twitter . In WWW , 2011. [14] C . Hutto, S. Y ardi, and E. Gilbert. A longitudinal study of follo w predictors on twitter . In CHI , 2013 . [15] S . Judd, M. K earns, and Y . V oro beychik. Behav ioral dynamics and influence in network ed coloring and consensus. PNAS , 2010. [16] D. Kempe, J. M. Kleinberg, and E. T ardos. Maximizing the spread of inSSuence through a social network. In KDD , 2003. [17] R . Kuma r , M. Mahdian, and M. McGlohon. Dynamics of con v ersations. In KDD , 2010. [18] R . Kuma r , J. Nov ak, and A. T omkins. Structure and e volution of online social netwo rks. In P . S. Y u, J. Han, and C. Faloutsos, editors, Link Mining : Models, Algorithms, and Applications , pages 337–35 7. Springer Ne w Y ork, 2010. [19] N. Kunegis, T . Gottron, J. Kune gis, and A. C. Alhadi. Bad ne ws travel f ast: A content-based analysis of interestingness on twitter . In W ebSci , 2011. [20] H. Kwak, S . Mo on, and W . Lee. More of a receiv er than a gi ver: Why do people unfollo w in twit ter? In ICWSM , 2012. [21] J. Lesko ve c, L. Adamic, and B. Huberman. The dynamics of viral marketing. A CM T ra nsactions on the W eb , 1(1), May 2007. [22] J. Lesko ve c, L. Backstrom, R. Kumar , and A. T omkins. Microscopic e volution of social netwo rks. In KDD , 2008. [23] J. Lesko ve c, J. Kleinberg, and C. Faloutsos. Graph s ove r time: densification l aws, sh rinking diameters and possible explan ations. In KDD , 2005. [24] J. Lesko ve c, M. McGlohon, C. Faloutsos, N. Glance, and M. Hurst. Cascading beha vior in large blog graphs. In SDM , 2007. [25] S . A. Myers and J. Lesko vec. Clash of the contag ions: Cooperation and competition in information diffusio n. In ICDM , 2012. [26] S . A. Myers, C. Zhu, and J. Lesko vec. Information dif fusion and external influence in netw orks. In KDD , 2012. [27] D. M. Romero, C. T an, and J. Kleinber g. On the interplay between social and topical structure. In ICWSM , 2013. [28] P . Singer , C. W agner , and M. Strohmaier . Factors influencing the co-e volution of social and content netw orks in online social media. In Modeling and Mining Ubiquitous Social Media . 2012. [29] G. S zabo and B. A. Huberman. Predicting the popularity of online content. Communications of the ACM , 5 3(8):80–88, 2010. [30] O. Tsur and A. Rappo port. W hat’ s in a hashtag?: con tent based prediction of the spread of ideas in microblogging communities. In WSDM , 2012. [31] J. Ugander , L . Backstrom, C. Marlo w , and J. Kleinberg. Structural div ersity in social contagion. PNAS , 2012. [32] L . W eng, J. Ratkie wicz, N. Perra, B. Gonà ˘ galves, C. Castillo, F . Bonchi, R. Schifanella, F . Mencze r , and A. Flammini. The role of information diffusion in the e volution of social network s. In KDD , 2013. [33] B . Xu, Y . Huang, H. Kwak , and N. Contractor . Structures of broke n ties: exp loring unfollow beha vior on twitter . In CSCW , 2013.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment