Smart broadcasting: Do you want to be seen?

Smart broadcasting: Do y ou w an t to b e seen? Mohammad Reza Karimi ∗ 1 , Erfan T a v ak oli ∗ 1 , Mehrdad F ara jtabar 2 , Le Song 2 , and Man uel Gomez-Ro driguez 3 1 Sharif Univ ersity , mk arimi@ce.sharif.edu, erfan.tav akoli71@gmail.com 2 Georgia Institute of T ec hnology , mehrdad@gatech.edu, lsong@cc.gatec h.edu 3 Max Planc k Institute for Softw are Systems, man uelgr@mpi-sws.org Abstract Man y users in online so cial netw orks are constantly trying to gain attention from their follow ers b y br o adc asting p osts to them. These broadcasters are likely to gain greater attention if their p osts can remain visible for a longer perio d of time among their follow ers’ most recen t feeds. Then when to p ost? In this paper, w e study the problem of smart broadcasting using the framework of temporal p oin t processes, where we model users feeds and p osts as discrete ev ents o ccurring in contin uous time. Based on such con tinuous-time mo del, then choosing a broadcasting strategy for a user b ecomes a problem of designing the conditional in tensity of her posting even ts. W e derive a no vel form ula which links this conditional in tensity with the “visibility” of the user in her follow ers’ feeds. F urthermore, by exploiting this formula, w e develop an eﬃcien t conv ex optimization framework for the “when-to-p ost” problem. Our metho d can ﬁnd broadcasting strategies that reach a desired “visibility” level with prov able guarantees. W e exp erimen ted with data gathered from Twitter, and show that our framework can consistently mak e broadcasters’ p ost more visible than alternatives. 1 In tro duction The p opularization of so cial media and online social net working has empow ered political parties, small and large corp orations, celebrities, as w ell as ordinary p eople, with a platform to build, reach and broadcast information to their own audience. F or example, p olitical leaders use so cial media to present their character and p ersonalize their message in hop es of tapping younger v oters 1 ; corp orations increasingly rely on social media for a v ariety of tasks, from brand aw areness to marketing and customer service [7]; celebrities leverage so cial media to bring aw areness to themselves and strengthen their fans’ loy alt y 2 ; and, ordinary p eople p ost ab out their liv es and express their opinions to gain recognition from a mix of close friends and acquain tances 3 . Ho wev er, so cial media users often fol low h undreds of br o adc asters , and they often receiv e information at a rate far higher than their cognitive abilities to pro cess it [13]. This also means that many broadcasters actually share quite a p ortion of their follow ers, and they are constantly comp eting for attention from these follo wers. In this con text, these follo w ers’ atten tion becomes a scarce commo dit y of great v alue [8], and broadcasters w ould lik e to consume a goo d share of it so that their p osted conten ts are notic e d and possibly lik ed or shared. As a consequence, there are m yriads of articles and blog en tries about the b est times to broadcast information in so cial media and so cial net working, as w ell as data analytics tools to ﬁnd these times 4 5 . How ev er, the b est ∗ Authors con tributed equally . This work was done during the authors’ internships at Max Planc k for Soft ware Systems. 1 http://www.nytimes.com/2012/10/08/technology/campaigns- use- social- media- to- lure- younger- voters.html 2 http://www.wsj.com/articles/what- celebrities- can- teach- companies- about- social- media- 1444788220 3 http://www.pewinternet.org/topics/social- networking/ 4 http://www.huffingtonpost.com/catriona- pollard/the- best- times- to- post- on_b_6990376.html 5 http://blog.klout.com/2015/07/whens- the- best- time- to- post- on- social/ 1 time to post on so cial media dep ends on a v ariet y of factors, often sp eciﬁc to the broadcaster in question, suc h as their follow ers’ daily and w eekly b eha vior patterns, their lo cation or timezone, and the n umber of broadcasters and v olume of information comp eting for their attention in these follow ers’ feeds (b e it in the form of a Twitter user’s timeline, a F aceb ook user’s wall or an Instagram user’s feed). Therefore, the problem of ﬁnding the b est times to broadcast messages and elicit atten tion (b e it views, lik es or shares), in short, the when-to-p ost problem, requires careful reasoning and smart algorithms, whic h ha ve been largely inexistent un til v ery recen tly [19]. In this pap er, we develop a no vel framework for the when-to-p ost problem, where w e measure the gained atten tion or visibility of a broadcaster as the time that at least one post from her is among the most recen t k received stories in her follow ers’ feed. A desirable property of this time based visibilit y measure is that it is easy to estimate from real data. In order to measure the ac hieved visibility for a particular deplo yed broadcasting strategy , one only need to use a separate held-out set of the follo wers’ feeds, indep enden tly of the broadcasted conten t. This is in contrast to other measures based on, e.g. , the num b er of likes or shares caused b y a broadcasting strategy . These latter measures are diﬃcult to estimate from real data and often require actual in terven tions, since they depend on other confounding factors such as the follow er’s reaction to the p ost con tent [6], whose eﬀect is diﬃcult to mo del accurately [5]. More sp eciﬁcally , w e will mo del users’ feeds and posts as discrete ev ents o ccurring in con tinuous time using the framework of temp oral point pro cesses. Our mo del explicitly characterize the con tinuous time in terv al b et w een posts b y means of conditional intensit y functions [1]. Based on such con tinuous-time mo del, then c ho osing a strategy for a broadcaster b ecomes a problem of designing the conditional intensit y of her p osting ev ents. W e derive a nov el form ula whic h can link the conditional in tensity of an arbitr ary broadcaster with her visibilit y in her follo wers’ feeds. Interestingly , w e can show that the av erage visibilit y is concav e in the space of (piece-wise) smo oth intensit y functions. Based on this result, we prop ose a conv ex optimization framew ork to address a div erse range of visibilit y shaping tasks giv en budget constraints. Our framew ork allo ws us to conduct ﬁne-grained control of a broadcaster’s visibilit y across her follow ers. F or instance, our framew ork can steer the visibility in such a w ay that some time interv als are fav ored ov er others, e.g. , times when the broadcasters’ followers are on-line. In addition to the nov el framework, we dev elop an eﬃcient gradien t based optimization algorithm, which allows us to ﬁnd optimal broadcast intensities for a v ariet y of visibilit y shaping tasks in a matter of milliseconds. Finally , we exp erimen ted on a large real-world dataset gathered from Twitter dataset, and show that our framew ork can consisten tly mak e broadcasters’ p osts more visible than alternativ es. Related work. The work most closely related to ours is by Spaso jevic et al. [19], who in tro duced the when-to-p ost problem. In their work, they ﬁrst p erform an empirical study on the b est times to p ost in Twitter and F aceb ook by analyzing more than a billion messages and resp onses. Then, they design sev eral heuristics to (independently) pinp oin t at the times that elicited the greatest n umber of resp onses in a training set and then sho w that these times also lead to more resp onses in a held-out set. In our work, w e measure atten tion by means of visibility , a measure that is not confounded with the message con tent and can b e accurately ev aluated on a held-out set, and then develop a con vex optimization framew ork to design complete broadcasting strategies that are pr ovably optimal. There ha ve b een an increasing n umber of empirical studies on understanding attention and information o verload on so cial and information netw orks [2, 14, 16, 13]. The common theme is to in vestigate whether there is a limit on the amoun t of ties ( e.g. , friends, follo wees or phone contacts) p eople can maintain, ho w p eople distribute attention across them, and how atten tion inﬂuences the propagation of information. In con trast, in this work, we fo cus on optimizing a so cial media user’s broadcasting strategy to capture the greatest attention from their follow ers. Our work also relates to the inﬂuence maximization problem, extensively studied in recent years [18, 15, 4, 10], which aims to ﬁnd a set of nodes in a so cial net work whose initial adoptions of certain idea or pro duct can trigger the largest exp ected n umber of follo w-ups. In this line of work, the goal is ﬁnding these inﬂuen tial users but not to ﬁnd the b est times for these users to broadcast their messages, which is our goal here. Only very recently , F ara jtabar et al. [11] ha v e dev elop ed a conv ex optimization framework to ﬁnd broadcasting strategies, ho wev er, their fo cus is on steering the o verall activity in the net work to a certain 2 state b y incentivizing a few inﬂuential users, in contrast, we focus on maximizing visibility as measured on a broadcaster’s audience’s feeds. Finally , the framework of temp oral p oin t pro cesses, whic h our w ork builds up on, has b een increasingly used to mo del a wide range of phenomena in social media and social netw orking sites, e.g. , from social inﬂuence [11], net work evolution [12], opinion dynamics [9] or pro duct comp etition [20]. 2 Bac kground on P oin t Pro cesses A temporal p oin t process is a stochastic process whose realization consists of a list of discrete even ts lo calized in time, { t i } with t i ∈ R + and i ∈ Z + . Many diﬀerent t yp es of data pro duced in online so cial netw orks can b e represented as temp oral p oin t pro cesses, such as the times of tw eets, retw eets or likes in Twitter. A temp oral p oin t pro cess can b e equiv alently represented as a counting pro cess, N ( t ), which records the n umber of even ts b efore time t . Then, in a inﬁnitesimally small time windo w dt around time t , the num b er of observed even t is dN ( t ) = X t i ∈H ( t ) δ ( t − t i ) dt, (1) and hence N ( t ) = R t 0 dN ( s ), where δ ( t ) is a Dirac delta function. It is often assumed that only one even t can happ en in a small windo w of size dt , and hence dN ( t ) ∈ { 0 , 1 } . An imp ortan t w ay to characterize temp oral p oin t pro cesses is via the in tensity function — the sto c hastic mo del for the time of the next ev ent given all the times of previous even ts. The in tensity function λ ( t ) (in tensity , for short) is the probability of observing an ev ent in a small window [ t, t + dt ), i.e. , λ ( t ) dt = P { ev ent in [ t, t + dt ) } . (2) Based on the intensit y , one can obtain the exp ectation of the num b er of ev ents in the windows [ t, t + dt ) and [0 , t ) resp ectiv ely as E [ dN ( t )] = λ ( t ) dt and E [ N ( t )] = Z t 0 λ ( τ ) dτ (3) There is a wide v ariety of functional forms for the intensit y λ ( t ) in the gro wing literature on so cial activity mo deling using point pro cesses, whic h are often designed to capture the phenomena of in terests. F or example, ret weets hav e b een mo deled using m ultidimensional Hawk es pro cesses [11, 22], new netw ork links ha ve b een predicted using surviv al processes [21, 12], and daily and w eekly v ariations on message broadcasting in tensities ha ve b een captured using inhomogeneous Poisson pro cesses [17]. In this w ork, since w e are in terested on optimizing message broadcasting intensities, w e use inhomogeneous P oisson pro cesses, whose intensit y is a time-v arying function λ ( t ) = g ( t ) > 0. 3 F rom In tensities to Visibility In this section, we will presen t our mo del for the posting times of broadcasters and the feed story arriv al times of follow ers using p oint processes parameterized by in tensity functions. Based on these mo dels, we will then deﬁne our visibilit y measure, and derive a nov el link b et ween the visibilit y measure and the intensit y functions of a broadcaster and her follo wers. Represen tation of broadcast and feed. Given a directed so cial net work G = ( V , E ) with m = |V | users, w e assume that each user can b e b oth broadcaster and follow er. Then, we will use tw o sets of coun ting pro cesses to mo deling each user’s activity , the ﬁrst set for the user’s broadcasting activity , and the second set for the user’s feed activity . More sp eciﬁcally , we represent the broadcasting times of the users as a set of counting pro cesses denoted b y a vector N ( t ), in which the u -th dimension, N u ( t ) ∈ { 0 } ∪ Z + , counts the num b er of messages user u 3 broadcasted up to but not including time t . Then, w e can characterize the message rate of these users using their corresp onding intensities E [ d N ( t )] = λ ( t ) dt. (4) F urthermore, giv en the adjacency matrix A ∈ { 0 , 1 } m × m corresp onding to the so cial netw ork G , where A uv = 1 indicates that v follo ws u , and A uv = 0 otherwise, we can represent the feed story arriv al times of the users as a sum of the set of broadcasting coun ting processes. That is M ( t ) = A T N ( t ) , (5) whic h essen tially aggregates for each user the counting pro cesses of the broadcasters follo wed b y this user. Then, we can c haracterize the feed rates using intensit y functions E [ d M ( t )] = γ ( t ) dt, (6) where γ ( t ) := A T λ ( t ) = ( γ 1 , . . . , γ m ) T . Finally , from the p erspective of a pair of broadcaster (or user) u and her follow er v , it is useful to deﬁne the feed rate of v due to other broadcasters (or users) follow ed by v as γ v \ u ( t ) := γ v ( t ) − λ u ( t ) , (7) where we assume γ v \ u ( t ) := 0 if v do es not follo w u , A uv = 0. Deﬁnition of Visibility . Consider a broadcaster u and her follow er v , and w e note that v ma y follo w man y other broadcasters other than u . Thus, at any time t , user v may see stories originated from m ultiple broadcasters. W e can mo del the times and origins of all these stories present in v ’s current feed as a ﬁrst-in-ﬁrst-out (FIFO) queue 6 of pairs H v ( t ) :=  ( t ( i ) , u ( i ) ) : t > t (1) > . . . > t ( I − 1) > t ( I ) , u ( i ) ∈ N − ( v )  , where · ( i ) denotes the i -th element in the queue, t ( i ) is the time when v receives a story from broadcaster u ( i ) , N − ( v ) denotes the set of broadcasters follo wed by v , and I is the length of the queue. The length I accoun ts for the fact that online so cial platforms typically set a maximum n umber of stories that can b e displa yed in the feed, e.g. , currently Twitter has I = 20. The FIF O queue is to model the fact that when a new story arriv es, the oldest story , ( t ( I ) , u ( I ) ), at the b ottom of the feed will be remov ed, and the ordering of the remaining stories will b e shifted do wn b y one slot, i.e. , i + 1 ← i, ∀ i = 1 , . . . , I − 1 and the newly arrived story will be appended to the beginning of the queue as t (1) and app ear at the top of the feed. F or simplicit y , we assume that the queue is alw ays full at the time of mo deling. In the list H v ( t ), w e keep track of the rank r uv ( t ) of the most recent story p osted b y the broadcaster u among all the stories receiv ed b y user v b y time t , i.e. , r uv ( t ) = min  i : u ( i ) = u  . (8) Then, given an observ ation time window [0 , T ], and a deterministic sequence of broadcasting even ts, we can deﬁne the deterministic visibility of broadcaster u at k with resp ect to follow er v as T uv ( k ) := Z T 0 I [ r uv ( t ) 6 k ] dt, (9) whic h is the amount of times that at least one story from broadcaster u is among the most recent k stories in user v ’s feed. 6 In this work, we assume the social netw ork sorts stories in eac h user’s feed in in verse chronological order. 4 Since the sequence of broadcasting ev ents are generated from stochastic pro cesses, we will consider the exp ected v alue of T uv ( k ) instead. If we ﬁrst denote the probability that at least one story from broadcaster u is among the k most recen t stories in follo wer v ’s feed as f uv ( t, k ) = P { r uv ( t ) 6 k } , (10) then the exp ected (or av erage) visibilit y V ( k ) can b e deﬁned as V uv ( k ) := E [ T uv ( k )] = Z T 0 f uv ( t, k ) dt, (11) giv en the integral is well-deﬁned. In some scenarios, one may like to fav or some perio ds of times ( e.g. , times in which the follow er is online), enco de such preference by means of a time signiﬁcance function s ( t ) > 0 and consider f uv ( t, k ) s ( t ) instead of just f uv ( t, k ). Note that the visibilit y V uv ( k ) is deﬁned for a pair of broadcaster u and her follo wer v giv en k . W e will fo cus our later exp osition on a particular of u and v , and omit the subscript · uv and simply use notation suc h as f k ( t ), V ( k ). Ho wev er, w e note that the c omputation of the visibility for a pair of users u and v ma y dep end on the broadcast and feed intensities of all users in the netw ork. Computation of Visibilit y . In this section, we derive an expression for the a verage visibilit y , giv en b y Eq. 11, using the broadcaster posting and follow er feed representation, given b y Eqs. 4-7. This link is crucial for the con vex visibility shaping framework in Section 5. Giv en a broadcaster u with λ u ( t ) = λ ( t ) and her follow er v with γ v ( t ) = γ ( t ) and γ v \ u ( t ) = µ ( t ), w e ﬁrst compute the probabilit y f 1 ( t ) that at least one message from the broadcaster is among the k = 1 most recen t ones received by v at time t . By deﬁnition, one can easily realize that f 1 ( t ) satisﬁes the following equation: f 1 ( t + dt ) = f 1 ( t ) (1 − µ ( t ) dt ) | {z } 1. Remains the most recent + (1 − f 1 ( t )) λ ( t ) dt, | {z } 2. Becomes the most recent (12) where each term models one of the tw o p ossible situations: 1. The most recen t message received b y follo wer v by time t w as p osted by broadcaster u ( w.p. f 1 ( t )) and none of the other broadcasters that v follows p osts a message in [ t, t + dt ] ( w.p. 1 − µ ( t ) dt ). 2. The most recent message receiv ed b y follow er v by time t was p osted by a diﬀerent broadcaster ( w.p. 1 − f 1 ( t )) and broadcaster u p osts a message in [ t, t + dt ] ( w.p. λ ( t ) dt ) which b ecomes the most recen t one. Then, b y rearranging terms a nd letting dt → 0, one ﬁnds that the probability satisﬁes the follo wing diﬀeren tial equation: f 0 1 ( t ) = − ( µ ( t ) + λ ( t )) f 1 ( t ) + λ ( t ) . (13) W e can proceed with the induction step for f k ( t ) with k > 1. In particular, by deﬁnition, f k ( t ) satisﬁes the following equation: f k ( t + dt ) = f k − 1 ( t ) | {z } 1. W as among k − 1 + ( f k ( t ) − f k − 1 ( t ))(1 − µ ( t ) dt ) | {z } 2. Remains on the k -th p osition + (1 − f k ( t )) λ ( t ) dt, | {z } 3. Becomes the most recent (14) where each term models one of the three possible situations: 1. The last message p osted by broadcaster u b y time t is among the most recent k − 1 ones receiv ed by follo wer v ( w.p. f k − 1 ( t )) and, indep endent of whether a message is p osted by any other broadcaster or not, this message will remain among the most recen t k at t + dt . 2. The last message posted by broadcaster u b y time t is the k -th one ( w.p. ( f k ( t ) − f k − 1 ( t ))) and none of the other broadcasters follo wed by v p osts a message in [ t, t + dt ] ( w.p. 1 − µ ( t ) dt ) 3. The last k messages received b y follo wer v by time t were posted by other broadcasters ( w.p. 1 − f k ( t )) and broadcaster u p osts a message in [ t, t + dt ] ( w.p. λ ( t ) dt ), b ecoming the most recent one. 5 𝑤 1 𝑤 2 𝑢 𝑣 𝑤 𝑚 𝑀 1 ( 𝑡 ) 𝑀 2 ( 𝑡 ) 𝑀 𝑚 ( 𝑡 ) 𝑀 ( 𝑡 ) 𝑁 ( 𝑡 ) Vis ibility Figure 1: The visibility shaping problem. A so cial media user u broadcast N u ( t ) messages at a rate λ u ( t ). Her messages accumulate in eac h of her follow ers’ feeds, which receiv es M v ( t ) messages at a rate γ v ( t ) = λ u ( t ) + γ v \ u ( t ), where γ v \ u ( t ) denotes the message rate due to other broadcasters v follo ws. F or each follo wer, the av erage visibility of a user u’s messages is deﬁned as the time that a p ost from user u is among the last k stories the follo wer receiv ed. In the visibilit y shaping problem, the goal is to optimize λ u ( t ) to steer visibilit y . By rearranging terms and letting dt → 0, w e uncov er a recursiv e relationship b et w een f k ( t ) and f k − 1 ( t ), by means of the following diﬀerential equation: f k 0 ( t ) = − ( µ ( t ) + λ ( t )) f k ( t ) + µ ( t ) f k − 1 ( t ) + λ ( t ) , (15) P erhaps surprisingly , w e can ﬁnd a closed form expression for f k ( t ), giv en by the follo wing Lemma (prov en in the App endix A): Lemma 1. Given a br o adc aster with message intensity λ ( t ) and one of her fol lowers with fe e d message intensity due to other br o adc asters µ ( t ) . The pr ob ability f k ( t ) that at le ast one message fr om the br o adc aster is among the k most r e c ent ones r e c eive d by the fol lower at time t c an b e uniquely c ompute d as f k ( t ) = R t 0 λ ( τ ) e − R t τ λ ( x ) dx Γ[ k , R t τ µ ( x ) dx ] dτ ( k − 1)! , (16) given the b oundary c onditions f 1 (0) = . . . = f k (0) = 0 and the inc omplete Gamma function deﬁne d as Γ[ k , x ] = R ∞ x τ k − 1 e − τ dτ . 4 On the Conca vit y of Visibilit y Once w e hav e a form ula that allows us to compute the av erage visibility given any arbitrary intensities for the broadcasters, we will now show that, rem ark ably , the av erage visibility is conca ve in the space of smo oth in tensity functions. Moreov er, we will also sho w that the av erage visibility is conca ve with resp ect to the parameters of piecewise constant functions, whic h we will use in our exp erimen ts. Smo oth in tensit y functions. In this section, we assume that the message in tensity of the broadcaster b elongs to the space H of all smo oth functions. Before we pro ceed, we need the follo wing deﬁnition: Deﬁnition 2. Given the sp ac e H of al l smo oth functions, a functional J : H → R is c onc ave if for every g 1 , g 2 ∈ H and 0 < α < 1 : J [ αg 1 + (1 − α ) g 2 ] ≥ αJ [ g 1 ] + (1 − α ) J [ g 2 ] . (17) A functional J is c onvex if − J is c onc ave. 6 It readily follows that the probability f k ( t ), given b y Eq. 16, is a functional with λ ( · ) as input. Moreo ver, the follo wing tw o theorems, prov en in App endices B and C, establish the concavit y of f k ( t ) and V ( k ) with resp ect to λ ( · ). Theorem 3. Given a br o adc aster with message intensity λ ( t ) and one of her fol lowers with fe e d message intensity due to other br o adc asters µ ( t ) . The pr ob ability f k ( t ) that at le ast one message fr om the br o adc aster is among the k most r e c ent ones r e c eive d by the fol lower at time t , given by Eq. 16, is c onc ave with r esp e ct to λ ( · ) . Theorem 4. Given a br o adc aster with message intensity λ ( t ) and one of her fol lowers with fe e d message intensity due to other br o adc asters µ ( t ) . The visibility V ( k ) , given by Eq. 11, is c onc ave with r esp e ct to λ ( · ) . Giv en the ab o ve results, one could think of ﬁnding the optimal (general) message in tensity λ ( t ) that maximize (a function of ) the av erage visibilities across a broadcaster’s follow ers. Ho w ever, in practical applications, this may b e ineﬃcient and undesirable, instead, one may focus on a simpler parametrized family of intensities, suc h as piecewise constant in tensity functions, which will b e easier to optimize and ﬁt using real data. T o this aim, next, we pro ve that the av erage visibility is also concav e on the parameters deﬁning piecewise constan t in tensity functions. Piecewise constant intensit y functions. In this section, w e assume that the message intensit y λ ( t ) of the broadcaster b elongs to the space of piecewise constant functions λ : [0 , T ] → R , denoted b y G , which w e parametrized as follo ws: λ ( t ) = M X m =1 a m I ( τ m − 1 ≤ t < τ m ) , (18) where a m ≥ 0, M is the n umber of pieces, τ i − τ i − 1 = T / M = ∆ and τ 0 = 0. As the reader may hav e noticed, the results from the previous section are not readily usable since Lemma 1 requires the in tensity functions to b e smo oth. Ho wev er, w e will now sho w that, for every function λ ( t ) ∈ G , there is a sequence of smo oth functions λ n ( t ) ∈ H such that lim n →∞ λ n ( t ) = λ ( t ) and, this will suﬃcien t to pro ve concavit y . Before we pro ceed, we need the follo wing deﬁnition: Deﬁnition 5. A functional J : G → H is said to b e c ontinuous at λ ( · ) ∈ H if for every  > 0 , ther e is a δ > 0 such that | J [ λ ] − J [ ˆ λ ] | <  (19) pr ovide d that || λ − ˆ λ || < δ , wher e || · || is a norm in H . It readily follows that the probability f k is a contin uous functional on H . Moreov er, we need the follo wing lemma (prov en in App endix D) to prov e the concavit y: Lemma 6. F or every λ ( t ) ∈ G , ther e is a se quenc e of smo oth functions λ n ( t ) ∈ H wher e lim n →∞ λ n ( t ) = λ ( t ) . Using Lemma 6, for an y λ ( t ) ∈ G , it follows that f k ( λ ( · )) = lim n →∞ f k ( λ n ( t )) (20) where λ n ( t ) is a sequence of smooth functions such that lim n →∞ λ n ( t ) = λ ( t ). As a consequence, we can establish the conca vity of f k ( t ) and V ( k ) with respect to a 1 , . . . , a M with the follo wing Theorem (prov en in App endix E): Theorem 7. f k and V ( k ) ar e c onc ave functionals in the sp ac e of pie c ewise c onstant functions G . Corollary 8. If we r epr esent λ ∈ G using Eq. 18, f k ( t ) and V ( k ) ar e c onc ave with r esp e ct to a 1 , . . . , a m . 7 Algorithm 1: Pro jected Gradien t Descen t for Visibility Shaping Initialize c ; rep eat 1- Pro ject c in to the p olytop e c > 0, c > 1 6 C ; 2- Find the gradient g ( c ); 3- Up date c using the gradien t g ( c ); un til c onver genc e ; 5 Con v ex Visibilit y Shaping F ramework Giv en the concavit y of the av erage visibility , we now propose a conv ex optimization framework for a v ariety of visibility shaping tasks. In all these tasks, our goal is to ﬁnd the optimal message in tensity λ u ( t ) for broadcaster u that maximizes a particular nondecreasing concav e utility function U ( V u ( k )) of the a verage visibilit y of broadcaster u in all her follo wers within a time window [0 , T ], i.e. , maximize λ u ( t ) U ( V u ( k )) sub ject to λ u ( t ) ≥ 0 t ∈ [0 , T ] R T 0 λ ( t ) dt ≤ C, (21) where V u ( k ) = ( V uv ( k )) v ∈N ( u ) + , N ( u ) + denotes the broadcaster u ’s follo wers, V uv ( k ) denotes the a verage visibilit y in follo wer v , the ﬁrst constraint asserts the intensit y function remains p ositiv e, and the second limits the a verage num b er of messages broadcasted within [0 , T ] to be no more than C . W e next discuss tw o instances of the general framew ork, which achiev e diﬀeren t goals (their constraints remain the same and hence omitted). More generally , the ﬂexibility of our framew ork allows to use any nondecreasing concav e utilit y function. Av erage Visibility Maximization (A VM). The goal here is to maximize the sum of the visibilit y for all the broadcaster’s follo wers, i.e. , maximize λ u ( t ) X v ∈N ( u ) + V uv ( k ) (22) Minimax Visibilit y Maximization (MVM). Suppose our goal is instead to keep the visibility in the n follo wers with the smallest visibility v alue ab o ve a certain minimum lev el, or, alternatively make the av erage visibilit y across the n follow ers with the smallest visibility as high as p ossible. Then, w e can p erform the follo wing minimax visibility maximization task maximize λ u ( t ) n X i =1 V uv [ i ] ( k ) , (23) where V uv [ i ] ( k ) denotes the av erage visibilit y in the follo wer with the i -th smallest visibilit y among all the broadcaster’s follow ers. 6 Scalable Algorithm T o solve the visibility shaping problems deﬁned ab o ve, w e need to b e able to (eﬃcien tly) ev aluate the probabilit y function f k and visibility V ( k ). Ho wev er, a direct ev aluation by means of Eqs. 16 and 11 seem diﬃcult. Here, w e present an alternative representation of the probability function f k and the visibility V ( k ) for piece-wise constan t intensit y functions, which allo w us to compute these quantities v ery eﬃciently . Based on this result, we present an eﬃcient gradient based algorithm to ﬁnd the optimum intensit y . 8 Assume the broadcaster’s message in tensity λ ( t ) and the follo wer’s feed message in tensity due to other broadcasters µ ( t ) adopt the follo wing form: λ ( t ) = M X m =1 c m I ( τ m − 1 ≤ t < τ m ) and µ ( t ) = M X m =1 b m I ( τ m − 1 ≤ t < τ m ) . Then, each piece m in the abov e in tensities satisﬁes the recurrence relation given by Eq. 15, which we rewrite as f 0 k ( t ) + ( b m + c m ) f k ( t ) = c m + b m f k − 1 ( t ) , and one can easily prov e by induction that, in general, the solution of the ab o ve diﬀeren tial equation for eac h time interv al τ m − 1 ≤ t < τ m is given by f k ( t ) = e − ( b m + c m ) t ( α k − 1 ,k t k − 1 + · · · + α 0 ,k ) + β k (24) where α i,k = b i i ! ( h k − i − β k − i ), β i = 1 −  b m b m + c m  i , and h i is the probability f i ( τ i − 1 ) at the b eginning of time in terv al. Suc h represen tation allo ws for an eﬃcient ev aluation of f k ( t ). Next, w e also need to compute the in tegral of f k ( t ) to eﬃcien tly compute the visibilit y V ( k ). Without loss of generalit y , w e represen t the time for each piece in a normalized time window [0 , 1]. Then, the in tegral of f k ( t ) can b e written as follo ws: V ( k ) = Z 1 0 f k ( t ) dt = β k + k − 1 X i =0 α i,k Z 1 0 e − ( b m + c m ) t t i dt = β k + k − 1 X i =0 α i,k ( b m + c m ) i +1 [ i ! − Γ( i + 1 , b m + c m )] (25) where note that the last term is eﬃciently computable since, for integer v alues of n , the incomplete Gamma function Γ( n, x ) = ( n − 1)! e − x P n − 1 i =0 x i i ! . Giv en Eq. 25, we can now easily compute the gradient of the visibility V ( k ), whic h w e can then use to design an eﬃcien t gradient based algorithm. F or brevit y , we just sho w the gradien t for k = 1. Let c = ( c 1 , . . . , c M ), and y = ( y 0 , . . . , y M − 1 , y M ) b e the v alues of f k ( t ) at the b eginning of each time in terv al, then, ∂ V (1) ∂ c i = 1 ( b i + c i ) 2  − ∂ y i ∂ c i ( b i + c i ) + ( y i − y i − 1 ) + b i  + M X m = i +1 1 b m + c m  ∂ y m − 1 ∂ c i − ∂ y m ∂ c i  . where we can easily compute ∂ y j /∂ c m recursiv ely as b m ( b m + c m ) 2 −  y m − 1 − c m ( b m + c m ) − b m ( b m + c m ) 2  e − ( b m + c m ) , if j = m , and e − ( b j + c j ) ∂ y j − 1 ∂ c m , if j > m . Once w e ha ve an eﬃcient wa y to compute the visibility V ( k ) and its gradien t, we can readily design a pro jected gradient descen t algorithm to ﬁnd the optimal message intensit y λ u ( t ) in the visibility shaping problems describ ed in Section 5. Note that, since our optimization problems are conv ex, there is a unique optim um and con vergence is guaran teed. Moreo ver, for the pro jection step, we solv e a quadratic program, minimizing the distance to the feasible polytop e. Algorithm 1 summarizes the o verall algorithm. 7 Exp erimen ts Dataset description and exp erimen tal setup. W e use data gathered from Twitter as rep orted in previous work [3], whic h comprises the follo wing three types of information: proﬁles of 52 million users, 1 . 9 billion directed follow links among these users, and 1 . 7 billion public tw eets posted b y the collected users. The follow link information is based on a snapshot taken at the time of data collection, in September 2009. 9 0 6 12 18 24 time 0.0 0.3 0.6 0.9 intensity top-1-nosig top-20-nosig wall top-1-sig top-20-sig 0 6 12 18 24 time 0 1 2 3 intensity top-1-nosig top-20-nosig wall top-1-sig top-20-sig 0 6 12 18 24 time 0 1 2 3 intensity top-1-nosig top-20-nosig wall top-1-sig top-20-sig 0 6 12 18 24 time 0.0 0.3 0.6 0.9 intensity top-1-nosig top-20-nosig wall top-1-sig top-20-sig 0 6 12 18 24 time 0.000 0.006 0.012 0.018 f_1 significance top-1-nosig top-1-sig top-20-nosig top-20-sig 0.00 0.07 0.14 0.21 f_20 0 6 12 18 24 time 0.0 0.1 0.2 0.3 f_1 significance top-1-nosig top-1-sig top-20-nosig top-20-sig 0.0 0.3 0.6 0.9 f_20 0 6 12 18 24 time 0.0000 0.0015 0.0030 0.0045 f_1 significance top-1-nosig top-1-sig top-20-nosig top-20-sig 0.00 0.04 0.08 0.12 f_20 0 6 12 18 24 time 0.000 0.002 0.004 0.006 f_1 significance top-1-nosig top-1-sig top-20-nosig top-20-sig 0.00 0.04 0.08 0.12 f_20 0 5 10 15 20 day 0.00 0.06 0.12 0.18 visibility(1) top-1-actual top-1-nosig top-20-actual top-20-nosig 0.0 0.3 0.6 0.9 visibility(20) 0 5 10 15 20 day 0.0 0.2 0.4 0.6 visibility(1) top-1-actual top-1-nosig top-20-actual top-20-nosig 0.0 1.5 3.0 4.5 visibility(20) 0 5 10 15 20 day 0.00 0.02 0.04 0.06 visibility(1) top-1-actual top-1-nosig top-20-actual top-20-nosig 0.00 0.25 0.50 0.75 visibility(20) 0 5 10 15 20 day 0.00 0.07 0.14 0.21 visibility(1) top-1-actual top-1-nosig top-20-actual top-20-nosig 0.0 0.6 1.2 1.8 visibility(20) Figure 2: Intensities and top- k probabilities and visibilities. W e fo cus on four broadcasters (one per column) and solve the A VM problem for one of their follow ers, pic ked at random. The ﬁrst row shows the follow er’s timeline intensit y ( µ ( t ), in brown) ﬁtted using ev ents from the training set, and the optimized intensities, as giv en b y our framework, that maximize visibilit y for k =1, 20 on the training set with and without signiﬁcance ( λ ∗ ( t ), in solid and dashed y ellow and blue, respectively). The second ro w shows the top- k probability for the optimized intensities with and without signiﬁcance for k =1, 20 ( f ∗ k ( t ), in solid and dashed y ellow and blue) as w ell as the follow er’s signiﬁcance ( s ( t ), in brown). The third row compares the a verage visibility ac hieved by the optimized intensities without signiﬁcance for k =1, 20 ( V ∗ ( k ), in yello w and blue) to the a verage visibility achiev ed by the broadcaster’s posting activity ( V ( k ), in green and purple) on a held-out set. Here, w e fo cus on the t weets published during a six and a half month p eriod, from F ebruary 2, 2009 to August 13, 2009. In particular, we sample 10,000 users uniformly at random as broadcasters and record all the t weets they posted. Moreov er, for eac h of these broadcasters, we track do wn all their follo wers and record all the tw eets they p osted as well as reconstruct their true timelines b y collecting all the tw eets published b y the p eople they follo w. In our experiments, w e use the ﬁrst three and a half month p eriod, from F ebruary 2 to May 13 to ﬁt the piecewise constan t in tensities of the follo wers’ timelines and the follow ers’ signiﬁcance, whic h we use in our con vex visibility shaping framework. Here, the follow er’s signiﬁcance is the probability that she is on-line, estimated as a piecewise (hourly) constant probability from the tw eets-retw eets the follow er p osted – if a follo wer tw eeted or retw eeted in an hour, w e assume it was on-line during that hour. Then, we use the last three mon th p erio d, from Ma y 14 to August 13, to ev aluate our framework. W e refer to the former perio d as the training set and the latter as the test set. W e exp erimen t both with T =24 hours ( M =24, ∆=1 hour) and T =7 days ( M =24 × 7, ∆=1 hour), and set the budget C to b e equal to the av erage num b er of tw eets p er T the broadcaster p osted in the training perio d. Ev aluation schemes. Throughout this section, w e use three diﬀeren t ev aluation sc hemes, with an increasing resem blance to a real world scenario: 10 The or etic al obje ctive : W e compute the theoretical v alue of the utility using the broadcaster intensit y under study , be it the (optimal) intensit y given b y our conv ex visibility shaping framew ork, the intensit y giv en b y an alternative baseline, or the the broadcaster’s (true) ﬁtted intensit y . Simulate d obje ctive : W e simulate ev ents b oth from the broadcaster in tensity under study and eac h of the follo wers’ timeline ﬁtted in tensities. Then, we estimate empirically the ov erall utility based on the simulated ev ents. W e p erform 100 independent simulation runs and rep ort the av erage and standard error (or standard deviation) of the utility . Held-out data : W e simulate ev ents from the broadcaster in tensity under study , interlea ve these generated ev ents on the true follo wers’ timelines recorded as test set, and compute the corresp onding utilit y . W e p erform 10 indep endent sim ulation runs and rep ort the a verage and standard error (or standard deviation) of the utilit y . In tensities, top- k probabilities and visibilities. W e pay attention to four broadcasters, pick ed at random, and solv e the av erage visibility maximization task for one of their follo wers, also pick ed at random. Our goal here is to shed ligh t on the inﬂuence that the follow er’s timeline in tensity and signiﬁcance ha ve on the optimized broadcaster’s in tensity as w ell as its corresp onding visibilit y and top- k probabilit y for diﬀerent v alues of k . Figure 2 summarizes the results, which show that (i) including the signiﬁcance in the visibilit y deﬁnition shifts the optimized intensities aw ay from the times in which the follow ers are not online (ﬁrst ro w); (ii) the optimized intensities typically achiev e a higher av erage visibilit y than the one achiev ed by the broadcaster’s true p osting activit y on a held-out set (third row); and (iii) the optimized intensities are more concen trated in time for k = 1 (ﬁrst row) and achiev e a higher av erage visibility and top- k probability for k = 20 (second and third ro w). Solution quality . In this section, we p erform a large scale ev aluation of our framew ork across all 10,000 broadcasters in terms of the three ev aluation schemes describ ed ab o ve and compare its p erformance against sev eral baselines. Here, we consider the deﬁnition of visibility that incorp orates signiﬁcance since, as argued previously , ma y lead to more eﬀective broadcasting strategies 7 . In the av erage visibilit y maximization task, we compare our framew ork with three heuristics, in which the broadcaster distributes the av ailable budget uniformly at random (RA VM), proportionally to P n i =1 µ i ( t ) (IA VM) and prop ortionally to P n i =1 s i ( t ) µ i ( t ) (P A VM), respectively . In the minimax visibilit y maximization task, we also compare with three heuristics. The ﬁrst t w o heuristics are similar to t wo of the ones just men tioned for A VM, i.e. , the broadcaster distributes the a v ailable budget uniformly at random (RMVM) and prop ortionally to P n i =1 µ i ( t ) (IMVM). In the third heuristic, the broadcaster distributes its budget follo wing a greedy procedure: at each iteration k , it ﬁrst ﬁnds the user with the least visibilit y given λ ( k − 1) ( t ) and then solves the a verage visibility maximization for that user given a budget of C /n . Finally , it outputs the intensit y λ ( t ) = P n k =1 λ ( k ) ( t ). The greedy pro cedure starts with λ (0) ( t ) = C / M . Additionally , for the held-out comparison, we also compute the actual av erage intensit y that the broadcaster achiev ed in realit y . Figure 3 summarizes the results by means of a b o x plot, whic h shows the utilities ac hieved by our framew ork and the heuristics normalized with resp ect to the utility achiev ed by the broadcasters’ ﬁtted true in tensity (by the p osts during the test set for the third ev aluation sc heme). That means, if y = 1, the optimized in tensity achiev es the same utilit y as the broadcaster’s recorded p osts. F or the a verage visibility maximization task, the intensities provided by our metho d ac hieve 1 . 5 × higher theoretical ob jective and 1 . 3 × higher utilit y on a held-out set, in av erage (blac k dashed line), than the broadcaster’s ﬁtted in tensities. In contrast, alternativ es fail at providing an y gain, i.e. , y ≤ 1 for a half of the broadcasters. Finally , for the minimax visibilit y maximization task, whic h is signiﬁcan tly harder, the intensities provided by our metho d ac hieve 1 . 6 × higher theoretical ob jective and 1 . 4 × higher av erage utilit y on a held-out set, in a verage (black dashed line), than the broadcaster’s ﬁtted intensities. In this case, although our metho d outp erforms the baselines by large margins in terms of theoretical and simulated ob jectives, the baselines achiev ed almost the same a verage utility on the held-out set. The theoretical and simulated ob jective are almost equal in all cases, as one may hav e exp ected. 7 W e obtain qualitatively similar results if we omit the signiﬁcance in the deﬁnition of visibility . Actually , in such case, our framework b eats the baselines by a greater margin. 11 Theoretical RAVM AVM PAVM IAVM 0.5 1.0 1.5 RMVM MVM PMVM IMVM 0.5 1.0 1.5 Simulated RAVM AVM PAVM IAVM 0.5 1.0 1.5 RMVM MVM PMVM IMVM 0.5 1.0 1.5 Real Held-out RAVM AVM PAVM IAVM 0.5 1.0 1.5 RMVM MVM PMVM IMVM 0.5 1.0 1.5 Av erage Visibilit y Minimax Visibility Figure 3: Visibility shaping for 10,000 broadcasters. The left (righ t) column corresp onds to A VM (MVM), ev aluated using the theoretical ob jective (ﬁrst row), the simulated ob jective (second row) and the held-out data (third row). The red (blue) dashed line shows the median (mean) ob jective popularity and the b o x limits corresp ond to the 25%–75% p ercen tiles. Solution quality vs. # of follo wers. Figure 4(a) shows the av erage visibilities ac hieved b y our optimized in tensities for the A VM task, normalized by the av erage visibility that the corresp onding broadcasters’ ﬁtted in tensities achiev e, against num b er of follow ers for the same 10,000 broadcasters as ab o ve. Indep enden tly of the num b er of follow ers, we ﬁnd that the intensities provided by our metho d consistently outp erform the broadcaster’s ﬁtted in tensities. Visibilit y vs. k . Figure 4(b) shows the av erage visibility ac hieved b y our optimized intensities for the A VM task against k for the four broadcasters from Figure 2. Scalabilit y . Figure 4(c) shows that our con vex optimization framew ork easily scale to broadcasters with thousands of follow ers. F or example, giv en a broadcaster with 2 , 000 follow ers, our algorithm takes ∼ 250 milliseconds to ﬁnd the optimal intensit y for the av erage visibility maximization using a single machine with 64 cores and 1 . 5 TB RAM. 8 Conclusions In this pap er, we dev elop ed a no vel framework to solv e the when-to-p ost problem, in which we mo del users’ feeds and p osts as discrete ev ents occurring in con tinuous time. Under such con tinuous-time mo del, then c ho osing a strategy for a broadcaster b ecomes a problem of designing the conditional intensit y of her p osting ev ents. The key technical idea that enables our framework is a no vel formula which can link the conditional 12 1 900 1800 2700 #followers 1.3 1.6 1.9 2.2 Visibility Gain (Theory) 0 5 10 15 20 k 1.44 1.46 1.48 1.50 Visibility Gain (Theory) 0 600 1200 1800 2400 #followers 0 50 100 150 200 250 300 Time(mili sec) a) F ollow ers b) k c) Running time Figure 4: Panels show the av erage visibilit y against (a) # of follow ers and (b) k . Panel (c) plots running time. in tensity of an arbitr ary broadcaster with her visibilit y in her follo wers’ feeds, deﬁned as the time that at least one p ost from her is among the most recen t k received stories in her follow ers’ feed. In addition to the framew ork, we dev elop an eﬃcient gradient based optimization algorithm, whic h allows us to ﬁnd optimal broadcast in tensities for a v ariety of visibility shaping tasks in a matter of seconds. Exp eriments on large real-w orld data gathered from Twitter rev ealed that our framew ork can consistently make broadcasters’ p osts more visible than alternatives. Our work also op ens man y interesting v enus for future work. F or example, we assume that the so cial net work sorts stories in eac h user’s feed in in verse c hronological order. While this is a realistic assumption for some social net works ( e.g. , Twitter), there are other social net works ( e.g. , F aceb ook) where the feed is curated algorithmically . It w ould b e very in teresting to augmen t our framew ork to suc h cases. In this w ork, we model users’ in tensities using inhomogeneous Poisson pro cesses, whose intensities are history indep enden t and deterministic. Extending our framework to p oin t processes with stochastic and history dependent intensit y functions, such as Ha wkes pro cesses, would most likely pro vide more eﬀective broadcasting strategies. In this work, we v alidate our framew ork on tw o visibility shaping tasks, av erage visibilit y maximization and minimax visibilit y maximization, how ever, there are man y other useful tasks one may think of, such as visibilit y homogenization. Finally , it w ould b e very interesting to inv estigate the scenario in which there are sev eral smart broadcasters using our algorithm. References [1] O. Aalen, O. Borgan, and H. K. Gjessing. Survival and event history analysis: a pr o c ess p oint of view . Springer, 2008. [2] L. Bac kstrom, E. Baksh y , J. M. Kleinberg, T. M. Lento, and I. Rosenn. Cen ter of atten tion: Ho w faceb ook users allo cate attention across friends. ICWSM , 2011. [3] M. Cha, H. Haddadi, F. Benev enuto, and P . K. Gummadi. Measuring User Inﬂuence in Twitter: The Million F ollow er F allacy. ICWSM , 2010. [4] W. Chen, C. W ang, and Y. W ang. Scalable inﬂuence maximization for prev alent viral mark eting in large- scale so cial netw orks. In Pr o c e e dings of the 16th ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , 2010. [5] J. Cheng, L. Adamic, P . Do w, J. Kleinberg, and J. Lesko v ec. Can cascades b e predicted? In Pr o c e e dings of the 23r d international c onfer enc e on World wide web , 2014. [6] T. Chenhao, L. Lee, and B. P ang. The eﬀect of w ording on message propagation: T opic- and author- con trolled natural experiments on t witter. In Pr o c e e dings of the 52nd Annual Me eting of the Asso ciation for Computational Linguistics , 2014. 13 [7] E. Constan tinides. F oundations of so cial media mark eting. Pr o c e dia-So cial and b ehavior al scienc es , 148:40–57, 2014. [8] M. B. Crawford. The world b eyond your he ad: On b e c oming an individual in an age of distr action . Macmillan, 2015. [9] A. De, I. V alera, N. Ganguly , S. Bhattachary a, and M. Gomez-Ro driguez. Modeling opinion dynamics in diﬀusion net works. arXiv pr eprint arXiv:1506.05474 , 2015. [10] N. Du, L. Song, M. Gomez-Ro driguez, and H. Zha. Scalable inﬂuence estimation in contin uous-time diﬀusion netw orks. In A dvanc es in Neur al Information Pr o c essing Systems , 2013. [11] M. F ara jtabar, N. Du, M. Gomez-Rodriguez, I. V alera, H. Zha, and L. Song. Shaping so cial activity by incen tivizing users. In NIPS , 2014. [12] M. F ara jtabar, Y. W ang, M. Gomez-Ro driguez, S. Li, H. Zha, and L. Song. Co ev olve: A joint p oin t pro cess model for information diﬀusion and net work co-ev olution. In A dvanc es in Neur al Information Pr o c essing Systems , pages 1945–1953, 2015. [13] M. Gomez-Rodriguez, K. Gummadi, and B. Sc h¨ olk opf. Quantifying information o v erload in social media and its impact on so cial contagions. In 8th International AAAI Confer enc e on Weblo gs and So cial Me dia , 2014. [14] N. Ho das and K. Lerman. How visibilit y and divided attention constrain so cial con tagion. So cialCom , 2012. [15] D. Kemp e, J. Kleinberg, and ´ E. T ardos. Maximizing the spread of inﬂuence through a social netw ork. In Pr o c e e dings of the ninth ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , 2003. [16] G. Miritello, R. Lara, M. Cebrian, and E. Moro. Limited comm unication capacity unv eils strategies for h uman in teraction. Scientiﬁc r ep orts , 3, 2013. [17] N. Na v aroli and P . Smyth. Modeling response time in digital human comm unication. In Ninth Inter- national AAAI Confer enc e on Web and So cial Me dia , 2015. [18] M. Richardson and P . Domingos. Mining knowledge-sharing sites for viral marketing. In Pr o c e e dings of the eighth ACM SIGKDD international c onfer enc e on Know le dge disc overy and data mining , 2002. [19] N. Spaso jevic, Z. Li, A. Rao, and P . Bhattacharyy a. When-to-p ost on so cial net works. In Pr o c e e dings of the 21th ACM SIGKDD International Confer enc e on Know le dge Disc overy and Data Mining , pages 2127–2136. ACM, 2015. [20] I. V alera and M. Gomez-Ro driguez. Mo deling adoption and usage of comp eting pro ducts. In IEEE International Confer enc e on Data Mining , 2015. [21] D. V u, D. Hun ter, P . Smyth, and A. Asuncion. Con tin uous-time regression mo dels for longitudinal net works. In A dvanc es in Neur al Information Pr o c essing Systems , 2011. [22] Q. Zhao, M. Erdogdu, H. He, A. Ra jaraman, and J. Lesk ov ec. Seismic: A self-exciting p oin t pro cess mo del for predicting t weet p opularit y . In 21th A CM SIGKDD International Confer enc e on Know le dge Disc overy and Data Mining , 2015. 14 A Pro of of Lemma 1 W e will pro ve this lemma b y induction on k . F or the case k = 1, f 1 ( t ) satisﬁes a ﬁrst-order linear diﬀeren tial equation, f 0 1 ( t ) = − ( µ ( t ) + λ ( t )) f 1 ( t ) + λ ( t ) , (26) whose unique solution is f 1 ( t ) = Z t 0 λ ( τ ) e − R t τ ( λ + µ )( x ) dx . (27) as long as assuming f 1 ( t ) = 0. Then, using that Γ[1 , R t τ µ ( x ) dx ] = e − R t τ µ ( x ) dx , we can rewrite the solution as f 1 ( t ) = Z t 0 λ ( τ ) e − R t τ λ ( x ) dx Γ[1 , Z t τ µ ( x ) dx ] dτ , whic h prov es the theorem for k = 1. No w, in the inductive step w e assume the hypothesis is true for 1 , 2 , . . . k − 1 and we prov e it for k . W e start by rewriting the diﬀeren tial giv en b y Equation 15 as f k 0 ( t ) + ( µ ( t ) + λ ( t )) f k ( t ) = λ ( t ) + µ ( t ) f k − 1 ( t ) , (28) where, b y assumption, f k − 1 ( t ) is unique and kno wn. Then, as long as f k (0) = 0, the ab o v e diﬀeren tial equation has a unique solution and thus we only need to ﬁnd f k ( t ) that satisﬁes it. T o do so, we rewrite the righ t hand side of the diﬀeren tial equation using the inductive h yp othesis as λ ( t ) + µ ( t ) R t 0 λ ( τ ) e − R t τ λ ( x ) dx Γ[ k − 1 , R t τ µ ( x ) dx ] dτ ( k − 2)! , whic h, using Γ[ k − 1 , x ] = 1 k − 1 (Γ[ k , x ] + ∂ Γ[ k,x ] ∂ x ), can b e expressed as λ ( t ) + µ ( t ) R t 0  λ ( τ ) e − R t τ λ ( x ) dx Γ[ k , R t τ µ ( x ) dx ] + ∂ Γ[ k, R t τ µ ( x ) dx ∂ R t τ µ ( x ) dx  dτ ( k − 1)! (29) Next, we hypothesize that f k ( t ) = R t 0 λ ( τ ) e − R t τ λ ( x ) dx Γ[ k , R t τ µ ( x ) dx ] dτ ) ( k − 1)! , (30) and rewrite Eq. 29 as λ ( t ) + µ ( t ) f k ( t ) + µ ( t ) R t 0 λ ( τ ) e − R t τ λ ( x ) dx ∂ Γ[ k, R t τ µ ( x ) dx ] ∂ R t τ µ ( x ) dx dτ ( k − 1)! . (31) Then, by the fundamen tal theorem of calculus, ∂ Γ[ k , R t τ µ ( x ) dx ] ∂ R t τ µ ( x ) dx = ∂ Γ[ k , R t τ µ ( x ) dx ] ∂ t × ∂ t ∂ R t τ µ ( x ) dx = ∂ Γ[ k , R t τ µ ( x ) dx ] ∂ t × 1 µ ( t ) and thus λ ( t ) + µ ( t ) f k ( t ) + R t 0 λ ( τ ) e − R t τ λ ( x ) dx ∂ Γ[ k, R t τ µ ( x ) dx ] ∂ t dτ ( k − 1)! . (32) 15 Finally , using that for diﬀerentiable functions g and h , g h 0 = ( g h ) 0 − g 0 h , we hav e that Z t 0 ( λ ( τ ) e − R t τ λ ( x ) dx )( ∂ Γ[ k , R t τ µ ( x ) dx ] ∂ t ) dτ = Z t 0 ∂ ( λ ( τ ) e − R t τ λ ( x ) dx Γ[ k , R t τ µ ( x ) dx ]) ∂ t dτ | {z } ( k − 1)!( f 0 k ( t ) − λ ( t )) − Z t 0 ( ∂ λ ( τ ) e − R t τ λ ( x ) dx ∂ t )(Γ[ k , Z t τ µ ( x ) dx ]) dτ | {z } ( k − 1)! λ ( t ) f k ( t ) and then w e can rewrite Eq. 32 as λ ( t ) + µ ( t ) f k ( t ) + ( k − 1)!( f 0 k ( t ) − λ ( t )) + ( k − 1)! λ ( t ) f k ( t ) ( k − 1)! , whic h simpliﬁes to f k 0 ( t ) + ( µ ( t ) + λ ( t )) f k ( t ) . This asserts that hypothesized solution for f k ( t ) in Eq. 30 satisﬁes Eq. 28, hence, it is the unique solution for f k ( t ). B Pro of of Theorem 3 F rom Lemma 1, w e kno w that f k ( t ) = R t 0 ( λ ( τ ) e − R t τ λ ( x ) dx )Γ[ k , R t τ µ ( x ) dx ] dτ ( k − 1)! . Using integration by parts, we can rewrite the ab ov e expression as f k ( t ) = 1 − e − R t 0 λ ( x ) dx Γ[ k , R t 0 µ ( x ) dx ] ( k − 1)! − R t 0 ( e − R t τ λ ( x ) dx ) ∂ Γ[ k, R t τ µ ( x ) dx ] ∂ τ dτ ( k − 1)! Lemma 9 tells us that e − R t 0 λ ( x ) dx and e − R t τ λ ( x ) dx are con vex with respect to λ ( · ). Moreov er, using Lemma 10 and the fact that ∂ Γ[ k, R t τ µ ( x ) dx ] ∂ τ > 0, it follows that the function R t 0 ( e − R t τ λ ( x ) dx ) ∂ Γ[ k, R t τ µ ( x ) dx ] ∂ τ dτ is conv ex. Finally , giv en that Γ[ k , R t 0 0] > 0, we can conclude that f k ( t ) is conca ve with resp ect to λ ( · ). Lemma 9. F unctional J [ λ ] = e − R t a λ ( x ) dx is c onvex with r esp e ct to λ ( · ) for any c onstant a ≤ t . Pr o of. W e simply verify that J [ λ ] satisﬁes the deﬁnition of con vexit y , as given b y Eq. 17: J [ αλ 1 + (1 − α ) λ 2 ] = e − R t a αλ 1 ( x )+(1 − α ) λ 2 ( x ) dx ≤ αe − R t a λ 1 ( x ) dx + (1 − α ) e − R t a λ 2 ( x ) dx = αJ [ λ 1 ] + (1 − α ) J [ λ 2 ] where the inequality follows from the arithmetic-geometric mean inequality , i.e. , θ x + (1 − θ ) y ≥ x θ y 1 − θ for all p ositiv e x , y , and 0 < θ < 1. Lemma 10. If the functional J t [ λ ( · )] is c onvex with r esp e ct to λ ( · ) . Then, given any arbitr ary function g ( x ) ≥ 0 , the functional L [ λ ] = R t 0 J τ [ λ ( . )] g ( τ ) dτ is also c onvex with r esp e ct to λ ( · ) . Pr o of. W e verify that the functional L [ λ ] = R t 0 J τ [ λ ( . )] g ( τ ) dτ veriﬁes the deﬁnition of conv exity , as giv en by Eq. 17: L [ αλ 1 + (1 − α ) λ 2 ] = Z t 0 J τ [ αλ 1 + (1 − α ) λ 2 ] g ( τ ) dτ ≤ α Z t 0 J τ [ λ 1 ] g ( τ ) dτ + (1 − α ) Z t 0 J τ [ λ 2 ] g ( τ ) dτ = αL [ λ 1 ] + (1 − α ) L [ λ 2 ] where the inequalit y holds using that, giv en any t wo arbitrary functions h 1 and h 2 suc h that h 1 ( x ) ≥ h 2 ( x ) ≥ 0 for all x ∈ D , then R D h 1 ( x ) g ( x ) dx ≥ R D h 2 ( x ) g ( x ) dx given g ( x ) ≥ 0 for all x ∈ D . 16 C Pro of of Theorem 4 Theorem 3 pro ves the concavit y of f k ( t ) with respect to λ ( · ). Therefore, 1 − f k ( t ) is con vex and, using Lemma 10, it holds that R T 0 (1 − f k ( t )) s ( t ) dt = R T 0 s ( t ) dt − R T 0 f k ( t ) s ( t ) dt is also con vex. Then, since R T 0 s ( t ) dt is constan t, R T 0 f k ( t ) s ( t ) dt is conca ve with respect to λ ( · ) and the proof is complete. D Pro of of Lemma 6 Eac h piecewise contin ues function can b e represented as summation of a num b er of he aviside step functions. The count is equal to the num b er of discontin uity p oin ts. Ho wev er, eac h hea viside function itself is the limit of smo oth tanh functions. Therefore, the piecewise contin ues function will b e the limit of a ﬁnite summation of smo oth tanh functions. E Pro of of Theorem 7 Consider tw o piecewise constant functions λ ( · ) , µ ( · ) ∈ G . According to Lemma 6 there exist sequence of smo oth functions such that lim n →∞ λ n = λ and lim n →∞ λ 0 n = λ 0 . Because of the concavit y of f k in H we kno w for 0 < α < 1: f k [ αλ n ( · ) + (1 − α ) λ 0 n ( · )] ≥ αf k [ λ n ( · )] + (1 − α ) f k [ λ 0 n ( · )] . T aking the limit and using the contin uit y of f k w e get: f k [ αλ ( · ) + (1 − α ) λ 0 ( · )] ≥ αf k [ λ ( · )] + (1 − α ) f k [ λ 0 ( · )] . (33) Accompanied with con vexit y of space G the theorem is prov ed. 17

Smart broadcasting: Do you want to be seen?

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment