Adversarial Influence Maximization

A dv ersarial Inﬂuence Maximizat ion Justin Khim ∗ V arun Jog † P o-Ling Loh †‡ Jan uary 19, 2019 Abstract W e consider the problem of inﬂuence maximization in ﬁxed networks for contagion models in an adversarial s etting. The goal is to select an optimal set of no des to seed the inﬂuence pro cess, suc h t ha t t he n umber of inﬂuenced no des a t the conclusion of the campaign is as large as po ssible. W e formulate the problem as a rep eated g ame between a player a nd adversary , where the adversary sp eciﬁes the edges along which the contagion may spread, a nd the play er chooses sets of nodes to inﬂuence in an online fashion. W e est ablish upper a nd lo wer bo unds o n the minimax pseudo - regret in bo th undirected and directed n et works. 1 In tro duction Man y data sets in con temp orary scien tiﬁc applications p oss ess some underlying net w ork struc- ture [ 32 ]. P opular examples include data collected from so cial media w ebsites such as F aceb o ok and T witter [ 1 , 28 ], or electro cortical recordings gathered from a net w ork of ﬁring neurons [ 35 ]. An imp ortan t application of netw ork science arises in mark eting, where researc hers hav e studied the imp ortance of wor d-of-mouth advertising for decades [ 23 ]. More recen tly , methods hav e b een pro- p osed b y mark eting researc hers to q uan tify the imp ortance of wo rd-of-mouth mark eting in online so cial net w orks in b oth theory and practic e [ 40 , 37 ]. Subsequen t empirical studies suggest that w ord-of-mouth mark eting has a signiﬁcan t eﬀect in online so cial net w orks [ 3 , 34 ]. A t the s ame time, computer s cientists hav e analyzed the problem of viral marke ting from an optimization-theo retic p ersp ectiv e [ 15 , 27 , 11 ], where the goal is to select an optimal set of inﬂuencers to encourage pro duct adoption in an online so cial net wo rk. This has led to rigor ou s theor etical guaran tees that hold for sto c hastic mo dels of word- of -mouth adv ertising inspired by ph ysics and epidemiology , and the s cop e of the spread is quantiﬁ ed using a notion kno wn as inﬂuence [ 24 ]. In so cial net w orks, edges represen t p oten tial int eractions b et w een individuals, and the problem of inﬂuence maximization corresp onds to identi f ying subsets of individu als on whic h to impress an idea so that information spreads as widely as p ossible sub ject to an adv ertising budget. F ormally , the inﬂuence of a subset of no des is deﬁned as the exp ected n um b er of inﬂuenced individuals in a net wor k at the conclusion of a spread, starting from an initial conﬁguration where only the sp eciﬁed no des are inﬂuenced. E ven when the inﬂuence function is assumed to b e com- putable for an y subset using a blac k-b ox method in unit time, it is not clear whether inﬂuenc e maximization ma y b e p erformed (exactly or appro ximately) in p olynomial time, since searc hing o ver all s ubsets of k no des is exp onen tial in the n umber of no des. A ccordingly , the b o dy of w ork in theoretical computer science has mostly f o cused on sp eciﬁc spreading mo dels that giv e rise to ∗ Department o f St atistics, Universi ty of P ennsylva n ia, Philadelphia, P A 19104. † Department o f Electrical & Computer Engineering, Univers ity of Wisconsi n , Madison, WI 53706. ‡ Department o f St atistics, Universi ty of Wisconsin, Madison, WI 53706 . 1 nice prop erties suc h as submo dularit y , implying that a greedy algorithm f or inﬂuence maximiza tion leads to a constan t-factor appro ximation of the optimal set [ 24 , 25 , 8 ]. Other related w ork includes predicting when kno wledge becomes v iral; limiting the spread of information through carefully po- sitioned inte rven tions [ 14 , 16 ]; or competitiv e s ettings of inﬂuence maximiza tion, e.g. c omp eting for v otes or mark et share [ 7 , 19 , 18 ]. A signiﬁcan t shortcoming in the analysis of sto c hastic spreading mo dels is the fact that the parameters ch aracterizing the spread of inﬂuence are generally assumed to b e known , allo wing for appro ximate ev aluation of the inﬂuence function (either by analytic methods or s im ulation). Ho w ever, s uc h an as s umption is not alwa ys practical. In the case of indep enden t cascade mo dels or linear threshold models, where paramet ers corresp ond to edge w eigh ts in the netw ork, one migh t ev en question a scien tist’s prior kno wledge of the precise net w ork structure. T o address these issues, some authors ha ve studied the in teresting question of accurately learning the inﬂuence function itself in a stochastic spreadi ng mo del based on observing m ultiple rounds of infection [ 29 , 26 , 21 ]. Another appr oach in v olv es a notion of “robust inﬂuence maxi mization,” where the parame ters are only s p eciﬁed to lie in ﬁxed conﬁdence sets, and the goal is to obtain a set of source ve rtices that appro ximately maximizes the true inﬂuence function, p ossibly in a w orst-case sense [ 12 , 20 ]. Robust inﬂuence maximization metho ds ma y also b e mo del-dep endent, meaning that a robust algorithm designed for the indep enden t cascade mo del may lead to a sev e rely non-optimal s olution if the inﬂuence spread actually follow s linear threshold mo del. Indeed, the parameters describing diﬀeren t mo dels, as w ell as the nature of uncertain ties p ermitted in them, may b e complet ely diﬀeren t. F urther, it is unclear that p opular mo dels of inﬂuence are go o d apprixmati ons of real-w orld b eha vior [ 17 , 22 ]. In this pap er, w e take a rather diﬀeren t approac h tow ard the problem of unkno wn spreadin g parameters that also a v oids assumptions about a particular spreading mec hanism. As discuss ed in more detail in Section 2 , w e only as s ume kno wledge of an underlying ﬁxed graph represen ting the paths along whic h a inﬂuence may spread, where the case of no prior knowl edge corresp onds to a comple te graph. W e f orm ulate the inﬂuence maximization problem as an online game, where a “play er” m ust mak e sequen tial decisions ab out the next seed s et to c ho ose based on observing the b eha vior of the spread in previous “rounds” of the game. Here, a round represen ts a particular instance of an inﬂuence pro cess initialized from the speciﬁc seed no des from b eginning to end. W e allo w an “adv ersary” to c ho ose the path of inﬂuence on eac h round in a completely arbitrary manner, as long as the pro cess ma y only spread along edges of the graph—in particular, this setting subsumes the stochastic models usually adopted in the inﬂuence maximization literature, while allo wing for m uc h more general spreading mec hanisms (e.g., information do es not necessarily propagate in an i.i.d. manner ov er all rounds of the game). Note that the adversary’s strategy ma y b e so arbitrary as to b e “unlearnable.” Thus, instead of simply trying to maximize the aggregate n um b er of inﬂuenced v ertices across all rounds, w e seek to dev elop pla yer strategies that bound the “regret ” of the pla yer, deﬁned as the diﬀerence b etw een the total n um b er of vertice s inﬂuenced using the play er’s strategy and the n um b er of verti ces that w ould ha ve b een inﬂuen ced if t he pla y er had adopted the best constan t c hoice of source set in hindsigh t. Such notio ns are tak en from the literature on m ulti- armed bandits and online learning theory [ 5 , 10 ], and adapted to the presen t setting. Our main con tribution is to deriv e upp er and lo w er b ounds on the pseudo -regret for v ario us adv ersarial and pla ye r s trategies. W e study both directed and undirected net w orks, where in the latter setting, con tagion is allo wed to spread in b oth directions when an edge is c hosen by the adv ersary . F urthermor e, w e deriv e lo w er b ounds for the minimax pseudo-regret when the underlying net w ork is a complete graph, where the suprem um is tak en o ver all adversarial strate gies and the inﬁm um is tak en ov er all pla y er strategies. Our upper and low er b ounds matc h up to constan t factors in the case of directed netw orks. Notably , the b ounds also agree with the usual rate for 2 pseudo-regret in m ulti-armed bandits, show ing that no new information is gained b y the pla yer by exploiting net w ork structure. On the other hand, a gap exists b et ween our upp er and lo wer b ounds for undirected netw orks, lea ving op en the p os sibilit y that the pla yer may lev erage the additional information from the net w ork to incur less regre t. A dditionally , the constan t f actor on the upp er b ound ma y b e sligh tly impro ve d, pro viding f urther evidence that graph structure ma y b e exploited. Finally , w e demonstrate ho w to extend our upp er b ounds to the s etting where the play er is allo w ed to c ho ose mult iple source vertic es on eac h round . The prop osed m ulti-source play er strategy augmen ts the source set sequen tially using the single-source strategies as a subroutine, and is based on a general online greedy algorithm prop osed b y Streeter and Golo vin [ 36 ]. The remainder of our pap er is organized as f ollows: In Section 2 , w e provide some imp ortan t bac kground on online learning theory and formally deﬁne the adv ersarial spreading mo del and notions of regret to b e studied in our paper. In Section 3 , we presen t upp er and low er b ounds for pseudo-regret in the adv ersarial setting. W e conclu de the pap er with a selection of op en researc h questions in Section 4 . All pro ofs, as wel l as a more tec hnical discussion of related wor k , is con tained in the app endices. Notation. F or a set A , let 2 A denote the p o we r set of A . When w e w an t to sp ecify that we are taking the exp ectation with resp ect to a particular distributio n p of s ome rando m v ariable X , w e write E X ∼ p . In particular, w e often write E S ∼ p to mean the exp ectation taken ov er the play er’s actions for a ﬁxed set of adversarial actions, which is the same as the conditional exp ectation with resp ect to the adv ersary’s actions. Similarly , w e write E A to indicat e the conditional exp ectation with resp ect to a ﬁxed set of pla yer actions. 2 Bac kground and prelimina ri e s W e b egin b y formally deﬁning the rep eated game b etw een the play er and adv ersary and the t yp es of strateg ies w e will analyze in our pap er. Next, w e in tro duce the notions of regret w e will study , and then connect our setting to related work in the learning t heory literature. 2.1 A dversa rial rep eated games Consider a ﬁxed graph G = ( V , E ) on n v ertices, whic h ma y b e directed or undirected . The adv ersarial inﬂuence maximization problem ma y be describ ed as follo ws: Rep eatedly ov er T roun ds, the pla yer selects an inﬂuence s eed set S t ⊆ V , with |S t | = k , for t = 1 , . . . , T . A t the same time, the adv ersary designate s a subset of edges A t ⊆ E to b e “op en.” A node is considered to b e inﬂuenced at time t if and only if it is an elemen t of S t or is reac hable from S t via a path of op en edges. Note that in the con text of inﬂuence spreading, the op en edges corresp ond to ties o ver whic h inﬂuence propagates in that round—importan tly , inﬂuence only has an opportunit y to be transmitted b et ween individuals that inte ract in the net work, but ma y not necessarily spread o ver a particular connection on a sp eciﬁc round. In the case when G is an undirected graph, designating an edge to b e op en allow s an inﬂuence campaign to spread in b oth directions. F urthermore, in the directed case, edges ma y exist in b oth directions b et w een a giv en pair of no des, in which case the adv ersary ma y designate b oth, one, or neither of the edges to b e open. F or an op en edge set A ⊆ E and inﬂuence seed set S ⊆ V , w e deﬁn e f ( A, S ) to b e the fraction of vertices in the graph lying in the inﬂuenced set. T o connect our mo del to the canonical setting of inﬂuence max imization, note that [ 25 ] prop osed a very general class of inﬂuence models called trigge ring mo dels, whic h include the indep enden t 3 cascade and the linear threshold mo dels as sp ecial cases. A t the b eginning of the inﬂuence campaign, eac h no de c ho oses a random “triggering” subset of neigh b ors according to a particular rule, and the incoming edges from those neigh b ors are designated to b e “active. ” A vert ex becomes inﬂuenced during the course of the process if and only if a path of activ e edges exis ts connecting that v ertex to a vertex in the seed set. Th us, triggering mo dels corresp o nd to a sp ecial case of our framew ork, in whic h the edge sets are c hosen in an i.i.d. manner f rom round to round, and the probab ility distribution ov er the edges is determined b y the probabilit y rule through whic h edges are assigned to b e active (e.g., according to the linear threshold or indep enden t cascade mo dels). Next, we describe the classes of strategies A = {A t } and S = {S t } a v ailable to the adv ersary and pla yer. W e assume that the adv ersary is oblivious of the pla yer’s actions; i.e., at time t = 0 , the adv ersary must decide on the (p ossibly random) strategy A . W e use A to denote the set of oblivious adv ersary strategies and A d to denote the set of deterministic adversary strategies. T urning to the classes of pla yer strategie s , w e allo w the pla yer to choose his or her action at time t based on the feedbac k pro vided in resp onse to the join t acti ons made b y the pla yer and adv ersary on preceding time steps. Although the pla y er k nows the edge set E of the underlying graph, w e assume that the pla y er only observ es the status of edges ( i, j ) such that either i or j is in the reac h of S t (in the undirected case), and the play er observes the status of ev ery edge ( i, j ) such that i is in the reac h of S t (in the directed case). In other w ords, whereas the play er cannot observe the subset of all edges that would have propagated inﬂuence in the net wor k , he or she will kno w whic h edges transmitted inﬂuence if reac hed by the inﬂuence cascade initialized using his or her seed set. F ormally , w e write I ( A t , S t ) to denote the set of edges with status know n to the pla yer (i.e., all edges in the s ubgraph induced by A t b elonging to connected comp onen ts cont aining nodes in S t ), and we deno te I t = ( I ( A 1 , S 1 ) , . . . , I ( A t , S t )) . If A t is c hosen via a sto chastic mo del suc h as the indep enden t cascade mo del with discrete time steps f o r inﬂuence campaign t , our setup tec hnically allows the pla ye r kno wledge of the status of an edge b et w een t w o v ertices u and v if b oth w ere actually inﬂuenced b y some other v ertex w . Realistically we w ould not w an t the status of edge ( u, v ) to b e returned as feedbac k, and we could enforce this b y p ositing a mo del of ho w each inﬂuence campaign pro ceeds. Ho wev er, this distinction do es not aﬀect our results or algorithms, and so w e do not f urther restrict the feedbac k I ( A t , S t ) . The pla yer can only make decisions based on the feedbac k observed in previous rounds, so an y allo wab le pla y er strateg y {S t } has the prop ert y that S t is a function of I t − 1 (p ossibly with additional randomization). W e denote the class of all pla yer s trategies b y P , and denote the sub class of all deterministic pla yer s trategies b y P d , meaning that S t is a deterministic function of I t − 1 . Note that strategies S t ∈ P d ma y still b e random , due to p ossible randomization of the adv ersary , but c onditione d on I t − 1 , the c hoice of S t is deterministic. 2.2 Minimax regret The pla yer wishes to devise a strategy that maximize s the aggregate num ber of inﬂuenced no des up to time T . Using the notation f rom the previous s ection, w e deﬁne the r e g r et of the pla y er to b e R T ( A , S ) = T X t =1 f ( A t , S ∗ ) − T X t =1 f ( A t , S t ) , (1) where S ∗ = arg max S : | S | = k T X t =1 f ( A t , S ) 4 is the optimal ﬁxed s et that the pla y er w ould ha ve chosen in hindsigh t with full k nowled ge of the adv ersary’s strategy . Note that the regret R T ( A , S ) ma y b e a random quan tit y due to randomness in b oth the adv ersary’s or pla y er’s strategies. A ccordingly , w e will seek to con trol the pseudo-r e gr et R T ( A , S ) := max S : | S | = k ( E A , S " T X t =1 f ( A t , S ) − T X t =1 f ( A t , S t ) #) , (2) where the exp ectation in equation ( 2 ) is taken with respect to p oten tial randomizatio n in b oth A and S . As in the standard learning theory literature [ 9 ], recall that the exp ected regret and pseudo-regret are generally related via the inequalit y R T ( A , S ) ≤ E [ R T ( A , S )] , although if A ∈ A d , w e ha ve R T ( A , S ) = E [ R T ( A , S )] . Our interest in the pseudo -regret rather than the exp ected regret is purely motiv ated b y the fact that the former quan tit y is often easier to b ound than the latter and that this simpliﬁcation is common in the literature on bandits. Finally , we in tro duce the sc ale d r e gr et R α ( A , S ) = α T X t =1 f ( A t , S ∗ ) − T X t =1 f ( A t , S t ) , (3) and the analogous quan tit y R α T ( A , S ) = max S : | S | = k ( E A , S " α T X t =1 f ( A t , S ) − T X t =1 f ( A t , S t ) #) . Note that α = 1 corresp onds to the unscaled v ersion. Our in terest in the expression ( 3 ) is again for theoretical purp oses, since w e ma y obtain con venie nt upp er b ounds on the scaled pseudo-regret in the case α = 1 − 1 e using an online greedy algorithm. Note that when k > 1 , the b enc hmark greedy algorithms used for inﬂuence maximization in the sto c hastic spreading setting are also only guaran teed to ac hiev e a  1 − 1 e  -appro ximation of the truth, so in some sense, the scaled regret ( 3 ) only requires the pla yer to perform comparably w ell in relation to the appropriately scaled optimal strategy . 3 Main results In this section, w e pro vide upper and low er b ounds for the pseudo-regret. Sp eciﬁcally , w e fo cus on the quantit y inf S ∈ P sup A∈ A R α T ( A , S ) , where the suprem um is tak en o ver the class of adversarial strategies, and the inﬁm um is take n ov er the class of pla y er s trategies based on the f eedbac k mo del w e ha ve described. In other wor ds, w e wish to charact erize the hardness of the inﬂuence maximization problem in terms of the play er’s b est p ossible strategy measured with resp ect to the w orst-case game . A rough outline of our approac h is as f ollo ws: W e establish up p er b ounds b y presen ting par- ticular strategies for the pla y er that ensure an appropria tely b ounded regret under all adversaria l strategies. F or lo wer b ounds, the general tec hnique is to pro vide an ensem ble of p ossible actions for the adv ersary that are diﬃcult f or the play er to distinguish in the inﬂuence maximization problem, whic h forces the pla yer to incur a certain lev el of regret. 5 3.1 Undirected graphs W e b egin b y deriving regret upp er b ounds for undirected graphs. W e initially restrict our atten tion to the case k = 1 . The proposed pla yer strategy for k > 1 , and corresponding regret b ounds, builds up on the results in the single-source setting. 3.1.1 Upp er b o u nds for a single source Consider a randomized play er strategy that selects S t = { i } with probabilit y p i,t . The pap er [ 9 ] suggests a method based on the Online Stochastic Mirror Descen t (OSMD) algorithm, whic h is sp eciﬁed b y loss estimates { ℓ i,t } and learning rates { η t } , as wel l as a Legendre function F . Here , w e commen t on the losses, and in order to av oid excessiv e tec h nicalities, w e defer additional details of the OSMD algorithm to the app endix. The most basic loss es timate, whic h follo ws from standard bandit theory and ignores all infor- mation ab out the graph, is b ℓ node i,t = ℓ i,t p i,t 1 1 1 S t = { i } , (4) where ℓ i,t = 1 − f ( A t , { i } ) is the loss incurred if the pla y er we re to c ho ose S t = { i } . Imp ortan tly , b ℓ node i,t is alwa ys computable for any choice the pla y er mak es at time t and is an un biased estimate of ℓ i,t . On the other hand, if S t = { i } and another no de j is inﬂuenced (i.e., in the connected component formed b y the open edges of A t ), the pla y er also kno ws the loss that woul d ha v e b een incurred if S t = { j } , since f ( A t , { i } ) = f ( A t , { j } ) . This motiv ates an alternativ e loss estimate that is nonzero ev en when S t 6 = { i } . In particular, we ma y express ℓ i,t = 1 n X j 6 = i ℓ t i,j , where ℓ t i,j is the indicator that i and j are in diﬀeren t connected componen ts formed b y the op en edges of A t . W e then deﬁne b ℓ sym i,t = 1 n X j 6 = i ℓ t i,j Z ij p i,t + p j,t , where Z ij = 1 1 1 S t ∩{ i,j }6 = ∅ . Note that b ℓ sym i,t is also an un biased estimate for ℓ i,t . The estimator b ℓ sym i,t is alw a y s computable b y the pla y er, since the v alue of ℓ t i,j is kno wn b y the play er whenev er S t is kno wn. W e call b ℓ sym i,t the sym metric loss . No w, w e s tate the follo wing regret b ounds: Theorem 1 (Symmetric loss, OSMD) . S upp ose the player uses the str ate gy S sym OSMD c orr esp onding to OSMD wi th the symmetric loss ˆ ℓ sym and appr op ri ate p ar amete rs. Then the pseudo-r e gr et satisﬁ es the b ound sup A∈ A R T ( A , S sym OSMD ) ≤ 2 1 4 √ T n . Remark 1. It i s ins tructive to c omp ar e the r esu lt of The or em 1 with analo gous r e gr et b ounds for generic multi-arme d b andits. When the OSMD algorithm is run with the loss estim ates ( 4 ) , standar d analysis establishes an upp e r b ound of 2 3 2 √ T n . Thus, using the symmetric loss, w hi ch lever ages the gr aphic al natur e of the pr oble m , pr o duc es slight gains. 6 3.1.2 Lo w er b o u nds W e no w establish lo wer b ounds for the pseudo-re gret in the case k = 1 . This furnishes a b etter understanding of the hardness of the adversarial inﬂuence maximization problem. The general approac h for deriving lo we r b ounds is to pro duce a strategy f or the adv ersary that forces the pla y er to incur a certain leve l of regret regardless of whic h strategy is c hosen. The intrin s ic diﬃcult y of online inﬂuence maximization ma y v ary widely depending on the top ology of the underlying graph, and metho ds for deriving low er b ounds ma y also diﬀer accordingly . In the case of a complete graph, w e hav e the follo wing result: Theorem 2. Supp ose G = K n is the c omplete gr aph on n ≥ 3 vertic es. T hen the pseudo-r e gr et satisﬁes the lower b ound 2 243 √ T ≤ inf S ∈ P sup A∈ A R T ( A , S ) . Remark 2 . Cle arly, a gap exists b etwe e n the lower b ound derive d in The or em 2 and the upp er b ound app e aring in The or em 1 . It is uncle ar which b ound, if an y, pr ovides t he pr op er min i max r ate. However, note that if the lower b ound wer e tight, it would imply that the pr op ortion of vertic es that the player misses by picking sub optimal sour c e sets is c onstant, me aning the numb er of additional vertic es the optimal sour c e vertex inﬂuenc es is line ar in the size of t he gr aph. This diﬀers substantial ly fr om the pseudo- r e gr et of or der √ n known to b e minimax optimal for the standar d multi-arme d b andit pr oble m (and arises, for i n stanc e, in the c ase of dir e cte d gr aphs, as discusse d in the next se ction). 3.1.3 Upp er b o u nds for m ultiple sources W e now turn to the case k > 1 , where the pla yer c ho oses mu ltiple source ve rtices at eac h t ime step. As discussed in Section 2 , we are in terested in b ounding the scaled pseudo-regret R α T ( A , S ) with α = 1 − 1 e , since it is diﬃcult to maximi ze the inﬂue nce even in an oﬄine setting, and the greedy algorithm is only guaran teed to pro vide a  1 − 1 e  -appro ximation of the truth. Our prop osed play er strategy is based on an online greedy adaptation of the strategy used in the s ingle-source setting, and the full details are giv en in the app endix. W e then hav e the followin g result concerning the scaled pseudo-regret: Theorem 3 (Symmetric loss , m ultiple sources) . Supp ose k > 1 and the player uses the s tr ate gy S sym ,k OSMD c orr esp onding to the Online Gr e e dy Algorithm with single-sour c e str ate gy S sym OSMD . Then the sc ale d pseudo-r e gr et satisﬁes the b ound sup A∈ A R (1 − 1 /e ) T ( A , S sym ,k OSMD ) ≤ 2 1 4 k √ T n. Comparing Theorem 3 to Theorem 1 , we see an additional factor of k in the ps eudo-regret upp er b ound. Similar results may b e derived when alternativ e single-source strategies are used as subroutines in the Online Greedy Algorithm. 3.2 Directed graphs W e no w deriv e upp er and lo w er b ounds for the pseudo-regre t in the case of directed graphs, when k = 1 . 7 3.2.1 Upp er b o u nds The symmetric loss do es not ha v e a clea r analog in the case of directed graphs. Ho wev er, w e ma y still use the no de loss estimate f or m ulti-armed bandit problems, given b y equation ( 4 ). This leads to the follo wing upp er b ound: Theorem 4. Supp ose the player uses the str ate g y S no de OSMD c orr esp onding to OSMD w i th the no de loss b ℓ no de and appr opriate p ar ameters. Then the pseudo-r e gr et satisﬁes the b ou n d sup A∈ A R T ( A , S no de OSMD ) ≤ 2 3 2 √ T n . Remark 3. In the c ase k > 1 , we may again use the Onlin e Gr e e dy Algorithm use d in Se ction 3.1.3 to obtain a player str ate gy c omp ose d of p ar al lel runs of a sin gle-sour c e str ate gy. If the player uses the sin gle-sour c e str ate gy S no de OSMD , we may obtain the sc ale d pseudo-r e gr et b ound sup A∈ A R (1 − 1 /e ) T ( A , S no de ,k OSMD ) ≤ 2 3 2 k √ T n. 3.2.2 Lo w er b o u nds Finally , w e pro vide a lo w er b ound for the directed complete graph on n v ertices. (This refers to the case where all edges are presen t and bidirectional.) W e ha ve the follo wing result: Theorem 5. Supp ose G is the dir e cte d c omple te gr aph on n vertic es. Then the pseudo-r e gr et sati sﬁes the lower b ound 1 48 √ 6 √ T n ≤ inf S ∈ P sup A ∈ A R T ( A , S ) . Notably , the low er b ound in Theorem 5 matc hes the upp er b ound in Theorem 4 , up to constan t factors. Thus, the minimax pseudo-regret for the inﬂuence m aximization problem is Θ( √ T n ) in the case of directed graphs. I n the case of undirected graphs, how ev er (cf. Theorem 2 ), w e only obtained a pseudo-regret lo w er b ound of Ω( √ T ) . This is due to the fact that in undirected graphs, one ma y learn ab out the loss of other no des at time t b esides the loss at S t . In cont rast, it is p ossible to construct adv ersarial strategies for directe d graphs that do not pro vide information regarding the loss incurred b y choosing a source verte x othe r than S t . Finally , w e remark that a diﬀeren t c hoice of G migh t aﬀect the lo w er b ound, since inﬂuence maximization is easier for some graph top ologies than others. Ho w ever, Theorem 5 s ho ws that the case of the complete graph is alwa ys guaran teed to incur a pseudo-regret that matc hes the general upp er b ound in Theorem 4 , implying that this is the minimax optimal rate for an y class of graphs con taining the complete graph. 4 Discussion W e hav e prop osed and analyzed pla y er strategies that con trol the pseudo-regret uniformly across all p ossible oblivious adversari al s trategies. F o r the problem of single-source inﬂuence maximization in complete net w orks, w e ha v e also derived minimax lo w er bounds that establish the fundamen tal hardness of the online inﬂuence maximization problem. In particular, our low er and upp er b ounds 8 matc h up to constan t factors in the case of directed complete graphs, implying that our proposed pla y er strategy is in some s ense optimal. Our work inspires a n umber of in teresting questions f or future study . An imp ortan t op en q uestion concerns closing the gap b etw een upp er and lo wer b ounds on the minimax ps eudo-regret in the case of undirected graphs, to determine whet her the f eedbac k av ailable in the inﬂue nce maximiza tion setting actually makes the online game easier than a standard bandit s etting. F urthermore, our lo w er b ounds only hold in the case of comp lete graphs and single-source inﬂuence maximization, and it w ould b e worth while to obtain low er b ounds that hold for other netw ork top ologies and seed sets con taining m ultiple no des. Our results only address a small s ubset of problems that ma y b e p osed and answ ered concerning a bandit theory of adv ersarial inﬂuence maximization with edge-lev el feedbac k. References [1] L. A. Ada mic and E. Ada r. F riends and neigh b ors on the Web. So cial Networks , 25(3):211 – 230, 2003. [2] N. Alon, N. Cesa-Bianc hi, C. Gentil e, S. Mannor, Y. Mansou r, and O. Shamir. Nonsto chastic m ulti-armed bandits with graph-structured feedbac k. SIAM Journal on Computing , 46(6):1785– 1826, 2017. [3] S. Aral and D. W al ker. Creating so cial con tagion through viral product design: A randomized trial of p eer inﬂuence in net works. Management Scienc e , 57(9):1623–16 39, 2011. [4] J.-Y. Audib ert, S. Bubeck, and G. Lugosi. Regret in online com binatorial optimization. Math- ematics of Op er ations R ese ar ch , 39(1):31–45, 2013. [5] P . Auer, N. Cesa-Bianc hi, Y. F reund, and R. E. Schap ire. The nonsto cha s tic m ultiarmed bandit problem. SIAM journal on c omputing , 32(1):48–77, 2002. [6] G. Bartók, D. P . F oster, D. Pál, A. R akhlin, and C. Szep esvári. P artial monitoring— Classiﬁcation, regret b ounds, and algorithms. Mathematics of Op er ations R ese ar ch , 39(4):9 67– 997, 2014. [7] S. Bharathi, D. Kemp e, and M. Salek. Competitive inﬂuence maximization in so cial net works. Internet an d Network Ec onomics , pages 306–311, 2007. [8] C. Borgs, M. Brautbar, J. Chay es, and B. Lucier. Maximizing so cial inﬂuence in nearly op- timal time. In Pr o c e e dings of the T wenty-Fifth Annual A CM-SIAM Symp osium on Discr ete Alg orithms , pages 946–957. SIAM, 2014. [9] S. Bub ec k and N. Cesa-Bianc hi. Regret analysis of s to c hastic and nonstoc hastic m ulti-armed bandits. F oundations and T r ends in Machine L e arn i ng , 5(1):1–122, 2012. [10] N. Cesa-Bianc hi and G. Lugosi. Pr e diction, L e ar n ing, and Games . Cam bridge Univ ersit y Press, New Y ork, NY, 2006. [11] W. Chen, L. V. Lakshmanan, and C. Castillo. Information and inﬂuence propagation in so cial net w orks. Synthesis L e ctur e s on Data Management , 5(4):1–177, 2013. 9 [12] W. Chen, T. Lin, Z. T an, M. Zhao, and X. Zhou. R obust inﬂuence maximization. Pr o c e e dings of the 22nd ACM S IGKDD International Confer enc e on Know le dge Disc overy and Data Mining , 2016. [13] W. Chen, Y. W ang, Y. Y uan, and Q. W ang. Com binatorial m ulti-armed bandi t and its exten- sion to probabilistically triggered arms. Journal of Machine L e arning R ese ar ch , 17(50):1–33, 2016. [14] J. Cheng, L. A damic, P . A. Do w, J. Klein b erg, and J. Lesk ov ec. Can cascades b e predicted? In Pr o c e e dings of the 23r d International Conf er enc e on WWW , pages 925–936. ACM, 2014. [15] P . Domingos and M. Ric hardson. Mining the net w ork v alue of customers. In Pr o c e e dings of the seventh A CM SIGKDD Internation al Confer enc e on Know le dge Dis c overy and Data Mining , pages 57–66. AC M, 2001. [16] K. Drak op oulos, A. Ozdaglar, and J. N. T sitsiklis. When is a net w ork epidemic hard to eliminate? Mathematics of Op e r ations R ese ar ch , 42(1 ):1–14, 2016. [17] S. Goel, D. J. W atts, and D. G. Goldstein. The structure o f online diﬀusion net wo rks . In Pr o c e e din gs of the 13th ACM c onfer e n c e on ele ctr onic c ommer c e , pages 623–638. ACM, 2012. [18] M. Grabisc h, A . Mandel, A. Rusino wsk a, and E. T anim ura. Strategic inﬂuence in so cial net- w orks. Mathematics of Op er ations R ese ar ch , 2017. [19] X. He and D. Kempe. Price of anarc h y f or the N -pla y er comp etitiv e cascade game with submo dular activ ation functions. In In ternational Confer enc e on W eb and Internet Ec onomics , pages 232–248. Springer, 2013. [20] X. He and D. Kemp e. Robust inﬂuence maximization. Pr o c e e dings of the 22n d ACM SIGKDD International C onfer enc e on Know le dge Disc overy and Data Mining , 2016. [21] X. He, K. Xu, D. Kemp e, and Y. Liu. Learning inﬂuence functions from incomplete observ ations. In A dvanc es in Neur al Informati on Pr o c essing S ystems , pages 2073–2081, 2016. [22] L. H u, B. Wilder, A. Y ada v, E. Rice, and M. T am b e. A ctiv ating the“breakfast club": Mo deling inﬂuence spread in natural-w orld so cial net w orks. arXiv pr eprint arXiv:1710.00364 , 2017. [23] D. Katz and R. L. Kahn. The so cial psycholo gy of or ganizations , v olume 2. Wiley New Y ork, 1978. [24] D. Kemp e, J. Klei nberg, and E. T ardos. Maximizing the spread of inﬂuence thr ough a social net w ork. In Pr o c e e dings of the Ninth ACM SI GKDD International Confer enc e on Know le dge Disc ove ry and Data M ining , KDD ’03, pages 137–146, New Y ork, N Y, USA, 2003. ACM. [25] D. Kempe, J. Klein b erg, and É. T ardos. Inﬂuen tial no des in a diﬀusion mo del for so cial net w orks. In Au tomata, language s and pr o gr amming , pages 1127–1138. Springer, 2005. [26] S. Lei, S. Maniu, L. Mo, R. Cheng, and P . Senellart. Online inﬂuence maximization. In Pr o c e e din gs of the 21th ACM SIGKDD International Conf er enc e on Know le dg e Disc overy and Data Mi ning , KDD ’15, pages 645–654, New Y ork, NY, USA, 2015. AC M. [27] J. Lesko v ec, L. A. A damic, and B. A. Huberman. The dy namics of viral mark eting. ACM T r ansactions on the W eb (T WEB) , 1(1):5, 2007. 10 [28] D. Lib en-No well and J. Kleinberg. The link-pred iction problem for so cial net w orks. Journal of the Americ an So ciety for Information Scienc e and T e chnolo gy , 58(7):1019–10 31, 2007. [29] H. Narasimhan, D. C. P arkes, and Y. Singer. Learnabilit y of inﬂuence in netw orks. In A dvanc es in Neur al Information Pr o c essing Systems , pages 3186–3194, 2015. [30] G. L. Nemhauser and L. A. W o ls ey . Best algorithms for approxim ating the maxim um of a submo dular set function. Mathematics of op er ations r ese ar ch , 3(3): 177–188, 1978. [31] G. Neu and G. Bartók. A n eﬃcien t algorithm for learning with semi-bandit feedbac k. I n International C onfer enc e on Algorithmic L e arning The ory , pages 234–248. Springer, 2013. [32] M. E. J. Newman. The structure and function of complex netw orks. S IAM r eview , 45(2):167– 256, 2003. [33] J. Olkho vsk ay a, G. N eu, and G. Lugosi. Online inﬂuence m aximization with lo cal observ ation s . arXiv pr eprint arXiv:1805.11022 , 2018. [34] S. Seiler, S. Y ao, and W. W ang. Do es online w ord of mouth i ncrease demand? (And how?) Evidence f rom a natural exp erimen t. Marketing Scienc e , 2017. [35] O. Sp orns. The hum an connectome: A complex net w ork. A nnals of the New Y ork A c ademy of Scienc es , 1224(1):109–12 5, 2011. [36] M. Streeter and D. Golovin. An online algorithm for maximizing submo dular functions. T e ch- nic al r ep ort , pages 1–35, 2007. [37] M. T ruso v, R. E. Buc klin, and K. P au we ls . Eﬀects of w ord-of-mouth v ersus traditional mar- k eting: ﬁndings from an in ternet s o cial net w orking site. Journal of Marketing , 73(5):90–102, 2009. [38] S. V as w ani, L. Lakshman an, and M. Sc hmidt. Inﬂuence max imization with bandits. arXiv pr eprint arXiv:1503.00024 , pages 1–12, 2015. [39] Q. W ang and W. Chen. Impro ving regret bounds for com binatorial semi-bandits with proba- bilistically triggered arms and its app lications. In A dvanc e s in Neur al In formation Pr o c essing Systems , pages 1161–1171, 2017. [40] D. J. W atts and P . S. Do dds. Inﬂuen tials, net works, and p ublic opinion formation. Journal of c onsumer r ese ar ch , 34(4 ):441–458, 200 7. [41] Z. W en, B. K veton, and M. V al ko. Online inﬂuence maximization under indep enden t cascade mo del with semi-bandit f eedbac k. A dvanc es in Neu r al Information Pr o c essing Systems , 2017. 11 A Related w ork Here, w e commen t more thoroughly on imp ortan t relationships b et w een our problem setting and v arious online games existing in the learning theory literature. A key diﬀerence b et w een the graph con tagion setting and the s tandard m ulti-armed bandit setting is that in the latter case, the only information a v ailable to the pla yer on eac h round is the rew ard obtained as a consequence of his or her actions. On the other hand, s ligh tly more information is a v ailable to the pla y er in our s etting, since the pla ye r ma y often deduce additional information ab out which vertic e s would have b e en inﬂuenced for a diﬀeren t cho ice of source v ertices, based on observing the scop e of the inﬂuence pro cess f or a particular c hoice of source vertices. As a concrete example, the pla ye r knows that exactly the same set of nodes w ould ha ve b een inﬂuenced if he or she had c hosen to inﬂuence a diﬀeren t seed no de in the same connected comp onen t of the subgraph indu ced b y the inﬂuenced no des and adv ersarially cho s en edges. Online games with partial monitoring [ 6 ] or graph-based feedbac k [ 2 ] generalize the bandit setting to rep eated games in whic h the pla y er ma y observe f eedbac k corresp onding to v arious subsets of other actions in addition to or instead of observing the feedbac k corresp onding to his or her actions. Although s uc h games resem ble our problem setting, the p ossi ble actions a v ailabl e of the pla y er in our case correspond to subsets of nodes of size k , leading to a rather complicated feedbac k graph that is additionally aﬀected by the adv ersary’s actions. A nother online game with a similar ﬂa v or is the com binatorial prediction setting [ 4 ], where the play er is allo wed to pull a subset of arms on eac h round, and observ es a loss equal to the s um of losses of the pulled arms in the case of bandit f eedbac k or a subv ector of loss es corresp onding to the pulled arms, in the case of semi-ba ndit feedbac k [ 31 ]. Our problem ma y b e cast as a type of com binatorial prediction game with a feedbac k graph that v aries from rou nd to round and is unkno wn to the pla yer. Note that the comb inatorial game with edge semi-bandit feedbac k has b een studied recen tly in the inﬂuence maximization literature [ 13 , 38 , 41 , 39 , 33 ], but these results only apply to sto c hastic adv ersaries, rather than the more general non-sto c hastic framew ork w e s tudy in this pap er. Edge s emi-bandit feedbac k refers to the fact that in a directed graph, the pla yer receiv es feedbac k ab out the transmission status of diﬀeren t subsets of edges, corresp onding to the outgoing edges from the nodes he or she c ho oses to seed on each round. B Pro ofs W e no w outline the pro ofs of our main results. B.1 Upp er b ounds for adv ersarial mo dels In this section, w e pro ve our upp er bounds. T o this end , w e des crib e the OSMD algorit hm, whic h generates a sequence of probabilit y distributions { p t } to b e emplo yed b y the pla yer on successiv e rounds. Let ∆ n ⊆ R n denote the probabilit y simplex. Online Sto c hastic Mirror Descen t (OSMD) with loss estimates { b ℓ i,t } Giv en: A Legendre function F deﬁned on R n , with asso ciated Bregman diverge nce D F ( p, q ) = F ( p ) − F ( q ) − ( p − q ) T ∇ F ( q ) , 12 and a learning rate η > 0 . Output: A sto chastic pla y er strategy {S t } . Let p 1 ∈ arg min p ∈ ∆ n F ( p ) . F or each round t = 1 , . . . , T : (1) Dra w a v ertex S t from the distribution p t . (2) Compute the vecto r of loss estimates b ℓ t = { b ℓ i,t } . (3) Set w t +1 = ∇ F ∗  ∇ F ( p t ) − η b ℓ t  , where F ∗ is the con vex conjugate of F . (4) Compute the new distribution p t +1 = arg min p ∈ ∆ n D F ( p, w t +1 ) . In general, the OSMD algorithm is deﬁned with resp ect to a compact, conv ex s et K ⊆ R n . The up dates are c haracterized b y noisy estimates of the gradient of the loss function, whic h w e ma y con v enien tly deﬁne to b e b ℓ t in the presen t scenario. F or more details and general izations, we refer the reader to [ 9 ]. W e will us e the followin g result: Prop osition 1 (Theorem 5.10 of [ 9 ]) . L et the loss functions { ℓ i,t } b e nonn e gativ e an d b ounde d by 1. The str ate gy S c orr esp ondi n g to the OSM D algorithm with loss estimates b ℓ , le arnin g r ate η > 0 , and L e gendr e function F ψ , wher e ψ is a 0-p otential, satisﬁes the pseud o-r e gr et b ound sup A∈ A R T ( A , S ) ≤ sup p ∈ ∆ n F ψ ( p ) − F ψ ( p 1 ) η + η 2 T X t =1 n X i =1 E " b ℓ 2 i,t ( ψ − 1 ) ′ ( p i,t ) # . W e formally deﬁne 0-p otenti als and the asso ciated Legend re functions in App endix C . I n our analysis, w e tak e ψ ( x ) = 1 x 2 , yielding the Legendre function F ψ ( x ) = − 2 P n i =1 x 1 / 2 i . The pseudo- regret b ound in Pro p osition 1 ma y then be analyzed and b ounded accordin gly in v arious s ettings of inte rest. Details f or the pro of of Theorem 1 are also provided in A pp endix C . B.2 Lo wer bounds for adv ersarial models. W e no w turn to establishing the lo we r b ounds. The proofs of Theorem s 2 and 5 are based on the same general strategy , whic h is summarized in the follo wing proposition. T o unify our results with standard bandit notation ([ 9 ]), we use the shorthand X i,t = f ( A t , { i } ) to denote the rewa rd incurr ed at time t when the play er c ho oses S t = { i } . Then R T ( A , S ) = max 1 ≤ i ≤ n E A , S T X t =1 ( X i,t − X S t ,t ) . Prop osition 2. Consider a deterministic player str ate gy S ∈ P d . L et A 0 , A 1 , . . . , A n b e sto ch asti c adversarial str ate gies such that for e ach A i , the set of e dges playe d at tim e t is indep endent of the 13 p ast actions of the adversary. L et P 0 , P 1 , . . . , P n denote the c orr e sp onding me asur es on the f e e db ack I T , al lowin g for p ossible r andomization only in the str ate gy of the adversary. L et E i denote the exp e ctation with r esp e ct to P i . Supp ose r ≤ min j 6 = i E i [ X i,t − X j,t ] , ∀ 1 ≤ t ≤ T , (5) and n X i =1 K L ( P 0 , P i ) ≤ D . (6) Then r T n − 1 n − r D 2 n ! ≤ 1 n n X i =1 E i T X t =1 ( X i,t − X S t ,t ) . (7) In p articular, if the b ounds ( 5 ) and ( 6 ) hold uniformly f or al l choic es of S ∈ P d , then r T n − 1 n − r D 2 n ! ≤ inf S ∈ P sup A∈ A R T ( A , S ) . (8) Remark 4. W e r emark brieﬂy ab out the r oles of the str ate gies {A i } app e aring in Pr op osition 2 . In pr actic e, the str ate gies ar e chosen to b e similar, exc ep t sele c tin g i as the sour c e no de is slightly mor e advantage ous when the adversary uses str ate gy A i . The str ate gy A 0 is a b aseline str ate gy that tr e ats al l no des identic al ly. Thus, the lower b ou n d pr ovide d by Pr op osition 2 is the pr o duct of the c ost of an inc orr e ct choic e of the sour c e vertex, given by r , and a factor that determines how e asy it is to distinguish the adversary str ate g ies fr om e ach other, which de p ends on D . Pr o of . Pro of of Proposition 2 . W e follo w the method used in the pro of of Theor em 3.5 in [ 9 ]. W e ﬁrst sho w ho w to obtain the b ound ( 8 ) from the set of uniform b ounds ( 7 ). Note that for an y S ∈ P , w e hav e sup A∈ A R T ( A , S ) = sup A∈ A max 1 ≤ i ≤ n E A , S T X t =1 ( X i,t − X S t ,t ) ≥ max 1 ≤ j ≤ n max 1 ≤ i ≤ n E S E A j T X t =1 ( X i,t − X S t ,t ) = max 1 ≤ j ≤ n max 1 ≤ i ≤ n E S E j T X t =1 ( X i,t − X S t ,t ) ≥ max 1 ≤ i ≤ n E S E i T X t =1 ( X i,t − X S t ,t ) ≥ E S " 1 n n X i =1 E i T X t =1 ( X i,t − X S t ,t ) # , where w e hav e used the fact that the maxim um is at least as large as the a verage in the ﬁnal inequalit y . Since an y pla y er strategy in P lies in the con vex h ull of deterministic pla yer strategies, a uniform b ound ( 7 ) ov er P d implies that inequality ( 8 ) holds, as w ell. 14 W e no w turn to the pro of of inequalit y ( 7 ). The idea is to sho w that on a verage , the pla ye r incurs a certain loss whe never the wro ng source v ertex is pla ye d, and this ev en t m ust happ en suﬃcien tly often. W e ﬁrst write E i T X t =1 ( X i,t − X S t ,t ) = T X t =1 E i   X j 6 = i ( X i,t − X j,t ) 1 1 1 {S t = { j }}   = X j 6 = i T X t =1 E i [ X i,t − X j,t ] E i  1 1 1 S t = { j }  . In the last equality , w e hav e used the ass umption that the adversary’s action at each time is independent of the past to conclude that the diﬀerence in rewards X i,t − X j,t (whic h dep ends on the adv ersary’s action at time t ) is indep endent of the indicator 1 1 1 S t = { j } (whic h dep ends on the sequence of feedbac k received up to time t − 1 ). Using the b ound ( 5 ), it follo ws that E i T X t =1 ( X i,t − X S t ,t ) = X j 6 = i r E i [ T j ] , where T i = |{ t : S t = { i }}| denotes the nu mber of times vertex i is selected as the source. No w let U T denote a verte x dra wn according to the distribution q T = ( q 1 ,T , . . . , q n,T ) , where q i,T = T i T . The deriv ations ab ov e imply that E i T X t =1 ( X i,t − X S t ,t ) = r T X j 6 = i P i { U T = j } = r T (1 − P i { U T = i } ) , so 1 n n X i =1 E i T X t =1 ( X i,t − X S t ,t ) = r T 1 − 1 n n X i =1 P i { U T = i } ! . (9) By Pinsker’ s inequali ty , w e ha ve P i { U T = i } ≤ P 0 { U T = i } + r 1 2 K L ( P ′ 0 , P ′ i ) , where P ′ i denotes the distribution of U T under the adv ersarial strategy A i . By Jensen’s inequa lity , w e therefore hav e 1 n n X i =1 P i { U T = i } ≤ 1 n + 1 n n X i =1 r 1 2 K L ( P ′ 0 , P ′ i ) ≤ 1 n + v u u t 1 2 n n X i =1 K L ( P ′ 0 , P ′ i ) . (10) Finally , the cha in rule for KL diverge nce impli es tha t K L  P ′ 0 , P ′ i  = K L ( P 0 , P i ) + X I T P 0  I T  K L  P ′ 0 {·| I T } , P ′ i {·| I T }  . (11) Note that conditional on I T , the distribution of U T is the s ame under P ′ 0 and P ′ i , since the pla yer uses a deterministic s trategy . Th us, equation ( 11 ) implies that n X i =1 K L  P ′ 0 , P ′ i  = n X i =1 K L ( P 0 , P i ) ≤ D . (12) Com bining inequalities ( 9 ), ( 10 ), and ( 12 ), w e arrive at the desired result ( 7 ) . T o pro v e Theorems 2 and 5 , it th us remains to ﬁnd an appropriate set of strategies {A 0 , A 1 , . . . , A n } and verify the b ounds ( 5 ) and ( 6 ). Details for the pro ofs are pro vided in App endix E.1 . 15 C A dditional onl ine upp er b ound pro ofs In this App endix, w e provide pro ofs for the pseudo-regret of pla yer strategies based on the OSMD algorithm. W e b egin with some preliminaries. C.1 Preliminaries W e ﬁrst describ e the f unction F ψ . Recall that a con tin uous function F : D → R is a Legendre function if F is strictly con v ex, F has con tinu ous ﬁrst partial deriv ativ es on D , and lim x → D\D k∇ F ( x ) k = ∞ . The analysis in this paper concerns a v ery sp eciﬁc t yp e of Legendre function asso ciated to a 0- p oten tial, as describ ed in the follo wing deﬁnition: Deﬁnition 1. A function ψ : ( −∞ , a ) → R + is c al le d a 0 -p otential i f it is c onvex, c ontinuously diﬀer entiable , and satisﬁes the f ol lowing c onditions: lim x →−∞ ψ ( x ) = 0 , lim x → a ψ ( x ) = ∞ , ψ ′ > 0 , Z 1 0 | ψ − 1 ( s ) | ds ≤ ∞ . W e additional ly deﬁne the asso ciate d function F ψ on (0 , ∞ ) n by F ψ ( x ) = n X i =1 Z x i 0 ψ − 1 ( s ) ds. In particular, w e will consider the 0-p otenti al ψ ( x ) = ( − x ) − q . Then ψ − 1 ( x ) = − x − 1 q , so F ψ ( x ) = − q q − 1 n X i =1 x q − 1 q i . Sp eciﬁcally , we will consider the case q = 2 (the same ana ly s is could b e p erformed with resp ect to q > 1 , and then the ﬁnal b ound could b e optimized o ver q ). T o emplo y Prop osition 1 , we need to b ound tw o summands. The followin g simple lemma b ounds the ﬁrst term: Lemma 1. When ψ ( x ) = 1 x 2 , we hav e the b ound F ψ ( p ) − F ψ ( p 1 ) ≤ 2 √ n, ∀ p ∈ ∆ n . Pr o of . Pro of. Since F ψ ( p ) ≤ 0 and k p 1 k 1 = 1 , Hölder’s inequalit y implies that F ψ ( p ) − F ψ ( p 1 ) ≤ 2 n X i =1 p 1 / 2 1 ,i ≤ 2 n 1 2 . This completes the pro of of the lemma. All that remains is to analyze the loss-sp eciﬁc term app earing in Prop osition 1 and c ho ose η appropriatel y . 16 C.2 Pro of of Theorem 1 W e ﬁrst pro ve the followin g lemma: Lemma 2. W e have the ine qual i ty n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t ) # ≤ √ 2 n, ∀ 1 ≤ t ≤ T . (13) Pr o of . Pro of. Let F t denote the sigma-ﬁeld of all actions up to time t . W e ha v e n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t )     F t − 1 # ( a ) = 2 n X i =1 p 3 / 2 i,t E h ( b ℓ sym i,t ) 2 |F t − 1 i ( b ) ≤ 2 n X i =1 p i,t ! 1 / 2 n X i =1  p i,t E h ( b ℓ sym i,t ) 2 |F t − 1 i 2 ! 1 / 2 = 2 n X i =1  p i,t E h ( b ℓ sym i,t ) 2 |F t − 1 i 2 ! 1 / 2 , (14) where w e ha ve used the facts that ( ψ − 1 ) ′ ( x ) = 1 2 x − 3 / 2 and p t is measurable with resp ect to F t − 1 to establish ( a ) , and applied Hölder’s inequalit y to obtain ( b ) . W e no w insp ect the conditional exp ectation more closely . W e ha ve E h ( b ℓ sym i,t ) 2 |F t − 1 i = E     1 n X j 6 = i 1 p i,t + p j,t ℓ t i,j Z ij   2     F t − 1   , = 1 n 2 E   X j 6 = i X k 6 = i 1 ( p i,t + p j,t )( p i,t + p k ,t ) ℓ t i,j ℓ t i,k Z ij Z ik     F t − 1   = 1 n 2 E   X j 6 = i X k 6 = i 1 ( p i,t + p j,t )( p i,t + p k ,t ) ℓ t i,j ℓ t i,k Z i     F t − 1   + 1 n 2 E   X j 6 = i 1 ( p i,t + p j,t ) 2 ( ℓ t i,j ) 2 Z j     F t − 1   , where the third equalit y is due to the fact that Z ij Z ik is 1 only when i is the source v ertex or j = k 17 is the source vertex. Using the fact that ℓ t i,j is b ounded b y 1 , w e then obtain E h ( b ℓ sym i,t ) 2 |F t − 1 i ≤ 1 n 2 E   X j 6 = i X k 6 = i Z i ( p i,t + p j,t )( p i,t + p k ,t )     F t − 1   + 1 n 2 E   X j 6 = i Z j ( p i,t + p j,t ) 2     F t − 1   ≤ 1 n 2 X j 6 = i X k 6 = i p i,t ( p i,t + p j,t )( p i,t + p k ,t ) + 1 n 2 X j 6 = i p j,t ( p i,t + p j,t ) 2 ≤ 1 n 2 X j 6 = i X k 6 = i p i,t + p j,t ( p i,t + p j,t )( p i,t + p k ,t ) = 1 n 2 X j 6 = i X k 6 = i 1 p i,t + p k ,t ≤ 1 n n X k =1 1 p i,t + p k ,t . Com bining this result with the b ound ( 14 ), we ha ve n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t )     F t − 1 # ≤ 2   n X i =1 p i,t n n X k =1 1 p i,t + p k ,t ! 2   1 / 2 = 2 n   n X i =1 n X j =1 n X k =1 p 2 i,t ( p i,t + p j,t )( p i,t + p k ,t )   1 2 ≤ 2 n   n n X i =1 n X j =1 p i,t p i,t + p j,t   1 2 . No w, w e hav e the useful equation n X i =1 n X k =1 a i a i + a k = n 2 2 , (15) for an y nonn egative sequence { a i } n i =1 . This may b e seen via the f ollo wing algebraic manipulations: n X i =1 n X k =1 a i a i + a k = n X i =1 a i a i + a i + X k 6 = i a i a i + a k = n 2 + 1 2 X k 6 = i a i a i + a k + 1 2 X k 6 = i a k a i + a k = n 2 + 1 2 X k 6 = i a i + a k a i + a k = n 2 + n ( n − 1) 2 = n 2 2 . 18 App ealing to equation ( 15 ), we ma y replace the double sum b y n 2 2 and simplify the b ound: n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t )     F t − 1 # ≤ 2 n  n 3 2  1 2 = √ 2 n . T aking an additional exp ectation and using the to wer prop erty , w e arriv e at the desired inequali ty . Com bining Lemmas 1 and 2 with Prop osition 1 , w e then ha ve sup A∈ A R T ( A , S sym OSMD ) ≤ 2 √ n η + η T r n 2 . Optimizing ov er η , we tak e η = 2 3 4 T − 1 2 , whic h establishe s the desired b ound. D A dv ersarial i nﬂuence ma ximization with multiple sources In this App endix, w e pro ve results concerning m ultiple inﬂuence sources. First, w e need to giv e the precise algorithmic details of the online greedy algorithm. W e assume the play er is allo w ed to c ho ose source v ertices sequentia lly at time t and observes the corresp onding edge feedbac k immediately after eac h selection. The algorithm, inspired b y [ 36 ], is outlined b elow: Online Greedy Algorithm Giv en: A single-source play er strategy S 1 . Output: A k -source play er strategy S k = {S t } 1 ≤ t ≤ T . F or each t = 1 , . . . , T , choose S t = { v 1 ,t , . . . , v k ,t } s eq uential ly , as follo ws: (1) Select v 1 ,t according to the s ingle-source s trategy S 1 . (2) F or eac h i > 1 , select v i,t according to the single-source s trategy S 1 , based on the edge feedbac k I ( A t , { v 1 ,t , . . . , v i,t } ) \ I ( A t , { v 1 ,t , . . . , v i − 1 ,t } ) . In other word s , the Online G reedy Algorithm runs the pla ye r’s s trategy for single-source selection k times in parallel, with losses computed marginally for each successively c hosen v ertex. The “greedy” comp onent of the algorithm corresp onds to the f act that the play er make s a selection of the s et of i th source ver tices in the b est p ossible w a y based on the inf ormation a v ailable (i.e., according to the single-source strategy that is designed to incur a small pseudo-regret). Note that the feedbac k I ( A t , { v 1 ,t , . . . , v i,t } ) \ I ( A t , { v 1 ,t , . . . , v i − 1 ,t } ) is indeed computable by the pla y er when c ho osing the i th v ertex at round t , since the pla yer has already observed I ( A t , { v 1 ,t , . . . , v i − 1 ,t } ) after the ﬁrst i − 1 source no des are selected. Fix an adversari al s trategy A , and deﬁne the functions f t ( S t ) = f ( A t , S t ) and F ( S ) = P T t =1 f t ( S t ) . Th us, F ( S ) is the total reward for strategy S = {S t } . In the sto c hastic setting, when T = 1 , man y 19 inﬂuence-ma x imization analyses exploit the s ubmo dularity of f t under certain sto ch astic ass ump- tions on A t . In the bandit setting, w e wish to establish an analogous result for F , in order to establish regret b ounds when the pla yer chooses source vertices according to a greedy algorithm. Since S ∈ (2 V ) T , the function F is not tec hnically a set function. Ho we ver, w e ma y iden tify eac h pla yer strategy S with an elemen t of S ∗ ∈ 2 ( V T ) , and deﬁne F ∗ ( S ∗ ) = F ( S ) . Here, V T :=  v T =  v (1) , v (2) , . . . , v ( T )  | v ( i ) ∈ V , f or 1 ≤ i ≤ T  , and S ∗ = { u T 1 , . . . , u T k } ∈ 2 ( V T ) corresp onds to the strategy that selects the source no des { u 1 ( t ) , . . . , u k ( t ) } in round t . In more detail, let S ( i ) t = { s t (1) , . . . , s t ( i ) } ⊆ V denote the set of the ﬁrst i seed vertices in round t , where S (0) t = ∅ . Then, w e can write f t ( S t ) = k X i =1 f t  S ( i ) t  − f t  S ( i − 1) t  . One can then write the total rewar d as F ( S ) = T X t =1 k X i =1 f t  S ( i ) t  − f t  S ( i − 1) t  = k X i =1 T X t =1 f t  S ( i ) t  − f t  S ( i − 1) t  If w e deﬁne f ∗ i ( S ∗ ) = F ∗ ( { u T 1 , . . . , u T i } ) − F ∗ ( { u T 1 , . . . , u T i − 1 } ) = T X t =1 f t ( { u j ( t ) : j ≤ i } ) − f t ( { u j ( t ) : j ≤ i − 1 } ) and F ∗ ( S ∗ ) = P k i =1 f ∗ i ( S ∗ ) , then w e indeed get the desired equalit y F ( S ) = F ∗ ( S ∗ ) while also switc hing our summation f or submodularit y to b e o ver the i th v ertices as opp osed to the t th round. W e ﬁrst show that F ∗ is a monotone, s ubmodular function: Lemma 3. The function f t ( S t ) = f ( A t , S t ) i s monotone and submo dular, for every ﬁxe d A t . Pr o of . Pro of. It is trivial to see that f t is monotone, so w e fo cus on pro ving submo dularit y . Our goal is to s how that for a ﬁxed A t , and for any S t ⊆ S ′ t and u ∈ V \S ′ t , w e ha v e f t ( S ′ t ∪ { u } ) − f t ( S ′ t ) ≤ f t ( S t ∪ { u } ) − f t ( S t ) . (16) Let Z S t ,v denote the indicator of an op en path b et w een a source no de s ∈ S t and v ∈ V , where by con v ention , Z S t ,v = 1 if v ∈ S t . Note that f t ( S t ) = 1 n P v ∈ V Z S t ,v . W e will sho w that for v / ∈ S ′ t ∪ { u } , w e ha ve Z S ′ t ∪{ u } ,v − Z S ′ t ,v ≤ Z S t ∪{ u } ,v − Z S t ,v . (17) Summing o v er v ∈ ( S ′ t ∪ { u } ) c , using Z S t ∪{ u } ,v − Z S t ,v ≥ 0 for v ∈ S ′ t \ S t , and dividing by n will yield the desired inequality ( 16 ). W e ha v e three cases to consider: In the ﬁrst case, an op en path exists f rom some s ∈ S ′ t to v . Then the left side of inequalit y ( 17 ) is equal to 0 , while the righ t hand s ide is at least 0 b y monotonicit y . In the second case, an open path do es not exist from any s ∈ S ′ t to v , but an op en path exists from u to v . Then b oth sides of inequalit y ( 17 ) are equal to 1 . Finally , if no op en path exists f rom s ∈ S ′ t ∪ { u } to v , then b oth sides of inequalit y ( 17 ) are equal to 0 . This completes the pro of. 20 Prop osition 3. The function F ∗ is monoton e and submo dular . Pr o of . Pro of. The properties are essen tially immediate f rom Lemma 3 . Let P and Q b e eleme nts of  2 V  T suc h that P ∗ ⊆ Q ∗ . Then F ∗ ( P ∗ ) = T X t =1 f t ( P t ) ≤ T X t =1 f t ( Q t ) = F ∗ ( Q ∗ ) , pro ving monotonicit y . Similarly , if S ∈ (2 V ) T , w e ha ve F ∗ ( S ∗ ∪ Q ∗ ) − F ∗ ( Q ∗ ) = T X t =1 ( f t ( S t ∪ Q t ) − f t ( Q t )) ≤ T X t =1 ( f t ( S t ∪ P t ) − f t ( P t )) = F ∗ ( S ∗ ∪ P ∗ ) − F ∗ ( P ∗ ) , pro ving submo dularit y . By the standard greedy appro x imation ([ 24 , 30 ]), w e then ha ve  1 − 1 e  max |S ∗ |≤ K F ∗ ( S ∗ ) ≤ F ∗ ( G ∗ ) , where G ∗ is a set of cardinalit y K ≥ 1 constructed via a sequen tial greedy algorithm. Ho w ever, this result is not immed iately applicable to the online bandit setting, since w e do not ha ve direct access to F ∗ . Th us, w e can only hop e to obtain an appro ximate greedy maximizer e G ∗ , and we wish to deriv e theoretical guaran tees for F ∗ ( e G ∗ ) . Our result relies on the follo wing general prop osition: Prop osition 4 (Theorem 6 from [ 36 ]) . L et f : 2 V → R b e a monotone, sub mo dular function such that f ( ∅ ) = 0 . C onsider a set D ⊆ V and a se que nc e of err or toler anc e s { ǫ i } , an d supp ose { G ǫ i } is c onstructe d in an appr oximate gr e e dy manner, such that G ǫ 0 = ∅ and G ǫ i = G ǫ i − 1 ∪ { g i } , wher e max d ∈ D f ( G ǫ i − 1 ∪ { d } ) − f ( G ǫ i − 1 ) ≤ f ( G ǫ i − 1 ∪ { g i } ) − f ( G ǫ i − 1 ) + ǫ i . Then for any K ≥ 1 , we have  1 − 1 e  max S ∗ ∈ D K f ( S ∗ ) − f ( G ǫ K ) ≤ K X i =1 ǫ i , wher e D K c onsists of subsets of D c ontaining at m ost K elements. Proposition 4 ensures that for submo dular functions, successive errors { ǫ i } in a sequen tial greedy algorithm only accum ulate additiv ely . The pro of is pro vided in [ 36 ], but we include a pro of in App endix D.2 for completeness. 21 D.1 Pro of of Theorem 3 Supp os e A ∈ A . W e will apply Prop osition 4 with f = E A F ∗ , V = V T , and K = k . Note that E A F ∗ inherits monotonicit y and submo dularit y from F ∗ . Also let D = { ( v , . . . , v ) : v ∈ V } ⊆ 2 ( V T ) denote the diagonal set of 2 ( V T ) . F or a (non-random) k -s ource strategy S ∗ with |S ∗ t | = k for all t , we use the notation S ∗ = {S ∗ 1 , . . . , S ∗ k } , where S ∗ i corresp onds to the set of i th v ertices c hosen during the T rounds. Prop osition 4 immediately giv es  1 − 1 e  max S ∗ ∈ D E A F ∗ ( S ∗ ) − E A F ∗ ( G ǫ k ) ≤ k X i =1 max d i ∈ D E A [ F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )] , where the sets { G ǫ i } are c hosen in an appro ximate greedy manner, and ǫ i are upp er b ounded b y the regret for the i th instance of the s ingle-source algorithm. In particular, w e consider { G ǫ i } to b e the c hoice of i th v ertices S ∗ i corresp onding to the pla yer ’s c hoice under the strategy S 1 . W e no w take an exp ectation with respect to p ossible randomization in the pla yer ’s strategy , to obtain R (1 − 1 /e ) T ( A , S ) ≤ k X i =1 E S  max d i ∈ D E A  F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )   ( a ) = k X i =1 E S [1: i ]  max d i ∈ D E A  F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )   ( b ) = k X i =1 E S [1: i − 1]  E S i max d i ∈ D E A  F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )   . Here, E S [1: i ] denotes the expectation with resp ect to the ﬁrst i ver tices pla y ed, and the equalit y in ( a ) holds b ecause the s et of i th v ertices pla y ed dep ends only on the sets of the ﬁrst i v ertices pla y ed. The equalit y in ( b ) holds b ecause the s et G i − 1 , and hence the cho ice of d i , do es not dep end on the selection of i th v ertices. F urthe rmore, the inner expression is simply the pseudo-regret of strategy S 1 . By Theorem 1 , this is b ounded by 2 1 4 √ T n . Summing up, w e obtain the desired result. D.2 Pro of of Pr op osition 4 W e b egin with tw o supp orting lemmas: Lemma 4. F or any P ⊆ V and Q ⊆ D , we have f ( P ∪ Q ) ≤ f ( P ) + |Q| max v ∈ D [ f ( P ∪ { v } ) − f ( P )] . Pr o of . Pro of. W e proceed b y induction on |Q| . The case |Q| = 1 is immediate. No w supp ose the statemen t is true for all |Q| ≤ k , where k ≥ 1 . Let c ∈ D , and s upp ose Q ⊆ D has cardinalit y k . 22 Then f ( P ∪ ( Q ∪ { c } )) ( a ) ≤ f ( P ∪ { c } ) + |Q| max d ∈ D [ f ( ( P ∪ { c } ) ∪ { d } ) − f ( P ∪ { c } )] ( b ) ≤ f ( P ) + max d ∈ D [ f ( P ∪ { d } ) − f ( P )] + |Q| max d ∈ D [ f ( P ∪ { d } ) − f ( P )] = f ( P ) + |Q ∪ { c }| max d ∈ D [ f ( P ∪ { d } ) − f ( P )] , where ( a ) f ollows from the induction hypothesis and ( b ) follo ws f rom the induction hypothesis and submo dularit y . This completes the induction and pro ve s the lemma. Lemma 5. L et δ i := f ( G ǫ i ) − f ( G ǫ i − 1 ) . F or any Q ⊆ D , we have f ( Q ) ≤ f ( G ǫ i − 1 ) + |Q| ( δ i + ǫ i ) . Pr o of . Pro of. Using Lemma 4 and monotonicit y of f , w e hav e f ( Q ) ≤ f ( G ǫ i − 1 ∪ Q ) ≤ f ( G ǫ i − 1 ) + |Q| max d ∈ D [ f ( G ǫ i − 1 ∪ { d } ) − f ( G ǫ i − 1 )] ≤ f ( G ǫ i − 1 ) + |Q|  f ( G ǫ i ) − f ( G ǫ i − 1 ) + ǫ i  = f ( G ǫ i − 1 ) + |Q| ( δ i + ǫ i ) , completing the pro of. W e no w deﬁne ∆ i := max S ∗ ∈ D K f ( S ∗ ) − f ( G ǫ i − 1 ) . By Lemma 5 , we ha ve max S ∗ ∈ D K f ( S ∗ ) ≤ f ( G ǫ i − 1 ) + K ( δ i + ǫ i ) . Subtracting f ( G ǫ i − 1 ) , w e obtain ∆ i ≤ K ( δ i + ǫ i ) = K (∆ i − ∆ i +1 + ǫ i ) , so ∆ i +1 ≤ ∆ i  1 − 1 K  + ǫ i . Applying this inequalit y recursiv ely , w e s ee that ∆ K +1 ≤ ∆ 1 K Y i =1  1 − 1 K  + K X i =1 ǫ i = ∆ 1  1 − 1 K  K + K X i =1 ǫ i ≤ ∆ 1  1 e  + K X i =1 ǫ i . Rearranging and using the f act that f ( ∅ ) = 0 completes the pro of. E A dditional onl i ne lo w e r b ound pro ofs The main goal of this App endix is to pro ve Theorems 2 and 5 . Some of the computations are rather length y and are therefore included in App endix E.2 . 23 E.1 Pro ofs of theorems W e ﬁrst presen t the main comp onen ts of the pro ofs, follow ed by detailed calculations inv olving the Kullbac k-Leibler div ergence. E.1.1 Pro of o f Theorem 2 Let the adv ersarial strategi es {A i } b e deﬁned as follows: F or eac h strateg y , the adv ersary chooses a random subset of ve rtices, and op ens all edges b et w een v ertices in the subset. F o r A i , with 1 ≤ i ≤ n , the adv ersary includes vertex i with probabilit y c n , and includes all other v ertices with probabilit y c n (1 − δ ) each, where δ ∈ (0 , 1 / 2) is a small constan t. Finally , for A 0 , the adversary includes all vertice s indep enden tly with probabilit y c n (1 − δ ) . Successive actions of the adversar y are i.i.d. across time steps. W e no w deriv e the follo wing lemmas, whic h will b e used in Prop osition 2 : Lemma 6. F or any i 6 = j and 1 ≤ t ≤ T , we have E i [ X i,t − X j,t ] = ( n − 2) c 2 n 3 (1 − δ ) δ. Pr o of . Pro of. Let C t b e the clique c hosen by the adv ersary at time t . Note that if i, j ∈ C t or i, j / ∈ C t , the diﬀerence in rewar ds is 0. Th us, the only cases of intere s t in computing the exp ectation are when exactly one of i or j is in C t . Then E i [ X i,t − X j,t ] = E i  ( |C t | − 1) 1 1 1 i ∈C t 1 1 1 j / ∈C t − (1 − |C t | ) 1 1 1 i / ∈C t 1 1 1 j ∈C t  = 1 n  c n ( n − 2)(1 − δ )   c n  h 1 − c n (1 − δ ) i − 1 n  c n ( n − 2)(1 − δ )  h 1 − c n i h c n (1 − δ ) i = 1 n ( n − 2)  c n  2 (1 − δ ) h 1 − c n (1 − δ ) i − h 1 − c n i (1 − δ )  = ( n − 2) c 2 n 3 (1 − δ ) δ, where the second equalit y uses the fact that c n ( n − 2)(1 − δ ) other vertices are exp ected to b e in C t . Lemma 7. L et S ∈ P d b e a dete rmi nistic player str ate gy, and let T i = |{ t : S t = { i }}| . Then we have the upp e r b ound n X i =1 K L ( P 0 , P i ) ≤ c ( c + 1) n − c T δ 2 . The pro of of Lemma 7 is pro vided in App endix E.2.1 . Th us, b y Prop osition 2 , we ha ve inf S ∈ P sup A∈ A R T ( A , S ) ≥ T ( n − 2) c 2 n 3 (1 − δ ) δ n − 1 n − δ r T 2 n r c n − c ( c + 1) ! ≥ T 6  c n  2 n − 1 n δ − δ 2 r T 2 n r c ( c + 1) n − c ! , 24 where the s econd inequa lity us es the fact that n ≥ 3 and δ < 1 / 2 . Finally , we optimize ov er δ and c . Since we ha ve a quadratic equation in δ , w e take δ = n − 1 2 n r 2 n T r n − c c ( c + 1) , yielding inf S ∈ P sup A∈ A R T ( A , S ) ≥ T 6  c n  2 1 4  n − 1 n  2 r 2 n T r n − c c ( c + 1) ! = 1 12 √ 2 √ T  c n  2  n − 1 n  2 s n ( n − c ) c ( c + 1) ≥ 1 27 √ 3 √ T  c n 2  p n ( n − c ) , where the second inequalit y uses the b ounds n − 1 n ≥ 2 3 when n ≥ 3 , and c c +1 ≥ 2 3 when c ≥ 2 . The ﬁnal expression is optimized at c = 2 n 3 , yielding the desired lo w er b ound. Note that for this choice of c , w e indeed ha ve δ < 1 / 2 when T ≥ 2 . E.1.2 Pro of o f Theorem 5 Let the adv ersarial strategies {A i } b e deﬁned as follo ws: F or eac h strategy , the adv ersary indep en- den tly designate s ev ery v ertex to b e a s ource, sink, or neither . The adv ersary then op ens directed edges f rom all source ve rtices to all sink v ertices. F or A i , with 1 ≤ i ≤ n , the adv ersary designates v ertex i to be a source v ertex with probabilit y c n , and all other v ertices to b e source vert ices with probabilit y c n (1 − δ ) . All v ertices are designated to b e sink vertices with probabilit y d n . Finally , for A 0 , the adv ersary designates all vertices to b e source v ertices with probabil ity c n (1 − δ ) , and sink v ertices with probabilit y d n . Successiv e actions of the adv ersary are i.i.d. across time steps. W e no w deriv e the follo wing lemmas, whic h will b e used in Prop osition 2 : Lemma 8. F or any i 6 = j and 1 ≤ t ≤ T , we have E i [ X i,t − X j,t ] = ( n − 1) cd n 3 δ . Pr o of . Pro of. W e compute the exp ectation of eac h term separately . Let B t and C t denote the source and s ink vertice s at time t , resp ectiv ely . Note that X i,t = 1 n if i / ∈ B t ; otherwise, X i,t = 1+ |C t | n . Hence, n E i [ X i,t ] = E [ 1 1 1 i / ∈B t + (1 + |C t | ) 1 1 1 i ∈B t ] =  1 − c n  +  1 + ( n − 1) d n   c n  = 1 + ( n − 1) cd n 2 . The computation for X j,t is similar: n E i [ X j,t ] = E  1 1 1 j / ∈B t + (1 + |C t | ) 1 1 1 j ∈B t  =  1 − c n (1 − δ )  +  1 + ( n − 1) d n  c n (1 − δ ) = 1 + ( n − 1) cd n 2 (1 − δ ) . 25 T aking the diﬀerence b et we en these exp ectations prov es the lemma. Lemma 9. L et S ∈ P d b e a dete rmi nistic player str ate gy, and let T i = |{ t : S t = { i }}| . Then we have the upp e r b ound n X i =1 K L ( P 0 , P i ) ≤ c ( n − d ) n ( n − c − d ) T δ 2 . Essen tially , the Kullbac k-Leibler div ergence is of order 1 n , b ecause playing a sub optimal v ertex pro vides no information ab out which v ertex is optimal. This is unlike the case of the undirected graph, where the optimal v ertex is alw ays more likely to be conta ined in the feedbac k that the pla y er receiv es, and the KL div ergence do es not deca y with n . The pro of of Lemma 9 is pro vided in App endix E.2.2 . By Prop osition 2 , we then ha ve inf S ∈ P sup A∈ A R T ( A , S ) ≥ ( n − 1) cd n 3 δ T n − 1 n − δ r T 2 n s c ( n − d ) n ( n − c − d ) ! . Finally , we optimize o ver δ , c , and d . W e take δ = 1 2  n − 1 n  r 2 n T s n ( n − c − d ) c ( n − d ) , to obtain inf S ∈ P sup A∈ A R T ( A , S ) ≥ ( n − 1) cd 4 n 3  n − 1 n  2 T r 2 n T s n ( n − c − d ) c ( n − d ) = 1 2 √ 2 √ nT  n − 1 n  3 cd n 2 s (1 − c/n − d/n ) ( c/n ) (1 − d/n ) ≥ 1 16 √ 2 √ nT cd n 2 s (1 − c/n − d/n ) ( c/n ) (1 − d/n ) , where the last inequalit y uses the b ound n − 1 n ≥ 1 2 . Finally , using the fact that the function f ( x, y ) = xy s 1 − x − y x (1 − y ) ac hiev es its maxim um v alue of 1 3 √ 3 when ( x, y ) =  1 6 , 2 3  , w e obtain the b ound inf S ∈ P sup A∈ A R T ( A , S ) ≥ 1 48 √ 6 √ T n, when c = n 6 and d = 2 n 3 . 26 E.2 Pro ofs of KL b ounds In this App endix, we deriv e the required upper b ounds on the KL div ergence betw een adv ersarial strategies. W e b egin b y pro ving a useful tec hnical lemma. Recall that P i denotes the distribution of the edge feedbac k I T under strategy A i , and S ∈ P d is a ﬁxed deterministic pla yer strategy . Also recall that T i = |{ t : S t = { i }}| denotes the num ber of times vertex i is ch osen by the play er. Let P t i denote the distribution of the edge feedbac k I t under strategy A i , so P i = P T i . F or eac h pair of no des i and v and any 1 ≤ t ≤ T , deﬁne the f unction K L t i ( v ) to b e the KL divergen ce b et we en the edge feedbac k, conditioned on any I t − 1 suc h that S t = { v } : K L t i ( v ) = K L  P t 0 {·| I t − 1 } , P t i {·| I t − 1 }  . Note that K L t i ( v ) is indeed a w ell-deﬁned function of v , since conditioned on I t − 1 , the pla yer ’s action S t is deterministic . Hence, the randomness in I t is purely due to the sto c hastic action of the adversary at time t . Lemma 10. If K L t i ( v ) is indep endent of t , we have K L ( P 0 , P i ) = K L i ( i ) E 0 [ T i ] + X j 6 = i K L i ( j ) E 0 [ T j ] . (18) If i n addition K L i ( i ) is indep ende nt of i , for 1 ≤ i ≤ n , and K L i ( j ) is c onstant for al l nonzer o p airs i 6 = j , we have n X i =1 K L ( P 0 , P i ) = K L i ( i ) T + K L i ( j )( n − 1) T . (19) Pr o of . Pro of. Note that equation ( 19 ) follows immediately from equation ( 18 ) by summing o v er i and using the f act that P n i =1 E 0 [ T i ] = T . T o derive equation ( 18 ), we use the c hain rule for KL diverg ence: K L ( P 0 , P i ) = T X t =1 X I t − 1 P 0  I t − 1  K L  P t 0  ·| I t − 1  , P t i  ·| I t − 1  = T X t =1 n X v =1 X I t − 1 : S t = { v } P 0 { I t − 1 } K L t i ( v ) ( a ) = T X t =1 n X v =1 P 0 {S t = { v }} K L i ( v ) = T X t =1 P 0 {S t = { i }} K L i ( i ) + T X t =1 X j 6 = i P 0 {S t = { j }} K L i ( j ) , using the assumption that K L t i ( v ) is independen t of t in the equation ( a ) . Now w e simply recognize that E 0 [ T i ] = E 0 " T X t =1 1 1 1 S t = { i } # = T X t =1 P 0 {S t = { i }} to obtain the desired equalit y . 27 E.2.1 Pro of o f Lemma 7 Note that K L t i ( v ) is indep enden t of t , since the adversar y ’s actions are i.i.d. across time steps . F urthermore, K L i ( i ) is clearly indep enden t of i and K L i ( j ) is constant for all pairs i 6 = j , so equation ( 19 ) of Lemma 10 holds. W e ﬁrst compute an upp er b ound for K L i ( i ) . Let X denote the size of the connected comp onen t con taining i on a particular time s tep, based on the edges pla yed b y the adversary . Then K L i ( i ) = K L ( P 0 ( X ) , P i ( X )) , where we abuse notation sligh tly and write P i ( X ) to denote the distribution of X under adversarial strategy A i . Also let Y b e the indicator v aria ble that i is in the clique selected b y the adv ersary . By the c hain rule for the KL div ergence, K L ( P 0 ( X ) , P i ( X )) ≤ K L ( P 0 ( X, Y ) , P i ( X, Y )) . W e will derive an upp er b ound for the latter quan tity . In particular, the range of ( X , Y ) is { (1 , 0) , (1 , 1) } ∪ { ( m, 1) : 2 ≤ m ≤ n } . This leads to the follo wing expression for K L ( P 0 ( X, Y ) , P i ( X, Y )) :  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  +  c n (1 − δ )  1 − c n (1 − δ )  n − 1  log c n (1 − δ )  1 − c n (1 − δ )  n − 1 c n  1 − c n (1 − δ )  n − 1 ! + n X m =2  n − 1 m − 1  c n (1 − δ )  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m × log c n (1 − δ )  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m c n  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m ! =  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  + n X m =1  n − 1 m − 1  c n (1 − δ )  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m log(1 − δ ) =  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  + c n (1 − δ ) log (1 − δ ) . Applying the inequalit y log (1 + x ) ≤ x t wice, we then obtain K L ( P 0 ( X ) , P i ( X )) ≤  1 − c n (1 − δ )  cδ n 1 − c n − c n (1 − δ ) δ = cδ n  n − c (1 − δ ) n − c − (1 − δ )  = cδ 2 n − c . (20) 28 The computation for K L i ( j ) is similar. L et X denote the size of the connected comp onen t con taining j , and let C denote the clique c hosen b y the adv ersary . Deﬁne the random v ariable Y =      0 , if j / ∈ C 1 , if j ∈ C and i / ∈ C 2 , if i, j ∈ C . Again, it suﬃces to obtain a b ound on K L ( P 0 ( X, Y ) , P i ( X, Y )) . The range of ( X , Y ) is { (1 , 0) , (1 , 1) } ∪ { ( m, 1) : 2 ≤ m ≤ n − 1 } ∪ { ( m, 2) : 2 ≤ m ≤ n } . F urther note that P 0 (1 , 0) = P i (1 , 0) , so w e ma y ignore this term when computing the KL div ergence. W e then hav e followin g expression for K L ( P 0 ( X, Y ) , P i ( X, Y )) :  c n (1 − δ )   1 − c n (1 − δ )  n − 1 log  c n (1 − δ )   1 − c n (1 − δ )  n − 1  c n (1 − δ )   1 − c n   1 − c n (1 − δ )  n − 2 ! + n − 1 X m =2  n − 2 m − 1  c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 × log c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 c n (1 − δ )  1 − c n   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 ! + n X m =2  n − 2 m − 2   c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m × log  c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m  c n (1 − δ )   c n   c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m ! = n − 1 X m =1  n − 2 m − 1  c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 × log c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 c n (1 − δ )  1 − c n   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 ! + n X m =2  n − 2 m − 2   c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m × log  c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m  c n (1 − δ )   c n   c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m ! = c n (1 − δ )  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  +  c n (1 − δ )  2 log (1 − δ ) . W e once again use the inequalit y log(1 + x ) ≤ x to obtain K L ( P 0 ( X ) , P i ( X )) ≤ c n (1 − δ )  1 − c n (1 − δ )  cδ n 1 − c n ! −  c n (1 − δ )  2 δ =  c n  2 (1 − δ ) δ  n − c (1 − δ ) n − c − (1 − δ )  =  c n  2 (1 − δ ) δ 2 n n − c . (21) 29 Com bining inequalities ( 20 ) and ( 21 ) with equation ( 19 ) of Lemma 10 , w e obtain the b ound n X i =1 K L ( P 0 , P i ) ≤ c n − c δ 2 T +  c n  2 (1 − δ ) δ 2 n n − c ( n − 1) T ≤ c ( c + 1) n − c T δ 2 , completing the pro of. E.2.2 Pro of o f Lemma 9 Note that K L t i ( v ) is indep enden t of t , since the adversar y ’s actions are i.i.d. across time steps . F urthermore, K L i ( i ) is clearly indep enden t of i and K L i ( j ) is constant for all pairs i 6 = j , so equation ( 19 ) of Lemma 10 holds. Note that when S t = { j } , the distribution of the feedbac k I t is the same under P t 0 {·| I t − 1 } and P t i {·| I t − 1 } , since the vertex i is c hosen to b e a s ink v ertex with the same probabilit y d n under both A 0 and A i . Hence, K L i ( j ) = 0 . T o compute K L i ( i ) , let X denote the size of the inﬂuenced comp onen t contain ing i when S t = { i } , and deﬁne the random v ariable Y =      0 , if i is a sink v ertex 1 , if i is a source v ertex 2 , otherwise . As in the pro of of Lemma 7 , w e will upp er-b ound K L ( P 0 ( X, Y ) , P i ( X, Y )) , leading to an upp er b ound on K L ( P 0 ( X ) , P i ( X )) . The range of ( X , Y ) is { (1 , 0) , (1 , 2) } ∪ { ( m, 1) : 2 ≤ m ≤ n } . W e then hav e the follow ing expression for K L ( P 0 ( X, Y ) , P i ( X, Y )) : d n log  d/n d/n  +  1 − c n (1 − δ ) − d n  log 1 − c n (1 − δ ) − d n 1 − c n − d n ! + n X m =2  n − 1 m − 1  c n (1 − δ )  d n  m − 1  1 − d n  n − m × log c n (1 − δ )  d n  m − 1  1 − d n  n − m c n  d n  m − 1  1 − d n  n − m ! , = n − c − d + cδ n log  n − c − d + cδ n − c − d  + c n (1 − δ ) log (1 − δ ) n X m =2  n − 1 m − 1   d n  m − 1  1 − d n  n − m ≤ n − c − d + cδ n log  n − c − d + cδ n − c − d  + c n (1 − δ ) log (1 − δ ) . Using the inequalit y log (1 + x ) ≤ x , w e then ha ve K L ( P 0 ( X ) , P i ( X )) ≤ cδ ( n − c − d + cδ ) n ( n − c − d ) − c n (1 − δ ) δ = c ( n − d ) n ( n − c − d ) δ 2 . Applying Lemma 10 completes the pro of. 30

Adversarial Influence Maximization

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment