Adversarial Influence Maximization

We consider the problem of influence maximization in fixed networks for contagion models in an adversarial setting. The goal is to select an optimal set of nodes to seed the influence process, such that the number of influenced nodes at the conclusio…

Authors: Justin Khim, Varun Jog, Po-Ling Loh

A dv ersarial Influence Maximizat ion Justin Khim ∗ V arun Jog † P o-Ling Loh †‡ Jan uary 19, 2019 Abstract W e consider the problem of influence maximization in fixed networks for contagion models in an adversarial s etting. The goal is to select an optimal set of no des to seed the influence pro cess, suc h t ha t t he n umber of influenced no des a t the conclusion of the campaign is as large as po ssible. W e formulate the problem as a rep eated g ame between a player a nd adversary , where the adversary sp ecifies the edges along which the contagion may spread, a nd the play er chooses sets of nodes to influence in an online fashion. W e est ablish upper a nd lo wer bo unds o n the minimax pseudo - regret in bo th undirected and directed n et works. 1 In tro duction Man y data sets in con temp orary scien tific applications p oss ess some underlying net w ork struc- ture [ 32 ]. P opular examples include data collected from so cial media w ebsites such as F aceb o ok and T witter [ 1 , 28 ], or electro cortical recordings gathered from a net w ork of firing neurons [ 35 ]. An imp ortan t application of netw ork science arises in mark eting, where researc hers hav e studied the imp ortance of wor d-of-mouth advertising for decades [ 23 ]. More recen tly , methods hav e b een pro- p osed b y mark eting researc hers to q uan tify the imp ortance of wo rd-of-mouth mark eting in online so cial net w orks in b oth theory and practic e [ 40 , 37 ]. Subsequen t empirical studies suggest that w ord-of-mouth mark eting has a significan t effect in online so cial net w orks [ 3 , 34 ]. A t the s ame time, computer s cientists hav e analyzed the problem of viral marke ting from an optimization-theo retic p ersp ectiv e [ 15 , 27 , 11 ], where the goal is to select an optimal set of influencers to encourage pro duct adoption in an online so cial net wo rk. This has led to rigor ou s theor etical guaran tees that hold for sto c hastic mo dels of word- of -mouth adv ertising inspired by ph ysics and epidemiology , and the s cop e of the spread is quantifi ed using a notion kno wn as influence [ 24 ]. In so cial net w orks, edges represen t p oten tial int eractions b et w een individuals, and the problem of influence maximization corresp onds to identi f ying subsets of individu als on whic h to impress an idea so that information spreads as widely as p ossible sub ject to an adv ertising budget. F ormally , the influence of a subset of no des is defined as the exp ected n um b er of influenced individuals in a net wor k at the conclusion of a spread, starting from an initial configuration where only the sp ecified no des are influenced. E ven when the influence function is assumed to b e com- putable for an y subset using a blac k-b ox method in unit time, it is not clear whether influenc e maximization ma y b e p erformed (exactly or appro ximately) in p olynomial time, since searc hing o ver all s ubsets of k no des is exp onen tial in the n umber of no des. A ccordingly , the b o dy of w ork in theoretical computer science has mostly f o cused on sp ecific spreading mo dels that giv e rise to ∗ Department o f St atistics, Universi ty of P ennsylva n ia, Philadelphia, P A 19104. † Department o f Electrical & Computer Engineering, Univers ity of Wisconsi n , Madison, WI 53706. ‡ Department o f St atistics, Universi ty of Wisconsin, Madison, WI 53706 . 1 nice prop erties suc h as submo dularit y , implying that a greedy algorithm f or influence maximiza tion leads to a constan t-factor appro ximation of the optimal set [ 24 , 25 , 8 ]. Other related w ork includes predicting when kno wledge becomes v iral; limiting the spread of information through carefully po- sitioned inte rven tions [ 14 , 16 ]; or competitiv e s ettings of influence maximiza tion, e.g. c omp eting for v otes or mark et share [ 7 , 19 , 18 ]. A significan t shortcoming in the analysis of sto c hastic spreading mo dels is the fact that the parameters ch aracterizing the spread of influence are generally assumed to b e known , allo wing for appro ximate ev aluation of the influence function (either by analytic methods or s im ulation). Ho w ever, s uc h an as s umption is not alwa ys practical. In the case of indep enden t cascade mo dels or linear threshold models, where paramet ers corresp ond to edge w eigh ts in the netw ork, one migh t ev en question a scien tist’s prior kno wledge of the precise net w ork structure. T o address these issues, some authors ha ve studied the in teresting question of accurately learning the influence function itself in a stochastic spreadi ng mo del based on observing m ultiple rounds of infection [ 29 , 26 , 21 ]. Another appr oach in v olv es a notion of “robust influence maxi mization,” where the parame ters are only s p ecified to lie in fixed confidence sets, and the goal is to obtain a set of source ve rtices that appro ximately maximizes the true influence function, p ossibly in a w orst-case sense [ 12 , 20 ]. Robust influence maximization metho ds ma y also b e mo del-dep endent, meaning that a robust algorithm designed for the indep enden t cascade mo del may lead to a sev e rely non-optimal s olution if the influence spread actually follow s linear threshold mo del. Indeed, the parameters describing differen t mo dels, as w ell as the nature of uncertain ties p ermitted in them, may b e complet ely differen t. F urther, it is unclear that p opular mo dels of influence are go o d apprixmati ons of real-w orld b eha vior [ 17 , 22 ]. In this pap er, w e take a rather differen t approac h tow ard the problem of unkno wn spreadin g parameters that also a v oids assumptions about a particular spreading mec hanism. As discuss ed in more detail in Section 2 , w e only as s ume kno wledge of an underlying fixed graph represen ting the paths along whic h a influence may spread, where the case of no prior knowl edge corresp onds to a comple te graph. W e f orm ulate the influence maximization problem as an online game, where a “play er” m ust mak e sequen tial decisions ab out the next seed s et to c ho ose based on observing the b eha vior of the spread in previous “rounds” of the game. Here, a round represen ts a particular instance of an influence pro cess initialized from the specific seed no des from b eginning to end. W e allo w an “adv ersary” to c ho ose the path of influence on eac h round in a completely arbitrary manner, as long as the pro cess ma y only spread along edges of the graph—in particular, this setting subsumes the stochastic models usually adopted in the influence maximization literature, while allo wing for m uc h more general spreading mec hanisms (e.g., information do es not necessarily propagate in an i.i.d. manner ov er all rounds of the game). Note that the adversary’s strategy ma y b e so arbitrary as to b e “unlearnable.” Thus, instead of simply trying to maximize the aggregate n um b er of influenced v ertices across all rounds, w e seek to dev elop pla yer strategies that bound the “regret ” of the pla yer, defined as the difference b etw een the total n um b er of vertice s influenced using the play er’s strategy and the n um b er of verti ces that w ould ha ve b een influen ced if t he pla y er had adopted the best constan t c hoice of source set in hindsigh t. Such notio ns are tak en from the literature on m ulti- armed bandits and online learning theory [ 5 , 10 ], and adapted to the presen t setting. Our main con tribution is to deriv e upp er and lo w er b ounds on the pseudo -regret for v ario us adv ersarial and pla ye r s trategies. W e study both directed and undirected net w orks, where in the latter setting, con tagion is allo wed to spread in b oth directions when an edge is c hosen by the adv ersary . F urthermor e, w e deriv e lo w er b ounds for the minimax pseudo-regret when the underlying net w ork is a complete graph, where the suprem um is tak en o ver all adversarial strate gies and the infim um is tak en ov er all pla y er strategies. Our upper and low er b ounds matc h up to constan t factors in the case of directed netw orks. Notably , the b ounds also agree with the usual rate for 2 pseudo-regret in m ulti-armed bandits, show ing that no new information is gained b y the pla yer by exploiting net w ork structure. On the other hand, a gap exists b et ween our upp er and lo wer b ounds for undirected netw orks, lea ving op en the p os sibilit y that the pla yer may lev erage the additional information from the net w ork to incur less regre t. A dditionally , the constan t f actor on the upp er b ound ma y b e sligh tly impro ve d, pro viding f urther evidence that graph structure ma y b e exploited. Finally , w e demonstrate ho w to extend our upp er b ounds to the s etting where the play er is allo w ed to c ho ose mult iple source vertic es on eac h round . The prop osed m ulti-source play er strategy augmen ts the source set sequen tially using the single-source strategies as a subroutine, and is based on a general online greedy algorithm prop osed b y Streeter and Golo vin [ 36 ]. The remainder of our pap er is organized as f ollows: In Section 2 , w e provide some imp ortan t bac kground on online learning theory and formally define the adv ersarial spreading mo del and notions of regret to b e studied in our paper. In Section 3 , we presen t upp er and low er b ounds for pseudo-regret in the adv ersarial setting. W e conclu de the pap er with a selection of op en researc h questions in Section 4 . All pro ofs, as wel l as a more tec hnical discussion of related wor k , is con tained in the app endices. Notation. F or a set A , let 2 A denote the p o we r set of A . When w e w an t to sp ecify that we are taking the exp ectation with resp ect to a particular distributio n p of s ome rando m v ariable X , w e write E X ∼ p . In particular, w e often write E S ∼ p to mean the exp ectation taken ov er the play er’s actions for a fixed set of adversarial actions, which is the same as the conditional exp ectation with resp ect to the adv ersary’s actions. Similarly , w e write E A to indicat e the conditional exp ectation with resp ect to a fixed set of pla yer actions. 2 Bac kground and prelimina ri e s W e b egin b y formally defining the rep eated game b etw een the play er and adv ersary and the t yp es of strateg ies w e will analyze in our pap er. Next, w e in tro duce the notions of regret w e will study , and then connect our setting to related work in the learning t heory literature. 2.1 A dversa rial rep eated games Consider a fixed graph G = ( V , E ) on n v ertices, whic h ma y b e directed or undirected . The adv ersarial influence maximization problem ma y be describ ed as follo ws: Rep eatedly ov er T roun ds, the pla yer selects an influence s eed set S t ⊆ V , with |S t | = k , for t = 1 , . . . , T . A t the same time, the adv ersary designate s a subset of edges A t ⊆ E to b e “op en.” A node is considered to b e influenced at time t if and only if it is an elemen t of S t or is reac hable from S t via a path of op en edges. Note that in the con text of influence spreading, the op en edges corresp ond to ties o ver whic h influence propagates in that round—importan tly , influence only has an opportunit y to be transmitted b et ween individuals that inte ract in the net work, but ma y not necessarily spread o ver a particular connection on a sp ecific round. In the case when G is an undirected graph, designating an edge to b e op en allow s an influence campaign to spread in b oth directions. F urthermore, in the directed case, edges ma y exist in b oth directions b et w een a giv en pair of no des, in which case the adv ersary ma y designate b oth, one, or neither of the edges to b e open. F or an op en edge set A ⊆ E and influence seed set S ⊆ V , w e defin e f ( A, S ) to b e the fraction of vertices in the graph lying in the influenced set. T o connect our mo del to the canonical setting of influence max imization, note that [ 25 ] prop osed a very general class of influence models called trigge ring mo dels, whic h include the indep enden t 3 cascade and the linear threshold mo dels as sp ecial cases. A t the b eginning of the influence campaign, eac h no de c ho oses a random “triggering” subset of neigh b ors according to a particular rule, and the incoming edges from those neigh b ors are designated to b e “active. ” A vert ex becomes influenced during the course of the process if and only if a path of activ e edges exis ts connecting that v ertex to a vertex in the seed set. Th us, triggering mo dels corresp o nd to a sp ecial case of our framew ork, in whic h the edge sets are c hosen in an i.i.d. manner f rom round to round, and the probab ility distribution ov er the edges is determined b y the probabilit y rule through whic h edges are assigned to b e active (e.g., according to the linear threshold or indep enden t cascade mo dels). Next, we describe the classes of strategies A = {A t } and S = {S t } a v ailable to the adv ersary and pla yer. W e assume that the adv ersary is oblivious of the pla yer’s actions; i.e., at time t = 0 , the adv ersary must decide on the (p ossibly random) strategy A . W e use A to denote the set of oblivious adv ersary strategies and A d to denote the set of deterministic adversary strategies. T urning to the classes of pla yer strategie s , w e allo w the pla yer to choose his or her action at time t based on the feedbac k pro vided in resp onse to the join t acti ons made b y the pla yer and adv ersary on preceding time steps. Although the pla y er k nows the edge set E of the underlying graph, w e assume that the pla y er only observ es the status of edges ( i, j ) such that either i or j is in the reac h of S t (in the undirected case), and the play er observes the status of ev ery edge ( i, j ) such that i is in the reac h of S t (in the directed case). In other w ords, whereas the play er cannot observe the subset of all edges that would have propagated influence in the net wor k , he or she will kno w whic h edges transmitted influence if reac hed by the influence cascade initialized using his or her seed set. F ormally , w e write I ( A t , S t ) to denote the set of edges with status know n to the pla yer (i.e., all edges in the s ubgraph induced by A t b elonging to connected comp onen ts cont aining nodes in S t ), and we deno te I t = ( I ( A 1 , S 1 ) , . . . , I ( A t , S t )) . If A t is c hosen via a sto chastic mo del suc h as the indep enden t cascade mo del with discrete time steps f o r influence campaign t , our setup tec hnically allows the pla ye r kno wledge of the status of an edge b et w een t w o v ertices u and v if b oth w ere actually influenced b y some other v ertex w . Realistically we w ould not w an t the status of edge ( u, v ) to b e returned as feedbac k, and we could enforce this b y p ositing a mo del of ho w each influence campaign pro ceeds. Ho wev er, this distinction do es not affect our results or algorithms, and so w e do not f urther restrict the feedbac k I ( A t , S t ) . The pla yer can only make decisions based on the feedbac k observed in previous rounds, so an y allo wab le pla y er strateg y {S t } has the prop ert y that S t is a function of I t − 1 (p ossibly with additional randomization). W e denote the class of all pla yer s trategies b y P , and denote the sub class of all deterministic pla yer s trategies b y P d , meaning that S t is a deterministic function of I t − 1 . Note that strategies S t ∈ P d ma y still b e random , due to p ossible randomization of the adv ersary , but c onditione d on I t − 1 , the c hoice of S t is deterministic. 2.2 Minimax regret The pla yer wishes to devise a strategy that maximize s the aggregate num ber of influenced no des up to time T . Using the notation f rom the previous s ection, w e define the r e g r et of the pla y er to b e R T ( A , S ) = T X t =1 f ( A t , S ∗ ) − T X t =1 f ( A t , S t ) , (1) where S ∗ = arg max S : | S | = k T X t =1 f ( A t , S ) 4 is the optimal fixed s et that the pla y er w ould ha ve chosen in hindsigh t with full k nowled ge of the adv ersary’s strategy . Note that the regret R T ( A , S ) ma y b e a random quan tit y due to randomness in b oth the adv ersary’s or pla y er’s strategies. A ccordingly , w e will seek to con trol the pseudo-r e gr et R T ( A , S ) := max S : | S | = k ( E A , S " T X t =1 f ( A t , S ) − T X t =1 f ( A t , S t ) #) , (2) where the exp ectation in equation ( 2 ) is taken with respect to p oten tial randomizatio n in b oth A and S . As in the standard learning theory literature [ 9 ], recall that the exp ected regret and pseudo-regret are generally related via the inequalit y R T ( A , S ) ≤ E [ R T ( A , S )] , although if A ∈ A d , w e ha ve R T ( A , S ) = E [ R T ( A , S )] . Our interest in the pseudo -regret rather than the exp ected regret is purely motiv ated b y the fact that the former quan tit y is often easier to b ound than the latter and that this simplification is common in the literature on bandits. Finally , we in tro duce the sc ale d r e gr et R α ( A , S ) = α T X t =1 f ( A t , S ∗ ) − T X t =1 f ( A t , S t ) , (3) and the analogous quan tit y R α T ( A , S ) = max S : | S | = k ( E A , S " α T X t =1 f ( A t , S ) − T X t =1 f ( A t , S t ) #) . Note that α = 1 corresp onds to the unscaled v ersion. Our in terest in the expression ( 3 ) is again for theoretical purp oses, since w e ma y obtain con venie nt upp er b ounds on the scaled pseudo-regret in the case α = 1 − 1 e using an online greedy algorithm. Note that when k > 1 , the b enc hmark greedy algorithms used for influence maximization in the sto c hastic spreading setting are also only guaran teed to ac hiev e a  1 − 1 e  -appro ximation of the truth, so in some sense, the scaled regret ( 3 ) only requires the pla yer to perform comparably w ell in relation to the appropriately scaled optimal strategy . 3 Main results In this section, w e pro vide upper and low er b ounds for the pseudo-regret. Sp ecifically , w e fo cus on the quantit y inf S ∈ P sup A∈ A R α T ( A , S ) , where the suprem um is tak en o ver the class of adversarial strategies, and the infim um is take n ov er the class of pla y er s trategies based on the f eedbac k mo del w e ha ve described. In other wor ds, w e wish to charact erize the hardness of the influence maximization problem in terms of the play er’s b est p ossible strategy measured with resp ect to the w orst-case game . A rough outline of our approac h is as f ollo ws: W e establish up p er b ounds b y presen ting par- ticular strategies for the pla y er that ensure an appropria tely b ounded regret under all adversaria l strategies. F or lo wer b ounds, the general tec hnique is to pro vide an ensem ble of p ossible actions for the adv ersary that are difficult f or the play er to distinguish in the influence maximization problem, whic h forces the pla yer to incur a certain lev el of regret. 5 3.1 Undirected graphs W e b egin b y deriving regret upp er b ounds for undirected graphs. W e initially restrict our atten tion to the case k = 1 . The proposed pla yer strategy for k > 1 , and corresponding regret b ounds, builds up on the results in the single-source setting. 3.1.1 Upp er b o u nds for a single source Consider a randomized play er strategy that selects S t = { i } with probabilit y p i,t . The pap er [ 9 ] suggests a method based on the Online Stochastic Mirror Descen t (OSMD) algorithm, whic h is sp ecified b y loss estimates { ℓ i,t } and learning rates { η t } , as wel l as a Legendre function F . Here , w e commen t on the losses, and in order to av oid excessiv e tec h nicalities, w e defer additional details of the OSMD algorithm to the app endix. The most basic loss es timate, whic h follo ws from standard bandit theory and ignores all infor- mation ab out the graph, is b ℓ node i,t = ℓ i,t p i,t 1 1 1 S t = { i } , (4) where ℓ i,t = 1 − f ( A t , { i } ) is the loss incurred if the pla y er we re to c ho ose S t = { i } . Imp ortan tly , b ℓ node i,t is alwa ys computable for any choice the pla y er mak es at time t and is an un biased estimate of ℓ i,t . On the other hand, if S t = { i } and another no de j is influenced (i.e., in the connected component formed b y the open edges of A t ), the pla y er also kno ws the loss that woul d ha v e b een incurred if S t = { j } , since f ( A t , { i } ) = f ( A t , { j } ) . This motiv ates an alternativ e loss estimate that is nonzero ev en when S t 6 = { i } . In particular, we ma y express ℓ i,t = 1 n X j 6 = i ℓ t i,j , where ℓ t i,j is the indicator that i and j are in differen t connected componen ts formed b y the op en edges of A t . W e then define b ℓ sym i,t = 1 n X j 6 = i ℓ t i,j Z ij p i,t + p j,t , where Z ij = 1 1 1 S t ∩{ i,j }6 = ∅ . Note that b ℓ sym i,t is also an un biased estimate for ℓ i,t . The estimator b ℓ sym i,t is alw a y s computable b y the pla y er, since the v alue of ℓ t i,j is kno wn b y the play er whenev er S t is kno wn. W e call b ℓ sym i,t the sym metric loss . No w, w e s tate the follo wing regret b ounds: Theorem 1 (Symmetric loss, OSMD) . S upp ose the player uses the str ate gy S sym OSMD c orr esp onding to OSMD wi th the symmetric loss ˆ ℓ sym and appr op ri ate p ar amete rs. Then the pseudo-r e gr et satisfi es the b ound sup A∈ A R T ( A , S sym OSMD ) ≤ 2 1 4 √ T n . Remark 1. It i s ins tructive to c omp ar e the r esu lt of The or em 1 with analo gous r e gr et b ounds for generic multi-arme d b andits. When the OSMD algorithm is run with the loss estim ates ( 4 ) , standar d analysis establishes an upp e r b ound of 2 3 2 √ T n . Thus, using the symmetric loss, w hi ch lever ages the gr aphic al natur e of the pr oble m , pr o duc es slight gains. 6 3.1.2 Lo w er b o u nds W e no w establish lo wer b ounds for the pseudo-re gret in the case k = 1 . This furnishes a b etter understanding of the hardness of the adversarial influence maximization problem. The general approac h for deriving lo we r b ounds is to pro duce a strategy f or the adv ersary that forces the pla y er to incur a certain leve l of regret regardless of whic h strategy is c hosen. The intrin s ic difficult y of online influence maximization ma y v ary widely depending on the top ology of the underlying graph, and metho ds for deriving low er b ounds ma y also differ accordingly . In the case of a complete graph, w e hav e the follo wing result: Theorem 2. Supp ose G = K n is the c omplete gr aph on n ≥ 3 vertic es. T hen the pseudo-r e gr et satisfies the lower b ound 2 243 √ T ≤ inf S ∈ P sup A∈ A R T ( A , S ) . Remark 2 . Cle arly, a gap exists b etwe e n the lower b ound derive d in The or em 2 and the upp er b ound app e aring in The or em 1 . It is uncle ar which b ound, if an y, pr ovides t he pr op er min i max r ate. However, note that if the lower b ound wer e tight, it would imply that the pr op ortion of vertic es that the player misses by picking sub optimal sour c e sets is c onstant, me aning the numb er of additional vertic es the optimal sour c e vertex influenc es is line ar in the size of t he gr aph. This differs substantial ly fr om the pseudo- r e gr et of or der √ n known to b e minimax optimal for the standar d multi-arme d b andit pr oble m (and arises, for i n stanc e, in the c ase of dir e cte d gr aphs, as discusse d in the next se ction). 3.1.3 Upp er b o u nds for m ultiple sources W e now turn to the case k > 1 , where the pla yer c ho oses mu ltiple source ve rtices at eac h t ime step. As discussed in Section 2 , we are in terested in b ounding the scaled pseudo-regret R α T ( A , S ) with α = 1 − 1 e , since it is difficult to maximi ze the influe nce even in an offline setting, and the greedy algorithm is only guaran teed to pro vide a  1 − 1 e  -appro ximation of the truth. Our prop osed play er strategy is based on an online greedy adaptation of the strategy used in the s ingle-source setting, and the full details are giv en in the app endix. W e then hav e the followin g result concerning the scaled pseudo-regret: Theorem 3 (Symmetric loss , m ultiple sources) . Supp ose k > 1 and the player uses the s tr ate gy S sym ,k OSMD c orr esp onding to the Online Gr e e dy Algorithm with single-sour c e str ate gy S sym OSMD . Then the sc ale d pseudo-r e gr et satisfies the b ound sup A∈ A R (1 − 1 /e ) T ( A , S sym ,k OSMD ) ≤ 2 1 4 k √ T n. Comparing Theorem 3 to Theorem 1 , we see an additional factor of k in the ps eudo-regret upp er b ound. Similar results may b e derived when alternativ e single-source strategies are used as subroutines in the Online Greedy Algorithm. 3.2 Directed graphs W e no w deriv e upp er and lo w er b ounds for the pseudo-regre t in the case of directed graphs, when k = 1 . 7 3.2.1 Upp er b o u nds The symmetric loss do es not ha v e a clea r analog in the case of directed graphs. Ho wev er, w e ma y still use the no de loss estimate f or m ulti-armed bandit problems, given b y equation ( 4 ). This leads to the follo wing upp er b ound: Theorem 4. Supp ose the player uses the str ate g y S no de OSMD c orr esp onding to OSMD w i th the no de loss b ℓ no de and appr opriate p ar ameters. Then the pseudo-r e gr et satisfies the b ou n d sup A∈ A R T ( A , S no de OSMD ) ≤ 2 3 2 √ T n . Remark 3. In the c ase k > 1 , we may again use the Onlin e Gr e e dy Algorithm use d in Se ction 3.1.3 to obtain a player str ate gy c omp ose d of p ar al lel runs of a sin gle-sour c e str ate gy. If the player uses the sin gle-sour c e str ate gy S no de OSMD , we may obtain the sc ale d pseudo-r e gr et b ound sup A∈ A R (1 − 1 /e ) T ( A , S no de ,k OSMD ) ≤ 2 3 2 k √ T n. 3.2.2 Lo w er b o u nds Finally , w e pro vide a lo w er b ound for the directed complete graph on n v ertices. (This refers to the case where all edges are presen t and bidirectional.) W e ha ve the follo wing result: Theorem 5. Supp ose G is the dir e cte d c omple te gr aph on n vertic es. Then the pseudo-r e gr et sati sfies the lower b ound 1 48 √ 6 √ T n ≤ inf S ∈ P sup A ∈ A R T ( A , S ) . Notably , the low er b ound in Theorem 5 matc hes the upp er b ound in Theorem 4 , up to constan t factors. Thus, the minimax pseudo-regret for the influence m aximization problem is Θ( √ T n ) in the case of directed graphs. I n the case of undirected graphs, how ev er (cf. Theorem 2 ), w e only obtained a pseudo-regret lo w er b ound of Ω( √ T ) . This is due to the fact that in undirected graphs, one ma y learn ab out the loss of other no des at time t b esides the loss at S t . In cont rast, it is p ossible to construct adv ersarial strategies for directe d graphs that do not pro vide information regarding the loss incurred b y choosing a source verte x othe r than S t . Finally , w e remark that a differen t c hoice of G migh t affect the lo w er b ound, since influence maximization is easier for some graph top ologies than others. Ho w ever, Theorem 5 s ho ws that the case of the complete graph is alwa ys guaran teed to incur a pseudo-regret that matc hes the general upp er b ound in Theorem 4 , implying that this is the minimax optimal rate for an y class of graphs con taining the complete graph. 4 Discussion W e hav e prop osed and analyzed pla y er strategies that con trol the pseudo-regret uniformly across all p ossible oblivious adversari al s trategies. F o r the problem of single-source influence maximization in complete net w orks, w e ha v e also derived minimax lo w er bounds that establish the fundamen tal hardness of the online influence maximization problem. In particular, our low er and upp er b ounds 8 matc h up to constan t factors in the case of directed complete graphs, implying that our proposed pla y er strategy is in some s ense optimal. Our work inspires a n umber of in teresting questions f or future study . An imp ortan t op en q uestion concerns closing the gap b etw een upp er and lo wer b ounds on the minimax ps eudo-regret in the case of undirected graphs, to determine whet her the f eedbac k av ailable in the influe nce maximiza tion setting actually makes the online game easier than a standard bandit s etting. F urthermore, our lo w er b ounds only hold in the case of comp lete graphs and single-source influence maximization, and it w ould b e worth while to obtain low er b ounds that hold for other netw ork top ologies and seed sets con taining m ultiple no des. Our results only address a small s ubset of problems that ma y b e p osed and answ ered concerning a bandit theory of adv ersarial influence maximization with edge-lev el feedbac k. References [1] L. A. Ada mic and E. Ada r. F riends and neigh b ors on the Web. So cial Networks , 25(3):211 – 230, 2003. [2] N. Alon, N. Cesa-Bianc hi, C. Gentil e, S. Mannor, Y. Mansou r, and O. Shamir. Nonsto chastic m ulti-armed bandits with graph-structured feedbac k. SIAM Journal on Computing , 46(6):1785– 1826, 2017. [3] S. Aral and D. W al ker. Creating so cial con tagion through viral product design: A randomized trial of p eer influence in net works. Management Scienc e , 57(9):1623–16 39, 2011. [4] J.-Y. Audib ert, S. Bubeck, and G. Lugosi. Regret in online com binatorial optimization. Math- ematics of Op er ations R ese ar ch , 39(1):31–45, 2013. [5] P . Auer, N. Cesa-Bianc hi, Y. F reund, and R. E. Schap ire. The nonsto cha s tic m ultiarmed bandit problem. SIAM journal on c omputing , 32(1):48–77, 2002. [6] G. Bartók, D. P . F oster, D. Pál, A. R akhlin, and C. Szep esvári. P artial monitoring— Classification, regret b ounds, and algorithms. Mathematics of Op er ations R ese ar ch , 39(4):9 67– 997, 2014. [7] S. Bharathi, D. Kemp e, and M. Salek. Competitive influence maximization in so cial net works. Internet an d Network Ec onomics , pages 306–311, 2007. [8] C. Borgs, M. Brautbar, J. Chay es, and B. Lucier. Maximizing so cial influence in nearly op- timal time. In Pr o c e e dings of the T wenty-Fifth Annual A CM-SIAM Symp osium on Discr ete Alg orithms , pages 946–957. SIAM, 2014. [9] S. Bub ec k and N. Cesa-Bianc hi. Regret analysis of s to c hastic and nonstoc hastic m ulti-armed bandits. F oundations and T r ends in Machine L e arn i ng , 5(1):1–122, 2012. [10] N. Cesa-Bianc hi and G. Lugosi. Pr e diction, L e ar n ing, and Games . Cam bridge Univ ersit y Press, New Y ork, NY, 2006. [11] W. Chen, L. V. Lakshmanan, and C. Castillo. Information and influence propagation in so cial net w orks. Synthesis L e ctur e s on Data Management , 5(4):1–177, 2013. 9 [12] W. Chen, T. Lin, Z. T an, M. Zhao, and X. Zhou. R obust influence maximization. Pr o c e e dings of the 22nd ACM S IGKDD International Confer enc e on Know le dge Disc overy and Data Mining , 2016. [13] W. Chen, Y. W ang, Y. Y uan, and Q. W ang. Com binatorial m ulti-armed bandi t and its exten- sion to probabilistically triggered arms. Journal of Machine L e arning R ese ar ch , 17(50):1–33, 2016. [14] J. Cheng, L. A damic, P . A. Do w, J. Klein b erg, and J. Lesk ov ec. Can cascades b e predicted? In Pr o c e e dings of the 23r d International Conf er enc e on WWW , pages 925–936. ACM, 2014. [15] P . Domingos and M. Ric hardson. Mining the net w ork v alue of customers. In Pr o c e e dings of the seventh A CM SIGKDD Internation al Confer enc e on Know le dge Dis c overy and Data Mining , pages 57–66. AC M, 2001. [16] K. Drak op oulos, A. Ozdaglar, and J. N. T sitsiklis. When is a net w ork epidemic hard to eliminate? Mathematics of Op e r ations R ese ar ch , 42(1 ):1–14, 2016. [17] S. Goel, D. J. W atts, and D. G. Goldstein. The structure o f online diffusion net wo rks . In Pr o c e e din gs of the 13th ACM c onfer e n c e on ele ctr onic c ommer c e , pages 623–638. ACM, 2012. [18] M. Grabisc h, A . Mandel, A. Rusino wsk a, and E. T anim ura. Strategic influence in so cial net- w orks. Mathematics of Op er ations R ese ar ch , 2017. [19] X. He and D. Kempe. Price of anarc h y f or the N -pla y er comp etitiv e cascade game with submo dular activ ation functions. In In ternational Confer enc e on W eb and Internet Ec onomics , pages 232–248. Springer, 2013. [20] X. He and D. Kemp e. Robust influence maximization. Pr o c e e dings of the 22n d ACM SIGKDD International C onfer enc e on Know le dge Disc overy and Data Mining , 2016. [21] X. He, K. Xu, D. Kemp e, and Y. Liu. Learning influence functions from incomplete observ ations. In A dvanc es in Neur al Informati on Pr o c essing S ystems , pages 2073–2081, 2016. [22] L. H u, B. Wilder, A. Y ada v, E. Rice, and M. T am b e. A ctiv ating the“breakfast club": Mo deling influence spread in natural-w orld so cial net w orks. arXiv pr eprint arXiv:1710.00364 , 2017. [23] D. Katz and R. L. Kahn. The so cial psycholo gy of or ganizations , v olume 2. Wiley New Y ork, 1978. [24] D. Kemp e, J. Klei nberg, and E. T ardos. Maximizing the spread of influence thr ough a social net w ork. In Pr o c e e dings of the Ninth ACM SI GKDD International Confer enc e on Know le dge Disc ove ry and Data M ining , KDD ’03, pages 137–146, New Y ork, N Y, USA, 2003. ACM. [25] D. Kempe, J. Klein b erg, and É. T ardos. Influen tial no des in a diffusion mo del for so cial net w orks. In Au tomata, language s and pr o gr amming , pages 1127–1138. Springer, 2005. [26] S. Lei, S. Maniu, L. Mo, R. Cheng, and P . Senellart. Online influence maximization. In Pr o c e e din gs of the 21th ACM SIGKDD International Conf er enc e on Know le dg e Disc overy and Data Mi ning , KDD ’15, pages 645–654, New Y ork, NY, USA, 2015. AC M. [27] J. Lesko v ec, L. A. A damic, and B. A. Huberman. The dy namics of viral mark eting. ACM T r ansactions on the W eb (T WEB) , 1(1):5, 2007. 10 [28] D. Lib en-No well and J. Kleinberg. The link-pred iction problem for so cial net w orks. Journal of the Americ an So ciety for Information Scienc e and T e chnolo gy , 58(7):1019–10 31, 2007. [29] H. Narasimhan, D. C. P arkes, and Y. Singer. Learnabilit y of influence in netw orks. In A dvanc es in Neur al Information Pr o c essing Systems , pages 3186–3194, 2015. [30] G. L. Nemhauser and L. A. W o ls ey . Best algorithms for approxim ating the maxim um of a submo dular set function. Mathematics of op er ations r ese ar ch , 3(3): 177–188, 1978. [31] G. Neu and G. Bartók. A n efficien t algorithm for learning with semi-bandit feedbac k. I n International C onfer enc e on Algorithmic L e arning The ory , pages 234–248. Springer, 2013. [32] M. E. J. Newman. The structure and function of complex netw orks. S IAM r eview , 45(2):167– 256, 2003. [33] J. Olkho vsk ay a, G. N eu, and G. Lugosi. Online influence m aximization with lo cal observ ation s . arXiv pr eprint arXiv:1805.11022 , 2018. [34] S. Seiler, S. Y ao, and W. W ang. Do es online w ord of mouth i ncrease demand? (And how?) Evidence f rom a natural exp erimen t. Marketing Scienc e , 2017. [35] O. Sp orns. The hum an connectome: A complex net w ork. A nnals of the New Y ork A c ademy of Scienc es , 1224(1):109–12 5, 2011. [36] M. Streeter and D. Golovin. An online algorithm for maximizing submo dular functions. T e ch- nic al r ep ort , pages 1–35, 2007. [37] M. T ruso v, R. E. Buc klin, and K. P au we ls . Effects of w ord-of-mouth v ersus traditional mar- k eting: findings from an in ternet s o cial net w orking site. Journal of Marketing , 73(5):90–102, 2009. [38] S. V as w ani, L. Lakshman an, and M. Sc hmidt. Influence max imization with bandits. arXiv pr eprint arXiv:1503.00024 , pages 1–12, 2015. [39] Q. W ang and W. Chen. Impro ving regret bounds for com binatorial semi-bandits with proba- bilistically triggered arms and its app lications. In A dvanc e s in Neur al In formation Pr o c essing Systems , pages 1161–1171, 2017. [40] D. J. W atts and P . S. Do dds. Influen tials, net works, and p ublic opinion formation. Journal of c onsumer r ese ar ch , 34(4 ):441–458, 200 7. [41] Z. W en, B. K veton, and M. V al ko. Online influence maximization under indep enden t cascade mo del with semi-bandit f eedbac k. A dvanc es in Neu r al Information Pr o c essing Systems , 2017. 11 A Related w ork Here, w e commen t more thoroughly on imp ortan t relationships b et w een our problem setting and v arious online games existing in the learning theory literature. A key difference b et w een the graph con tagion setting and the s tandard m ulti-armed bandit setting is that in the latter case, the only information a v ailable to the pla yer on eac h round is the rew ard obtained as a consequence of his or her actions. On the other hand, s ligh tly more information is a v ailable to the pla y er in our s etting, since the pla ye r ma y often deduce additional information ab out which vertic e s would have b e en influenced for a differen t cho ice of source v ertices, based on observing the scop e of the influence pro cess f or a particular c hoice of source vertices. As a concrete example, the pla ye r knows that exactly the same set of nodes w ould ha ve b een influenced if he or she had c hosen to influence a differen t seed no de in the same connected comp onen t of the subgraph indu ced b y the influenced no des and adv ersarially cho s en edges. Online games with partial monitoring [ 6 ] or graph-based feedbac k [ 2 ] generalize the bandit setting to rep eated games in whic h the pla y er ma y observe f eedbac k corresp onding to v arious subsets of other actions in addition to or instead of observing the feedbac k corresp onding to his or her actions. Although s uc h games resem ble our problem setting, the p ossi ble actions a v ailabl e of the pla y er in our case correspond to subsets of nodes of size k , leading to a rather complicated feedbac k graph that is additionally affected by the adv ersary’s actions. A nother online game with a similar fla v or is the com binatorial prediction setting [ 4 ], where the play er is allo wed to pull a subset of arms on eac h round, and observ es a loss equal to the s um of losses of the pulled arms in the case of bandit f eedbac k or a subv ector of loss es corresp onding to the pulled arms, in the case of semi-ba ndit feedbac k [ 31 ]. Our problem ma y b e cast as a type of com binatorial prediction game with a feedbac k graph that v aries from rou nd to round and is unkno wn to the pla yer. Note that the comb inatorial game with edge semi-bandit feedbac k has b een studied recen tly in the influence maximization literature [ 13 , 38 , 41 , 39 , 33 ], but these results only apply to sto c hastic adv ersaries, rather than the more general non-sto c hastic framew ork w e s tudy in this pap er. Edge s emi-bandit feedbac k refers to the fact that in a directed graph, the pla yer receiv es feedbac k ab out the transmission status of differen t subsets of edges, corresp onding to the outgoing edges from the nodes he or she c ho oses to seed on each round. B Pro ofs W e no w outline the pro ofs of our main results. B.1 Upp er b ounds for adv ersarial mo dels In this section, w e pro ve our upp er bounds. T o this end , w e des crib e the OSMD algorit hm, whic h generates a sequence of probabilit y distributions { p t } to b e emplo yed b y the pla yer on successiv e rounds. Let ∆ n ⊆ R n denote the probabilit y simplex. Online Sto c hastic Mirror Descen t (OSMD) with loss estimates { b ℓ i,t } Giv en: A Legendre function F defined on R n , with asso ciated Bregman diverge nce D F ( p, q ) = F ( p ) − F ( q ) − ( p − q ) T ∇ F ( q ) , 12 and a learning rate η > 0 . Output: A sto chastic pla y er strategy {S t } . Let p 1 ∈ arg min p ∈ ∆ n F ( p ) . F or each round t = 1 , . . . , T : (1) Dra w a v ertex S t from the distribution p t . (2) Compute the vecto r of loss estimates b ℓ t = { b ℓ i,t } . (3) Set w t +1 = ∇ F ∗  ∇ F ( p t ) − η b ℓ t  , where F ∗ is the con vex conjugate of F . (4) Compute the new distribution p t +1 = arg min p ∈ ∆ n D F ( p, w t +1 ) . In general, the OSMD algorithm is defined with resp ect to a compact, conv ex s et K ⊆ R n . The up dates are c haracterized b y noisy estimates of the gradient of the loss function, whic h w e ma y con v enien tly define to b e b ℓ t in the presen t scenario. F or more details and general izations, we refer the reader to [ 9 ]. W e will us e the followin g result: Prop osition 1 (Theorem 5.10 of [ 9 ]) . L et the loss functions { ℓ i,t } b e nonn e gativ e an d b ounde d by 1. The str ate gy S c orr esp ondi n g to the OSM D algorithm with loss estimates b ℓ , le arnin g r ate η > 0 , and L e gendr e function F ψ , wher e ψ is a 0-p otential, satisfies the pseud o-r e gr et b ound sup A∈ A R T ( A , S ) ≤ sup p ∈ ∆ n F ψ ( p ) − F ψ ( p 1 ) η + η 2 T X t =1 n X i =1 E " b ℓ 2 i,t ( ψ − 1 ) ′ ( p i,t ) # . W e formally define 0-p otenti als and the asso ciated Legend re functions in App endix C . I n our analysis, w e tak e ψ ( x ) = 1 x 2 , yielding the Legendre function F ψ ( x ) = − 2 P n i =1 x 1 / 2 i . The pseudo- regret b ound in Pro p osition 1 ma y then be analyzed and b ounded accordin gly in v arious s ettings of inte rest. Details f or the pro of of Theorem 1 are also provided in A pp endix C . B.2 Lo wer bounds for adv ersarial models. W e no w turn to establishing the lo we r b ounds. The proofs of Theorem s 2 and 5 are based on the same general strategy , whic h is summarized in the follo wing proposition. T o unify our results with standard bandit notation ([ 9 ]), we use the shorthand X i,t = f ( A t , { i } ) to denote the rewa rd incurr ed at time t when the play er c ho oses S t = { i } . Then R T ( A , S ) = max 1 ≤ i ≤ n E A , S T X t =1 ( X i,t − X S t ,t ) . Prop osition 2. Consider a deterministic player str ate gy S ∈ P d . L et A 0 , A 1 , . . . , A n b e sto ch asti c adversarial str ate gies such that for e ach A i , the set of e dges playe d at tim e t is indep endent of the 13 p ast actions of the adversary. L et P 0 , P 1 , . . . , P n denote the c orr e sp onding me asur es on the f e e db ack I T , al lowin g for p ossible r andomization only in the str ate gy of the adversary. L et E i denote the exp e ctation with r esp e ct to P i . Supp ose r ≤ min j 6 = i E i [ X i,t − X j,t ] , ∀ 1 ≤ t ≤ T , (5) and n X i =1 K L ( P 0 , P i ) ≤ D . (6) Then r T n − 1 n − r D 2 n ! ≤ 1 n n X i =1 E i T X t =1 ( X i,t − X S t ,t ) . (7) In p articular, if the b ounds ( 5 ) and ( 6 ) hold uniformly f or al l choic es of S ∈ P d , then r T n − 1 n − r D 2 n ! ≤ inf S ∈ P sup A∈ A R T ( A , S ) . (8) Remark 4. W e r emark briefly ab out the r oles of the str ate gies {A i } app e aring in Pr op osition 2 . In pr actic e, the str ate gies ar e chosen to b e similar, exc ep t sele c tin g i as the sour c e no de is slightly mor e advantage ous when the adversary uses str ate gy A i . The str ate gy A 0 is a b aseline str ate gy that tr e ats al l no des identic al ly. Thus, the lower b ou n d pr ovide d by Pr op osition 2 is the pr o duct of the c ost of an inc orr e ct choic e of the sour c e vertex, given by r , and a factor that determines how e asy it is to distinguish the adversary str ate g ies fr om e ach other, which de p ends on D . Pr o of . Pro of of Proposition 2 . W e follo w the method used in the pro of of Theor em 3.5 in [ 9 ]. W e first sho w ho w to obtain the b ound ( 8 ) from the set of uniform b ounds ( 7 ). Note that for an y S ∈ P , w e hav e sup A∈ A R T ( A , S ) = sup A∈ A max 1 ≤ i ≤ n E A , S T X t =1 ( X i,t − X S t ,t ) ≥ max 1 ≤ j ≤ n max 1 ≤ i ≤ n E S E A j T X t =1 ( X i,t − X S t ,t ) = max 1 ≤ j ≤ n max 1 ≤ i ≤ n E S E j T X t =1 ( X i,t − X S t ,t ) ≥ max 1 ≤ i ≤ n E S E i T X t =1 ( X i,t − X S t ,t ) ≥ E S " 1 n n X i =1 E i T X t =1 ( X i,t − X S t ,t ) # , where w e hav e used the fact that the maxim um is at least as large as the a verage in the final inequalit y . Since an y pla y er strategy in P lies in the con vex h ull of deterministic pla yer strategies, a uniform b ound ( 7 ) ov er P d implies that inequality ( 8 ) holds, as w ell. 14 W e no w turn to the pro of of inequalit y ( 7 ). The idea is to sho w that on a verage , the pla ye r incurs a certain loss whe never the wro ng source v ertex is pla ye d, and this ev en t m ust happ en sufficien tly often. W e first write E i T X t =1 ( X i,t − X S t ,t ) = T X t =1 E i   X j 6 = i ( X i,t − X j,t ) 1 1 1 {S t = { j }}   = X j 6 = i T X t =1 E i [ X i,t − X j,t ] E i  1 1 1 S t = { j }  . In the last equality , w e hav e used the ass umption that the adversary’s action at each time is independent of the past to conclude that the difference in rewards X i,t − X j,t (whic h dep ends on the adv ersary’s action at time t ) is indep endent of the indicator 1 1 1 S t = { j } (whic h dep ends on the sequence of feedbac k received up to time t − 1 ). Using the b ound ( 5 ), it follo ws that E i T X t =1 ( X i,t − X S t ,t ) = X j 6 = i r E i [ T j ] , where T i = |{ t : S t = { i }}| denotes the nu mber of times vertex i is selected as the source. No w let U T denote a verte x dra wn according to the distribution q T = ( q 1 ,T , . . . , q n,T ) , where q i,T = T i T . The deriv ations ab ov e imply that E i T X t =1 ( X i,t − X S t ,t ) = r T X j 6 = i P i { U T = j } = r T (1 − P i { U T = i } ) , so 1 n n X i =1 E i T X t =1 ( X i,t − X S t ,t ) = r T 1 − 1 n n X i =1 P i { U T = i } ! . (9) By Pinsker’ s inequali ty , w e ha ve P i { U T = i } ≤ P 0 { U T = i } + r 1 2 K L ( P ′ 0 , P ′ i ) , where P ′ i denotes the distribution of U T under the adv ersarial strategy A i . By Jensen’s inequa lity , w e therefore hav e 1 n n X i =1 P i { U T = i } ≤ 1 n + 1 n n X i =1 r 1 2 K L ( P ′ 0 , P ′ i ) ≤ 1 n + v u u t 1 2 n n X i =1 K L ( P ′ 0 , P ′ i ) . (10) Finally , the cha in rule for KL diverge nce impli es tha t K L  P ′ 0 , P ′ i  = K L ( P 0 , P i ) + X I T P 0  I T  K L  P ′ 0 {·| I T } , P ′ i {·| I T }  . (11) Note that conditional on I T , the distribution of U T is the s ame under P ′ 0 and P ′ i , since the pla yer uses a deterministic s trategy . Th us, equation ( 11 ) implies that n X i =1 K L  P ′ 0 , P ′ i  = n X i =1 K L ( P 0 , P i ) ≤ D . (12) Com bining inequalities ( 9 ), ( 10 ), and ( 12 ), w e arrive at the desired result ( 7 ) . T o pro v e Theorems 2 and 5 , it th us remains to find an appropriate set of strategies {A 0 , A 1 , . . . , A n } and verify the b ounds ( 5 ) and ( 6 ). Details for the pro ofs are pro vided in App endix E.1 . 15 C A dditional onl ine upp er b ound pro ofs In this App endix, w e provide pro ofs for the pseudo-regret of pla yer strategies based on the OSMD algorithm. W e b egin with some preliminaries. C.1 Preliminaries W e first describ e the f unction F ψ . Recall that a con tin uous function F : D → R is a Legendre function if F is strictly con v ex, F has con tinu ous first partial deriv ativ es on D , and lim x → D\D k∇ F ( x ) k = ∞ . The analysis in this paper concerns a v ery sp ecific t yp e of Legendre function asso ciated to a 0- p oten tial, as describ ed in the follo wing definition: Definition 1. A function ψ : ( −∞ , a ) → R + is c al le d a 0 -p otential i f it is c onvex, c ontinuously differ entiable , and satisfies the f ol lowing c onditions: lim x →−∞ ψ ( x ) = 0 , lim x → a ψ ( x ) = ∞ , ψ ′ > 0 , Z 1 0 | ψ − 1 ( s ) | ds ≤ ∞ . W e additional ly define the asso ciate d function F ψ on (0 , ∞ ) n by F ψ ( x ) = n X i =1 Z x i 0 ψ − 1 ( s ) ds. In particular, w e will consider the 0-p otenti al ψ ( x ) = ( − x ) − q . Then ψ − 1 ( x ) = − x − 1 q , so F ψ ( x ) = − q q − 1 n X i =1 x q − 1 q i . Sp ecifically , we will consider the case q = 2 (the same ana ly s is could b e p erformed with resp ect to q > 1 , and then the final b ound could b e optimized o ver q ). T o emplo y Prop osition 1 , we need to b ound tw o summands. The followin g simple lemma b ounds the first term: Lemma 1. When ψ ( x ) = 1 x 2 , we hav e the b ound F ψ ( p ) − F ψ ( p 1 ) ≤ 2 √ n, ∀ p ∈ ∆ n . Pr o of . Pro of. Since F ψ ( p ) ≤ 0 and k p 1 k 1 = 1 , Hölder’s inequalit y implies that F ψ ( p ) − F ψ ( p 1 ) ≤ 2 n X i =1 p 1 / 2 1 ,i ≤ 2 n 1 2 . This completes the pro of of the lemma. All that remains is to analyze the loss-sp ecific term app earing in Prop osition 1 and c ho ose η appropriatel y . 16 C.2 Pro of of Theorem 1 W e first pro ve the followin g lemma: Lemma 2. W e have the ine qual i ty n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t ) # ≤ √ 2 n, ∀ 1 ≤ t ≤ T . (13) Pr o of . Pro of. Let F t denote the sigma-field of all actions up to time t . W e ha v e n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t )     F t − 1 # ( a ) = 2 n X i =1 p 3 / 2 i,t E h ( b ℓ sym i,t ) 2 |F t − 1 i ( b ) ≤ 2 n X i =1 p i,t ! 1 / 2 n X i =1  p i,t E h ( b ℓ sym i,t ) 2 |F t − 1 i 2 ! 1 / 2 = 2 n X i =1  p i,t E h ( b ℓ sym i,t ) 2 |F t − 1 i 2 ! 1 / 2 , (14) where w e ha ve used the facts that ( ψ − 1 ) ′ ( x ) = 1 2 x − 3 / 2 and p t is measurable with resp ect to F t − 1 to establish ( a ) , and applied Hölder’s inequalit y to obtain ( b ) . W e no w insp ect the conditional exp ectation more closely . W e ha ve E h ( b ℓ sym i,t ) 2 |F t − 1 i = E     1 n X j 6 = i 1 p i,t + p j,t ℓ t i,j Z ij   2     F t − 1   , = 1 n 2 E   X j 6 = i X k 6 = i 1 ( p i,t + p j,t )( p i,t + p k ,t ) ℓ t i,j ℓ t i,k Z ij Z ik     F t − 1   = 1 n 2 E   X j 6 = i X k 6 = i 1 ( p i,t + p j,t )( p i,t + p k ,t ) ℓ t i,j ℓ t i,k Z i     F t − 1   + 1 n 2 E   X j 6 = i 1 ( p i,t + p j,t ) 2 ( ℓ t i,j ) 2 Z j     F t − 1   , where the third equalit y is due to the fact that Z ij Z ik is 1 only when i is the source v ertex or j = k 17 is the source vertex. Using the fact that ℓ t i,j is b ounded b y 1 , w e then obtain E h ( b ℓ sym i,t ) 2 |F t − 1 i ≤ 1 n 2 E   X j 6 = i X k 6 = i Z i ( p i,t + p j,t )( p i,t + p k ,t )     F t − 1   + 1 n 2 E   X j 6 = i Z j ( p i,t + p j,t ) 2     F t − 1   ≤ 1 n 2 X j 6 = i X k 6 = i p i,t ( p i,t + p j,t )( p i,t + p k ,t ) + 1 n 2 X j 6 = i p j,t ( p i,t + p j,t ) 2 ≤ 1 n 2 X j 6 = i X k 6 = i p i,t + p j,t ( p i,t + p j,t )( p i,t + p k ,t ) = 1 n 2 X j 6 = i X k 6 = i 1 p i,t + p k ,t ≤ 1 n n X k =1 1 p i,t + p k ,t . Com bining this result with the b ound ( 14 ), we ha ve n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t )     F t − 1 # ≤ 2   n X i =1 p i,t n n X k =1 1 p i,t + p k ,t ! 2   1 / 2 = 2 n   n X i =1 n X j =1 n X k =1 p 2 i,t ( p i,t + p j,t )( p i,t + p k ,t )   1 2 ≤ 2 n   n n X i =1 n X j =1 p i,t p i,t + p j,t   1 2 . No w, w e hav e the useful equation n X i =1 n X k =1 a i a i + a k = n 2 2 , (15) for an y nonn egative sequence { a i } n i =1 . This may b e seen via the f ollo wing algebraic manipulations: n X i =1 n X k =1 a i a i + a k = n X i =1 a i a i + a i + X k 6 = i a i a i + a k = n 2 + 1 2 X k 6 = i a i a i + a k + 1 2 X k 6 = i a k a i + a k = n 2 + 1 2 X k 6 = i a i + a k a i + a k = n 2 + n ( n − 1) 2 = n 2 2 . 18 App ealing to equation ( 15 ), we ma y replace the double sum b y n 2 2 and simplify the b ound: n X i =1 E " ( b ℓ sym i,t ) 2 ( ψ − 1 ) ′ ( p i,t )     F t − 1 # ≤ 2 n  n 3 2  1 2 = √ 2 n . T aking an additional exp ectation and using the to wer prop erty , w e arriv e at the desired inequali ty . Com bining Lemmas 1 and 2 with Prop osition 1 , w e then ha ve sup A∈ A R T ( A , S sym OSMD ) ≤ 2 √ n η + η T r n 2 . Optimizing ov er η , we tak e η = 2 3 4 T − 1 2 , whic h establishe s the desired b ound. D A dv ersarial i nfluence ma ximization with multiple sources In this App endix, w e pro ve results concerning m ultiple influence sources. First, w e need to giv e the precise algorithmic details of the online greedy algorithm. W e assume the play er is allo w ed to c ho ose source v ertices sequentia lly at time t and observes the corresp onding edge feedbac k immediately after eac h selection. The algorithm, inspired b y [ 36 ], is outlined b elow: Online Greedy Algorithm Giv en: A single-source play er strategy S 1 . Output: A k -source play er strategy S k = {S t } 1 ≤ t ≤ T . F or each t = 1 , . . . , T , choose S t = { v 1 ,t , . . . , v k ,t } s eq uential ly , as follo ws: (1) Select v 1 ,t according to the s ingle-source s trategy S 1 . (2) F or eac h i > 1 , select v i,t according to the single-source s trategy S 1 , based on the edge feedbac k I ( A t , { v 1 ,t , . . . , v i,t } ) \ I ( A t , { v 1 ,t , . . . , v i − 1 ,t } ) . In other word s , the Online G reedy Algorithm runs the pla ye r’s s trategy for single-source selection k times in parallel, with losses computed marginally for each successively c hosen v ertex. The “greedy” comp onent of the algorithm corresp onds to the f act that the play er make s a selection of the s et of i th source ver tices in the b est p ossible w a y based on the inf ormation a v ailable (i.e., according to the single-source strategy that is designed to incur a small pseudo-regret). Note that the feedbac k I ( A t , { v 1 ,t , . . . , v i,t } ) \ I ( A t , { v 1 ,t , . . . , v i − 1 ,t } ) is indeed computable by the pla y er when c ho osing the i th v ertex at round t , since the pla yer has already observed I ( A t , { v 1 ,t , . . . , v i − 1 ,t } ) after the first i − 1 source no des are selected. Fix an adversari al s trategy A , and define the functions f t ( S t ) = f ( A t , S t ) and F ( S ) = P T t =1 f t ( S t ) . Th us, F ( S ) is the total reward for strategy S = {S t } . In the sto c hastic setting, when T = 1 , man y 19 influence-ma x imization analyses exploit the s ubmo dularity of f t under certain sto ch astic ass ump- tions on A t . In the bandit setting, w e wish to establish an analogous result for F , in order to establish regret b ounds when the pla yer chooses source vertices according to a greedy algorithm. Since S ∈ (2 V ) T , the function F is not tec hnically a set function. Ho we ver, w e ma y iden tify eac h pla yer strategy S with an elemen t of S ∗ ∈ 2 ( V T ) , and define F ∗ ( S ∗ ) = F ( S ) . Here, V T :=  v T =  v (1) , v (2) , . . . , v ( T )  | v ( i ) ∈ V , f or 1 ≤ i ≤ T  , and S ∗ = { u T 1 , . . . , u T k } ∈ 2 ( V T ) corresp onds to the strategy that selects the source no des { u 1 ( t ) , . . . , u k ( t ) } in round t . In more detail, let S ( i ) t = { s t (1) , . . . , s t ( i ) } ⊆ V denote the set of the first i seed vertices in round t , where S (0) t = ∅ . Then, w e can write f t ( S t ) = k X i =1 f t  S ( i ) t  − f t  S ( i − 1) t  . One can then write the total rewar d as F ( S ) = T X t =1 k X i =1 f t  S ( i ) t  − f t  S ( i − 1) t  = k X i =1 T X t =1 f t  S ( i ) t  − f t  S ( i − 1) t  If w e define f ∗ i ( S ∗ ) = F ∗ ( { u T 1 , . . . , u T i } ) − F ∗ ( { u T 1 , . . . , u T i − 1 } ) = T X t =1 f t ( { u j ( t ) : j ≤ i } ) − f t ( { u j ( t ) : j ≤ i − 1 } ) and F ∗ ( S ∗ ) = P k i =1 f ∗ i ( S ∗ ) , then w e indeed get the desired equalit y F ( S ) = F ∗ ( S ∗ ) while also switc hing our summation f or submodularit y to b e o ver the i th v ertices as opp osed to the t th round. W e first show that F ∗ is a monotone, s ubmodular function: Lemma 3. The function f t ( S t ) = f ( A t , S t ) i s monotone and submo dular, for every fixe d A t . Pr o of . Pro of. It is trivial to see that f t is monotone, so w e fo cus on pro ving submo dularit y . Our goal is to s how that for a fixed A t , and for any S t ⊆ S ′ t and u ∈ V \S ′ t , w e ha v e f t ( S ′ t ∪ { u } ) − f t ( S ′ t ) ≤ f t ( S t ∪ { u } ) − f t ( S t ) . (16) Let Z S t ,v denote the indicator of an op en path b et w een a source no de s ∈ S t and v ∈ V , where by con v ention , Z S t ,v = 1 if v ∈ S t . Note that f t ( S t ) = 1 n P v ∈ V Z S t ,v . W e will sho w that for v / ∈ S ′ t ∪ { u } , w e ha ve Z S ′ t ∪{ u } ,v − Z S ′ t ,v ≤ Z S t ∪{ u } ,v − Z S t ,v . (17) Summing o v er v ∈ ( S ′ t ∪ { u } ) c , using Z S t ∪{ u } ,v − Z S t ,v ≥ 0 for v ∈ S ′ t \ S t , and dividing by n will yield the desired inequality ( 16 ). W e ha v e three cases to consider: In the first case, an op en path exists f rom some s ∈ S ′ t to v . Then the left side of inequalit y ( 17 ) is equal to 0 , while the righ t hand s ide is at least 0 b y monotonicit y . In the second case, an open path do es not exist from any s ∈ S ′ t to v , but an op en path exists from u to v . Then b oth sides of inequalit y ( 17 ) are equal to 1 . Finally , if no op en path exists f rom s ∈ S ′ t ∪ { u } to v , then b oth sides of inequalit y ( 17 ) are equal to 0 . This completes the pro of. 20 Prop osition 3. The function F ∗ is monoton e and submo dular . Pr o of . Pro of. The properties are essen tially immediate f rom Lemma 3 . Let P and Q b e eleme nts of  2 V  T suc h that P ∗ ⊆ Q ∗ . Then F ∗ ( P ∗ ) = T X t =1 f t ( P t ) ≤ T X t =1 f t ( Q t ) = F ∗ ( Q ∗ ) , pro ving monotonicit y . Similarly , if S ∈ (2 V ) T , w e ha ve F ∗ ( S ∗ ∪ Q ∗ ) − F ∗ ( Q ∗ ) = T X t =1 ( f t ( S t ∪ Q t ) − f t ( Q t )) ≤ T X t =1 ( f t ( S t ∪ P t ) − f t ( P t )) = F ∗ ( S ∗ ∪ P ∗ ) − F ∗ ( P ∗ ) , pro ving submo dularit y . By the standard greedy appro x imation ([ 24 , 30 ]), w e then ha ve  1 − 1 e  max |S ∗ |≤ K F ∗ ( S ∗ ) ≤ F ∗ ( G ∗ ) , where G ∗ is a set of cardinalit y K ≥ 1 constructed via a sequen tial greedy algorithm. Ho w ever, this result is not immed iately applicable to the online bandit setting, since w e do not ha ve direct access to F ∗ . Th us, w e can only hop e to obtain an appro ximate greedy maximizer e G ∗ , and we wish to deriv e theoretical guaran tees for F ∗ ( e G ∗ ) . Our result relies on the follo wing general prop osition: Prop osition 4 (Theorem 6 from [ 36 ]) . L et f : 2 V → R b e a monotone, sub mo dular function such that f ( ∅ ) = 0 . C onsider a set D ⊆ V and a se que nc e of err or toler anc e s { ǫ i } , an d supp ose { G ǫ i } is c onstructe d in an appr oximate gr e e dy manner, such that G ǫ 0 = ∅ and G ǫ i = G ǫ i − 1 ∪ { g i } , wher e max d ∈ D f ( G ǫ i − 1 ∪ { d } ) − f ( G ǫ i − 1 ) ≤ f ( G ǫ i − 1 ∪ { g i } ) − f ( G ǫ i − 1 ) + ǫ i . Then for any K ≥ 1 , we have  1 − 1 e  max S ∗ ∈ D K f ( S ∗ ) − f ( G ǫ K ) ≤ K X i =1 ǫ i , wher e D K c onsists of subsets of D c ontaining at m ost K elements. Proposition 4 ensures that for submo dular functions, successive errors { ǫ i } in a sequen tial greedy algorithm only accum ulate additiv ely . The pro of is pro vided in [ 36 ], but we include a pro of in App endix D.2 for completeness. 21 D.1 Pro of of Theorem 3 Supp os e A ∈ A . W e will apply Prop osition 4 with f = E A F ∗ , V = V T , and K = k . Note that E A F ∗ inherits monotonicit y and submo dularit y from F ∗ . Also let D = { ( v , . . . , v ) : v ∈ V } ⊆ 2 ( V T ) denote the diagonal set of 2 ( V T ) . F or a (non-random) k -s ource strategy S ∗ with |S ∗ t | = k for all t , we use the notation S ∗ = {S ∗ 1 , . . . , S ∗ k } , where S ∗ i corresp onds to the set of i th v ertices c hosen during the T rounds. Prop osition 4 immediately giv es  1 − 1 e  max S ∗ ∈ D E A F ∗ ( S ∗ ) − E A F ∗ ( G ǫ k ) ≤ k X i =1 max d i ∈ D E A [ F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )] , where the sets { G ǫ i } are c hosen in an appro ximate greedy manner, and ǫ i are upp er b ounded b y the regret for the i th instance of the s ingle-source algorithm. In particular, w e consider { G ǫ i } to b e the c hoice of i th v ertices S ∗ i corresp onding to the pla yer ’s c hoice under the strategy S 1 . W e no w take an exp ectation with respect to p ossible randomization in the pla yer ’s strategy , to obtain R (1 − 1 /e ) T ( A , S ) ≤ k X i =1 E S  max d i ∈ D E A  F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )   ( a ) = k X i =1 E S [1: i ]  max d i ∈ D E A  F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )   ( b ) = k X i =1 E S [1: i − 1]  E S i max d i ∈ D E A  F ∗ ( G ǫ i − 1 ∪ { d i } ) − F ∗ ( G ǫ i − 1 ∪ { g i } )   . Here, E S [1: i ] denotes the expectation with resp ect to the first i ver tices pla y ed, and the equalit y in ( a ) holds b ecause the s et of i th v ertices pla y ed dep ends only on the sets of the first i v ertices pla y ed. The equalit y in ( b ) holds b ecause the s et G i − 1 , and hence the cho ice of d i , do es not dep end on the selection of i th v ertices. F urthe rmore, the inner expression is simply the pseudo-regret of strategy S 1 . By Theorem 1 , this is b ounded by 2 1 4 √ T n . Summing up, w e obtain the desired result. D.2 Pro of of Pr op osition 4 W e b egin with tw o supp orting lemmas: Lemma 4. F or any P ⊆ V and Q ⊆ D , we have f ( P ∪ Q ) ≤ f ( P ) + |Q| max v ∈ D [ f ( P ∪ { v } ) − f ( P )] . Pr o of . Pro of. W e proceed b y induction on |Q| . The case |Q| = 1 is immediate. No w supp ose the statemen t is true for all |Q| ≤ k , where k ≥ 1 . Let c ∈ D , and s upp ose Q ⊆ D has cardinalit y k . 22 Then f ( P ∪ ( Q ∪ { c } )) ( a ) ≤ f ( P ∪ { c } ) + |Q| max d ∈ D [ f ( ( P ∪ { c } ) ∪ { d } ) − f ( P ∪ { c } )] ( b ) ≤ f ( P ) + max d ∈ D [ f ( P ∪ { d } ) − f ( P )] + |Q| max d ∈ D [ f ( P ∪ { d } ) − f ( P )] = f ( P ) + |Q ∪ { c }| max d ∈ D [ f ( P ∪ { d } ) − f ( P )] , where ( a ) f ollows from the induction hypothesis and ( b ) follo ws f rom the induction hypothesis and submo dularit y . This completes the induction and pro ve s the lemma. Lemma 5. L et δ i := f ( G ǫ i ) − f ( G ǫ i − 1 ) . F or any Q ⊆ D , we have f ( Q ) ≤ f ( G ǫ i − 1 ) + |Q| ( δ i + ǫ i ) . Pr o of . Pro of. Using Lemma 4 and monotonicit y of f , w e hav e f ( Q ) ≤ f ( G ǫ i − 1 ∪ Q ) ≤ f ( G ǫ i − 1 ) + |Q| max d ∈ D [ f ( G ǫ i − 1 ∪ { d } ) − f ( G ǫ i − 1 )] ≤ f ( G ǫ i − 1 ) + |Q|  f ( G ǫ i ) − f ( G ǫ i − 1 ) + ǫ i  = f ( G ǫ i − 1 ) + |Q| ( δ i + ǫ i ) , completing the pro of. W e no w define ∆ i := max S ∗ ∈ D K f ( S ∗ ) − f ( G ǫ i − 1 ) . By Lemma 5 , we ha ve max S ∗ ∈ D K f ( S ∗ ) ≤ f ( G ǫ i − 1 ) + K ( δ i + ǫ i ) . Subtracting f ( G ǫ i − 1 ) , w e obtain ∆ i ≤ K ( δ i + ǫ i ) = K (∆ i − ∆ i +1 + ǫ i ) , so ∆ i +1 ≤ ∆ i  1 − 1 K  + ǫ i . Applying this inequalit y recursiv ely , w e s ee that ∆ K +1 ≤ ∆ 1 K Y i =1  1 − 1 K  + K X i =1 ǫ i = ∆ 1  1 − 1 K  K + K X i =1 ǫ i ≤ ∆ 1  1 e  + K X i =1 ǫ i . Rearranging and using the f act that f ( ∅ ) = 0 completes the pro of. E A dditional onl i ne lo w e r b ound pro ofs The main goal of this App endix is to pro ve Theorems 2 and 5 . Some of the computations are rather length y and are therefore included in App endix E.2 . 23 E.1 Pro ofs of theorems W e first presen t the main comp onen ts of the pro ofs, follow ed by detailed calculations inv olving the Kullbac k-Leibler div ergence. E.1.1 Pro of o f Theorem 2 Let the adv ersarial strategi es {A i } b e defined as follows: F or eac h strateg y , the adv ersary chooses a random subset of ve rtices, and op ens all edges b et w een v ertices in the subset. F o r A i , with 1 ≤ i ≤ n , the adv ersary includes vertex i with probabilit y c n , and includes all other v ertices with probabilit y c n (1 − δ ) each, where δ ∈ (0 , 1 / 2) is a small constan t. Finally , for A 0 , the adversary includes all vertice s indep enden tly with probabilit y c n (1 − δ ) . Successive actions of the adversar y are i.i.d. across time steps. W e no w deriv e the follo wing lemmas, whic h will b e used in Prop osition 2 : Lemma 6. F or any i 6 = j and 1 ≤ t ≤ T , we have E i [ X i,t − X j,t ] = ( n − 2) c 2 n 3 (1 − δ ) δ. Pr o of . Pro of. Let C t b e the clique c hosen by the adv ersary at time t . Note that if i, j ∈ C t or i, j / ∈ C t , the difference in rewar ds is 0. Th us, the only cases of intere s t in computing the exp ectation are when exactly one of i or j is in C t . Then E i [ X i,t − X j,t ] = E i  ( |C t | − 1) 1 1 1 i ∈C t 1 1 1 j / ∈C t − (1 − |C t | ) 1 1 1 i / ∈C t 1 1 1 j ∈C t  = 1 n  c n ( n − 2)(1 − δ )   c n  h 1 − c n (1 − δ ) i − 1 n  c n ( n − 2)(1 − δ )  h 1 − c n i h c n (1 − δ ) i = 1 n ( n − 2)  c n  2 (1 − δ ) h 1 − c n (1 − δ ) i − h 1 − c n i (1 − δ )  = ( n − 2) c 2 n 3 (1 − δ ) δ, where the second equalit y uses the fact that c n ( n − 2)(1 − δ ) other vertices are exp ected to b e in C t . Lemma 7. L et S ∈ P d b e a dete rmi nistic player str ate gy, and let T i = |{ t : S t = { i }}| . Then we have the upp e r b ound n X i =1 K L ( P 0 , P i ) ≤ c ( c + 1) n − c T δ 2 . The pro of of Lemma 7 is pro vided in App endix E.2.1 . Th us, b y Prop osition 2 , we ha ve inf S ∈ P sup A∈ A R T ( A , S ) ≥ T ( n − 2) c 2 n 3 (1 − δ ) δ n − 1 n − δ r T 2 n r c n − c ( c + 1) ! ≥ T 6  c n  2 n − 1 n δ − δ 2 r T 2 n r c ( c + 1) n − c ! , 24 where the s econd inequa lity us es the fact that n ≥ 3 and δ < 1 / 2 . Finally , we optimize ov er δ and c . Since we ha ve a quadratic equation in δ , w e take δ = n − 1 2 n r 2 n T r n − c c ( c + 1) , yielding inf S ∈ P sup A∈ A R T ( A , S ) ≥ T 6  c n  2 1 4  n − 1 n  2 r 2 n T r n − c c ( c + 1) ! = 1 12 √ 2 √ T  c n  2  n − 1 n  2 s n ( n − c ) c ( c + 1) ≥ 1 27 √ 3 √ T  c n 2  p n ( n − c ) , where the second inequalit y uses the b ounds n − 1 n ≥ 2 3 when n ≥ 3 , and c c +1 ≥ 2 3 when c ≥ 2 . The final expression is optimized at c = 2 n 3 , yielding the desired lo w er b ound. Note that for this choice of c , w e indeed ha ve δ < 1 / 2 when T ≥ 2 . E.1.2 Pro of o f Theorem 5 Let the adv ersarial strategies {A i } b e defined as follo ws: F or eac h strategy , the adv ersary indep en- den tly designate s ev ery v ertex to b e a s ource, sink, or neither . The adv ersary then op ens directed edges f rom all source ve rtices to all sink v ertices. F or A i , with 1 ≤ i ≤ n , the adv ersary designates v ertex i to be a source v ertex with probabilit y c n , and all other v ertices to b e source vert ices with probabilit y c n (1 − δ ) . All v ertices are designated to b e sink vertices with probabilit y d n . Finally , for A 0 , the adv ersary designates all vertices to b e source v ertices with probabil ity c n (1 − δ ) , and sink v ertices with probabilit y d n . Successiv e actions of the adv ersary are i.i.d. across time steps. W e no w deriv e the follo wing lemmas, whic h will b e used in Prop osition 2 : Lemma 8. F or any i 6 = j and 1 ≤ t ≤ T , we have E i [ X i,t − X j,t ] = ( n − 1) cd n 3 δ . Pr o of . Pro of. W e compute the exp ectation of eac h term separately . Let B t and C t denote the source and s ink vertice s at time t , resp ectiv ely . Note that X i,t = 1 n if i / ∈ B t ; otherwise, X i,t = 1+ |C t | n . Hence, n E i [ X i,t ] = E [ 1 1 1 i / ∈B t + (1 + |C t | ) 1 1 1 i ∈B t ] =  1 − c n  +  1 + ( n − 1) d n   c n  = 1 + ( n − 1) cd n 2 . The computation for X j,t is similar: n E i [ X j,t ] = E  1 1 1 j / ∈B t + (1 + |C t | ) 1 1 1 j ∈B t  =  1 − c n (1 − δ )  +  1 + ( n − 1) d n  c n (1 − δ ) = 1 + ( n − 1) cd n 2 (1 − δ ) . 25 T aking the difference b et we en these exp ectations prov es the lemma. Lemma 9. L et S ∈ P d b e a dete rmi nistic player str ate gy, and let T i = |{ t : S t = { i }}| . Then we have the upp e r b ound n X i =1 K L ( P 0 , P i ) ≤ c ( n − d ) n ( n − c − d ) T δ 2 . Essen tially , the Kullbac k-Leibler div ergence is of order 1 n , b ecause playing a sub optimal v ertex pro vides no information ab out which v ertex is optimal. This is unlike the case of the undirected graph, where the optimal v ertex is alw ays more likely to be conta ined in the feedbac k that the pla y er receiv es, and the KL div ergence do es not deca y with n . The pro of of Lemma 9 is pro vided in App endix E.2.2 . By Prop osition 2 , we then ha ve inf S ∈ P sup A∈ A R T ( A , S ) ≥ ( n − 1) cd n 3 δ T n − 1 n − δ r T 2 n s c ( n − d ) n ( n − c − d ) ! . Finally , we optimize o ver δ , c , and d . W e take δ = 1 2  n − 1 n  r 2 n T s n ( n − c − d ) c ( n − d ) , to obtain inf S ∈ P sup A∈ A R T ( A , S ) ≥ ( n − 1) cd 4 n 3  n − 1 n  2 T r 2 n T s n ( n − c − d ) c ( n − d ) = 1 2 √ 2 √ nT  n − 1 n  3 cd n 2 s (1 − c/n − d/n ) ( c/n ) (1 − d/n ) ≥ 1 16 √ 2 √ nT cd n 2 s (1 − c/n − d/n ) ( c/n ) (1 − d/n ) , where the last inequalit y uses the b ound n − 1 n ≥ 1 2 . Finally , using the fact that the function f ( x, y ) = xy s 1 − x − y x (1 − y ) ac hiev es its maxim um v alue of 1 3 √ 3 when ( x, y ) =  1 6 , 2 3  , w e obtain the b ound inf S ∈ P sup A∈ A R T ( A , S ) ≥ 1 48 √ 6 √ T n, when c = n 6 and d = 2 n 3 . 26 E.2 Pro ofs of KL b ounds In this App endix, we deriv e the required upper b ounds on the KL div ergence betw een adv ersarial strategies. W e b egin b y pro ving a useful tec hnical lemma. Recall that P i denotes the distribution of the edge feedbac k I T under strategy A i , and S ∈ P d is a fixed deterministic pla yer strategy . Also recall that T i = |{ t : S t = { i }}| denotes the num ber of times vertex i is ch osen by the play er. Let P t i denote the distribution of the edge feedbac k I t under strategy A i , so P i = P T i . F or eac h pair of no des i and v and any 1 ≤ t ≤ T , define the f unction K L t i ( v ) to b e the KL divergen ce b et we en the edge feedbac k, conditioned on any I t − 1 suc h that S t = { v } : K L t i ( v ) = K L  P t 0 {·| I t − 1 } , P t i {·| I t − 1 }  . Note that K L t i ( v ) is indeed a w ell-defined function of v , since conditioned on I t − 1 , the pla yer ’s action S t is deterministic . Hence, the randomness in I t is purely due to the sto c hastic action of the adversary at time t . Lemma 10. If K L t i ( v ) is indep endent of t , we have K L ( P 0 , P i ) = K L i ( i ) E 0 [ T i ] + X j 6 = i K L i ( j ) E 0 [ T j ] . (18) If i n addition K L i ( i ) is indep ende nt of i , for 1 ≤ i ≤ n , and K L i ( j ) is c onstant for al l nonzer o p airs i 6 = j , we have n X i =1 K L ( P 0 , P i ) = K L i ( i ) T + K L i ( j )( n − 1) T . (19) Pr o of . Pro of. Note that equation ( 19 ) follows immediately from equation ( 18 ) by summing o v er i and using the f act that P n i =1 E 0 [ T i ] = T . T o derive equation ( 18 ), we use the c hain rule for KL diverg ence: K L ( P 0 , P i ) = T X t =1 X I t − 1 P 0  I t − 1  K L  P t 0  ·| I t − 1  , P t i  ·| I t − 1  = T X t =1 n X v =1 X I t − 1 : S t = { v } P 0 { I t − 1 } K L t i ( v ) ( a ) = T X t =1 n X v =1 P 0 {S t = { v }} K L i ( v ) = T X t =1 P 0 {S t = { i }} K L i ( i ) + T X t =1 X j 6 = i P 0 {S t = { j }} K L i ( j ) , using the assumption that K L t i ( v ) is independen t of t in the equation ( a ) . Now w e simply recognize that E 0 [ T i ] = E 0 " T X t =1 1 1 1 S t = { i } # = T X t =1 P 0 {S t = { i }} to obtain the desired equalit y . 27 E.2.1 Pro of o f Lemma 7 Note that K L t i ( v ) is indep enden t of t , since the adversar y ’s actions are i.i.d. across time steps . F urthermore, K L i ( i ) is clearly indep enden t of i and K L i ( j ) is constant for all pairs i 6 = j , so equation ( 19 ) of Lemma 10 holds. W e first compute an upp er b ound for K L i ( i ) . Let X denote the size of the connected comp onen t con taining i on a particular time s tep, based on the edges pla yed b y the adversary . Then K L i ( i ) = K L ( P 0 ( X ) , P i ( X )) , where we abuse notation sligh tly and write P i ( X ) to denote the distribution of X under adversarial strategy A i . Also let Y b e the indicator v aria ble that i is in the clique selected b y the adv ersary . By the c hain rule for the KL div ergence, K L ( P 0 ( X ) , P i ( X )) ≤ K L ( P 0 ( X, Y ) , P i ( X, Y )) . W e will derive an upp er b ound for the latter quan tity . In particular, the range of ( X , Y ) is { (1 , 0) , (1 , 1) } ∪ { ( m, 1) : 2 ≤ m ≤ n } . This leads to the follo wing expression for K L ( P 0 ( X, Y ) , P i ( X, Y )) :  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  +  c n (1 − δ )  1 − c n (1 − δ )  n − 1  log c n (1 − δ )  1 − c n (1 − δ )  n − 1 c n  1 − c n (1 − δ )  n − 1 ! + n X m =2  n − 1 m − 1  c n (1 − δ )  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m × log c n (1 − δ )  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m c n  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m ! =  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  + n X m =1  n − 1 m − 1  c n (1 − δ )  c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m log(1 − δ ) =  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  + c n (1 − δ ) log (1 − δ ) . Applying the inequalit y log (1 + x ) ≤ x t wice, we then obtain K L ( P 0 ( X ) , P i ( X )) ≤  1 − c n (1 − δ )  cδ n 1 − c n − c n (1 − δ ) δ = cδ n  n − c (1 − δ ) n − c − (1 − δ )  = cδ 2 n − c . (20) 28 The computation for K L i ( j ) is similar. L et X denote the size of the connected comp onen t con taining j , and let C denote the clique c hosen b y the adv ersary . Define the random v ariable Y =      0 , if j / ∈ C 1 , if j ∈ C and i / ∈ C 2 , if i, j ∈ C . Again, it suffices to obtain a b ound on K L ( P 0 ( X, Y ) , P i ( X, Y )) . The range of ( X , Y ) is { (1 , 0) , (1 , 1) } ∪ { ( m, 1) : 2 ≤ m ≤ n − 1 } ∪ { ( m, 2) : 2 ≤ m ≤ n } . F urther note that P 0 (1 , 0) = P i (1 , 0) , so w e ma y ignore this term when computing the KL div ergence. W e then hav e followin g expression for K L ( P 0 ( X, Y ) , P i ( X, Y )) :  c n (1 − δ )   1 − c n (1 − δ )  n − 1 log  c n (1 − δ )   1 − c n (1 − δ )  n − 1  c n (1 − δ )   1 − c n   1 − c n (1 − δ )  n − 2 ! + n − 1 X m =2  n − 2 m − 1  c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 × log c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 c n (1 − δ )  1 − c n   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 ! + n X m =2  n − 2 m − 2   c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m × log  c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m  c n (1 − δ )   c n   c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m ! = n − 1 X m =1  n − 2 m − 1  c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 × log c n (1 − δ )  1 − c n (1 − δ )   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 c n (1 − δ )  1 − c n   c n (1 − δ )  m − 1  1 − c n (1 − δ )  n − m − 1 ! + n X m =2  n − 2 m − 2   c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m × log  c n (1 − δ )  2  c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m  c n (1 − δ )   c n   c n (1 − δ )  m − 2  1 − c n (1 − δ )  n − m ! = c n (1 − δ )  1 − c n (1 − δ )  log  1 − c n (1 − δ ) 1 − c n  +  c n (1 − δ )  2 log (1 − δ ) . W e once again use the inequalit y log(1 + x ) ≤ x to obtain K L ( P 0 ( X ) , P i ( X )) ≤ c n (1 − δ )  1 − c n (1 − δ )  cδ n 1 − c n ! −  c n (1 − δ )  2 δ =  c n  2 (1 − δ ) δ  n − c (1 − δ ) n − c − (1 − δ )  =  c n  2 (1 − δ ) δ 2 n n − c . (21) 29 Com bining inequalities ( 20 ) and ( 21 ) with equation ( 19 ) of Lemma 10 , w e obtain the b ound n X i =1 K L ( P 0 , P i ) ≤ c n − c δ 2 T +  c n  2 (1 − δ ) δ 2 n n − c ( n − 1) T ≤ c ( c + 1) n − c T δ 2 , completing the pro of. E.2.2 Pro of o f Lemma 9 Note that K L t i ( v ) is indep enden t of t , since the adversar y ’s actions are i.i.d. across time steps . F urthermore, K L i ( i ) is clearly indep enden t of i and K L i ( j ) is constant for all pairs i 6 = j , so equation ( 19 ) of Lemma 10 holds. Note that when S t = { j } , the distribution of the feedbac k I t is the same under P t 0 {·| I t − 1 } and P t i {·| I t − 1 } , since the vertex i is c hosen to b e a s ink v ertex with the same probabilit y d n under both A 0 and A i . Hence, K L i ( j ) = 0 . T o compute K L i ( i ) , let X denote the size of the influenced comp onen t contain ing i when S t = { i } , and define the random v ariable Y =      0 , if i is a sink v ertex 1 , if i is a source v ertex 2 , otherwise . As in the pro of of Lemma 7 , w e will upp er-b ound K L ( P 0 ( X, Y ) , P i ( X, Y )) , leading to an upp er b ound on K L ( P 0 ( X ) , P i ( X )) . The range of ( X , Y ) is { (1 , 0) , (1 , 2) } ∪ { ( m, 1) : 2 ≤ m ≤ n } . W e then hav e the follow ing expression for K L ( P 0 ( X, Y ) , P i ( X, Y )) : d n log  d/n d/n  +  1 − c n (1 − δ ) − d n  log 1 − c n (1 − δ ) − d n 1 − c n − d n ! + n X m =2  n − 1 m − 1  c n (1 − δ )  d n  m − 1  1 − d n  n − m × log c n (1 − δ )  d n  m − 1  1 − d n  n − m c n  d n  m − 1  1 − d n  n − m ! , = n − c − d + cδ n log  n − c − d + cδ n − c − d  + c n (1 − δ ) log (1 − δ ) n X m =2  n − 1 m − 1   d n  m − 1  1 − d n  n − m ≤ n − c − d + cδ n log  n − c − d + cδ n − c − d  + c n (1 − δ ) log (1 − δ ) . Using the inequalit y log (1 + x ) ≤ x , w e then ha ve K L ( P 0 ( X ) , P i ( X )) ≤ cδ ( n − c − d + cδ ) n ( n − c − d ) − c n (1 − δ ) δ = c ( n − d ) n ( n − c − d ) δ 2 . Applying Lemma 10 completes the pro of. 30

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment