Kronecker Graphs: An Approach to Modeling Networks

K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S Kr onecker graphs: An A pp r oach t o Modeling Networks Jur e Lesk ov ec J U R E @ C S . S T A N F O R D . E D U Computer Science Department, S tanfor d University Deepayan Chakrabarti D E E PA Y @ Y A H O O - I N C . C O M Y ahoo! Resear ch Jon Kleinber g K L E I N B E R @ C S . C O R N E L L . E D U Computer Science Department, Corn ell University Christos Fal outsos C H R I S T O S @ C S . C M U . E D U Computer Science Department, Carn e gie Mellon University Zoubin Ghahramani Z O U B I N @ E N G . C A M . AC . U K Department o f Engine eri n g, University of Cambridge Machine Learning Department, Carnegie Mellon U n iver sity Abstract How ca n we gen erate rea li stic networks? In addition, how can we do so with a mathematically tractable model that allows fo r r igorous an alysis of network prop erties? Real networks exhibit a long list of surprising properties: Heavy tails for the in- and out-degree distrib ution , heavy tails for the eigenv alu es an d eigenv e ctors, small diam eters, and densiﬁcation and shrink ing diameters over time. Cur rent network models and generators eithe r fail to match se veral of the above prop erties, are complicated to analyze mathematically , o r both. Here we prop ose a generative mode l for networks that is b oth mathematically tractable a nd can gen erate networks that h a ve all th e above mentioned structural p roperties. Our main id ea here is to u s e a non-stand ard matrix operation , the Kr onecker pr oduct , to generate graphs which we refer to as “Kronecker graphs”. First, we show th at Kronecker grap hs natura lly o be y co mmon network pro perties. In fact, we rig orously pr ove th at they do s o . W e also p rovide empirical evidence showing that Kr onecker graphs can effecti vely model the structure of real networks. W e then pr esent K R O N F I T , a fast an d scalable algorithm for ﬁtting the Kro necker graph g en- eration mod el to large re al networks. A naiv e appro ach to ﬁtting would take super-exponential time. In contrast, K R O N F I T takes linear time , b y exploiting the structure of Kronec k er matrix multiplication and by using statistical simulation techniqu es. Experime nts on a wide range of lar g e real and synthetic networks show that K RO N F I T ﬁnds accurate p arameters that very well mimic the pro perties o f target networks. In fact, using just four p arameters we can accurately mode l se veral aspects of global network structure. Once ﬁtted, the model parame ters can be used to gain insights abou t the network structure, and the resulting synthetic graph s can be used for null-models, ano n y mization, e xtr apolations, and grap h summa - rization. Keywords: Kron ecker graphs, Network an alysis, Network m odels, Social n etw o rks, Graph gen - erators, Graph mining, Network e volution 1 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I 1. Introduc tion What do real graph s look like? How do the y ev olve ov er time? How can w e g enerate sy nthetic, b ut realist ic looking , time-ev olving graphs? Recently , networ k analy sis has been attract ing much inter - est, with an emph asis on ﬁndin g patterns and abnormal ities in social networks, compute r netwo rks, e-mail inte ractions, gene regu latory networks, and man y more. Most of the work foc uses on static snapsh ots of gr aphs, w h ere fasc inating “la ws” hav e been disc overed , includi ng small di ameters and hea vy-tailed d egree distrib utions. In parallel with discov eries of suc h structura l “laws” there has been ef fort to ﬁnd mechan isms and model s of netw ork formation that ge nerate networks with such structur es. So, a g ood realistic netwo rk generation model is important for at least two reasons. The ﬁ rs t is that it can generate graphs for extrapol ations, hypoth esis testin g, “what-if ” scenarios , and simulation s, when real graphs are difﬁcul t or impossible to colle ct. For example , ho w well will a giv en pro tocol run on the Inte rnet ﬁv e years from no w? Accurate netw ork m o dels can produce more realistic models for the future Intern et, on which si mulation s can be run. The sec ond reason is more s ubtle. It forces us to think about networ k pro perties that gener ativ e models should obe y to be rea listic. In this paper we introduce Kronecke r graphs, a generat ive network model which obey s all the main static network p atterns that hav e ap peared in the literature (Fa loutsos et a l., 1999; Albert et al., 1999; Chakrabart i et al., 2004; Farkas et al., 2001; Mihail and Papadimitrio u , 2002; W atts and Strogatz, 1998). Our model also obe ys recen tly discov ered temporal ev olution patte rns (Lesko vec et al., 2005b, 2007a). And, cont rary to other model s that match thi s combina tion of n etwork propertie s (as for exampl e, (Bu and T owsle y, 2002; Klemm and Egu ´ ıluz, 2002; V ´ azquez, 2003; Lesko vec et al., 2005b; Zhele va et al., 2009)), Kron ecker g raphs also lead to tr actabl e ana lysis and rigorou s proofs. Furthermor e, the Kroneck er graphs generati ve process also has a nice natural interpre tation and justiﬁcat ion. Our model is based on a m a trix operatio n, the Kr onec ker pr oduc t . There are se veral known theore m s on Kroneck er p roducts. They corre spond e xactly to a signiﬁcant portion of what we want to prov e: hea vy-tailed distrib utions for in-de gree, out-d egree, eigen v alues, and eigen v ectors. W e also demonst rate how a Kron eck er graphs can match the beha vior of se veral real networks (social netwo rks, citations, w e b, interne t, and others) . While Kroneck er produ cts ha ve been studied by the algebra ic combinato rics community (see, e .g. , (Cho w , 19 97 ; Imrich, 199 8 ; Imrich and Kla v ˇ zar, 2000; Hammack, 2009)), the present work is the ﬁrst to employ this opera tion in the design of netwo rk models to match rea l dat a. Then we als o make a step further and tackle the f ollowing proble m : Giv en a lar ge real netwo rk, we wan t to generate a syn thetic graph, so that the r esulting synthetic graph matches the properties of the real netwo rk as well as pos sible. Ideally w e wou ld like: (a) A grap h generati on model that natural ly produce s network s where many prop erties that are also found in real networks naturally emer ge. (b) T he model parameter estimatio n shoul d be fa st and scala ble, so that w e can handle netw orks with millions o f nodes. (c) The result ing s et of p arameters sho uld gen erate reali stic-looking net works that match the stati stical proper ties of the tar get, re al ne tworks. In ge neral the pr oblem of mode ling network s tructure present s sev eral conceptu al and engineer - ing cha llenge s: Which generati ve model should we choo se, among the man y in the literat ure? How do we measure the goodness of the ﬁ t? (Least squ ares don’ t work well for po wer laws, for subtle reason s!) If we use lik elihood, how do we estimate it faster than in time quad ratic on the number 2 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S of nodes? Ho w do we solve the node correspon dence problem, i.e., w h ich n ode of the real netwo rk corres ponds t o w h at no de o f the syn thetic one? T o answer the abo ve questio ns we prese nt K RO N F I T , a fast and scala ble algorithm for ﬁtting Kroneck er graphs by using the maximum like lihood p rinciple. When calculati ng the like lihood there are two challen ges: First, one needs to solve the node corresp ondence probl em by matching the nodes of the real and the synthe tic netw ork. Essentially , one ha s to consider all mappings of node s of the netwo rk to the ro w s and columns of the graph adjac ency matrix. This becomes intractable for graphs with more than tens of nodes. Ev en when giv en the “true ” node correspond ences, just e valua ting the like lihood is still prohibi tiv ely expen sive for lar ge graphs that we consider , as one needs to ev aluate the probabili ty of each possib le edge. W e present solutio ns to both of these proble m s : W e dev elop a Metropol is sampling algorith m for samplin g node correspo ndences, and approx imate the li kelihood to obtai n a lin ear time algorith m for Kron ecker graph model parameter estimatio n that scales to lar ge network s with millions of node s and edges. K RO N F I T giv es orders of magnitu de speed- ups against older methods (20 minutes on a commodity PC, versu s 2 days on a 50-mach ine c luster). Our extensi ve experiment s on synthetic and real networ ks sho w that Kroneck er graphs can ef ﬁciently m o del statistical properties of networ ks, like de gree dis tributi on and diameter , while using only four paramet ers. Once the model is ﬁtted to the real networ k, there are sev eral beneﬁts and applicatio ns: (a) Network structur e: the parameters gi ve us insight into the global structure of the networ k itself. (b) Null-model: when workin g with networ k data we would oft en like to assess the signiﬁcance or th e extent to which a certai n netwo rk property is express ed. W e can use Kronecke r graph as an accura te null-mode l. (c) Simulation s: giv en an algori thm working on a graph w e would like to ev aluate ho w its per - formance depends on v arious prope rties of the network. Using our model one can genera te graphs that exh ibit v arious combinati ons of such properties , and then ev aluate the algorithm. (d) Extrapo lations: we can use the model to generate a lar ger graph, to hel p us unders tand how the networ k will look lik e in the futu re. (e) Sampling: con v ersely , we can also generat e a smaller graph, which may be useful for run- ning simulation exp eriments ( e.g. , simulating routin g algorit hms in computer networks, or virus/ worm prop agation algorith ms), w h en these algorithms may be too slo w to run on lar ge graphs . (f) Graph similarity: to compare the similarit y of the structure of dif ferent networks (ev en of dif ferent si zes) one can use t he di fferen ces i n estimated parameters as a similarity measure. (g) Graph visualiza tion and compr ession: we ca n compres s the grap h, by s toring just the mod el paramete rs, and the de viations bet w e en the real and the synthetic graph. Similarly , for visual - ization pur poses one can u se the struct ure of the paramete r matrix to visu alize the backbon e of the network , and then disp lay the edge s that de viate from t he ba ckbone st ructur e. 3 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I (h) Anonymizatio n: suppo se that the real gr aph canno t be publ icized, like, e.g. , corporate e-mai l netwo rk or customer -product sales in a re commendation sys tem. Y et, we would like to share our netwo rk. Our work gi ves ways to suc h a rea listic, ’ si m il ar’ n etwork. The current paper build s on our pre vious work on Kroneck er graphs (Leskov ec et al . , 2005a; Lesko vec and Falout sos , 2007) and is orga nized as follows: Section 2 brieﬂy surve ys the related literat ure. In s ection 3 we introdu ce the Kron ecker graph model, and giv e formal sta tements about the properties of networks it generates. W e in ve stigate the model using simulation in Section 4 and contin ue by introd ucing K RO N F I T , the Kronecke r graphs parameter estimation algorithm, in Section 5. W e present experimen tal resu lts on a wide range of real and syntheti c netwo rks in Section 6. W e close with d iscussion an d con clusions in se ctions 7 and 8. 2. Relation to pre vio us work on netw ork modeling Network s across a wide ra nge of domains present surp rising re gularities, such as po wer laws, small diameter s, communities, and so on. W e use these patte rns as sanity checks , that is, our syntheti c graphs shoul d m a tch those pr operties o f the re al ta rget graph. Most of the related work in this ﬁeld has concentr ated on two aspects: propertie s and pat- terns fo und in re al-world networ ks, and then ways to ﬁnd mod els to b uild unde rstanding about the emer gence of these proper ties. First, we will discuss the commonly found patterns in (static and temporal ly ev olving ) graphs, and ﬁnally , th e sta te of the art in grap h gene ration method s. 2.1 Graph Patter n s Here we brieﬂy introduc e the netwo rk patterns (also referr ed to as properties or stati stics) that we will later use to compare the similari ty between the real networks and their synthetic counterp arts produ ced by the Kroneck er graphs model. While many patte rns hav e been discov ered, two of the princi pal o nes are h eavy-t ailed degre e distri bution s and small diameters. De gr ee distrib ution: The de gree-distr ibutio n of a graph is a po wer law if the numbe r of nodes N d with degree d is gi ven by N d ∝ d − γ ( γ > 0) w h ere γ is called the po wer law expone nt. Power laws hav e bee n found in the Int ernet (Falout sos et al., 1999), the W eb (Kleinber g et al., 1999; Broder et al., 2000), citation graphs (Redner, 1998), online social networks (Chakrabar ti et al., 2004 ) and many o thers. Small diamet er: Most real-w orld graphs exhibit relat ivel y small diameter (the “small- world” pheno m e non, or “six de grees of s eparation” (Mil gram , 1967)): A graph has diameter D if ev ery p air of nodes can be conne cted by a path of length at most D edges . The diameter D is susceptibl e to out- liers. Thus, a more r obust mea sure of the pair wise dist ances between no des in a graph is the inte ge r ef fective diameter (T auro et al., 2001), which is the minimum nu m b er of link s (steps/hops ) in which some fraction (or quantile q , say q = 0 . 9 ) of al l connected pai rs of nod es can re ach each other . H e re we make use of ef fective diameter which we deﬁne as follows (Lesko vec e t al. , 200 5b ). For each natura l number h , let g ( h ) denote the fract ion of conne cted node pairs whose shortest connecting path has length at most h , i.e. , at most h hops away . W e th en conside r a functi on deﬁned over all positi ve real numbers x by linearly interpo lating bet w e en the point s ( h, g ( h )) a nd ( h + 1 , g ( h + 1)) for each x , where h = ⌊ x ⌋ , and w e d eﬁne the ef fective d iameter of the networ k to be the val ue x at which t he function g ( x ) achie ves the v alue 0.9. The ef fecti ve diamet er has been found to be small 4 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S for lar ge real-wor ld graph s, lik e Interne t, W eb, and online social networks (Albert and Barab ´ asi, 2002; Milgram, 1967; Lesko vec et al., 2005b). Hop-plot : It extends the notion of diameter by plotting the number of reach able pairs g ( h ) within h hops, as a function of the number of hops h (Palmer et al., 2002). It giv es us a sense of ho w quickly nodes’ n eighbo rhoods expa nd with the number of hops. Scr ee pl ot: This is a p lot of the eigen v alues (or singular v alues) of the gra ph adjacen cy matrix, ver sus their rank, using the loga rithmic scale. The scree plot is also often found to approxima tely obe y a power law (Chakrabart i et al., 2004; Farkas et al . , 2001). Moreo ver , this pattern was also found analytically for random po wer la w graphs (Mihail and Papadi m it riou , 2002; Chung et al., 2003). Network values : The distrib ution of eigen ve ctor componen ts (indicators of “network v alue”) associ ated to th e lar gest eigen v alue of t he graph adjacenc y matrix has also be en found to be sk ewed (Chakrab arti et al., 2004). Node triangl e participati on: Edges in real-world network s and especially in social network s tend to cluster (W atts and Strogatz, 1998) and f orm triads of connected nodes. Node trian gle partic- ipatio n is a measure of transiti vity in networ ks. It count s the number of t riangles a n ode participat es in, i.e. , the number of connec tions between the neighbors of a node. The plot of the number of triang les ∆ versus the number of nodes that participa te in ∆ triangle s has also been found to be ske wed (Tsourakak is , 2008). Densiﬁca tion power law: The relation between the number of edges E ( t ) and the number of nodes N ( t ) in ev olving network at time t obeys the densiﬁca tion power law (DPL), which states that E ( t ) ∝ N ( t ) a . T h e densiﬁc ation expon ent a is typical ly greater than 1 , implyin g that the a verage deg ree of a node in the net work is incr easing ov er time (as the netw ork gain s m o re nodes and edges). This means that real netw orks tend to sprout many more edges than nodes, and thus densif y as they gro w (Lesko vec et al., 2005b, 2007a). Shrink ing diameter: The eff ectiv e diamete r of graphs tends to shrink or stabilize as th e numb er of nodes in a network grows ov er time (Lesko vec et al., 2005b, 2007a). This is somewh at coun- terintu itiv e since from common exp erience as o ne wou ld expe ct that as the volu m e of th e object (a graph) gro ws, the size ( i.e. , the diameter ) would also gro w . But for real networ ks th is do es not h old as the diamete r shrinks and then se ems to s tabilize as th e ne twork g rows. 2.2 Generativ e models of network str u ct u r e The earliest probabil istic generati ve mod el for graphs was the Erd ˝ os-R ´ en yi (Erd ˝ os and R ´ eny i , 1960) random graph model, where each pair of nodes has an identica l, independ ent probabilit y of being joined by an edg e. The study of this model ha s led to a rich math ematical theory . H o we ver , as t he model was not de velop ed to model real-world network s it produ ces graphs that fa il to match real netwo rks in a number of respec ts (for exampl e, it does not produce heav y-tailed degr ee distrib u- tions) . The v ast majority of recent networ k models in vol ve some form of pr efer enti al attac hment (Barab ´ asi a nd Albe rt , 1999; Albert and Barab ´ asi, 2002; W inick and Jamin, 2002; Kleinber g et al., 1999; Kumar et al., 1999; Flaxman et al., 2007) that employ s a simple rule: new node joins the graph at each time step, and then creates a connectio n to an existi ng node u with the probabi lity propo rtiona l to th e deg ree of the nod e u . This lea ds to the “ric h get richer” phe nomena and to po wer 5 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I law tail s in de gree di strib ution. Howe ver , the diameter in this model gro ws slo wly w it h the number of node s N , which violates the “shrinkin g diameter” property mentioned abov e. There are also many v ariations of prefer ential attachment model, all someho w employing the “rich get ric her” type mechanism, e.g., the “cop ying mode l” (K umar et al., 2000), the “win ner d oes not take all” model (Pennock et al . , 2002), the “forest ﬁre” model (Lesko vec et al., 2005b), the “rando m surfer model” (Blum et al., 2006), etc. A differ ent family of network methods striv es for small diameter and local clustering in net- works . Example s of such models include the small-world model (W a tts an d Strog atz, 1998) and the W axman gene rator (W axman, 1988). A n other family of mode ls shows th at heavy tails emer ge if nodes try to optimize their connect ivity unde r reso urce constrain ts (Carlson and Doyl e , 1999; Fabrik ant et al., 20 02). In summary , most current models focus on modeling only one (static) networ k property , and neg lect the othe rs. In additio n, it is usually hard to analytic ally analyze propertie s of the netw ork model. On the other hand, the Kroneck er graph model we describe in the next section addresses these issue s as it ma tches multiple prope rties of real n etworks at the same time, while be ing anal yt- ically tractab le and lending it self to ri gorous ana lysis. 2.3 Parameter estimation of network models Until re cently relati vely little ef fort w as made to ﬁt the above network mod els to real data. One of the dif ﬁculties is that mos t of the abov e models usually deﬁne a mec hanism or a princip le by which a netwo rk is con structed, and thus parame ter estimation is either trivia l or almost imposs ible. Most w ork in estimating netwo rk models comes from the area of social scien ces, statis tics and social netw ork analysis where the e xponential random graphs , also kno wn as p ∗ model, were in- troduc ed (W asserman and Pattison, 1996). The model essentially deﬁnes a log linear model over all possible graphs G , p ( G | θ ) ∝ exp( θ T s ( G )) , where G is a graph, and s is a set of functio ns, that can be viewed as summary statistic s for the structural features of the netwo rk. The p ∗ model usuall y focuses on “local” structural features of networ ks (lik e, e.g. , characte ristics of nodes that determin e a presen ce of an edge, link recip rocity , etc.). As expon ential random graphs ha ve been ver y useful for modeling small networ ks, and indi vidual nodes and edges, our goal here is dif ferent in a sense tha t we aim to acc urately model the structure of the netwo rk as a whole . Moreov er , we aim to model and estimate parame ters of network s with millions of nodes, while ev en for graph s of small size ( > 100 nodes) the number of model parameter s in ex ponential random g raphs usually becomes too la rge, and e stimation prohibiti vely expensi ve, both in terms of computa tional t ime and memory . Regar dless of a particular choice of a network model, a common theme w h en estimating the like lihood P ( G ) of a gr aph G under some mode l is the chall enge of ﬁnd ing the corresp ondence be- tween the nodes of the true network and its syntheti c count erpart. The node corresp ondence prob lem results in the fac torially many possible matchin gs of nodes. One can think of the correspo ndence proble m as a test o f graph i somorphism. T wo isomorp hic graphs G and G ′ with differ ently assigned node IDs shoul d hav e same likeliho od P ( G ) = P ( G ′ ) so we aim to ﬁnd an accurate mapping between the node s of the two graph s. An orderi ng or a permutat ion deﬁnes the mapping of nodes in one netwo rk to nodes in the other netw ork. For e xample, Butts (Butts, 20 05 ) u sed permutatio n sampling to d etermine similarity between two g raph adjacenc y matrices, while Bez ´ ako v ´ a et al. (Bez ´ ak ov ´ a et al., 2006) use d permu- 6 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S S Y M B O L D E S C R I P T I O N G Real netw ork N Number of nodes in G E Number of edges in G K Kroneck er gra ph (syn thetic esti mate of G ) K 1 Initiat or of a Kroneck er gra phs N 1 Number of nodes in initia tor K 1 E 1 Number of edges in K 1 (the exp ected numbe r of edge s in P 1 , E 1 = P θ ij ) G ⊗ H Kroneck er pro duct of adja cency mat rices of g raphs G and H K [ k ] 1 = K k = K k th Kroneck er po wer of K 1 K 1 [ i, j ] Entry at ro w i and column j of K 1 Θ = P 1 Stochast ic Kronecker initiato r P [ k ] 1 = P k = P k th Kroneck er po wer of P 1 θ ij = P 1 [ i, j ] Entry at ro w i and column j of P 1 p ij = P k [ i, j ] Probabil ity of an edge ( i, j ) i n P k , i.e. , e ntry at ro w i and column j of P k K = R ( P ) Realizati on of a Stochast ic Kroneck er grap h P l (Θ) Log-lik elihood. Log-prob . that Θ generat ed re al gra ph G , log P ( G | Θ) ˆ Θ Paramete rs at max imum lik elihood, ˆ Θ = argmax Θ P ( G | Θ) σ Permutation that maps nod e IDs of G to those of P a Densiﬁcatio n po wer law exp onent, E ( t ) ∝ N ( t ) a D Diameter of a grap h N c Number of nodes in the lar gest w e akly connected co m p onent of a gr aph ω Proportio n of times SwapNo des permuta tion propo sal distrib ution is u sed T able 1: T able of symbo ls. tation s for graph mod el select ion. Recently , an appr oach for estimating parame ters of th e “cop ying” model was introduced (W iuf et al., 2006), ho wev er authors also note that the class of “cop ying” models may not be rich enough to accurately model real networks. As we sho w later , Kronecker graph model seems to ha ve the necessary expre ssiv e po w er t o mimic rea l net works w e ll. 3. Kroneck er graph model The K ro necker graph model we propose here is based on a recursi ve constructi on. Deﬁning the recurs ion proper ly is somewha t subtle, as a number of s tandard, related gr aph construct ion methods fail to produce grap hs that densify acc ording to the pa tterns observ ed in real netw orks, and they also produ ce graphs w h ose diameters increase. T o produc e densifying graphs w it h constant/sh rinking diameter , and th ereby matc h the qualit ativ e beha vior of a real ne twork, we de velop a p rocedure t hat is best describ ed in t erms of t he Kr on eck er pr oduct of matrices. 3.1 Main idea The m a in intui tion behind the model is to create self-similar gr aphs, recursi vely . W e begin with an initiat or graph K 1 , with N 1 nodes a nd E 1 edges, and by recurs ion we pr oduce successi vely larg er 7 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I graphs K 2 , K 3 , . . . such that the k th graph K k is on N k = N k 1 nodes . If we want these graphs to exhib it a version of the Densiﬁcation power la w (Lesk ovec et al., 2005b), then K k should hav e E k = E k 1 edges. This is a property that requires some care in order to get right, as standard recursi ve constr uctions (for e xample, th e tradit ional Cartesian product or the construc tion of (Bar ab ´ asi et a l., 2001)) do not yield graph s satisfy ing the densiﬁca tion po wer law . It turns out that the Kr oneck er pr odu ct of two matrices is the right tool for this goal. The Kroneck er pro duct is deﬁned as follo ws: Deﬁnition 1 (Kro n ecker product of matrices) Given two matrices A = [ a i,j ] and B of sizes n × m an d n ′ × m ′ r espective ly , the Kr onec ker pr oduc t matrix C of dimension s ( n · n ′ ) × ( m · m ′ ) is given by C = A ⊗ B . =      a 1 , 1 B a 1 , 2 B . . . a 1 ,m B a 2 , 1 B a 2 , 2 B . . . a 2 ,m B . . . . . . . . . . . . a n, 1 B a n, 2 B . . . a n,m B      (1) W e then deﬁne the Kroneck er product of two grap hs simply as the Kroneck er product of their corres pondi ng adjacen cy matrice s. Deﬁnition 2 (Kro n ecker product of graphs (W eichsel , 19 62 )) If G and H ar e gr aphs with adja- cency matr ices A ( G ) and A ( H ) r espectivel y , then the Kr on eck er pr odu ct G ⊗ H is deﬁned as the gra ph with adj acency matri x A ( G ) ⊗ A ( H ) . Observ ation 1 (Edges in Kr onecker -multiplied graphs) Edge ( X ij , X k l ) ∈ G ⊗ H if f ( X i , X k ) ∈ G and ( X j , X l ) ∈ H wher e X ij and X k l ar e nodes in G ⊗ H , and X i , X j , X k and X l ar e the corr espo nding nodes in G and H , as in F igur e 1 . The last observ ation is crucial, and deserv es el aboration. Basically , each node in G ⊗ H can be repres ented as an or dered pair X ij , with i a node of G and j a node of H , and with a n edg e joi ning X ij and X k l precis ely when ( X i , X k ) is an e dge of G and ( X j , X l ) is an e dge of H . This i s a dir ect conseq uence of the hierar chical nature of the Kronecke r product. Figure 1(a– c) further illustr ates this by sho wing the recursi ve constructio n of G ⊗ H , when G = H is a 3-node chain. Consider node X 1 , 2 in Figure 1(c): It belongs to the H graph t hat re placed no de X 1 (see Figure 1(b)), and in fact is th e X 2 node ( i.e . , the center ) within this small H -graph. W e propose to produce a gro wing seq uence of matric es by iterat ing the Kronec ker prod uct: Deﬁnition 3 (Kro n ecker power) The k th power of K 1 is deﬁn ed as the matrix K [ k ] 1 (abbr e viated to K k ), such tha t: K [ k ] 1 = K k = K 1 ⊗ K 1 ⊗ . . . K 1 | {z } k times = K k − 1 ⊗ K 1 8 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 X X X 1 2 3 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 X 3,3 X 2,3 Central node is X 2,2 X 3,1 X 3,2 X 1,1 X 1,2 X 1,3 X 2,1 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 000 000 000 000 000 111 111 111 111 111 (a) Graph K 1 (b) Intermedi ate stage (c) Graph K 2 = K 1 ⊗ K 1 1 1 0 1 1 1 0 1 1 K 1 K 1 K 1 K 1 K 1 K 1 K 1 0 0 (d) Adjace ncy m a trix (e) Adjacenc y matrix of K 1 of K 2 = K 1 ⊗ K 1 Figure 1: Example of K r oneck er multipli cation: T o p: a “3-chain” initiator graph a nd it s Krone cker produ ct with itself. Each of the X i nodes gets expand ed into 3 nodes, which are then link ed using Observa tion 1. Bottom row: the corresp onding adjacenc y matrices. See ﬁgure 2 for adjacen cy matrice s of K 3 and K 4 . Deﬁnition 4 (Kro n ecker graph) Kr onec ker graph of or der k is deﬁned by the adjacency matrix K [ k ] 1 , wher e K 1 is the Kr onec ker initiator a djacency m a trix. The self-similar nature of the Kronecker graph product is clear: T o produce K k from K k − 1 , we “exp and” (replace ) each node of K k − 1 by con ve rting it into a cop y of K 1 , and w e join these copies togeth er according to the adjacencie s in K k − 1 (see Figure 1). This process is very natu ral: one can imagine it as positin g that communiti es within the graph gro w recursi vely , w i th nodes in the community re cursiv ely getting expande d into miniatu re copies of the community . Nodes in the sub-co m mu nity then link among themselv es an d als o to node s from othe r communitie s. Note that there are man y differ ent names to refer to K ro necker product o f grap hs. Other na m e s for the Kronecker produc t are tensor product, categoric al product, direct product, cardinal prod- uct, relation al product, conju nction, weak direct product or just produc t, and ev en Cartesian prod- uct (Imrich and Kla v ˇ zar , 20 00). 3.2 Analysis of Kronecker gra p hs W e shall now discuss the properti es of Kronecker graphs , speciﬁcally , their deg ree distrib ution s, diameter s, eige n value s, eigen ve ctors, and time-ev olutio n. Our ability to pro ve analytica l results about all of these prope rties is a major adv antage o f K ro necker graphs over o ther net work mod els. 9 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I (a) K 3 adjace ncy m a trix ( 27 × 27 ) (b) K 4 adjace ncy m a trix ( 81 × 81 ) Figure 2: Adjacenc y matrices of K 3 and K 4 , the 3 r d and 4 th Kroneck er power of K 1 matrix as deﬁned in Figure 1. Dots represent non-zero matrix entries , and white space represen ts zeros. N o tice the recursi ve s elf-similar str ucture of th e adj acency m a trix. 3 . 2 . 1 D E G R E E D I S T R I B U T I O N The next few theorems pro ve th at se veral d istribu tions o f interes t are multinomia l for our Kroneck er graph model. This is impor tant, because a carefu l choice of the initial graph K 1 makes the re sult- ing multinomial distrib ution to beha ve lik e a power law or Discrete Gaussian E x ponential (DGX) distrib ution ( Bi et al., 2001; Clauset et al., 2007). Theor em 5 (Multinomial degr ee distributio n ) Kr onec ker graph s have multinomial de gre e dis tri- b utions, for b oth in- an d ou t-de gr ees. Pro of Let the initiator K 1 ha ve the de gree sequence d 1 , d 2 , . . . , d N 1 . Kroneck er multipli cation of a node with d egree d expa nds it into N 1 nodes , with the co rresponding deg rees being d × d 1 , d × d 2 , . . . , d × d N 1 . After Kronecke r power ing, the degre e of each node in graph K k is of the form d i 1 × d i 2 × . . . d i k , with i 1 , i 2 , . . . , i k ∈ (1 . . . N 1 ) , and th ere is on e node for each ordered combi- nation . This gi ves us the multinomial distr ibutio n on the de grees of K k . So, graph K k will ha ve multinomia l degree distrib ution where the “ev ents” (degree s) of the distrib ution will be combina- tions of degree products : d i 1 1 d i 2 2 . . . d i N 1 N 1 (where P N 1 j =1 i j = k ) and e vent (degree) probabili ties w il l be proportion al to  k i 1 i 2 ...i N 1  . Note also that this is equi v alent to no ticing that the de grees of nodes in K k can be expre ssed as the k th Kroneck er po wer of the vecto r ( d 1 , d 2 , . . . , d N 1 ) . 10 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S Initiat or K 1 K 1 adjace ncy matrix K 3 adjace ncy m a trix Figure 3: T wo example s of Kron ecker initiators on 4 node s and the s elf-similar ad jacency matrices the y produce. 3 . 2 . 2 S P E C T R A L P RO P E RT I E S Next we analyze the spectral prop erties of adjacen cy matrix of a K ro necker graph. W e sho w that both the distrib ution of eigen v alues and the distrib ution o f compo nent v alues of e igen vecto rs of the graph adjacen cy matrix foll o w mul tinomial distrib utions. Theor em 6 (Multinomial eige nv alue distr ibution) The Kr onec ker graph K k has a multinomial distrib ution f or it s eigen v alues. Pro of Let K 1 ha ve the eigen v alues λ 1 , λ 2 , . . . , λ N 1 . By propertie s of the Kronecker multiplica- tion (Loan, 2000; Langvil le and Ste wart, 20 04 ), the eige n value s of K k are th e k th Kroneck er power of the vecto r of eigen v alues of the initiator matrix, ( λ 1 , λ 2 , . . . , λ N 1 ) [ k ] . As in Theorem 5, the eigen- v alue distrib ution is a multinomial . A similar ar gument us ing pro perties of Kronec ker mat rix mul tiplication sh ows the follo wing. Theor em 7 (Multinomial eige nv ector dist rib u ti on) The componen ts of eac h eigen vec tor of the Kr onec ker graph K k follow a multinomia l distri bution . Pro of Let K 1 ha ve the eigen v ectors ~ v 1 , ~ v 2 , . . . , ~ v N 1 . By proper ties of the K r onecker multipli - cation (Loan, 2000; Langville and Stew art, 2004), the eige n vectors of K k are gi ven by the k th 11 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I Kroneck er power of the vector : ( ~ v 1 , ~ v 2 , . . . , ~ v N 1 ) , which giv es a m u ltinomial distrib ution for the compone nts of each eigen v ector in K k . W e ha ve just co vered sev eral of the static graph patterns. Notice that the proofs were a direct conseq uences of the Kroneck er multip lication prop erties. 3 . 2 . 3 C O N N E C T I V I T Y O F K R O N E C K E R G R A P H S W e n ow p resent a series of results on t he conn ectivi ty of Kron ecker grap hs. W e sho w , m a ybe a bit surpri singly , that e ven if a Kroneck er initiator graph is connected its Kroneck er power can in fac t be disconn ected. Lemma 8 If at least one of G and H is a disconn ected grap h, the n G ⊗ H is also disconnec ted. Pro of Wi thout loss of genera lity we can assume that G has two connecte d components, while H is connect ed. F ig ure 4(a) illustrat es the correspon ding adjacen cy matrix for G . Using the nota- tion from Observ ation 1 let graph let G hav e nodes X 1 , . . . , X n , where nodes { X 1 , . . . X r } and { X r +1 , . . . , X n } form the two connected components. N o w , note that ( X ij , X k l ) / ∈ G ⊗ H for i ∈ { 1 , . . . , r } , k ∈ { r + 1 , . . . , n } , and all j , l . This follo ws directly from Observ ation 1 as ( X i , X k ) are not edges in G . Thus, G ⊗ H must at least two con nected compon ents. Actually it turns out that both G and H can be conn ected while G ⊗ H is disconn ected. T h e follo wing theorem a nalyze s this case. Theor em 9 If both G and H ar e c onnected but bipa rtite, the n G ⊗ H is disconnec ted, and each of the two connec ted componen ts is again bipa rtite. Pro of Again without loss of genera lity let G be bipar tite w it h two partitions A = { X 1 , . . . X r } and B = { X r +1 , . . . , X n } , where edges exists only between the partition s, and no edge s exist inside the partiti on: ( X i , X k ) / ∈ G for i, k ∈ A or i, k ∈ B . Similarly , let H also be bipartit e with two partitions C = { X 1 , . . . X s } and D = { X s +1 , . . . , X m } . F ig ures 4 (b ) and (c) illustrate the structu re of the corresp onding adjacen cy matrice s. No w , t here will be t wo connec ted componen ts in G ⊗ H : 1 st compone nt will be composed of nodes { X ij } ∈ G ⊗ H , where ( i ∈ A, j ∈ D ) or ( i ∈ B , j ∈ C ) . And similarly , 2 nd compone nt will be composed of nodes { X ij } , where ( i ∈ A, j ∈ C ) or ( i ∈ B , j ∈ D ) . Basically , there exi st edges between node sets ( A, D ) and ( B , C ) , and similarly between ( A, C ) and ( B , D ) b ut not across the sets. T o see this we ha ve to analyze the cases using Observ ation 1. For example , in G ⊗ H there exist edg es between nodes ( A, C ) and ( B , D ) as there exist edges ( i, k ) ∈ G for i ∈ A, k ∈ B , and ( j, l ) ∈ H for j ∈ C and l ∈ D . Similar is true for nodes ( A, C ) and ( B , D ) . Ho wev er , there are no edges cross the two sets, e .g. , nodes fro m ( A, D ) do not link to ( A, C ) , as there are no e dges between nodes in A (sin ce G is bipartite ). S ee Figur es 4(d) and 4(e) for a visua l proof. Note that bi partite graphs are trian gle free and ha ve no self-lo ops. Stars, chai ns, trees and cycl es of ev en leng th are all e xamples of bipartite g raphs. In order to ensur e that K k is connect ed, for the remained of the paper we focus on initiat or gr aphs K 1 with self loops on all of the verti ces. 12 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S (a) Adjacenc y matrix (b) Adjacenc y matrix (c) Adjace ncy matrix when G is discon nected when G is bipart ite when H is bipartite (d) Kroneck er pro duct of (e) Rearrange d adjace ncy two bip artite grap hs G and H matrix from panel (d) Figure 4: Graph adjacen cy matrices. Dark parts represent connect ed (ﬁlled with ones) and white parts represent empty (ﬁlled with zeros) parts of the adjacen cy matrix. (a) W h en G is discon nected, K r onecker multiplication w it h any matrix H will result in G ⊗ H bei ng discon nected. (b) Adjacency matrix of a connected bipa rtite graph G with node p artitions A a nd B . (c) Adjacenc y matrix of a conne cted bipartite graph G with nod e partition s C and D . (e) Krone cker product of tw o bipartite graphs G and H . (d) After re arranging the adjace ncy matrix G ⊗ H we clearly see the resulting graph is disconnect ed. 3 . 2 . 4 T E M P O R A L P R O P ER TI E S O F K RO N E C K E R G R A P H S W e continu e with th e analysis of tempora l pat terns of e volut ion of Kron ecker graphs: the de nsiﬁca- tion po wer l aw , an d shr inking/stab ilizing diameter (Lesko vec et al., 2005b, 2007a). Theor em 10 (Densiﬁca tion po wer law) Kr oneck er graphs follow the densiﬁ cation power law (DPL) with densi ﬁcation ex ponent a = log( E 1 ) / log ( N 1 ) . Pro of Since the k th Kroneck er po w er K k has N k = N k 1 nodes and E k = E k 1 edges, it satisﬁes E k = N a k , wher e a = log( E 1 ) / log ( N 1 ) . The cruc ial p oint is that this e xponent a is independe nt of k , a nd hence the sequence of Kronec ker p owers foll ows an ex act version of the d ensiﬁcation p ower law . W e no w show how the Kronecke r product also preserv es the property of constan t diameter , a crucia l ingredient for matching the diameter properties of many real-world network dataset s. In 13 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I order to establish this, we will assume that the initiator graph K 1 has a self-loop on ev ery node. Otherwise, its Kroneck er po wers may be disconnec ted. Lemma 11 If G and H each have diameter at most D a nd each has a self-loop on every node, the n the Kr onec ker graph G ⊗ H also has diameter at most D . Pro of Each nod e in G ⊗ H can be repr esented as a n ordered pair ( v , w ) , with v a node of G and w a node of H , an d with an edge joining ( v , w ) and ( x, y ) pr ecisely when ( v , x ) is an edge of G and ( w, y ) is an edge o f H . (Note thi s exactly the Observ ation 1.) Now , for an arbitr ary pair of nodes ( v , w ) and ( v ′ , w ′ ) , we must show th at there is a path of len gth at most D connectin g them. Since G has diameter a t m o st D , there is a path v = v 1 , v 2 , . . . , v r = v ′ , where r ≤ D . If r < D , we can con v ert this into a path v = v 1 , v 2 , . . . , v D = v ′ of le ngth ex actly D , by simply rep eating v ′ at th e end for D − r times. By an anal ogous ar gument, we hav e a path w = w 1 , w 2 , . . . , w D = w ′ . Now by the deﬁniti on of the Kronecke r produ ct, there is an edge join ing ( v i , w i ) and ( v i +1 , w i +1 ) for all 1 ≤ i ≤ D − 1 , and so ( v, w ) = ( v 1 , w 1 ) , ( v 2 , w 2 ) , . . . , ( v D , w D ) = ( v ′ , w ′ ) is a path o f length D conne cting ( v, w ) to ( v ′ , w ′ ) , as requi red. Theor em 12 If K 1 has dia m et er D and a self- loop on e very no de, then for every k , the gr aph K k also has diameter D . Pro of This follo ws dir ectly from the pre vious lemma, combined with induction on k . As deﬁned in section 2 we al so consider the ef fective diameter D ∗ . W e deﬁned the q -e ffecti ve diameter as the m i nimum D ∗ such that, for a q fra ction of the reachab le node pairs, the pat h length is at most D ∗ . The q -effect ive diameter is a more rob ust quan tity than the diameter , the latter being prone to the ef fects of degenera te structure s in the graph ( e.g . , very long chains) . Ho w e ver , the q - ef fectiv e diameter and diameter tend to ex hibit qualitati vely simila r behavi or . For reporting results in su bsequent section s, we will generally con sider the q -ef fecti ve diamete r w i th q = 0 . 9 , an d refer to this simply as the ef fective di ameter . Theor em 13 (Effe ctive Diameter) If K 1 has diameter D and a self-loop on every node , then for eve ry q , the q -ef fective diameter of K k con ver g es to D (fr om belo w ) as k inc rea ses. Pro of T o p rove th is, it is s ufﬁcient to sho w that for two rand omly selec ted nod es of K k , the proba - bility that their distan ce is D con ver ges to 1 as k goe s to inﬁnity . W e esta blish this as follo ws. Each no de in K k can be represented as an ordered sequenc e of k nodes from K 1 , and we can view the random selection of a node in K k as a sequence of k inde- pende nt rando m node selec tions from K 1 . Suppose that v = ( v 1 , . . . , v k ) and w = ( w 1 , . . . , w k ) are two such rando m ly select ed nodes from K k . Now , if x and y are two nodes in K 1 at dista nce D (such a p air ( x, y ) exists since K 1 has d iameter D ), th en with probab ility 1 − (1 − 1 N 2 1 ) k , th ere is some index j for which { v j , w j } = { x, y } . If there is s uch an index , then the dist ance between v and w is D . As the express ion 1 − (1 − 1 N 2 1 ) k con v erges to 1 as k increases, it follo ws that the q -ef fecti ve diameter is con ver ging to D . 14 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 10 0 10 1 10 2 10 3 10 4 10 1 10 2 10 3 10 4 Count k, Node degree 10 -2 10 -1 10 0 10 1 10 2 10 3 Network value Rank (a) Kronec ker (b) Deg ree d istribu tion of K 6 (c) Network v alue of K 6 initiat or K 1 ( 6 th Kroneck er po wer of K 1 ) ( 6 th Kroneck er po wer of K 1 ) Figure 5: The “staircase” ef fect. Kroneck er initiat or and th e de gree dist ributio n and network v alue plot for the 6 th Kroneck er po wer of the initiato r . Notice the non-s m o othness of the curv es. 3.3 Stochastic Kronecke r gra p hs While the Kroneck er po wer constructio n discussed so far yield s graph s w it h a ran ge o f des ired pro p- erties, its discrete natur e produces “stairc ase ef fects” in th e degree s and spectral quan tities, simply becaus e indivi dual val ues ha ve larg e multiplicit ies. F or exa m p le, deg ree distrib ution and distri- b ution of eigen v alues of graph adjacency m a trix and the distrib ution of the princip al eigen v ector compone nts ( i.e. , the “network” valu e) are all impacted by this. These quantiti es are multinomi- ally distrib uted which leads to indi vidual values with lar ge multiplici ties. Figure 5 illustrate s the stairca se ef fect. Here we propose a stochastic version of Kronecke r graphs that eliminates this effec t. There are many possible ways ho w one could introduce stocha sticity into K ro necker graph model. Be- fore introducing the prop osed model, we intr oduce two simpl e ways of introducing randomn ess to Kroneck er gra phs and descri be why the y d o no t work. Probably the simplest (bu t wrong) idea is to generate a lar ge determin istic Kroneck er graph K k , and then unifo rmly at random ﬂip some edge s, i.e. , unifo rmly at random select entries of the graph adjace ncy matrix and ﬂ i p them ( 1 → 0 , 0 → 1 ). Howe ver , this will not work, as it will essential ly superimpose an Erd ˝ os-R ´ eny i random graph, w h ich wou ld, for example , corrupt the deg ree distrib ution – real network s usually hav e hea vy tailed deg ree distrib utions, while random graphs hav e B in omial degree distrib utions . A second idea could be to allow a weighted initiator matrix, i.e. , val ues of entries of K 1 are not restricte d to valu es { 0 , 1 } but rathe r can be any non- neg ativ e real number . Using such K 1 one would genera te K k and the n thresh old the K k matrix to obtain a binary adjacenc y matri x K , i.e . , fo r a chose n val ue of ǫ set K [ i, j ] = 1 if K k [ i, j ] > ǫ else K [ i, j ] = 0 . This al so would not w ork as the mechanism would selecti vel y remov e ed ges a nd thus the lo w d egree nodes which would ha ve lo w weig ht ed ges wou ld g et iso lated ﬁrst . No w we deﬁne Sto chastic Kr onec ker graph m odel which o vercomes th e abov e issues. A more natura l way to intro duce stoch asticity to Krone cker gra phs is to r elax the assumption that entries of the init iator matrix take on ly bina ry v alues. Instead we allo w entries of the initi ator to tak e value s on the interv al [0 , 1] . This means now each entry of the initiator matrix encodes the probability of th at particular edge a ppearing. W e then Kroneck er-p ower such initiator matrix to obtain a lar ge 15 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I stocha stic adjace ncy m a trix, where again each entry of th e large matrix gi ves the pro bability of t hat particu lar edge appearin g in a big graph. Such a stochastic adjacenc y matrix deﬁnes a probabil ity distrib ution ov er all graphs. T o obtain a graph w e simply sample an i nstance from th is distrib ution by sampling indi vidual edg es, where eac h edge ap pears indepen dently with prob ability giv en by the entry of the lar ge sto chastic adj acency m a trix. More formally , we de ﬁ n e: Deﬁnition 14 (Stochastic Kr onecker graph) Let P 1 be a N 1 × N 1 probab ility matrix : the value θ ij ∈ P 1 denote s the pr obab ility that edge ( i, j ) i s pr esent, θ ij ∈ [0 , 1] . Then k th Kr onec ker power P [ k ] 1 = P k , wher e eac h entry p uv ∈ P k encod es the pr obability of an edge ( u, v ) . T o obta in a gr aph, an instance (or realizatio n ), K = R ( P k ) we in clude edge ( u, v ) in K with pr obab ility p uv , p uv ∈ P k . First, note that sum of the entries of P 1 , P ij θ ij , can be grea ter than 1. Second , notice that in princi ple i t take s O ( N 2 k 1 ) time to gen erate an inst ance K of a Stochastic K r onecker graph from the probab ility matrix P k . This means th e time to get a realizatio n K is quadrati c in the siz e of P k as one has to ﬂip a coin for each possibl e edg e in the grap h. Later we sho w ho w to g enerate Stochastic Kroneck er gra phs much f aster , in the time linear in the expecte d number of edges in P k . 3 . 3 . 1 P R O BA B I L I T Y O F A N E D G E For the size of graph s we aim to model and generate here taking P 1 (or K 1 ) and then explicitly perfor ming the Kroneck er p roduct of th e initiator matrix is inf easible. The reas on f or this is that P 1 is usual ly dens e, so P k is also dens e and one can not explici tly stor e it in memo ry to directly iterate the Kronec ker product. Howe ver , due to the structure of Kroneck er multiplicat ion one can easil y compute the probab ility of an edge in P k . The probab ility p uv of an edge ( u, v ) occurr ing in k -th Kroneck er po wer P = P k can be calcul ated in O ( k ) time as follo ws: p uv = k − 1 Y i =0 P 1  j u − 1 N i 1 k ( mod N 1 ) + 1 , j v − 1 N i 1 k ( mod N 1 ) + 1  (2) The equ ation imitates recur sive descent into th e matrix P , wher e at e very l eve l i the a ppropriate entry of P 1 is chosen. Since P has N k 1 ro ws and columns it takes O ( k log N 1 ) to ev aluate the equati on. Refer to Figure 6 for the illustrat ion of the recursi ve structure of P . 3.4 Add it ional propertie s of Kronec ker graphs Stochast ic Kronecker graphs with initiat or matrix of size N 1 = 2 were studied by Mahdian and Xu (Mahdi an and Xu, 2007). The auth ors sho wed a phase transi tion for the emerge nce of the gi- ant component and anoth er phas e transit ion for connecti vity , and prove d that such graphs ha ve consta nt diameters be yond the connect ivity threshold, but are not searchabl e using a dece ntralized algori thm (Kleinber g , 1999). General overv iew of Kroneck er product is giv en in (Imrich and Klav ˇ zar , 20 00 ) and propertie s of Kroneck er graphs related t o graph min ors, plan arity , cut v ertex and cu t edg e ha ve been e xplored in (Bottreau and Metivie r , 1998). Moreov er , recentl y (Tsourakakis, 2008) gav e a closed form ex- pressi on for the number of triang les in a Kroneck er graph that depend s on the eigen v alues of the initiat or graph K 1 . 16 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S (a) 2 × 2 Stoch astic (b) Probabili ty matrix (c) Alterna tiv e vie w Kroneck er initia tor P 1 P 2 = P 1 ⊗ P 1 of P 2 = P 1 ⊗ P 1 Figure 6: Stochastic Kroneck er initi ator P 1 and the correspond ing 2 nd Kroneck er po wer P 2 . Notice the recu rsiv e nature of th e Kronecker p roduct, with edge pr obabilities in P 2 simply be ing produ cts o f entr ies of P 1 . 3.5 T wo in terpre tations of Kr onecker graphs Next, we presen t two natura l interpret ations of the generati ve process b ehind th e Kroneck er graph s that go beyo nd the pure ly mathemat ical construct ion of Kronecker graphs as introduc ed so far . W e already mentioned the ﬁrst interpreta tion w h en we ﬁ rs t deﬁned K ro necker graphs. One intuiti on is that netwo rks are hierarch ically organi zed into communities (cluster s). Communi- ties then gro w recursi vely , creating miniature copies of themselv es. Figure 1 depicts the process of the recurs ive communi ty expansi on. In fa ct, se veral resear chers hav e ar gued that real net- works are hi erarchically org anized (Rav asz et al., 20 02 ; Rav asz and Barab ´ asi , 2003) an d algorithms to extra ct the network hierarchica l structu re ha ve also been dev eloped (Sales-Pa rdo et al., 2007; Clauset et al., 2 008). Moreov er , espec ially web gr aphs (Dill et al., 200 2 ; Dorogovt sev e t al. , 2002; Crov ella and Bestav ros , 1997) an d biolog ical networks (Rav asz an d Bara b ´ asi , 200 3 ) were found to be self-simila r and “fract al”. The second intuition comes fro m vie wing e ver y no de of P k as being described w i th an ord ered sequen ce of k no des from P 1 . (This is similar to the O b servati on 1 and the proo f of Theorem 13.) Let’ s label nodes of the initiat or matrix P 1 , u 1 , . . . , u N 1 , and nodes of P k as v 1 , . . . , v N k 1 . Then ev ery node v i of P k is describe d with a sequen ce ( v i (1) , . . . , v i ( k ) ) of node labels of P 1 , where v i ( l ) ∈ { u 1 , . . . , u k } . Simila rly , consider also a second node v j with the label sequen ce ( v j (1) , . . . , v j ( k ) ) . Then the probabili ty p e of an edge ( v i , v j ) in P k is exact ly: p e ( v i , v j ) = P k [ v i , v j ] = k Y l =1 P 1 [ v i ( l ) , v j ( l )] (Note this is exa ctly the Equat ion 2.) No w one can look at the descriptio n sequence of node v i as a k dimens ional vector of attr ibute v alues ( v i (1) , . . . , v i ( k ) ) . Then p e ( v i , v j ) is exactly the coord inate-wis e product of appropria te entries of P 1 , where the node descript ion sequen ce selects which entrie s of P 1 to multiply . T hu s, the P 1 matrix can be th ought of as the attrib ute similarit y matrix, i . e . , it en codes the pr obability of linkin g gi ven th at two no des agree/dis agree on the attrib ute valu e. Then the p robability of an edge 17 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I is simply a p roduct of in dividu al attrib ute similariti es ov er the k N 1 -v alued attrib utes that des cribe each of the two node s. This giv es us a very natural interpre tation of Stochastic Kroneck er graphs: Each node is de- scribe d by a seque nce of cate gorical attrib ute val ues or features. And then the probabilit y of two nodes linking depends on the product of indivi dual att ribute similarities. This way Kronecker graphs can ef fectiv ely model homo phily (n odes with simila r attr ibute value s are more likel y to link ) by P 1 ha ving hig h v alue en tries on the diagonal . Or heteroph ily (nodes tha t dif fer are more lik ely to link) by P 1 ha ving high en tries of f the diagon al. Figure 6 sho ws an exa mple. Let’ s label nodes of P 1 u 1 , u 2 as in Figure 6(a). The n ev ery node of P k is de scribed with an ordered sequenc e of k binary at tribute s. For example, Figure 6( b) sho ws an instance for k = 2 where node v 2 of P 2 is described by ( u 1 , u 2 ) , and similarly v 3 by ( u 2 , u 1 ) . Then as sho wn in Figure 6(b), the probab ility of edg e p e ( v 2 , v 3 ) = b · c , whi ch is exactly P 1 [ u 2 , u 1 ] · P 1 [ u 1 , u 2 ] = b · c — the product of entries of P 1 , where the cor responding elemen ts of the descrip tion of nodes v 2 and v 3 act as select ors of which entries of P 1 to multipl y . Figure 6(c) further illustrates the recursi ve nat ure of Kroneck er graph s. One can see K ro necker produ ct as recursi ve descent into the big adjacenc y matrix where at each stage one of the entries or blocks is chosen. For example, to get to entry ( v 2 , v 3 ) one ﬁrst needs to div e into quadrant b follo wing by the qu adrant c . This in tuition will help us in se ction 3.6 to de vise a fast algorith m for genera ting Kroneck er graph s. Ho wev er , there are also two n otes to make here. F ir st, using a single initiato r P 1 we are impl ic- itly assuming that there is one single and univ ersal attrib ute similarity matrix that holds across all k N 1 -ary attrib utes. One can easily relax this assumption by taking a diffe rent initiat or matrix for each attrib ute (initia tor matrices can e ven be of diff erent sizes as attrib utes are of dif ferent arity), and t hen Kroneck er multiplyin g the m to o btain a l arge network. Here each initiator matrix plays th e role of attrib ute similarity matri x for tha t particula r attr ibute. For simplicity and con veni ence w e will wo rk with a single init iator matrix bu t all our met hods can be trivia lly extend ed to handle multiple initiator matrices. Moreov er , as we will see later in sectio n 6 eve n a single 2 × 2 initiato r matrix seems to be enough to capture lar ge scale statis tical proper ties of real-wo rld netw orks. The second assumption is harder to relax . W h en describ ing e very node v i with a seque nce of attrib ute valu es we are i mplicitly assumin g that the val ues of all a ttribu tes are unifo rmly distr ibuted (ha ve same pro portions), and th at ev ery node has a unique combinati on of attrib ute va lues. So, all possib le combin ations of attrib ute v alues are taken. For example, no de v 1 in a lar ge matrix P k has attrib ute sequence ( u 1 , u 1 , . . . , u 1 ) , and v N 1 has ( u 1 , u 1 , . . . , u 1 , u N 1 ) , while th e “last” node v N k 1 is has a ttribu te v alues ( u N 1 , u N 1 , . . . , u N 1 ) . One can thi nk of this as co unting in N 1 -ary number sys- tem, where node attr ibute descriptio ns range from 0 ( i.e. , “left m o st” nod e with att ribute descriptio n ( u 1 , u 1 , . . . , u 1 ) ) to N k 1 ( i.e. , “rig htmost” node attr ibute descriptio n ( u N 1 , u N 1 , . . . , u N 1 ) ). A simple way to relax the abo ve assumptio n is to take a lar ger initiator matrix with a smaller number of paramete rs tha n the number of entrie s. This means that multiple entries of P 1 will share the s ame v alue (paramete r). For examp le, if attrib ute u 1 tak es one value 66% of the times, and the other v alue 33% of the times, the n one can model this by taking a 3 × 3 initi ator m a trix with only four parameters. Adop ting the naming con v ention of F ig ure 6 this means that parameter a now occup ies a 2 × 2 block, which then also makes b and c occup y 2 × 1 and 1 × 2 blocks, and d a s ingle cell. This way one gets a four parameter model with une ven feature value dis tributi on. 18 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S W e note that the view of Kronec ker gra phs where e very node is descri bed with a set of features and the initiator matrix encod es the proba bility of linking gi ven the attrib ute valu es of two nodes some what resemble s the Random dot product graph mo del (Y ou ng a nd Sche inerman , 2007; Nick el , 2008). The important diff erence here is that we multiply indi vidual linking probabiliti es, while in Random dot product graphs one takes the sum of indi vidual probabi lities which seems somewhat less natura l. 3.6 Fast gener ation of Stochas tic Kr onecker graphs The intuiti on for fast genera tion of Stochastic Kroneck er graphs comes from the recursi ve nature of the K r onecker product and is closely relate d to the R-MA T graph generato r (Chakrabart i et al., 2004). Generating a S t ochastic Kroneck er graph K on N node s nai vely takes O ( N 2 ) time. Here we present a linear time O ( E ) algori thm, where E is the (expect ed) number of edges in K . Figure 6(c) shows the recursi ve nature of the Krone cker pr oduct. T o “arri ve” to a par ticular edg e ( v i , v j ) of P k one has to mak e a sequenc e of k (in our case k = 2 ) decisions among the entries of P 1 , multiply the chosen entries of P 1 , and then placing the edge ( v i , v j ) with the obtain ed probab ility . Instea d of ﬂ ip ping O ( N 2 ) = O ( N 2 k 1 ) biased coins to determin e the edges, we can place E edges by directly simulatin g the recurs ion of the Kronecker p roduct. Basically we recursi vely c hoose sub- reg ions of matrix K with probab ility propo rtional to θ ij , θ ij ∈ P 1 until in k steps we descend to a single cell of the big adja cency matrix K and p lace an edge. For example, for ( v 2 , v 3 ) in Figu re 6(c) we ﬁrst ha ve to choose b follo wing by c . The probability of each indi vidual edge of P k follo ws a Bernoulli distrib ution, as the edge occurr ences are independen t. By the Central Limit T h eorem (Petrov, 1995) the number of edges in P k tends to a normal distrib ution with mean ( P N 1 i,j =1 θ ij ) k = E k 1 , where θ ij ∈ P 1 . S o , giv en a stochastic initiator matrix P 1 we ﬁrst sample the expe cted number of edges E in P k . Then we place E edges in a grap h K , by apply ing the recursi ve descen t for k steps where at each step we choos e entry ( i, j ) with probab ility θ ij /E 1 where E 1 = P ij θ ij . Since w e add E = E k 1 edges, the probab ility that edge ( v i , v j ) appear s in K is exact ly P k [ v i , v j ] . This basically means that in Stochast ic Kroneck er graph s the ini tiator matrix encodes bo th the total n umber of edges in a graph and their struc ture. P θ ij encod es the number of edges i n the graph, while the propor tions (ratios) of v alues θ ij deﬁne ho w m a ny edges each part of graph adjacenc y matrix will contai n. In practice it can happen that more than one edge lands in the same ( v i , v j ) entry of big adja- cenc y matrix K . If an edge lands in a already oc cupied cell we ins ert it a gain. Even though values of P 1 are usually sk ewed, adjacenc y matrices o f real networks are so sparse that this is not really a proble m in practic e. Empirically we note that around 1% of edges collide. 3.7 Observa tions and connectio n s Next, w e d escribe sev eral observ ations ab out the p roperties of Kronecker graphs and mak e connec- tions to other network mode ls. • Bipartite gr aphs: Kroneck er graphs can naturally mode l bipartite grap hs. Instead of startin g with a squar e N 1 × N 1 initiat or matrix, one can choose arbitrary N 1 × M 1 initiat or matrix, where rows deﬁne “left”, and columns the “right” side of the bipartite graph. Kroneck er multiplic ation will then generate biparti te graphs with partitio n sizes N k 1 and M k 1 . 19 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I • Graph distr ibutio ns: P k deﬁnes a distrib ution ov er all graph s, as it enco des the prob ability of al l pos sible N 2 k 1 edges a ppeari ng in a graph by using an expone ntially smaller nu m b er of paramete rs (just N 2 1 ). As we will later see, e ven a very small number of parameters, e.g . , 4 ( 2 × 2 initiator matrix) or 9 ( 3 × 3 initiator), is enou gh to accurately model the structure of lar ge ne twork s. • Extension of Er d ˝ os-R ´ enyi random graph m o del: Stochastic Kroneck er graphs represent an ext ension of Erd ˝ os-R ´ enyi (Erd ˝ os and R ´ en yi, 1960) random graphs. If one takes P 1 = [ θ ij ] , where e very θ ij = p then we o btain exac tly the Erd ˝ os-R ´ eny i model of random gra phs G n,p , where e very edge appears independ ently with probabili ty p . • Relation to the R-MAT model: The recursi ve nature of Stochast ic Kronecker graphs makes them relate d to the R-MA T ge nerato r (Chakrabarti et al., 2004). The d ifferen ce between the two m o dels is that in R-MA T one needs to separately specify the number of edges, while in Stochastic Kronecke r graphs initiato r m a trix P 1 also encodes the number of edges in the graph. Section 3.6 built on this similarity to de vise a fast algorithm for generating Stoch astic Kroneck er graph s. • Densiﬁcation : Similarly as with determini stic Kronecke r graphs the number of nodes in a S t ochastic Kronecker graph grows as N k 1 , and the expec ted number of edges gro w s as ( P ij θ ij ) k . This means one would want to choose v alues θ ij of th e initiator matrix P 1 so th at P ij θ ij > N 1 in order for the resul ting netwo rk to den sify . 4. Simulations of Kroneck er graphs Next we perfor m a set of simula tion exp eriments to demonstra te the ability of Kronec ker gra phs to match the patterns of real-world networks . W e will t ackle th e pro blem of estimatin g the Kroneck er graph model fr om real data, i . e . , ﬁndin g the most likely initiator P 1 , in t he ne xt secti on. Instead he re we pres ent simulatio n experi m e nts using Kroneck er graphs to explor e the para m e ter spac e, and to compare propert ies of Kroneck er graph s to thos e found in lar ge r eal networ ks. 4.1 Comparison to real graphs W e observ e two kinds of graph patterns — “static” and “temporal. ” As mentione d earlier , com- mon static patterns includ e degree distrib ution, scree plot (eigen v alues of graph adjac ency matrix vs. rank) and distrib ution of components of the princip al eigen v ector of graph adjacenc y matrix. T empo ral patt erns include the diameter over time, and the den siﬁcation po w e r law . For the diamete r computa tion, we us e the effecti ve diameter as deﬁned in Section 2. For the purpos e of this sectio n consider the follo wing setting. Giv en a real grap h G we want to ﬁnd K r onecker initiat or that produces qualitati vely similar graph. In principle one could try choos ing each of th e N 2 1 paramete rs for the matrix P 1 separa tely . Howe ver , we redu ce the number of parameters from N 2 1 to just tw o: α and β . Let K 1 be the initia tor matrix ( binary , deter m in istic). Then we create the correspon ding stochastic initiator matrix P 1 by replacing each “1” and “0” of K 1 with α and β respe cti vely ( β ≤ α ). The r esulting probabilit y matrices maintain — with s ome random no ise — the self-similar structu re of the Kro necker graphs in the pre vious sect ion (which , for clarity , we call deterministi c Kr onec ker graphs ). W e defer the discussio n of how to automatica lly estimate P 1 from data G to the nex t sec tion. 20 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S The datase ts we use here are: • C I T - H E P - T H : This is a citatio n graph for Hig h-Energy Physi cs Theory resea rch papers from pre-pr int archi ve ArXiv , with a total of N = 29,555 papers and E = 352,807 citation s (Gehrk e et a l., 2003). W e follo w its ev olutio n from January 1993 to April 2003, w i th one data-point per month. • A S - R O U T E V I E W S : W e also analyz e a static dataset cons isting of a single snapshot of con- necti vity among Internet Autonomous Systems (Route V iews, 1997) fro m Ja nuary 2 000, with N = 6,474 and E = 26,467. Results are sho wn in Figure 7 for the C I T - H E P - T H graph which ev olv es ov er time. W e sho w the plots of two stati c and two temporal patterns. W e see that the deterministic Kroneck er model alread y to so me degree captures the qualitat ive structure of the deg ree and eige n value distrib utions, as well as the temporal patte rns represent ed by the densiﬁcatio n power la w and the stabiliz ing di- ameter . Howe ver , the deterministic nature of this model results in “staircase” beha vior , as shown in scree plot for the deter m in istic Kronecker graph of Figure 7 (column (b), second row). W e see that the Stocha stic Kronecke r graphs smooth out these distrib utions, further matchi ng the qualita- ti ve structure of the real data, and they also match the shrinkin g-before-stabilization trend of the diameter s of re al gra phs. Similarly , Figu re 8 shows plots for the static patterns in the Auton omous systems (A S - R O U T E V I E W S ) graph. Recall that we analy ze a si ngle, static network snapshot in this case. In addition to the de gree distrib ution and scree plot, w e a lso sho w two typical plots (Chakr abarti et al., 200 4 ): the distrib u- tion of networ k values (principal eigen v ector componen ts, sor ted, vers us rank) and the hop-plot (the number of r eachable pairs g ( h ) within h ho ps or less, a s a functi on of t he numbe r of hops h ). Notice that, again, the Stochast ic Kroneck er gr aph match es well the prop erties of the real grap h. 4.2 Parameter space of Kro n ec ker graphs Last we present simulatio n exp eriments that in vesti gate th e paramete r spa ce of Stochastic Kronecker graphs . First, in Figure 9 we sho w the abilit y of Kroneck er Graphs to generate n etworks with increasing, consta nt and decreasi ng/stabilizing ef fectiv e diameter . W e star t with a 4-node ch ain initiator g raph (sho wn in top ro w of Figure 3), settin g eac h “1 ” of K 1 to α and eac h “0 ” to β = 0 to obta in P 1 that we then use to generate a gro wing seq uence of graph s. W e plot t he ef fecti ve diameter of each R ( P k ) as we generate a sequenc e of gro wing graphs R ( P 2 ) , R ( P 3 ) , . . . , R ( P 10 ) . R ( P 10 ) has exactly 1,048,57 6 nodes . Notice Stochastic Kroneck er gra phs is a very ﬂe xible model. When the gener ated graph is very sparse (lo w v alue of α ) we obtain graphs with slo w ly increasin g ef fectiv e diameter (Figure 9(a)). For intermediate v alues of α we get gr aphs with constant di ameter (Figure 9(b)) a nd that in our case also s lowly densify with densiﬁcation e xponent a = 1 . 05 . L as t, we see an exa m p le of a grap h with shrinkin g/stabilizing effec tive diameter . Here we set th e α = 0 . 54 which resul ts in a densiﬁcation ex ponent of a = 1 . 2 . Note that these obse rvation s are not c ontradictin g Theorem 1 1 . Actually , th ese s imulations he re ag ree well with the an alysis of (Mah dian and Xu, 200 7 ). Next, we e xamine th e parameter space of a Stoch astic Kroneck er graphs where we choose a star on 4 nod es as a initiato r graph and parameter ized with α and β as before . The initiator graph and the structure of the co rresponding (determin istic) Kronecke r graph a djacency mat rix is sho wn in top ro w o f F ig ure 3 . 21 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I Real graph 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Degree Count 10 0 10 1 10 2 10 1 10 2 10 3 Rank Eigenvalue 0 0.5 1 1.5 2 2.5 3 x 10 4 0 2 4 6 8 10 Nodes Effecitve diameter 10 2 10 3 10 4 10 5 10 2 10 3 10 4 10 5 10 6 Nodes Edges Kronecker Deterministic 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 Degree Count 10 0 10 1 10 2 10 1 10 2 10 3 10 4 Rank Eigenvalue 0 2 4 6 8 x 10 4 0 0.5 1 1.5 2 Nodes Effecitve diameter 10 0 10 5 10 0 10 2 10 4 10 6 10 8 Nodes Edges Kronecker Stochastic 10 0 10 1 10 2 10 0 10 1 10 2 10 3 10 4 10 5 Degree Count 10 0 10 1 10 2 10 1 Rank Eigenvalue 0 1 2 3 4 5 6 x 10 4 0 2 4 6 8 10 Nodes Effecitve diameter 10 0 10 5 10 0 10 2 10 4 10 6 Nodes Edges (a) Degree (b) Scree plot (c) Diameter (d) DPL distrib ution ov er time Figure 7: Citation networ k ( C I T - H E P - T H ): Patterns f rom the r eal graph (t op ro w ) , th e determin istic Kroneck er graph with K 1 being a star graph on 4 nodes (center + 3 satelli tes) (middle ro w), and the Stochastic K ro necker graph ( α = 0 . 41 , β = 0 . 11 – bottom ro w). Static pattern s: (a) is the PDF of degrees in the grap h (lo g-log scal e), and (b) the distrib ution of eigen v alues (log- log sca le). T emporal patte rns: (c) giv es the effecti ve diameter over ti m e (linear -linea r scale), and (d) is the number of edges versus number of nodes ov er time (log-lo g sc ale). Notice th at the Stochas tic Kronecker gra phs qualitat ivel y match es all the pattern s very well. Figure 10(a) sh ows the sh arp transition in the fra ction of the number of no des that belong to the lar gest weakly con nected compone nt as we ﬁx β = 0 . 15 and slo wly increase α . S u ch phase tran- sition s on the size of the larges t connecte d component also occur in Erd ˝ os-R ´ enyi random graphs. Figure 10(b) fu rther ex plores thi s by plotting the frac tion of nodes in the larg est connec ted compo- nent ( N c / N ) over the full parameter space. Notice a sharp transi tion between discon nected (white area) and conn ected gr aphs (d ark). Last, Figure 10(c) sho ws the ef fecti ve diameter o ver th e parameter space ( α, β ) for the 4-node star initiato r graph. Notice that when paramete r val ues are small, the effec tive diameter is small, since the graph is disconnecte d and not many pairs of nodes can be reached. The shape of the transit ion between lo w-high dia meter closely follo ws the shape of the emer gence of the connected compone nt. S imil arly , when parameter va lues are lar ge, the g raph is v ery dense, and the diameter i s small. There is a narro w band in paramet er space where we get graph s with interes ting diameter s. 22 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S Real graph 10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3 10 4 Degree Count 10 0 10 1 10 2 10 0 10 1 Rank Eigenvalue 10 0 10 1 10 2 10 3 10 −2 10 −1 10 0 Rank Network value 0 2 4 6 8 10 4 10 5 10 6 10 7 10 8 Hops Neighborhood size Kronecker Stochastic 10 0 10 1 10 2 10 3 10 0 10 1 10 2 10 3 Degree Count 10 0 10 1 10 2 10 0 10 1 Rank Eigenvalue 10 0 10 1 10 2 10 3 10 −2 10 −1 10 0 Rank Network value 0 2 4 6 8 10 4 10 5 10 6 10 7 Hops Neighborhood size (a) Deg ree (b) Scree plot (c) “Network v alue” (d) “Hop-plo t” distrib ution distrib ution Figure 8: Auton omous syst ems ( A S - R O U T E V I E W S ): Real (top) versus Kroneck er (bot tom). Columns (a) and (b) show the d egree d istribu tion and the scree plot, as b efore. Columns (c) and (d) sho w two more static pattern s (see text). Notice that, again, the Stochastic Kroneck er graph match es well the properties of the real graph. 2 4 6 8 10 0 50 100 150 200 250 log(Nodes) Diameter 2 4 6 8 10 4 6 8 10 12 14 16 18 log(Nodes) Diameter 2 4 6 8 10 4.5 5 5.5 6 6.5 log(Nodes) Diameter (a) Incre asing diameter (b ) C o nstant diameter (c) Decreasin g diameter α = 0 . 38 , β = 0 α = 0 . 43 , β = 0 α = 0 . 54 , β = 0 Figure 9: Effect ive diameter ov er time for a 4-node chain initi ator graph. After each consecuti ve Kroneck er power we measure the effe ctiv e diameter . W e use dif ferent setti ngs of α pa- rameter . α = 0 . 38 , 0 . 43 , 0 . 54 and β = 0 , respe ctiv ely . 5. Kroneck er graph model estimation In pre vious section s we in vesti gated v arious properties of networks generate d by the (Stochasti c) Kroneck er gra phs mode l. Man y of these propertie s were also obs erved in real networks . Moreov er , we also ga ve close d form exp ressions (parametr ic forms) for va lues of these statistical network proper ties, whi ch allo w us to calc ulate a p roperty ( e.g . , dia m e ter , eigen v alue spectrum) of a net work directl y from just t he initiato r m a trix. So in p rinciple, on e could in ve rt these equations and directly get from a prope rty ( e.g . , shape of d egree distrib ution) to the v alues of initiator matrix. 23 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I 0 0.2 0.4 0.6 0.8 0 0.2 0.4 0.6 0.8 1 Parameter α Component size, Nc/N α β 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 Nc/N=1 Nc/N=0 α β 0.2 0.4 0.6 0.8 0 0.1 0.2 0.3 0.4 0.5 10 20 30 40 50 Small effective diameter Small effective diameter (a) Larg est c omponent siz e (b) Largest compone nt size (c) Effe ctiv e dia m e ter Figure 10: Fraction of node s in the la rgest weakly c onnec ted compo nent ( N c / N ) and the ef fecti ve diameter for 4-star ini tiator graph. (a) W e ﬁx β = 0 . 15 and vary α . (b) W e v ary both α and β . (c) Ef fectiv e diameter of the network, if network is disconn ected or very dense path lengt hs are short, the diameter is large when the network is barel y conne cted. Ho wev er , in pre vious sectio ns we di d not say an ything about ho w va rious network prope rties of a K ro necker graph correlate and interdep end. For exa m p le, it could be the case that two network proper ties are mutually ex clusiv e. For instan ce, perha ps only could only match the network diameter b ut n ot th e de gree distrib ution or vice ve rsa. Howe ver , as we sho w later th is is n ot th e cas e. No w we turn our attention to automatica lly estimating the Kronecke r initiator graph. The setting is th at we are gi ven a rea l network G and w ould like to ﬁ n d a Stochastic Kronec ker initiator P 1 that produ ces a synthe tic Kroneck er g raph K that is “simila r” to G . One way t o measure similarity is to compare statisti cal networ k prope rties, like dia m e ter and de gree distrib ution, of graphs G and K . Comparing statistical properties already suggests a very direct approach to this probl em: One could ﬁrst iden tify the set of network p roperties (statist ics) to match, then deﬁne a qu ality of ﬁt m e t- ric and someho w optimize over it. Fo r example, one could use the KL div erge nce (Ku llback and L ei bler , 1951), or the sum of sq uared dif ference s between the degre e distri bution of the real network G and its synt hetic counterp art K . Moreo ver , as we are intere sted in matching se veral such statisti cs be- tween the ne tworks one would ha ve to meani ngfully combine the se indi vidual error metrics into a global error metric. So, one would hav e to specify w ha t kind of properties he or she cares about and then combine them ac cordingly . This would be a hard ta sk as the p atterns of inter est ha ve very dif ferent mag nitude s and scales. Moreo ver , as ne w n etwork patterns a re disco vered, the error fu nc- tions would ha ve to be chan ged and model s re-estimate d. And eve n then it i s not c lear ho w to deﬁne the optimizat ion proced ure to maximize the quality of ﬁ t and how to perfo rm optimiza tion ov er the paramete r space. Our approach here is diff erent. Inste ad of committing to a set of network properties ahead of time, we try to directly match the adjacenc y matrices of the real networ k G and its synthetic counte rpart K . The idea is that if the adjace ncy matrices are similar then the global statistica l proper ties (statis tics compute d ov er K and G ) will al so matc h. Moreov er , by dir ectly wo rking with the graph itself (and not summary statistics), we do not commit to any particul ar set of network statist ics (networ k properties/ patterns) and a s new statistical properti es of netw orks are disco vered our models and estimate d paramet ers will s till h old. 24 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 5.1 Preliminari es Stochast ic graph models indu ce probabi lity distrib ution s ove r graphs. A gen erativ e mode l assigns a probab ility P ( G ) to e very g raph G . P ( G ) is the lik elihood that a gi ven model (with a gi ven set of paramete rs) gene rates the gra ph G . W e concent rate on the Stochas tic Kronec ker graphs model, and consid er ﬁtting it to a real graph G , our data. W e use the maximum likeliho od approach, i.e. , we aim to ﬁnd p arameter v alues, the init iator P 1 , that maximize P ( G ) under the Stocha stic Kroneck er graph model. This prese nts se veral challenge s: • Model s election: a graph is a single str ucture, and not a set of items drawn indepe ndently and identi cally-d istributed (i.i.d.) from some dis tributi on. So one c annot spli t it into in dependent trainin g and tes t sets. The ﬁtted parame ters w il l thus be best to ge nerate a particula r instance of a graph. Also , ov erﬁtting could be an issue since a more complex m o del generally ﬁ ts better . • Node corr espondence: The second challenge is the node correspo ndence or node labeling proble m . The graph G has a set of N nod es, and each node has a unique label (index, ID). Labels do not carry an y parti cular meanin g, they just uniqu ely denote or identi fy the nodes. One can think of this as the graph is ﬁrst genera ted a nd then the labels (node IDs) are randomly assign ed. This mea ns th at tw o iso morphic gra phs that ha ve dif ferent n ode lab els shou ld ha ve the same likeliho od. A permutation σ is sufﬁcien t to describe the node correspond ences as it maps labels (IDs) to nodes of the graph. T o compute the likelihoo d P ( G ) one has to consid er all node correspo ndences P ( G ) = P σ P ( G | σ ) P ( σ ) , where the sum is ov er all N ! permutat ions σ of N nodes. Calculating this super -exp onential sum explic itly is infeasib le for an y graph with more than a handful of no des. Intuiti vely , one can think of this summation as some kind of graph isomorphi sm test where we are searching for best correspond ence (mapping ) between nodes of G and P . • Likelihood estimatio n : Even if we assume one can efﬁcie ntly so lve the node corresponde nce proble m , calculating P ( G | σ ) nai vely takes O ( N 2 ) as one has to ev aluate the probabili ty of each of the N 2 possib le edges in the graph adjace ncy matrix. Again, for graphs of size we want to model here , appr oaches with quad ratic comple xity are in feasible. T o dev elop our soluti on we use sampling to av oid the super- exponent ial sum over the node corres ponde nces. By exp loiting the stru cture o f the Kro necker matrix multiplicat ion we de velop an algori thm to e valu ate P ( G | σ ) in linear time O ( E ) . Since real gr aphs are spar se , i .e. , the number of edges is roughly of the same or der as the number of nod es, this makes ﬁtting of Kronec ker graphs to lar ge n etworks feasible. 5.2 Problem f ormulation Suppose we are gi ven a graph G o n N = N k 1 nodes (for s ome po sitiv e integ er k ), and an N 1 × N 1 Stochast ic Kroneck er grap hs initiato r matrix P 1 . Here P 1 is a parameter matrix, a se t of parameters that we a im to estimate. For now al so assume N 1 , th e size of the initiato r matrix , is g ive n. Later we will sho w how t o automatically selec t it. Nex t, using P 1 we crea te a Stochastic Kronecke r graphs probab ility matrix P k , where e very entry p uv of P k contai ns a probab ility tha t nod e u lin ks to nod e 25 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I v . W e then e v aluate the prob ability tha t G is a realization of P k . The task is to ﬁnd su ch P 1 that has the highes t probab ility of realizi ng (genera ting) G . Formally , we are solving: arg max P 1 P ( G |P 1 ) (3) T o keep the notation simpler we use standard symbol Θ to denote the parameter matrix P 1 that we are trying to estimate. W e denote entries of Θ = P 1 = [ θ ij ] , and similarly we denote P = P k = [ p ij ] . Note that her e we s lightly simpliﬁed the nota tion: we use Θ to re fer t o P 1 , and θ ij are elements of Θ . Similarly , p ij are elements o f P ( ≡ P k ). Moreov er , we denote K = R ( P ) , i.e . , K is a realization of the S to chastic Kronecker graph sampled from probabilis tic adjacen cy matrix P . As n oted before, the nod e IDs are assi gned arbitrarily and the y carry no s igniﬁcant information , which means that we ha ve to consid er all the mappings of nodes from G to rows and columns of stocha stic adjacenc y matrix P . A prio ri all labe lings are equal ly likely . A permutation σ of the set { 1 , . . . , N } deﬁnes this map ping of nodes fr om G to stoch astic adjacenc y matrix P . T o e v aluate the like lihood of G one needs to consider all possi ble mapping s of N nodes of G t o ro ws (columns) of P . For con ven ience we wo rk with lo g-likeli hood l (Θ) , an d so lve ˆ Θ = arg max Θ l (Θ) , where l (Θ) is deﬁned as: l (Θ) = lo g P ( G | Θ) = log X σ P ( G | Θ , σ ) P ( σ | Θ) = log X σ P ( G | Θ , σ ) P ( σ ) (4) The lik elihood that a gi ven initiator matrix Θ and permutation σ gav e rise to the real graph G , P ( G | Θ , σ ) , is cal culated naturally as follo ws. First, by usi ng Θ w e create the Stochas tic Kro necker graph adja cency matrix P = P k = Θ [ k ] . P e rmutation σ deﬁnes the map ping of nodes of G to the ro ws and columns of stochastic a djacency matrix P . (See Figure 11 for the illustrat ion.) W e then model edges as ind ependent Berno ulli random varia bles pa rameterized by th e parame- ter matrix Θ . So, each entry p uv of P giv es exactly the prob ability of edge ( u, v ) appearin g. W e then deﬁne the likelih ood: P ( G |P , σ ) = Y ( u,v ) ∈ G P [ σ u , σ v ] Y ( u,v ) / ∈ G (1 − P [ σ u , σ v ]) , (5) where we denote σ i as the i th element of the p ermutation σ , and P [ i, j ] is the element at row i , and column j of matrix P = Θ [ k ] . The lik elihood is deﬁned v ery naturally . W e tra verse the entrie s of adjacenc y matrix G and then based o n whether a particular edg e appeared in G or not we tak e the pro bability of edge o ccurring (or not) as giv en by P , and multiply these probabiliti es. As one has to touch all the entrie s of the stocha stic adjace ncy matrix P e valua ting Equation 5 tak es O ( N 2 ) time. W e furth er illustrate the process of estimating Stochastic Kroneck er initiator m a trix Θ in Fig- ure 11. W e sea rch ove r initiat or matrices Θ to ﬁnd the one that maximizes the likel ihood P ( G | Θ) . T o estimate P ( G | Θ) we are giv en a c oncrete Θ a nd now we use Kronec ker multiplic ation to create probab ilistic adjacenc y m a trix Θ [ k ] that is of same size as real network G . Now , w e e valua te the 26 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S Figure 11: Kronecker parameter estimatio n as an optimization problem. W e sear ch over the ini- tiator matrice s Θ ( ≡ P 1 ). Using Kroneck er multiplica tion we create prob abilistic ad- jacenc y m a trix Θ [ k ] that is of same siz e as real netw ork G . Now , w e ev aluate the like - lihood by simulta neously tra versing and mult iplying entrie s of G and Θ [ k ] (see Eq. 5). As sho wn by the ﬁgure permutation σ plays an important role, as permuting rows and columns of G could make it loo k more similar to Θ [ k ] and thus increa se the lik elihood. like lihood by tra versing the correspondi ng entries of G and Θ [ k ] . Equation 5 basically trav erses the adjacen cy matrix of G , and m a ps e very entry ( u, v ) of G to a corresp onding entry ( σ u , σ v ) of P . Then in case that edge ( u, v ) exis ts in G ( i.e. , G [ u, v ] = 1 ) the likelih ood that partic ular edge exi sting is P [ σ u , σ v ] , and similarly , in case the edge ( u, v ) does not ex ist the likeli hood is simply 1 − P [ σ u , σ v ] . This also demonstrates the importan ce of permutation σ , as permuting rows and columns o f G could mak e the adj acency matrix look mor e “similar” to Θ [ k ] , an d would increa se th e like lihood. So f ar we sho wed ho w to a ssess the quali ty (li kelihood ) of a particular Θ . So, nai vely on e could perfor m some kind of grid sea rch to ﬁnd best Θ . Howe ver , this is very i nef ﬁcient. A better way of doing i t is to compute th e gradient o f the l og-likeli hood ∂ ∂ Θ l (Θ) , and t hen use t he grad ient to upda te the current esti m a te of Θ and mov e towa rds a solution of higher likeli hood. Algorithm 1 g ive s an outlin e of the optimizati on pr ocedure. Ho wev er , there are sev eral dif ﬁculties with this algorithm. First, we are assuming gradient descen t typ e optimizatio n will ﬁnd a goo d solution, i.e. , the proble m do es not ha ve (too many) l ocal minima. S e cond, we are summing o ver expon entially many permutations in equation 4. Third, the e valua tion of equatio n 5 as it is written n ow take s O ( N 2 ) time and needs to be ev aluated N ! times. So, gi ven a concrete Θ just nai vely calcu lating the lik elihood takes O ( N ! N 2 ) time, and then one also has to optimize ov er Θ . Observ ation 2 The comple xity of calculat ing the likelihoo d P ( G | Θ) of the graph G naively is O ( N ! N 2 ) , wher e N is the number of nodes i n G . Next, we sho w that all this can be done in linear time . 27 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I input : size of parameter matrix N 1 , graph G on N = N k 1 nodes , and learning rate λ output : MLE paramet ers ˆ Θ ( N 1 × N 1 probab ility matrix) initial ize ˆ Θ 1 1 while not con ver g ed do 2 e valua te gradi ent: ∂ ∂ ˆ Θ t l ( ˆ Θ t ) 3 update paramete r estimate s: ˆ Θ t +1 = ˆ Θ t + λ ∂ ∂ ˆ Θ t l ( ˆ Θ t ) 4 end 5 r eturn ˆ Θ = ˆ Θ t 6 Algorithm 1 : K R O N F I T algor ithm. 5.3 Su m ming over the node labelings T o maximize equation 3 using algorit hm 1 w e need to obtain the gradient of the log-lik elihood ∂ ∂ Θ l (Θ) . W e can write: ∂ ∂ Θ l (Θ) = P σ ∂ ∂ Θ P ( G | σ , Θ ) P ( σ ) P σ ′ P ( G | σ ′ , Θ) P ( σ ′ ) = P σ ∂ log P ( G | σ, Θ) ∂ Θ P ( G | σ , Θ ) P ( σ ) P ( G | Θ) = X σ ∂ log P ( G | σ, Θ) ∂ Θ P ( σ | G, Θ) (6) Note we are still summing ov er all N ! permutation s σ , so calcula ting Eq. 6 is computa tionally in - tractab le for graph s with m o re than a handful of nodes. Howe ver , the equati on has a nice form which allo ws f or use of simulatio n techniqu es to a v oid the summation ov er s uper-e xponent ially many node corres ponde nces. Thus, w e simula te draws from the permutati on distr ibutio n P ( σ | G, Θ) , and then e valua te the quantities at the sa m p led permut ations to obtain the exp ected valu es of log-lik elihood and gradie nt. Algorithm 2 giv es the details. Note th at we can a lso permute th e rows and colu m n s of the pa rameter matrix Θ to obtain eq uiv- alent es timates. Therefore Θ is n ot strictly iden tiﬁable exact ly because of these permutat ions. Since the spac e of pe rmutations on N nodes is v ery lar ge (gro ws as N ! ) the permuta tion samplin g algo- rithm will explor e onl y a small fractio n of the spac e of all permutat ions and may con v erge to one of the g lobal maxima (b ut may not explore all N 1 ! of them) of th e parameter space . As we empiri cally sho w later our results are not sensiti ve to this and multiple restarts result in equi vale nt (b ut often permuted ) paramete r estimates. 5 . 3 . 1 S A M P L I N G P E R M U TA T I O N S Next, w e describ e the Metropolis algorithm to simulate draws from the permutation distrib ution P ( σ | G, Θ) , which is giv en by P ( σ | G, Θ) = P ( σ , G, Θ) P τ P ( τ , G, Θ) = P ( σ , G, Θ) Z 28 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S input : Parameter matrix Θ , and graph G output : Log-lik elihood l (Θ) , and gradient ∂ ∂ Θ l (Θ) f or t := 1 to T do 1 σ t := Sample Permutation ( G, Θ ) 2 l t = log P ( G | σ ( t ) , Θ) 3 g r ad t := ∂ ∂ Θ log P ( G | σ ( t ) , Θ) 4 end 5 r eturn l (Θ) = 1 T P t l t , and ∂ ∂ Θ l (Θ) = 1 T P t g r ad t 6 Algorithm 2 : C a lculating log-lik elihood and gradien t where Z is the norma lizing cons tant that is hard to compute since it in v olves the sum ov er N ! elements . Howe ver , if we compute the lik elihood ratio between p ermutations σ and σ ′ (Equatio n 7) the normalizi ng constan ts nicely cancel out: P ( σ ′ | G, Θ) P ( σ | G, Θ) = Y ( u,v ) ∈ G P [ σ u , σ v ] P [ σ ′ u , σ ′ v ] Y ( u,v ) / ∈ G (1 − P [ σ u , σ v ]) (1 − P [ σ ′ u , σ ′ v ]) (7) = Y ( u,v ) ∈ G ( σ u ,σ v ) 6 =( σ ′ u ,σ ′ v ) P [ σ u , σ v ] P [ σ ′ u , σ ′ v ] Y ( u,v ) / ∈ G ( σ u ,σ v ) 6 =( σ ′ u ,σ ′ v ) (1 − P [ σ u , σ v ]) (1 − P [ σ ′ u , σ ′ v ]) (8) This immediately suggests the use of a Metropolis sampling algorith m (Gamerman, 1997) to simulate draws from the permutation distrib ution since Metropolis is solely based on such ratios (where normalizing constants cancel out) . In particular , suppos e that in the Metropolis algorit hm (Algorith m 3) we consider a mov e from permutat ion σ to a ne w permutation σ ′ . Proba bility of accept ing the mov e to σ ′ is gi ven by Equation 7 (if P ( σ ′ | G, Θ) P ( σ | G, Θ) ≤ 1 ) or 1 ot herwise. No w we ha ve to de vise a way to sample permutatio ns σ from the propos al distrib ution. One way to do this woul d be to simply gener ate a r andom permutation σ ′ and the n check the a ccepta nce condit ion. This would be very inefﬁcie nt as we exp ect the distrib ution P ( σ | G, Θ) to be hea vily ske wed, i.e. , there will be a re lativ ely small numbe r of go od permut ations (node mappings ). Even more so a s the degree distrib ution s in real networks are ske wed there will be man y ba d permuta tions with lo w likeliho od, and few go od ones tha t do a good job in matching nodes of high degree. T o make the sampling process “smoothe r”, i.e . , sample permutation s that are not that diff er- ent (and thus are not rand omly jumping across the permuta tion space) we design a Marko v chain. The idea is to stay in high likelih ood part of the permutation space longer . W e do this by making samples depe ndent, i.e. , gi ven σ ′ we wan t to generate next candi date permutat ion σ ′′ to then e val- uate the likelih ood ratio. When designi ng the Markov chain step one has to be careful so that the propo sal distrib ution satisﬁes the detaile d balance condition: π ( σ ′ ) P ( σ ′ | σ ′′ ) = π ( σ ′′ ) P ( σ ′′ | σ ′ ) , where P ( σ ′ | σ ′′ ) is the transi tion proba bility of obtaining permutat ion σ ′ from σ ′′ and, π ( σ ′ ) is the station ary distrib ution . In Algorithm 3 we use a s imple prop osal where giv en p ermutation σ ′ we generat e σ ′′ by swap- ping elements at two uniformly at random chosen positi ons of σ ′ . W e refer to this propo sal as SwapN odes . While this is simple a nd clearly satis ﬁ e s the detaile d balance conditi on it is also in- ef ﬁcient in a way that most of the times lo w de gree no des will get swapped (a direct conseque nce of 29 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I input : K ro necker initiator matrix Θ and a graph G on N nodes output : Permutation σ ( i ) ∼ P ( σ | G, Θ) σ (0) := (1 , . . . , N ) 1 i = 1 2 r epeat 3 Draw j and k u niformly from (1 , . . . , N ) 4 σ ( i ) := SwapNo des ( σ ( i − 1) , j , k ) 5 Draw u fro m U (0 , 1) 6 if u > P ( σ ( i ) | G, Θ) P ( σ ( i − 1) | G, Θ) then 7 σ ( i ) := σ ( i − 1) 8 end 9 i = i + 1 10 until σ ( i ) ∼ P ( σ | G, Θ) 11 r eturn σ ( i ) 12 Where U (0 , 1) is a u niform dis tribut ion on [0 , 1] , and σ ′ := Swa pNodes( σ, j, k ) is the 13 permutat ion σ ′ obtain ed from σ by swap ping element s at posit ions j and k . Algorithm 3 : Sampl ePermutat i o n ( G , Θ ) : Metropo lis sampling of the node permuta- tion. hea vy ta iled de gree distrib utions). T h is has tw o consequenc es, (a) we will slo wly c on ver ge to good permutat ions (ac curate map pings of high d egree node s), and (b) onc e w e reach a go od permutation , ver y few permutations will get accepted as m o st propose d permutatio ns σ ′ will swap lo w degree nodes (as they f orm the majority of nodes). A po ssibly m o re efﬁci ent way would be to swap ele ments of σ biased bas ed on correspo nding node degree, so that h igh deg ree nodes w ould ge t swapped mor e often. Howe ver , doing th is directly does not satisfy the detailed balan ce condition. A way of samplin g labels bias ed by node de grees that at the same time satisﬁes th e de tailed bala nce con dition is the follo wing: we pick an edge in G unifor m ly at rand om and swap the lab els of the nod es at the edge endp oints. Notice this is biased to wards sw apping label s of nodes with high de grees simpl y as th ey ha ve m o re edge s. The d etailed balanc e conditio n holds as edges are sampled uniformly at random. W e refer to this proposal as SwapE dgeEndpoi n t s . Ho wev er , the issue with this propos al is that if the graph G is disconnec ted, we w il l only be swapp ing labels of no des that belong to the same connect ed co m p onent. This means th at some parts of th e permutation spa ce will ne ver ge t visited. T o overc ome this probl em we ex ecute Swap Nodes with some probab ility ω and SwapEdg eEndpoints with p robability 1 − ω . T o summarize we consider the follo wing tw o pe rmutation pr oposal dis tributi ons: • σ ′′ = SwapNode s ( σ ′ ) : we obtai n σ ′′ by takin g σ ′ , uniformly at ran dom selecting a pair of elements and swapp ing thei r positi ons. • σ ′′ = Sw apEdgeEnd points ( σ ′ ) : we obtain σ ′′ from σ ′ by ﬁrst sampling an edge ( j, k ) from G uniformly at random, then we take σ ′ and swap the label s at positions j and k . 30 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 5 . 3 . 2 S P E E D I N G U P T H E L I K E L I H O O D R ATI O C A L C U L A T I O N W e further speed up the algor ithm by using the follo wing observ ation. As written the equation 7 tak es O ( N 2 ) to e valu ate since w e ha ve to cons ider N 2 possib le edges. Howe ver , notice that per - mutation s σ and σ ′ dif fer only at two po sitions, i.e . elements at position j and k are swa pped, i.e. , σ and σ ′ map all nod es excep t the two to the same locations. This means thos e elements of equa- tion 7 cance l out. Thus to update the lik elihood we o nly need to tra verse two ro ws an d columns of matrix P , namely ro ws and co lumns j and k , since e verywher e else the mappi ng of the nod es to the adjace ncy matrix is th e same for bo th permutatio ns. This giv es equ ation 8 where the product s now range only ov er the two rows/c olumns o f P where σ and σ ′ dif fer . Graphs we are worki ng with here are too lar ge to allo w us to explic itly create and store the stocha stic adjace ncy matrix P by Kroneck er po wering the initiator matrix Θ . Every time probab ility P [ i, j ] of edge ( i, j ) is neede d the e quation 2 is ev aluated , which tak es O ( k ) . So a single iteratio n of Algorit hm 3 t akes O ( k N ) . Observ ation 3 Sampling a permuta tion σ fr om P ( σ | G, Θ ) tak es O ( k N ) . This is gi ves us an improve m e nt over the O ( N !) compl exity of summing o ver all the p ermuta- tions. So f ar we ha ve sh own ho w to obtain a p ermutation but we still ne ed to e valua te the lik elihood and ﬁnd the gradie nts that will guide us in ﬁnding good initiato r matrix. The problem here is that nai vely ev aluating the network likelihoo d (gradient) as written in equatio n 6 takes time O ( N 2 ) . This is exac tly what we in v estigate next and ho w to calculate t he lik elihood in linear time . 5.4 Efﬁciently approximati n g likelihoo d and gradie n t W e just sho wed how to ef ﬁ c iently sample nod e permutat ions. N o w , gi ven a permut ation we sho w ho w to efﬁcient ly ev aluate the likeli hood and it’ s gradie nt. Similarly as ev aluati ng the lik elihood ratio, naiv ely calcu lating the log-lik elihood l (Θ) or its grad ient ∂ ∂ Θ l (Θ) tak es time quadratic in the number of nodes. Next, we sho w h ow to compute this in linear time O ( E ) . W e begin with the observ ation that real graphs are sparse, whic h mean s that the number of edges is no t quadratic b ut rath er almost linea r in the nu m b er of nod es, E ≪ N 2 . This mea ns that majori ty of en tries of graph adjac ency matrix are zero, i.e. , mos t of the ed ges are not prese nt. W e exploi t this fact . The idea is to ﬁrst calculate the lik elihood (gradient) of an empty g raph, i.e. , a gr aph w i th zero edges, and then correct for the edges that actual ly appear in G . T o nai vely calcula te the lik elihood for an empty graph one ne eds to ev aluate e very cell of grap h adjace ncy matrix. W e c onsider T aylor ap proximation to the lik elihood, and e xploit th e structure o f matrix P to de vise a co nstant time alg orithm. First, consider the second o rder T aylor app roximation to log-like lihood of an e dge tha t succeeds with proba bility x bu t do es no t app ear in the gra ph: log(1 − x ) ≈ − x − 1 2 x 2 Calculati ng l e (Θ) , the log-lik elihood of an empty graph , beco m e s: l e (Θ) = N X i =1 N X j =1 log(1 − p ij ) ≈ −  N 1 X i =1 N 1 X j =1 θ ij  k − 1 2  N 1 X i =1 N 1 X j =1 θ ij 2  k (9) 31 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I Notice that while the ﬁrst pair of sums rang es over N element s, the last pair only ranges ove r N 1 elements ( N 1 = lo g k N ). E q uation 9 holds due to the recursi ve structure of matrix P generated by the Kroneck er product. W e substi tute the log(1 − p ij ) with its T aylor approximati on, which gi ves a sum o ver elements of P and their sq uares. Next, we notice the sum of elements of P forms a multi nomial series , and thus P i,j p ij = ( P i,j θ ij ) k , where θ ij denote s an elemen t of Θ , and p ij element of Θ [ k ] . Calculati ng log-lik elihood of G now tak es O ( E ) : First, we approxima te the likelihoo d of an empty graph in constant time, and then account for the edges that are a ctually present in G , i.e . , w e subtra ct “no-edg e” likel ihood and add the “edge” likelih oods: l (Θ) = l e (Θ) + X ( u,v ) ∈ G − log (1 − P [ σ u , σ v ]) + lo g ( P [ σ u , σ v ]) W e n ote that by using the second ord er T aylor approximatio n to the log-like lihood of an empty graph, the error term of the approx imation is 1 3 ( P i θ ij 3 ) k , w h ich can div erge for larg e k . For typica l valu es of initiator matrix P 1 (that we present in Section 6.5) we note that one need s about fourth o r ﬁfth order T aylor app roximation for th e error of the ap proximation actually g o to z ero as k approa ches inﬁnity , i.e . , P ij θ ij n +1 < 1 , wher e n is the order of T aylo r approximation employed. 5.5 Calculating the gradient Calculati on of the gradien t of the log-lik elihood follo ws exac tly the same pattern as described abov e. First by using the T aylor approx imation we calculate the gradient as if graph G would hav e no edges. Then we correc t the gradient for the edges that are present in G . As in previo us section we speed up the calculation s of the gradient by explo iting the fac t that two consecuti ve permutati ons σ and σ ′ dif fer only at two positio ns, and thus giv en the gradient from previo us step one only needs to accoun t for the swap of the two ro ws and colu m n s of th e gradient matrix ∂ P /∂ Θ to update to th e gradie nts of indi vidual pa rameters. 5.6 Determining the size of initiator matrix The que stion we answer ne xt is ho w to determine the right number of parameters, i.e. , what is th e right size of matrix Θ ? This is a classical question of model selecti on where there is a tradeof f between th e complex ity of th e model, and the quali ty of the ﬁt. Bigge r model with more par ameters usuall y ﬁts better , ho w e ver it is also more lik ely to o verﬁt the data. For model selection to ﬁnd the appropria te value of N 1 , the size of matrix Θ , and choose the right tradeof f between the complexit y of the model and the quality of the ﬁt, we propose to use the Bayes Informat ion Criterion (BIC) (Schwarz, 1978). Stochastic Kroneck er graph m o del the presen ce of edges with independ ent Bernoulli random variab les, where the canonic al number of paramete rs is N 2 k 1 , whic h is a fu nction of a lower -dimensio nal parameter Θ . T h is is t hen a curved e xponential fa m ily (Efro n , 19 75 ), an d BIC natu rally appli es: BIC ( N 1 ) = − l ( ˆ Θ N 1 ) + 1 2 N 2 1 log( N 2 ) where ˆ Θ N 1 are the maximum likelih ood paramet ers o f the model with N 1 × N 1 paramete r ma- trix, and N is the number of nodes in G . Note that one could also addition al term to the abov e 32 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S formula to acc ount for multiple g lobal maxima of the likeliho od spac e bu t as N 1 is small the addi- tional term would mak e n o real dif ference. As an alternati ve to BIC one could also consider the Minimum Descripti on Leng th (MDL) (Rissanen, 1978) principle where the model is scored by the quality of the ﬁt plus the size of the description that encod es the model and the paramete rs. 6. Experiments on real and synthetic data Next we descri bed our expe riments on a range of real and synth etic network s. W e di vide the ex- periment s into sev eral subsections . First we examine the con v ergen ce and mixing of the Marko v chain of our permutati on sampling scheme. T h en we consider estimatin g the parameters of syn- thetic Kro neck er g raphs to see wh ether K RO N F I T is able to reco ver the parameters used to generate the networ k. Last, we cons ider ﬁtting Stochas tic Kroneck er grap hs to lar ge real world network s. 6.1 Permutatio n sampling In our exper iments we consid ered both synthetic and real graphs. Unless mentioned otherwise all synthe tic Kroneck er gra phs were generate d using P ∗ 1 = [0 . 8 , 0 . 6; 0 . 5 , 0 . 3] , and k = 1 4 which giv es us a graph G on N = 16 , 3 84 nodes and E = 115,741 edges. W e chose this particu lar P ∗ 1 as it resemble s the typica l initiator fo r rea l netw orks a nalyzed lat er in t his s ection. 6 . 1 . 1 C O N V E R G E N C E O F T H E L O G - L I K E L IH O O D A N D T H E G R A D I E N T First, we examine the con ver gence of M e tropolis permutat ion sampling, w h ere permutat ions are sampled sequentia lly . A new permutat ion is ob tained by modifying the p reviou s one which create s a Marko v chain . W e want to assess the con ver gence and m ix ing of the chain. W e ai m to determin e ho w many permutati ons one needs to draw to reliably estimate the likelihoo d and the gradient, and also how long does it take until the samples con ver ge to the stationary distrib ution. For the exp eriment we generated a syntheti c S to chastic Kronec ker g raphs usin g P ∗ 1 as deﬁned a bove. Then, startin g with a random permutation we run Algorithm 3 , and measure how the likeliho od and the gradie nts con v erge to t heir true values . In th is particul ar case, we ﬁrst generated a S t ochastic Kroneck er grap hs G as described abov e, b ut then calcula ted the likel ihood and the parameter gradi ents for Θ ′ = [0 . 8 , 0 . 75; 0 . 45 , 0 . 3] . W e a verage the likelih oods and gradients ov er b uckets of 1,000 cons ecutiv e samples, and pl ot how the log-lik elihood calcula ted o ver the sa mpled permuta tions approaches the true l og-likeli hood (t hat we can compute since G is a Stochasti c Kroneck er graph s). First, we prese nt expe riments that aim to answer how many samples ( i.e. , permutatio ns) does one need to draw to obtain a reliable estimate of the gradi ent (see Equati on 6). Figure 12(a) sho ws h ow the estimated log-lik elihood approaches the t rue lik elihood. Notice that estimated valu es quickl y con ve rge to their true valu es, i.e. , Metropol is sampling quickly m o ves to wards “good” pe r- mutation s. Similarly , Figure 1 2 (b ) plots the con ve rgenc e of th e gradients . Notice that θ 11 and θ 22 of Θ ′ and P ∗ 1 match, so gradients of these two parameters should con ver ge to zero and indeed the y do. On the other hand, θ 12 and θ 21 dif fer between Θ ′ and P ∗ 1 . Notice, the gradie nt for one is positi ve as the parameter θ 12 of Θ ′ should be decreased , and similarl y for θ 21 the gradient is negati ve as the paramete r val ue should be increased to match the Θ ′ . In summary , this sho w s that log- likelihoo d and gradie nts rather quickly con v erge to their true value s. 33 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I -1.1 ⋅ 10 6 -1.0 ⋅ 10 6 -9.0 ⋅ 10 5 0 ⋅ 10 0 1 ⋅ 10 6 2 ⋅ 10 6 Log-likelihood, l( θ | σ t ) Sample index (time) Estimated l( θ | σ t ) True l( θ ) -2 · 10 5 0 2 · 10 5 4 · 10 5 0 1 · 10 6 2 · 10 6 Gradient, ∂ l( θ ) / ∂ θ ij Sample index (time) ∇ θ 11 ∇ θ 21 ∇ θ 21 ∇ θ 22 (a) Log-lik elihood (b) Gradient 0.1 0.2 0.3 0.4 0.5 0 · 10 0 1 · 10 6 2 · 10 6 Acceptance ratio Sample index (time) -0.2 0 0.2 0.4 0.6 0.8 1 0 200 400 600 800 1000 l( θ ) autocorrelation, r k Lag, k (c) Accept ance ratio (d) Autoco rrelation Figure 12: Con ver gence of the log-lik elihood and componen ts of the gradient to wards their true v alues for Metropoli s permutat ion sampling (Algo rithm 3) w i th the number of samples. In Figures 12(c) and (d) we also in v estigate the proper ties of the Marko v Chain M o nte Carlo sampling procedure, and assess con ver gence and mixing criteria. F ir st, we plot the fraction of accept ed prop osals. It stabil izes at aroun d 15%, which is quite close to the rule-of-a -thumb of 25%. Secon d, Figure 12(d) plots the auto correlation of the log- likelihoo d as a functio n of the lag. Autocorr elation r k of a signa l X is a function of the lag k where r k is deﬁned as the correlation of signal X at time t w it h X at t + k , i.e. , correlat ion of the signal with itself at lag k . H ig h autoco rrelations w it hin chains indicate slow mixing and, usually , slow con v ergen ce. O n the othe r hand fast decay of autocorre lation implies better the mixing and thus one needs less samples to accura tely estimate the gradien t or the likeli hood. N o tice the r ather f ast autocorrela tion decay . All in all, these experiments show that one needs to sample on the order of tens of thous ands of permutations for the estimates to con v erge. W e als o ve riﬁed th at the va riance of the esti m a tes is suf ﬁciently small. In our experiment s we start with a random permutat ion and use long b urn-in time. Then when performing optimizatio n w e u se the permuta tion from the prev ious step to initialize the permutat ion at the cur rent step of the gr adient desc ent. Intuiti vely , small changes in paramete r space Θ also mean small change s in P ( σ | G , Θ) . 34 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S -5.9 ⋅ 10 5 -5.7 ⋅ 10 5 -5.5 ⋅ 10 5 -5.3 ⋅ 10 5 0 2 ⋅ 10 5 4 ⋅ 10 5 Log-likelihood, l( θ | σ t ) Sample index (time) True l( θ ) ω =0.0 ω =0.2 ω =0.4 ω =0.6 ω =0.8 ω =1.0 Figure 13: Con ver gence of the log-l ikelihood and gradie nts for Metropoli s permutat ion sampling (Algorith m 3) for dif ferent c hoices of ω that interpolates between the S wapNodes ( ω = 1 ) and Swap EdgeEndpo ints ( ω = 0 ) permutati on proposa l distrib utions . Notice fast est con verg ence of log-lik elihood for ω = 0 . 6 . 6 . 1 . 2 D I FF E R E N T P R O P O S A L D I S T R I B U T I O N S In section 5.3.1 we deﬁne d two permutation samplin g strate gies: Sw apNodes where we pick two nodes uniformly at random and swap their label s (node ids), and Swa pEdgeEndp oint s w h ere we pick a random edg e in a g raph and then sw ap t he label s of th e edge endpoin ts. W e also discu ssed that one can interpola te between the two strat egi es by e xecuting Swa pNodes with probabili ty ω , and SwapE dgeEndpoi n t s with probabili ty 1 − ω . So, giv en a Stochastic Kronecke r graphs G on N = 16,384 and E = 115,741 generated from P ∗ 1 = [0 . 8 , 0 . 7; 0 . 5 , 0 . 3] we ev aluate the likeliho od of Θ ′ = [0 . 8 , 0 . 75; 0 . 45 , 0 . 3] . As we sample permutat ions we observ e how the est imated likelihoo d con ver ges to the tru e like lihood. Moreov er we also v ary parameter ω which inte rpolat es between the two permutatio n proposal distrib ution s. The quick er the con ver gence tow ards the true log-likel ihood the better the proposal distrib ution. Figure 13 plot s the c on ver gence of the log-lik elihood with th e number of samp led permutation s. W e plot the a ver age ove r non-ov erlapping bu ckets of 1,000 con secuti ve permutatio ns. Faster c on- ver gence imp lies bett er permutatio n proposal dis tribut ion. When we use onl y SwapN odes ( ω = 1 ) or Swap EdgeEndpo ints ( ω = 0 ) con v ergenc e is rather slo w . W e ob tain be st con v erge nce for ω around 0 . 6 . Similarly , Figure 14(a) pl ots the auto correla tion as a func tion of the la g k for d ifferen t choice s of ω . Faste r aut ocorrelation decay means better mixing of the M a rkov chain. Again, notice that we get best mixing for ω ≈ 0 . 6 . (Notice lo garithmic y-axis.) Last, we diagnose ho w long the sampling proced ure must be run befo re the generate d samples can be considered to be drawn (appr oximately) from the station ary distrib ution. W e call this the 35 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I 10 -2 10 -1 10 0 0 50 100 150 200 l( θ ) autocorrelation, r k Lag, k ω =0.0 ω =0.2 ω =0.4 ω =0.6 ω =0.8 ω =1.0 1 1.01 1.02 1.03 1.04 1.05 1.06 1.07 1.08 0 500 1000 1500 2000 Potential scale reduction Chain length, K ω =0.0 ω =0.2 ω =0.4 ω =0.6 ω =0.8 ω =1.0 (a) Autocorre lation (b) Potential scale reduc tion Figure 14: (a) Autocorr elation plot of the log-lik elihood for the dif ferent choices of parameter ω . Notice w e get best mixing with ω ≈ 0 . 6 . (b) The potential scale reducti on that com- pares the v ariance insi de- a nd across- in dependent Marko v chains f or dif ferent val ues of paramete r ω . b urn-in time of the chain. There ar e var ious procedures for as sessing con ver gence. Here we adopt the approach of Gelman et al. (Gelman et al., 2003), that is based on running multiple Mark ov chains each from a dif ferent starting point, and then comparing the varia nce within the chain and between the chain s. The soo ner the with in- and b etween-chain va riances become equal the shorter the b urn-in time, i.e. , the sooner the samples are drawn from the station ary distrib ution . Let l be the parameter that is being simulated with J differe nt chains, and then let l ( k ) j denote the k th sample of the j th chain, where j = 1 , . . . , J and k = 1 , . . . , K . More speciﬁcally , in our case we run separat e permutation sampling chains. So, w e ﬁ rs t sample permutation σ ( k ) j and the n calcul ate th e co rresponding log-likel ihood l ( k ) j . First, we compute betwee n and within chain v ariances ˆ σ 2 B and ˆ σ 2 W , where between-chain v ari- ance is obtaine d by ˆ σ 2 B = K J − 1 J X j =1 ( ¯ l · j − ¯ l ·· ) 2 where ¯ l · j = 1 K P K k =1 l ( k ) j and ¯ l ·· = 1 J P J j =1 ¯ l · j Similarly the within-ch ain v ariance is d eﬁned by ˆ σ 2 W = 1 J ( K − 1) J X j =1 K X k =1 ( l ( k ) j − ¯ l · j ) 2 Then, the mar ginal p osterior v ariance of ˆ l i s calcul ated using ˆ σ 2 = K − 1 K ˆ σ 2 W + 1 K ˆ σ 2 B 36 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S -4.1 -4.0 -3.9 -3.8 -3.7 -3.6 -3.5 -3.4 -3.3 -3.2 0 200 400 600 800 1000 Log-likelihood Rank ⋅ 10 5 ⋅ 10 6 -2.6 -2.4 -2.2 -2.0 -1.8 -1.6 -1.4 0 200 400 600 800 1000 Log-likelihood Rank ⋅ 10 5 ⋅ 10 6 -2.3 -2.2 -2.1 -2.0 -1.9 -1.8 -1.7 -1.6 -1.5 -1.4 -1.3 -1.2 5 10 15 20 25 30 35 40 45 50 Log-likelihood, l( Θ t ) Gradient descent iteration, t ⋅ 10 5 (a) l (Θ | σ i ) where (b) l ( Θ | σ i ) where (c) l (Θ t ) for 10 random σ i ∼ P ( σ ) σ i ∼ P ( σ | Θ , G ) gradie nt desce nt runs Figure 15: (a) Distrib ution of log-like lihood of permutati ons sampled uniformly at random, and (b) when sampled from P ( σ | Θ , G ) . Notic e the space of good permutations is rather small b ut ou r sampling quick ly ﬁnds permutations of h igh likelih ood. (c) Con verg ence of log-like lihood for 1 0 runs of gradie nt descent, each fr om a dif ferent ra ndom startin g point. And, ﬁnally , we es timate the p otential sc ale r educti on (Gelman et al., 2003) of l b y p ˆ R = s ˆ σ 2 ˆ σ 2 W Note that as the le ngth of the c hain K → ∞ , p ˆ R co n ver ges to 1 from ab ove. The recommen- dation for con ver gence assessment from (Gelman et al., 2003) is that the potentia l scale reduction is belo w 1 . 2 . Figure 14(b) giv es the Gelman-Ru bin-Brooks plot, where we plot the po tential scale reductio n p ˆ R ov er the increasin g chain length K for dif ferent choices of parameter ω . Notice that the po- tential scale re ductio n quick ly decays to wards 1. Similarly as in Figure 14 the extreme valu es of ω gi ve slo w de cay , while we obtain the fastest potenti al scale reduct ion when ω ≈ 0 . 6 . 6 . 1 . 3 P R O PE RT I E S O F T H E P E R M U T A T I O N S P A C E Next we explore the prope rties of the permutatio n space. W e would lik e to quan tify what fracti on of permutations are “good” (ha ve high likelihoo d), and ho w quickly are the y discov ered. For the exp eriment we took a real networ k G ( A S -R O U T E V I E W S network) and the MLE parameters ˆ Θ for it tha t w e estimated bef ore h and ( l ( ˆ Θ) ≈ − 150,000). The net work G has 6,4 74 nodes which mean s the space of all permutatio ns has ≈ 10 22 , 000 elements . First, we sampl ed 1 billi on ( 1 0 9 ) per m u tations σ i unifor m ly at ra ndom, i.e . , P ( σ i ) = 1/(6,474!) and for each ev aluated its log-like lihood l ( σ i | Θ) = log P (Θ | G, σ i ) . W e ordered the p ermutations i n deceas ing log-lik elihood and plotted l ( σ i | Θ) vs. rank. F ig ure 15 ( a) gi ves the plot. Notice th at ve ry fe w rand om permutation s are very bad ( i.e. , they gi ve lo w likelihoo d), similarly few permuta tions are very good, while most of them are some where in bet w e en. Notice th at be st “rand om” permuta- tion has log -likeliho od of ≈ − 320,00 0, which is far belo w tru e lik elihood l ( ˆ Θ) ≈ − 150,000. This sugge sts th at on ly a v ery small fraction of all permutations giv es good n ode labelings. 37 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I On t he other han d, we also repeate d the same e xperiment b ut no w usi ng pe rmutations sampled from the permutatio n distrib ution σ i ∼ P ( σ | Θ , G ) via our Metropolis sampling scheme. Fig- ure 15(b ) giv es the plot. Notice the radical dif ference. Now the l ( σ | Θ i ) v ery quickly con v erg es to the true like lihood of ≈ − 150 , 000 . This sugg ests that while the number of “good” permutations (accur ate nod e mapping s) is rath er small, our sampling p rocedure quickly con ver ges to th e “good ” part of the permuta tion space where node mapping s are accura te, and spends the most time there. 6.2 Propertie s of the optimizat ion space In maximizin g the lik elihood we use a stochast ic ap proximation to the gradient. This add s v ariance to the g radient and m a kes efﬁcie nt op timization te chniques, lik e co njugate gradient, h ighly u nstable. Thus we use gradie nt descent, which is slo wer but easie r to control. F ir st, we make the follo wing observ ation: Observ ation 4 Given a r eal graph G then ﬁnding the maximum likeli hood Stoc hastic Kr onec ker initiat or matrix ˆ Θ ˆ Θ = arg max Θ P ( G | Θ) is a non-con vex opt imization pr o blem. Pro of By deﬁnition permutation s of the Kroneck er graphs initiator matrix Θ all hav e the same log-lik elihood. This means that we hav e se veral glob al minima that corres pond to p ermutations of paramete r m a trix Θ , and then between them the log -likeliho od drop s. T hi s means that the optimiza- tion probl em is non-con v ex. The above observ ation does not seem promising for estimating ˆ Θ using gradie nt descent as it is prone to ﬁnding local m in ima. T o test for this behav ir we run the follo wing exper iment: we genera ted 100 synthet ic Kronecker graphs on 16,384 ( 2 14 ) nodes and 1.4 million edges on the a verage, each with a randomly chosen 2 × 2 parameter matrix Θ ∗ . For each of the 100 graphs we run a single trial of gradient descent starting from a random parameter matrix Θ ′ , and try to reco ver Θ ∗ . In 98% of the cases the gradient descent con ver ged to the true paramet ers. Many times th e algorithm con v erged to a dif ferent glob al mini m a , i.e. , ˆ Θ is a permuted v ersion of origi nal paramete r matrix Θ ∗ . Moreov er , the median number of gradie nt descent iteratio ns was onl y 52. This sugge sts surpri singly nice structure of our optimiza tion space: it seems to behav e like a con v ex optimization problem with many equi v alent global minima. Moreov er , this exp eriment is also a good sanity chec k as it sho ws that gi ven a Kronecker grap h we can reco ver and ide ntify the paramete rs that were used to generat e it. Moreo ver , Figure 15(c) plots th e log-like lihood l (Θ t ) of the current paramet er es timate Θ t ov er the iterations t of the stochastic grad ient descent. W e plot the log-lik elihood for 10 differe nt runs of gradi ent descen t, each time starti ng from a dif ferent random set of parameters Θ 0 . Notice that in all runs gradie nt descent a lways con v erges tow ards the o ptimum, an d no ne o f the runs gets stuck in some local maxima. 6.3 Con verge n ce of the graph propert ies W e approa ched the problem of estimatin g Stochastic Kroneck er initiato r matrix Θ by deﬁning the like lihood ov er the indivi dual entrie s of the graph adjacenc y matrix. Howe ver , what we would really 38 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S -4.6e4 -4.4e4 -4.2e4 -4.0e4 -3.8e4 -3.6e4 -3.4e4 -3.2e4 0 10 20 30 40 50 l( θ ) (Log-likelihood) Gradient descent iterations True l( θ ) 0 0.02 0.04 0.06 0.08 0.1 0.12 0.14 0.16 0.18 0 10 20 30 40 50 Average absolute error Gradient descent iterations (a) Log-lik elihood (b) A verage error 3 3.5 4 4.5 5 5.5 6 0 10 20 30 40 50 Effective diameter Gradient descent iterations True effective diameter 2 4 6 8 10 12 14 0 10 20 30 40 50 λ 1 (Largest singular value) Gradient descent iterations True λ 1 (c) Effe cti ve diameter (d) Lar gest s ingular -valu e Figure 16: Con ver gence of grap h properties with the number of iterations of gradient des cent using the synthet ic dataset. W e start with a random choice of parameters and with steps of gradie nt descent the Kroneck er graph better and better matches netwo rk properties of the tar get gr aph. like is to be gi ven a real graph G and then gen erate a s ynthetic graph K that has simila r propertie s as the real G . By properties we mean netw ork statis tics that can be comput ed from the graph, e .g. , diameter , degree distr ibutio n, clustering coefﬁcie nt, etc. A priori it is not clear that our approach which tries to match indi vidual entries of graph adjacen cy matrix will also be able to reprod uce these global network sta tistics. Howe ver , as sho w n ext this is not the case. T o get some unders tanding of the con v erg ence of the gradient descent in terms of the n etwork proper ties we p erformed the follo wing e xperiment. A f ter e very step t o f stoc hastic gradie nt descent, we compar e the true grap h G with the synthetic Kroneck er graph K t genera ted using the current paramete r estimates ˆ Θ t . Figure 16(a) gi ves the con ver gence of log-likeli hood, and (b) giv es ab solute error in para m et er va lues ( P | ˆ θ ij − θ ∗ ij | , where ˆ θ ij ∈ ˆ Θ t , and θ ∗ ij ∈ Θ ∗ ). Similarly , Figure 16(c) plots the ef fecti ve diameter , and (d) gi ves the lar gest singular valu e of gr aph adjacenc y matrix K as it con v erge s to large st singular v alue of G . Note how with progre ssing iterations of gradient descent prop erties of graph K t quickl y con- ver ge to tho se of G ev en though we are n ot directly optimizing the si m i larity in network prop erties: 39 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Count In-degree, k Real graph Kronecker 10 3 10 4 10 5 10 6 10 7 10 8 0 1 2 3 4 5 6 7 8 9 Reachable pairs of nodes, r(h) Number of hops, h Real graph Kronecker (a) Deg ree distrib ution (b) Hop plot 10 0 10 1 10 2 10 0 10 1 10 2 Singular value Rank Real graph Kronecker 10 -3 10 -2 10 -1 10 0 10 0 10 1 10 2 10 3 10 4 Network value Rank Real graph Kronecker (c) Scree plot (d) “Network ” v alue Figure 17: Autono m ou s Systems ( A S - R O U T E V I E W S ): Overlayed patterns of real graph and the ﬁtted Kroneck er graph. Notice that the ﬁ tt ed Kroneck er graph matches patterns of the real graph while using only four para m e ters ( 2 × 2 init iator matrix). log-lik elihood increases, absol ute error of parameters dec reases, diameter and large st singular v alue of K t both co n ver ge to G . T hi s is a n ice result as it sho ws that thro ugh maximizing the lik elihood the r esultin g graph s become more and more s imilar a lso in the ir structur al prope rties (ev en though we are not direc tly o ptimizing o ver them). 6.4 Fitting to real- world netwo rks Next, we present experiments of ﬁttin g Kron ecker grap h model to real-world n etworks. Give n a real netwo rk G we aim to disc over the most like ly paramet ers ˆ Θ that ideal ly woul d ge nerate a syn thetic graph K ha ving si m i lar properties as real G . This assu m e s that Kroneck er graph s are a good model of the netwo rk structure, and that K RO N F I T is able to ﬁnd good parameters. In pre vious section we sho wed that K RO N F I T can efﬁcient ly recov er the parameters. Now we examin e ho w well can Kroneck er gra ph model the struc ture of real netw orks. W e consider sev eral dif ferent network s, lik e a graph of connecti vity a mong Internet A u tonomous systems (A S - R O U T E V I E W S ) w i th N = 6,474 and E = 26,467 a who-trusts -whom type social net- work fr om Epinions (Richardson et al., 2003) (E P I N I O N S ) with N = 75,879 and E = 508,960 and 40 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S many others. The lar gest network we consider for ﬁtting is F L I C K R photo-sharin g online social netwo rk with 584 , 2 07 node s and 3,555,11 5 edge s. For the purpose of this secti on we tak e a real network G , ﬁnd parameters ˆ Θ using K R O N F I T , genera te a synthetic graph K using ˆ Θ , and then compare G and K by comparing their properties that w e introdu ced in secti on 2 . In a ll exp eriments we s tarted from a random p oint (rando m in itiator matrix) and run gradient descent for 100 steps. At each step we estimate the likeli hood and the gradie nt based on 510,000 sampled permutatio ns where we discar d ﬁrst 10,000 samples to allo w the chain to b urn-in. 6 . 4 . 1 F I T T I N G T O A U T O N O M O U S S Y S T E M S N E T W O R K First, we focus on the Aut onomous Systems net work obtai ned from the Univ ersity of Ore gon Route V ie ws project (RouteV ie ws, 1997). Giv en the AS network G we run K R O N F I T to obtain parameter estimates ˆ Θ . U s ing the ˆ Θ we then generate a synthetic Kroneck er graph K , and compare the proper ties of G and K . Figure 17 sho ws properti es of A S - R O U T E V I E W S , and compares them with the p roperties of a synthe tic Kro necker gra ph generated u sing the ﬁtted parameters ˆ Θ of size 2 × 2 . Notice that prop er- ties of both graphs match really well. The estimated parameters are ˆ Θ = [0 . 987 , 0 . 571; 0 . 571 , 0 . 049] . Figure 17(a) compares the degre e distrib utions of the A S - R O U T E V I E W S network and its syn- thetic Kroneck er estimate. In this and all other plots we use the expo nential binning which is a standa rd procedu re to de-noi se the data when plotting on log–log scales. Notice a ve ry cl ose match in de gree dis tribut ion between the real graph and its synthetic counterpa rt. Figure 17(b) plots the cumulati ve number of pairs of nodes g ( h ) that can be reached in ≤ h hops. The hop plo t gi ves a sense about the dist ributio n of the shortes t path lengths in the networ k and abo ut the network d iameter . Last, Fig ures 17(c) and (d) plot the sp ectral properties of the graph adjace ncy matrix. Figure 17( c) plots l argest singular v alues vs. rank, and (d) plots the components of le ft singu lar vecto r (the network v alue) vs. the rank . Again notice the good agreement with the real graph while using only four parameter s. Moreo ver , on all plots th e erro r bars of two standard de viations sho w the var iance of the graph proper ties for diff erent realiz ations R ( ˆ Θ [ k ] ) . T o obtain the error bars we took the same ˆ Θ , and genera ted 50 realization s of a Kronecker graph. As for the m o st of the plots the error bars are so small to be practically in visib le, this sh ows that th e varian ce of n etwork prope rties when generating a Stochast ic Kroneck er grap h is indeed very small. Also notice that the A S - R O U T E V I E W S is an undirected graph, and that the ﬁ tt ed paramet er matrix ˆ Θ is in fact symmetric. This means that w i thout a priori biasing the ﬁ tt ing to wards undi- rected graphs , the reco vered parameters obey thi s aspect of the network. Fitting A S - R O U T E V I E W S graph fr om a ran dom set of pa rameters, performing gradi ent desc ent for 100 iterat ions and at ea ch iterati on sampling half a million permutations, took less than 10 minutes on a standard desktop PC. This is a signiﬁca nt speedup ov er (Bez ´ akov ´ a et al., 2006), where by usin g a similar permuta - tion sampling approach for c alculating the lik elihood of a preferent ial attac hment model on similar A S - R O U T E V I E W S graph took about two days on a cluster of 50 machines. 6 . 4 . 2 C H O I C E O F T H E I N I T I A T O R M A T R I X S I Z E N 1 As mentioned earlie r for ﬁnding the optimal number of parameters, i.e. , selecting the size of initiato r matrix, BIC criterio n naturally applies to the case of Kroneck er graphs. Figure 23(b) sho ws BIC 41 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I N 1 l ( ˆ Θ) N k 1 E k 1 |{ de g ( u ) > 0 }| BIC score 2 − 152,499 8,192 25,023 5,675 152,50 6 3 − 127,066 6,561 28,790 5,683 127,08 3 4 − 153,260 16,384 24,925 8,222 153,290 5 − 149,949 15,625 29,111 9,822 149,996 6 − 128,241 7,776 26,557 6,623 128,30 9 A S - R O U T E V I E W S 26,46 7 6,474 T able 2: Log-lik elihood at MLE for dif ferent choices of the size of the init iator matrix N 1 for the A S - R O U T E V I E W S graph. Notice the log-lik elihood l ( ˆ θ ) g enerally incre ases with the model complexi ty N 1 . Also noti ce the ef fect o f zero-pad ding, i.e . , for N 1 = 4 and N 1 = 5 the constra int of the number of nodes being an inte ger power of N 1 decrea ses the log- like lihood. Howe ver , the column |{ d eg ( u ) > 0 }| g i ves the number of non-isolat ed nodes in the ne twork which is much less t han N k 1 and is in fa ct very close to the true number of nodes in the A S - R O U T E V I E W S . Using the BIC scores we see that N 1 = 3 or N 1 = 6 are best choic es for the size of the initiator m a trix. scores f or the follo wing ex periment: W e generat ed Kron ecker graph with N = 2,187 an d E = 8,736 using N 1 = 3 (9 parameter s) and k = 7 . For 1 ≤ N 1 ≤ 9 we ﬁ nd the MLE parameters using gradie nt desce nt, and calculate the BIC scores. The model with the lowest score is chosen. As ﬁgure 23(b) sho ws we reco vered the true model, i.e. , BIC sc ore is the lowest for th e model with the true number of paramete rs, N 1 = 3 . Intuiti vel y w e e xpect a more complex model with more para meters to ﬁt the data better . Thus we expect lar ger N 1 to generally gi ve better lik elihood. On the other hand the ﬁt will also de pend on the s ize of the graph G . Kroneck er g raphs can only generate grap hs on N k 1 nodes , while rea l graphs do not neces sarily hav e N k 1 nodes (for some, preferably small, integers N 1 and k ). T o solve this proble m we choos e k so that N k − 1 1 < N ( G ) ≤ N k 1 , and then augment G by adding N k 1 − N isolated nodes . Or equi vale ntly , we pad the adjacenc y matrix of G with zeros until it is of the appropriat e size, N k 1 × N k 1 . While this solv es the problem o f requir ing the in teger p ower of the number o f nodes it also makes the ﬁ t ting proble m harder as when N ≪ N k 1 we are basically ﬁ t ting G plus a lar ge number of isolate d nodes . T able 2 sho ws the resu lts of ﬁtting K ro necker graphs to A S - R O U T E V I EW S w h ile varyi ng the size of the initia tor mat rix N 1 . First, notice th at in genera l lar ger N 1 results in higher log-lik elihood l ( ˆ Θ) at MLE. Similarly , notice (column N k 1 ) that while A S - R O U T E V I E W S has 6,474 nodes, Kro- neck er esti mates hav e up to 16,384 nodes (16,384 = 4 7 , which is the ﬁrst inte ger power of 4 gre ater than 6,474). Ho wev er , we also show the number of non-zer o degree (non-iso lated) nodes in the Kroneck er graph (column |{ deg ( u ) > 0 }| ). N o tice that the number of non-isola ted nod es well corres ponds to the number of nodes in A S - R O U T E V I E W S networ k. This sho ws that K RO N F I T is actual ly ﬁtting the graph well and it succes sfully ﬁts the structu re of the graph plus a number of isolate d nodes . Last, column E k 1 gi ves th e number of edges in the correspondi ng Kronecker grap h which is close to the true number of edges of the A S - R O U T E V I E W S graph. Last, compa ring the log -likeliho od at the MLE an d the BIC score in T ab le 2 we notic e that the log-lik elihood hea vily dominates the BIC score. This means that the size of the initiator matrix 42 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 Count Degre, k K: N 1 =3 Fit K’: N 1 =2 Fit K’’: N 1 =3 10 4 10 5 10 6 10 7 10 8 10 9 0 1 2 3 4 5 6 7 8 Reachable pairs of nodes, r(h) Number of hops, h K: N 1 =3 Fit K’: N 1 =2 Fit K’’: N 1 =3 (a) Deg ree distrib ution (b) Hop plot 10 -1 10 0 10 1 10 0 10 1 10 2 Singular value Rank K: N 1 =3 Fit K’: N 1 =2 Fit K’’: N 1 =3 10 -4 10 -3 10 -2 10 -1 10 0 10 0 10 1 10 2 10 3 10 4 Network value Rank K: N 1 =3 Fit K’: N 1 =2 Fit K’’: N 1 =3 (c) Scree plot (d) “Network ” v alue Figure 18: 3 b y 3 Stoc hastic Kr onec ker graphs: Giv en a Stoc hastic Kroneck er graph s G generated from N 1 = 3 (red curve ), we ﬁt a Kronecke r graph K ′ with N ′ 1 = 2 (green) and K ′′ with N ′′ 1 = 3 (blue). Not s urprisingly K ′′ ﬁts t he properties of K perfec tly a s the model is the of same complex ity . On the other hand K ′ has only 4 paramet ers (instea d of 9 as in K and K ′′ ) and still ﬁts well. (number of parameters) is so small that ov erﬁtting is not a conce rn. Thus w e can just choose the initiat or matrix that maximizes the likelihoo d. A simple calculatio n sho w s that one would need to tak e initiator matrices with thousa nds of entries before the model complexity part of BIC score would star t to play a sign iﬁcant role. W e further examine the s ensitiv ity of the c hoice of the initiator size by the follo wing e xperiment. W e g enerate a Stochastic Kro necker grap hs K on 9 parameters ( N 1 = 3 ), an d then ﬁt a Kroneck er graph K ′ with a smaller number of paramete rs (4 instead of 9, N ′ 1 = 2 ). And also a Kronecker graph K ′′ of the same compl exi ty as K ( N ′′ 1 = 3 ) . Figure 18 p lots the propertie s of all thr ee graph s. Not su rprisingly K ′′ (blue) ﬁts th e prope rties of K (red) perfect ly as the initiato r is of the same size. On th e other hand K ′ (green ) is a simpler model with on ly 4 parameters (instead of 9 as in K and K ′′ ) and still generally ﬁts well: hop plot and degree distrib ution match well, while sp ectral propert ies of graph a djacency matrix , especial ly scree plot, are not matche d that w e ll. This sho w s that nothing drastic happe ns and that ev en a bit 43 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I Snapsho t at time N E l ( ˆ Θ) Estimates at MLE, ˆ Θ T 1 2,048 8,794 − 40,535 [0 . 981 , 0 . 633; 0 . 633 , 0 . 048] T 2 4,088 15,711 − 82,675 [0 . 934 , 0 . 623; 0 . 622 , 0 . 044] T 3 6,474 26,467 − 152,499 [0 . 987 , 0 . 57 1; 0 . 571 , 0 . 049] T able 3: Paramete r estimates of the three temporal snapshots of the A S - R O U T E V I EW S network. Notice that estimate s stay remarka bly s table o ver time. too simple model still ﬁts the data well. In general we observ e empiricall y that by increasing the size of the initiator mat rix one does not gain radically bet ter ﬁts fo r deg ree distrib ution and hop plot . On t he ot her hand the re is usually an i m p rovemen t in the scree plot a nd the pl ot of network v alues when one increas es th e initiator si ze. 6 . 4 . 3 N E T W O R K P A R A M E T E R S OV E R T I M E Next we brieﬂy examine the e voluti on of the Kronecke r initiat or for a temporally ev olving grap h. The idea is that giv en paramete r estimates of a real-g raph G t at time t , w e c an foreca st the future structu re of the graph G t + x at time t + x , i.e. , using pa rameters obtaine d from G t we can g enerat e a lar ger sy nthetic gr aph K that will be similar to G t + x . As we ha ve the information about the ev olution of the A S - R O U T E V I E W S netwo rk, we esti- mated parameters for three snap shots of the network when it ha d about 2 k nodes . T a ble 3 gi ves the results of the ﬁ t ting for the three temporal snapshot s of th e A S - R O U T E V I EW S network. Notice the paramete r estimates ˆ Θ remain remarka bly stable o ver time. This mea ns that Kronecker graphs can be used to estimate the struc ture of the networks in the futur e, i.e. , parameters estimated from the histor ic data ca n e xtrapolate the graph structure in the future. Figure 19 further explore s this. It overl ays the graph properties of the real A S - R O U T E V I E W S netwo rk at time T 3 and the s ynthetic grap hs for w h ich we used the para meters obtai ned on hi storic snapsh ots of A S - R O U T E V I E W S at times T 1 and T 2 . The a greements are good which demonstrates that Kroneck er graph s can foreca st the struct ure of the netwo rk in the futu re. Moreo ver , this expe riments also sho ws that parameter estimates do not suf fer much from the zero padding of graph adjacenc y matrix ( i.e. , adding isolated nodes to make G ha ve N k 1 nodes ). Snapsho ts of A S - R O U T E V I E W S at T 1 and T 2 ha ve close to 2 k nodes , while we had to add 26% (1,718) isolated nodes to the network at T 3 to make the number of nodes be 2 k . Regardle ss of this we see the paramete r estimates ˆ Θ remain basically constant ov er time, which seems to be indepe ndent of the number of isol ated nodes added. T h is means that the estimated parameters are not biased too much from zero paddi ng the adjacen cy matrix of G . 6.5 Fitting to other large r eal-world netw orks Last, we present results of ﬁtting Stochastic Kroneck er graphs to 20 large real-wor ld network s: lar ge onlin e social n etworks, lik e E P I N I O N S , F L I C K R and D E LI C I O U S , web and blog gra phs (W E B - N OT R E D A M E , B L O G - NAT 0 5 - 6 M , B L O G - NAT 0 6 A L L ), intern et an d peer- to-peer networks (A S - N E W M A N , G N U T E L L A - 2 5, G N U T E L L A - 3 0), colla boration n etworks of co-auth orships from DBLP (C A - D B L P ) and vari ous areas of physics ( C A - H E P - T H , C A - H E P - P H , C A - G R - Q C ), physics citation networks (C I T - H E P - P H , C I T - H E P - T H ), an email networ k ( E M A I L - I N SI D E ), a protein interaction networ k 44 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 10 -4 10 -3 10 -2 10 -1 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Count Degre, k AS at T 3 Kronecker T 1 Kronecker T 2 10 3 10 4 10 5 10 6 10 7 10 8 0 1 2 3 4 5 6 7 Reachable pairs of nodes, r(h) Number of hops, h AS at T 3 Kronecker T 1 Kronecker T 2 (a) Deg ree (b) Hop plot 10 0 10 1 10 2 10 0 10 1 10 2 Singular value Rank AS at T 3 Kronecker T 1 Kronecker T 2 10 -3 10 -2 10 -1 10 0 10 0 10 1 10 2 10 3 10 4 Network value Rank AS at T 3 Kronecker T 1 Kronecker T 2 (c) Scree plot (d) “Network ” v alue Figure 19: Autono m ou s systems n etwork ove r time ( A S - R O U T E V I E W S ): O v erlayed pa tterns of rea l A S - R O U T E V I E W S networ k at time T 3 and the Kro necker graphs with parameters esti - mated from A S - R O U T E V I E W S at time T 1 and T 2 . Notice good ﬁts which means that paramete rs estimate d on h istoric snapshot s can be us ed to estimate the graph in the fu- ture. B I O - P RO T E I N S , and a bipartite af ﬁ li ation network (authors- to-papers, A T P - G R - Q C ). Refer to ta- ble 5 in the appen dix for th e descr iption and basic pr operties of these ne tworks. They are av ailable for do wnload a t h ttp://sna p.stanford.edu . For eac h dataset we started gradient descent from a random point ( random initiato r m a trix) and ran it for 100 steps. At each step we estimate the likelihoo d and the gradie nt based on 510,000 sampled permutati ons where we discard ﬁrst 10,000 samples to allo w the chain t o b urn-in. T able 4 gi ves the estimated paramete rs, the correspond ing log-like lihoods and the wall clock times. All experime nts were carried out on standard deskto p computer . Notice that the estimat ed initiat or matrices ˆ Θ seem to hav e almost u niv ersal structur e with a lar ge va lue in the top left ent ry , a v ery small v alue at the bottom righ t corner and inte rmediate val ues in the other tw o corners. W e furthe r d iscuss the i m p lications o f such structure of Kronecker initiat or matrix on th e global network structu re in nex t sec tion. Last, Figures 20 and 21 sh ow ov erlays of var ious network proper ties of real and the estimate d synthe tic networks . In addition to the network propert ies we plotted in Figure 18, we also sepa- 45 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I Network N E Estimated MLE par ameters ˆ Θ l ( ˆ Θ) T ime A S - R O U T E V I E W S 6,47 4 26,467 [0 . 9 87 , 0 . 571; 0 . 571 , 0 . 04 9 ] − 1 52,499 8m15s A T P - G R - Q C 19,17 7 26,169 [0 . 902 , 0 . 253; 0 . 221 , 0 . 58 2] − 24 2,493 7 m40s B I O - P ROT E I N S 4,626 29,60 2 [0 . 8 47 , 0 . 641; 0 . 641 , 0 . 072] − 185,130 43m4 1s E M A I L - I N S I D E 986 32,1 28 [0 . 999 , 0 . 77 2; 0 . 772 , 0 . 25 7 ] − 1 07,283 1h07m C A - G R - Q C 5,242 28,98 0 [0 . 9 99 , 0 . 245; 0 . 245 , 0 . 691] − 160,902 14m0 2s A S - N E W M A N 22,96 3 96,872 [0 . 954 , 0 . 594; 0 . 594 , 0 . 01 9] − 59 3,747 28 m48s B L O G - N A T 0 5 - 6 M 31,60 0 271, 377 [0 . 9 99 , 0 . 569; 0 . 502 , 0 . 22 1 ] − 1,99 4,943 47m20 s B L O G - N A T 0 6 A L L 32,443 318,8 15 [0 . 99 9 , 0 . 578; 0 . 517 , 0 . 221] − 2,289,00 9 52m31s C A - H E P - P H 12,008 237 ,010 [0 . 999 , 0 . 437 ; 0 . 437 , 0 . 484] − 1 ,272,629 1h22m C A - H E P - T H 9,877 51,97 1 [0 . 9 99 , 0 . 271; 0 . 271 , 0 . 587] − 343,614 21m1 7s C I T - H E P - P H 30 ,567 3 48,721 [0 . 994 , 0 . 4 39; 0 . 355 , 0 . 526] − 2 ,607,159 51 m26s C I T - H E P - T H 27,770 352,8 07 [0 . 9 9 0 , 0 . 440; 0 . 347 , 0 . 53 8] − 2,507 ,167 15m23s E P I N I O N S 75 ,879 5 08,837 [0 . 999 , 0 . 5 32; 0 . 480 , 0 . 129] − 3 ,817,121 45 m39s G N U T E L L A - 2 5 22,68 7 54,705 [0 . 746 , 0 . 496; 0 . 654 , 0 . 18 3 ] − 5 30,199 16 m22s G N U T E L L A - 3 0 36,68 2 88,328 [0 . 753 , 0 . 489; 0 . 632 , 0 . 17 8 ] − 9 19,235 14 m20s D E L I C I O U S 205,2 82 436 ,735 [0 . 999 , 0 . 327 ; 0 . 348 , 0 . 391] − 4,579, 001 27m51s A N S W E R S 598,3 14 1,83 4,200 [0 . 9 94 , 0 . 384; 0 . 414 , 0 . 24 9 ] − 20 ,508,982 2h35m C A - D B L P 425,957 2,6 96,489 [0 . 999 , 0 . 307 ; 0 . 307 , 0 . 574] − 26,8 13,878 3h01m F L I C K R 584,2 07 3,55 5,115 [0 . 9 99 , 0 . 474; 0 . 485 , 0 . 14 4 ] − 32 ,043,787 4h26 m W E B - N OT R E D A M E 325,729 1,4 97,134 [0 . 999 , 0 . 414 ; 0 . 453 , 0 . 229] − 14,5 88,217 2h59m T able 4: Results of parameter estimation for 20 differ ent networ ks. T able 5 gi ves the description and bas ic properties of the abo ve netw ork datasets. Network s are av ailable for do wnload at http:/ /snap.stanford.edu . rately plot in- and out-d egree dist ributio ns (as both netw orks are directed) and plot the node tr iangle partici pation in panel (c), where w e plot the number of triangle s a node particip ates in versu s the number of suc h nodes. (Again the er ror b ars s how the v ariance of network prop erties o ver dif ferent realiza tions R ( ˆ Θ [ k ] ) of a Stochastic Kronecke r graph . ) Notice that for b oth net works and in all cases the pro perties of the real network a nd t he synth etic Kroneck er coi ncide re ally well. Using Stochast ic Kroneck er gr aphs with just 4 parameters we match the scree plot, deg ree dis trib utions, trian gle partic ipatio n, hop plot and netwo rk v alues. Giv en the pre vious e xperiments f rom the Au tonomous sy stems g raph we only present the r esults for the simplest model with initi ator size N 1 = 2 . E mpir ically we also obse rve that N 1 = 2 gi ves surpri singly good ﬁts and the estimation procedu re is the most rob ust and con v erges the fastest . Using larg er initiato r matric es N 1 > 2 generally he lps impro ve the lik elihood b ut not dra matically . In terms of matching the network proper ties we also gent a slight improv ement by making the model more comple x. Figure 22 giv es the percen t improvemen t in log-l ikelihood as we make the model more comple x. W e use the log-lik elihood of a 2 × 2 model as a baseline and estimate the log- like lihood at the MLE for lar ger initiator matrices. Again, models with more parameters tend to ﬁt better . Ho w e ver , sometimes d ue to zero-pad ding of a grap h adjacenc y matrix the y ac tually hav e lo wer log-like lihood (as for example seen in T able 2). 46 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 Count In-degree, k Real graph Kronecker 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 Count Out-degree, k Real graph Kronecker 10 0 10 1 10 2 10 3 10 4 10 0 10 1 10 2 10 3 10 4 10 5 Count Node triangle participation Real graph Kronecker (a) In-Deg ree (b) Out-de gree (c) Tria ngle par ticipation 10 4 10 5 10 6 10 7 10 8 10 9 0 1 2 3 4 5 6 7 Reachable pairs of nodes, r(h) Number of hops, h Real graph Kronecker 10 0 10 1 10 2 10 3 10 0 10 1 10 2 Singular value Rank Real graph Kronecker 10 -3 10 -2 10 -1 10 0 10 0 10 1 10 2 10 3 10 4 Network value Rank Real graph Kronecker (d) Hop plot (e) Scree plot (f) “Netwo rk” v alue Figure 20: Blog network ( B LO G - N A T 0 6 A L L ): Overlayed patterns of real n etwork and the estimated Kroneck er graph using 4 parameters ( 2 × 2 in itiator matrix). Notice that the Kronec ker graph matche s all proper ties of the real network. 6.6 Scalability Last we also empirical ly e valuat e the scalabilit y of the K RO N F I T . The exp eriment conﬁrms that K RO N F I T runtime scales linearly with the number of edge s E in a grap h G . More precise ly , w e perfor med the followin g exp eriment. W e generat ed a sequence of increasi ngly large r synthetic graphs on N nodes and 8 N edges, and m e asured the time of one iteration of gradien t descent, i.e. , sample 1 million permutation s and e valua te the gradients. W e started with a graph on 1,000 nodes, and ﬁnished with a graph on 8 million nodes, and 64 million edges. Figure 2 3 (a ) sho w s K RO N F I T scales line arly with the size of the network. W e plo t wall-c lock time vs. size of th e graph . The dashed line gi ves a linear ﬁt to the data point s. 7. Discussi o n Here we discus s se veral of the desirab le pr operti es of the prop osed Kroneck er g raphs. Generality : Stochastic Kroneck er graph s inclu de se veral othe r genera tors as spec ial cases : For θ ij = c , w e obtain the classical Erd ˝ os-R ´ enyi random graph model. For θ i,j ∈ { 0 , 1 } , we obtain a deterministic Kroneck er graph. Setting the K 1 matrix to a 2 × 2 matrix, we obtain the R-MA T genera tor (Chakr abarti et al., 200 4 ). In contras t to Kronec ker g raphs, the RMA T ca nnot extrapo late into the futu re, since it needs t o kno w the number of edges t o insert . Thus, it i s incap able of obey ing the densiﬁcat ion po wer la w . 47 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 Count In-degree, k Real graph Kronecker 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 Count Out-degree, k Real graph Kronecker 10 0 10 1 10 2 10 3 10 4 10 5 10 0 10 1 10 2 10 3 10 4 10 5 Count Node triangle participation Real graph Kronecker (a) In-Deg ree (b) Out-de gree (c) Tria ngle par ticipation 10 4 10 5 10 6 10 7 10 8 10 9 10 10 0 1 2 3 4 5 6 7 8 Reachable pairs of nodes, r(h) Number of hops, h Real graph Kronecker 10 1 10 2 10 3 10 0 10 1 10 2 Singular value Rank Real graph Kronecker 10 -3 10 -2 10 -1 10 0 10 0 10 1 10 2 10 3 10 4 Network value Rank Real graph Kronecker (d) Hop plot (e) Scree plot (f) “Netwo rk” v alue Figure 21: E P I N I O N S who-trusts -whom social network: Over layed patterns of real network and the ﬁtte d K r onecker graph using only 4 pa rameters ( 2 × 2 init iator matrix). A g ain, the synthe tic Kroneck er gra ph matches all the properties of the real network. Phase transition phenomena: The Erd ˝ os-R ´ eny i g raphs exhibit phase transition s (Erd ˝ os and R ´ eny i, 1960). S e vera l resear chers argue that real systems are “at the edge of chaos” or phase tran si- tion (Bak , 1996; S o le an d Good w i n, 200 0 ). S t ochastic Kronec ker graphs also exhib it phase transi- tions (Mah dian an d Xu, 2007) fo r the emer gence of the gian t component and an other phase tr ansi- tion for connec tivity . Implications to th e structure of the large-r eal networks: Empirically we found that 2 × 2 initiat or ( N 1 = 2 ) ﬁts well the prop erties of real-world networks. Moreov er , gi ven a 2 × 2 initia tor matrix, one can look at it as a recu rsiv e expansio n of two groups into sub-gr oups. W e introduced this rec ursi ve view of Kron ecker graphs back in se ction 3. S o , one can then interpr et the diagon al v alues of Θ as the prop ortion of edges inside each of the groups, and the of f-diagonal values gi ve the fractio n of ed ges connecting th e gr oups. Figure 24 illustrates the setting for two groups . For e xample, as shown in Figure 24, lar ge a, d and small b, c would imply that the network is compose d of hierar chically neste d communities, where there are many edges inside each communit y and few edges crossin g them (Lesko vec, 2009). One could think of this structure as some kind of or ganizatio nal or uni versit y hierarchy , w he re one ex pects the most friendsh ips between people within same lab, a bit l ess betwee n peopl e in the same department, less acr oss dif ferent departments, and the least friend ships to be formed across people from dif ferent schools of t he un ive rsity . Ho wev er , par ameter estimates for a wid e range of ne tworks pre sented in T able 4 su ggests a v ery dif ferent pi cture of the network structure. Notice that for most n etworks a ≫ b > c ≫ d . Moreov er , a ≈ 1 , b ≈ c ≈ 0 . 6 and d ≈ 0 . 2 . W e empir ically observ ed that the same s tructure of i nitiator matrix 48 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 3x3 4x4 5x5 6x6 −12 −10 −8 −6 −4 −2 0 2 4 6 8 10 Initiator size (number of parameters) Pct. log−likelihood improvement over the 2x2 model Answers Blog−nat06all cit−hep−ph CA−DBLP Delicious Epinions Flickr Figure 22: Percent improv ement in log-lik elihoo d over th e 2 × 2 mo del as we in crease the m o del comple xity (size of initia tor matrix ). In g eneral lar ger initiato r matrices that ha ve mor e deg rees of fr eedom hel p impro ving the ﬁ t o f the m o del. ˆ Θ also ho lds w h en ﬁtting 3 × 3 or 4 × 4 models. Alwa ys the top left ele ment is the larg est and th en the v alues o n the diagon al decay faster tha n off the diag onal (Lesk ovec, 2009). This suggest s a netwo rk struc ture which is also known as cor e-per iphery (Borg atti and Everett, 2000; Holme, 2005), the jellyﬁs h (T auro et al., 2001; S i ganos e t al., 2006), or the octopu s (Chun g an d Lu, 2006) structu re of the networ k as illus trated in Figure 24(c). All of the abov e basically say that the network is composed of a densely link ed network core and the periph ery . In our case this would imp ly the followin g structure of the initia tor m a trix. The core is mod eled by parameter a an d the peripher y by d . Most edges are in side the core (large a ), and ver y few bet w e en the nodes of perip hery (small d ). Then there are many more edges betwee n the core and the periph ery than ins ide the pe riphery ( b, c > d ) (Lesko vec, 2009). This is e xactly what we s ee as well. And in spirit of Kron ecker graphs t he structur e repeat s recursi vely — the core again has the dense co re and the peripher y , and so on. And similarly the p eriphery itsel f has the c ore and the periph ery . This su ggest an “on ion” like n ested cor e-periph ery (Lesko vec et al., 20 08a ,b) networ k struct ure as illust rated in Figure 24(c), where the network is composed of denser and dens er layers as one mov es to wards the center of the network. W e also observ e similar structure of the Kroneck er ini- tiator when ﬁ t ting 3 × 3 or 4 × 4 initiat or matrix. T h e dia gonal elemen ts ha ve la rge but decreas ing v alues with off diagon al element s follo wing same d ecreasing pattern. 49 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I 0 5 10 x 10 6 200 400 600 800 1000 1200 Size of the graph Time (seconds) Linear fit 1 2 3 4 5 6 7 8 9 5 5.5 6 6.5 x 10 4 N 1 (size of initiator matrix) BIC score True N 1 =3 (a) Scalabilit y (b) Model selecti on Figure 23: (a) Processo r time to sample 1 million gradients as the graph gro ws. Notice the algo- rithm scal es linearly wit h the graph size. (b) B I C score for model selection. (a) 2 × 2 i nitiator matrix (b) T wo recursi ve communities (c) Core-perip hery Figure 24: 2 × 2 Kroneck er initiator m a trix (a) can be thoug ht of as two communities where there are a and d edges inside each of the communities and b and c edges cross ing the two communitie s as illustra ted in (b). Each community can then be recursi vely d ivided using the same pattern. (c) The onion like core-perip hery structu re where the network gets denser and denser as we mov e tow ards the center of the netwo rk. One of the implications of this is that networ ks do not break nicely into hierarchical ly org a- nized sets of communities that lend themselv es to graph partitionin g and community detection al- gorith m s . On contrary , this suggest s that lar ge networks can be deco mposed into a densel y linked core with many small periphery pieces hanging of f the core. T h is is in accordan ce with our re cent results (L e skov ec et al . , 2008a,b), that mak e similar observ ation (b ut based on a comp letely diffe r- ent methodolog y bas ed on graph p artitioning ) about the clu stering and community structure of lar ge real-w orld net works. 50 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S 8. Conclusion In concl usion, the main contrib ution of this work is a family of models of netwo rk structure that uses a non-tr aditional m a trix operation , the Kr onec ker pr oduct . T h e resulting graphs (a) ha ve all the static properties (hea vy-tailed degree distrib ution, small diameter , etc.), (b) all the temporal proper ties (densiﬁcatio n, shrinkin g diameter) that are found in real net works. And in addition , (c) we can formally pro ve all of these properties . Sev eral of the proofs are extremel y simple, t hanks to the rich theory of Kronecker multiplicatio n. W e also provi de proofs about the diameter and effe ctiv e diameter , and we sho w that S to chastic Kroneck er gra phs can mimic real graph s w e ll. Moreo ver , we also presen ted K RO N F I T , a fast, scalable algorithm to estimat e S to chastic Kro- neck er initi ator , which can be then used to create a synthe tic graph tha t m i mics th e properties of a gi ven real network . In contr ast to earlier work, our work has the follo wing nove lties: (a) it is among the fe w that estimates the para m e ters of the chose n generator in a princi pled way , (b) it is among the few that has a conc rete measure of goodn ess of the ﬁt (namely , the likelih ood), (c) it av oids the quadrat ic comple xity of computin g the likelihoo d by ex ploiting the prop erties of the Kronecke r graphs , and (d) it av oids the factoria l exp losion of the node correspo ndence problem, by using the M e tropolis sampling . The resulting algorit hm m a tches well a ll the known properties of real graphs , as we sho w with the Epinio ns graph and the AS graph, it scales linea rly on the number of edges, and it is orders of magnitud es f aster than ea rlier graph-ﬁttin g attempts : 20 minutes on a commod ity PC, versu s 2 days on a cluster of 50 works tation s (Bez ´ ako v ´ a et al., 200 6 ). The beneﬁts of ﬁtting a Kroneck er graph mode l into a real graph are sev eral: • Extrapola tion : Once we hav e the Kro necker generator Θ for a giv en r eal matrix G (such that G is mimicked by Θ [ k ] ), a lar ger version of G can be generated by Θ [ k +1] . • Null-model : When analyzin g a real network G one often needs to asses the signiﬁcance of the observ ation. Θ [ k ] that mimics G can be used as an accurate model of G . • Network struc tur e : E s timated paramet ers giv e insi ght into the global network and communit y structu re of the netwo rk. • F or ecastin g : As w e demonstr ated one can obtain Θ from a graph G t at time t such that G i s mimicke d by Θ [ k ] . Then Θ can be used to model the structure of G t + x in the future . • Sampling : Similarly , if we want a real istic sample of the real grap h, we coul d use a smaller exp onent in the Krone cker e xponentiat ion, like Θ [ k − 1] . • Anonymization : Since Θ [ k ] mimics G , we can publish Θ [ k ] , without rev ealing information about the nodes of the real graph G . Future wo rk could includ e exten sions of Kro necker graphs to e v olving networks. W e en vision formulat ing a dynamic Bayesian network w it h ﬁrst order Mark ov depend encies, w h ere parameter matrix at time t depends on th e graph G t at cu rrent time t and the parameter matrix at time t − 1 . Giv en a s eries of networ k snapsho ts one wo uld the n aim to e stimate initia tor matrices at indi vidua l time steps and the parameters of the model gover ning the ev olution of the initiator matrix. W e 51 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I exp ect that based on the ev olution of in itiator matrix one wo uld gain gre ater insight in the e v olution of lar ge n etworks. Second direc tion for future wor k is to e xplore connecti ons between Kronec ker g raphs and Ran- dom Dot Product graphs (Y oung and Scheinerman, 2 007 ; Nickel, 2008). T h is also nicel y connects with the “attrib ute view” of Kroneck er graphs as described in Section 3.5. It would be interest ing to desig n methods to estimate the indi vidual node attrib ute values as well as the attr ibute- attribu t e similarit y matri x (i.e., the initiato r matrix). As for s ome netwo rks node attrib utes are already gi ven one could then try to infer “hidden” or missing no de attrib ute va lues and this wa y gain insight into indi vidual nodes as well as indiv idual edge format ions. Moreov er , this would be intere sting as one could further e valuat e ho w realistic is the “ attribu te vie w” of Kro necker graphs. Last, we also mention possible exte nsions of K r onecker graph s for m o deling w ei ghted and labele d networks . Currently Stochas tic Kroneck er graphs use a Bernoulli edge generati on model, i.e. , an entry o f big matrix P encod es the par ameter of a Bernou lli coin. In similar spiri t one could consid er entries of P to encod e parameters of dif ferent edge generati ve proce sses. For example, to genera te ne tworks with w e ights on edges an entry of P cou ld encode the parameter of an expo nential distrib ution , or in case of labe led networks one could use sev eral initia tor matrices in para llel and this way en code parameters of a multinomial distrib ution ov er diffe rent no de attrib ute val ues. Refer ences Network data . http:/ /www- personal.umich.edu/ ˜ mejn/ netdata , July 16, 2007. R. A l bert and A.-L. Barab ´ asi. Statistical m e chanics of complex network s. Revie ws of Modern Physics , 74(1) :47–97, 2002. R. Albert, H. Jeong, and A.-L. Barab ´ asi. Diameter of th e world -wide web . Natur e , 401:130– 131, September 1999 . L. Backstrom, D. Huttenlocher , J. Kleinber g, and X. L an . Group formation in lar ge social net- works : membership , gr owth, an d e volut ion. In KDD ’06: Pr ocee dings of the 12th ACM SIGKDD intern ational confer ence on Knowledg e discov ery and data mining , pages 44–54, 2006. ISBN 1-595 93-33 9-5. P . Bak. H o w Natur e W orks: The Science of Self-Or gan ized Criticality . Springer , September 1996. A.-L. Barab ´ asi and R. A l bert. Emerg ence of scaling in random netw orks. Science , 286:5 09–512, 1999. A.-L. Barab ´ asi, E . Rav asz, and T . V icsek. Deterministic scale-fr ee netw orks. Physic a A , 299: 559–5 64, 2001. I. Bez ´ ako v ´ a, A. K a lai, and R. S an thanam. Graph model selection using maximum like lihood. In ICML ’06: Pr ocee dings of the 23r d internation al confer ence on Machine learning , pages 105– 112, 2006. Z. Bi, C. Falouts os, and F . Ko rn. The DGX d istribu tion for mining mass ive , ske wed data. In KDD ’01: Pr oceedings of the 6th A C M SIGKDD internati onal confer ence on Knowledg e disco very and data mining , pages 17–26 , 2001. 52 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S A. Blum, H. C h an, and M . Rwebangira. A random-su rfer web-graph m o del. In A N ALCO ’06: Pr oceedin gs of the 3r d W orkshop on Analytic Algorithmics and Combinatorics , 2006. S. P . Bor gatti and M. G . Everett. Models of core/periphe ry struc tures. Social Networks , 21(4): 375–3 95, 2000. A. Bottreau and Y . Metivier . Some remarks on the krone cker product of graphs. Informatio n Pr ocessin g Letters , 68(2) :55 – 61, 1998 . A. Broder , R . Kumar , F . M a ghoul, P . Ragha va n, S. Rajagopalan, R . Stata, A. T omkins, and J. W iener . Graph structure in the w e b: exper iments and models. In WWW ’00: P r oceedings of the 9th intern ational confe ren ce on W orld W ide W eb , 2000. T . Bu and D. F . T owsle y . On distingu ishing between interne t po w er law topology genera tors. In INFOCOM , 2002 . C. T . Butts. Permutation m o dels for relational data. (tech. rep. mbs 05-02, Uni v . of Californi a, Irvine , 2005. J. M. Carlson and J. Doyle. Highly optimize d tolera nce: a mecha nism for power laws in designed systems. Physical Review E , 60 (2):1412–1 427, 1999 . D. Chakrab arti, Y . Zhan, and C. Falout sos. R-mat: A recursi ve model for graph mining. In SDM ’04: SIAM Confer ence on D a ta Mini ng , 20 04. T . Cho w . The Q - spectrum and span ning trees of tensor produc ts of bipartite graph s. P r o c. Amer . Math. Soc. , 125:31 55–3161, 1997. F . R. K. Chung and L. L u . Comple x Graphs and Networks , v olume 107 of CBMS Re giona l Confer - ence Series in Mathematics . American Mathematical Society , 200 6. F . R. K . Chung, L. L u, and V . V u. Eigen v alues of ran dom power la w graph s. Annals of Combina- torics , 7:21–33, 2 003. A. C la uset, C. R. Shali zi, and M. E. J. N e wman. Power -law distrib utions in empi rical data. ArXiv , arXi v:0706.1062, Jun 2 007. A. Claus et, C. Moore, and M. E. J. Ne wman. Hierarchical structure and the pred iction of missing links in netwo rks. Natur e , 453(7 191):98–101, 2008. V . C o lizza, A. F la m min i, M. A. S er rano, and A. V espignani. Charac terization and modeling of protei n protei n interaction netw orks. P hysica A Statistic al Mechani cs and its Applicatio ns , 352: 1–27, 2005 . M. E. Crov ella a nd A. Besta vros. Self-simila rity in W orld W ide W eb traf ﬁc: evid ence and poss ible causes . IEEE /A CM T ransac tions on Networkin g , 5(6):8 35–846, 1997. S. Dill, R. Kumar , K. S. Mccurle y , S. Rajagopala n, D . Siv akumar , and A. T omk ins. Self-simil arity in the web . ACM T rans. Inter et T echnolo gy , 2(3 ):205–223, 2002 . ISSN 1533- 5399. 53 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I S. N. D o rogovtse v , A. V . G o ltsev , and J. F . F . Mendes. Pseudo fractal scale-fr ee web . Physical Revie w E , 65(6):06 6122, Jun 2002. B. Efron. Deﬁning the curv ature of a statistica l problem (with applicatio ns to second order efﬁ- cienc y). A n n. Statist. , 3(6):1189– 1242, 1975. P . Erd ˝ os and A. R ´ eny i. On the ev olution of random graphs . Publica tion of the Mathematic al Insti tute of the Hungari an Acadamy of Scienc e , 5:17–6 7, 1960. A. Fabrikant, E. Kouts oupias, and C. H. P apadimitriou. H e uristically optimized trad e-offs: A ne w paradi gm for po wer laws in the intern et. In ICALP ’02: Pr oceedings of the 29th Interna tional Colloqui um on Auto m a ta, Lang uages, and Pr ogramming , v olume 2380, 2002. M. Fa loutsos, P . Faloutso s, and C. Faloutsos . On po wer-l aw relationsh ips of the internet topolog y . In SIGCOMM ’99: Pr oceed ings of the confer ence on Applicati ons, techno logies, ar chite ctur es, and pr otoc ols for computer communicati on , pages 251–262 , 1999. I. Farkas , I. Der ´ eni, A.-L. Barab ´ asi, and T . V icsek. S p ectra of “real-world” graphs: beyond the semicircl e law . Physical Revie w E , 64(02670 4), 2001. A. D. Flaxman, A . M . Frieze, and J. V era. A geometric preferential attachment model of networks II. In W A W ’07: Pr oceedings of the 5th W orks hop On Algorithms And Models F or T he W eb-Gra ph , pages 41–55, 2007 . D. Gamerman. Markov chain Monte Carl o, Stoc hastic simulation fo r Bayesia n infer ence . Chapman & Hall, London , 1997. J. Gehrke, P . Ginspar g, and J. M. Klein berg. O v ervie w of the 20 03 kdd c up. SIGKDD E x ploration s , 5(2):1 49–151, 2003. A. Gelman, J. B. Carlin, H. S. Stern, and D.B. Rubin. Bayesian Data Analysis , Second Edition . Chapman & Hall, London, July 2003. ISBN 1584883 88X. R. H. Hammack. Proof of a conject ure concern ing the direct produc t of bipartit e graphs. Eur opean J ournal of Combinator ics , 30(5):11 14 – 1118, 2009. ISSN 0195-6 698. Part Special Issue on Metric Graph Theory . P . Holme. C o re-periphery or ganization of comple x netwo rks. Physical Revie w E , 72:046111 , 2005. W . Imrich. Factor ing cardinal product graphs in polynomia l time. Discr ete Mathematics , 192(1-3): 119–1 44, 1998. ISSN 0012-365X. W . Imrich and S. K la v ˇ zar . Pr oduct G r aphs: Structu re and Recogn ition . Wi ley , 2000 . ISB N 1-4020- 0489- 3. J. M. Kleinber g. The small-w orld phenomenon: an algorithmic perspecti ve. T echnical Report 99-17 76, Cornell Compu ter S ci ence Depa rtment, 1 999. J. M. Kleinber g, S. R. Kumar , P . Ragha van, S. Rajagopala n, and A . T omkins. The web as a gr aph: Measureme nts, models and method s. In COCOON ’99: Pr oceedings of the Intern ational C o n- fer ence on Combinator ics and Computin g , 199 9. 54 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S K. Klemm and V . M. Egu ´ ıluz. Highly clustered scale-free networks . Phys. Rev . E , 65(3):0361 23, Feb 2002. S. Kullback and R. A . Leibler . O n informati on and sufﬁcie ncy . Annals of Mathematical Sta tistics , 22(1): 79–86, 1951. R. Kumar , P . Ragha va n, S. Rajagopal an, D. Siv akumar , A. T omkins, and E. Upfal. S to chastic models for the web graph. In FOCS ’00: Pr oceeding s of the 41st Annua l Symposiu m on F ound ations of Computer Scien ce , page 57, 2000. R. Ku m a r , J. Nov ak, and A. T omkins. Structure and e volu tion of onlin e social netwo rks. In KD D ’06: Pr oceedi ngs of the 12th ACM SIGKDD internatio nal confer ence on Knowledge disco very and data mining , pages 611–6 17, 2006. S. R. Kumar , P . R a ghav an, S. R a jagopalan, and A. T omkin s. Extracting lar ge-scale kno wledge bases from the web . In Pr oceedings of the 25th V LDB C o nfer ence , Edinb ur gh, Scotland , 1999 . A. N. L a ngville and W . J. Ste wart. The Kroneck er produ ct and stoch astic automata netwo rks. J ournal of Compu tation and A p plied Math ematics , 16 7:429–447 , 2004. J. Les ko vec. Networks, communiti es and kroneck er products . In CN I K M ’09: C omplex Network s in Infor m a tion and Knowledge Mana gement , 2009. J. Leskov ec and C. F aloutsos. Scalable mode ling of real gr aphs using kronecke r multiplication . In ICML ’07: Pr oceedings of the 24th Internatio nal Confer enc e on Machin e Lear ning , 200 7. J. Lesko vec, D. Chakrabarti, J. M. Klein berg, and C. Fa loutso s. Realistic, mathematically tr actable graph generation and e volut ion, using krone cker multiplicati on. In PKDD ’05: Pr ocee dings of the 9th Eur opean Con fer ence on Prin ciples and Pract ice of Knowledg e Disco very in Databas es , pages 133–14 5, 2005a . J. Lesko vec, J. M . Kleinber g, and C. Faloutsos . Graphs over time: densiﬁcatio n laws, shrinking diameter s and possible expl anations. In KDD ’05: Pr oceedi ng of the 11th ACM SIGKDD inter - nation al confer ence on Knowledge disco very in data mining , page s 177– 187, 2005b. J. L es kov ec, J. M. Kleinber g, and C . Faloutsos. Graph ev olutio n: Densiﬁcation and shrinking diameter s. ACM T ransactio ns on Knowledg e Disco very fr o m Data (TKD D ) , 1(1):2, 2007a. J. Lesk ovec, M. McGlohon, C. Faloutsos , N. G la nce, and M. Hurst. Cascading beha vior in lar ge blog graph s. In SDM ’07: Pr oceed ings of the SIAM Confer ence on Data Mining , 2007b. J. Lesko vec, K. J. Lang, A. Dasgupta, and M. W . Mahoney . Statistical properties of community structu re in lar ge so cial and information netwo rks. In W WW ’08: Pr oceedings of the 17th Inter - nation al Confer ence on W orld W ide W eb , 2008a. J. Leskov ec, K. J. Lang , A. Dasgupta, and M. W . Mah oney . Community structure in lar ge networ ks: Natural clu ster siz es and the absen ce of large well-de ﬁ ne d clust ers. ArXiv , arXi v:0810.135 5 , Oct 2008b . C. F . V an Loan. T h e ubiquitous Kronec ker prod uct. J ournal of Computat ion and Applied Mathe- matics , 123:85 –100, 2000. 55 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I M. Mahdian and Y . Xu. Stochastic kronecke r graphs. In W A W ’07: Pr oceed ings of the 5th W orkshop On Algorithms And Models F or T h e W eb -Graph , pages 179–1 86, 2007. M. Mihail and C. H. Papa dimitriou. On the eigen value power la w . In RAN D OM , pages 254–2 62, 2002. S. Milgram. The small-world proble m . Psycholo gy T oday , 2: 60–67, 1967 . C. L. M. Nickel. Random dot product graphs: A model for social networks. P h . D. The sis, Dep t. o f Applied Mathematics and Statistics , Johns Hopkins Uni versity , 20 08. C. R . Palmer , P . B. Gibbons, and C. Faloutsos. A n f: a fast and scalable tool for data mining in massi ve graph s. In KDD ’02: Pr oceedings of the 8th A C M SIGKDD intern ational confe ren ce on Knowledg e disc ove ry and data mining , pages 81–90, 2002. D. M. Pennock , G. W . Flake, S. Lawren ce, E . J. Glov er , and C. L. Giles. Wi nners don’ t take all: Characte rizing the competitio n for links on the Web . Pr oceedin gs of the National Academy of Scienc es , 99(8) :5207–5211, 2002. V . V . Petro v . Limit Theor ems of Pr obability Theory . Oxford Uni versity Press, 1995. E. Rav asz and A .-L. Barab ´ asi. Hierarchical or ganiza tion in co mplex networks. P h ysical Rev iew E , 67(2): 026112, 2003. E. Rav asz, A. L. Somera, D. A. Mongru, Z. N . Oltv ai, and A.-L. Barab ´ asi. Hierarchical o rgani zation of modula rity in metabolic networ ks. Scienc e , 297(55 86):1551–1 5 5 5 , 2002. S. Redner . How po pular is your paper? an empiric al study of the citation distri bution . Eur opean Physical J ournal B , 4:131–134, 1998. M. R i chardson, R. Agr awal, and P . Doming os. Trust management for the semantic web . In ISWC , 2003. M. Ripeanu, I. Foster , and A. Iamnitchi . Mapping the gnute lla network: Properti es of lar ge-scale peer -to-peer systems and implicati ons for system desi gn. IEEE Internet Computing Jo urnal , 6 (1):50 –57, Jan/Feb 2002 . J. Rissanen . Modelling by the shortest data descript ion. Automatic a , 14:465 –471, 1978 . RouteV ie ws. U n ive rsity of O r egon Route V ie ws Project. Online data and reports . http: //www.rou t e v i e w s . o r g , 1997. M. S a les-Pardo, R. Guimera, A. A. Moreira, and L. A . Amaral. Extracti ng the hierarc hical or- ganiza tion of complex systems. Pr oceedings of the National A c ademy of Scienc es , 104(39): 15224 –15229, September 2007. G. Schwarz . Estimating the dimens ion of a model. The Annals of Statistics , 6(2):461– 464, 1978. G. S ig anos, S. L. T auro, and M. Falout sos. Jellyﬁsh: A conceptual model for th e as in ternet topol- ogy . Jou rnal of Communicat ions an d Networks , 8:3 39–350, 2006 . 56 K RO N E C K E R G R A P H S : A N A P P ROA CH T O M O D E L I N G N E T W O R K S R. Sole and B. Goo dwin. Signs of Life : How Comple xity P ervades Biology . Perseus Books Gro up, New Y ork, NY , 20 00. S. L. T auro, C. Palmer , G. Siganos , and M. F aloutsos. A simple con ceptual model for the interne t topolo gy . In GLOBEC O M ’01: Global T eleco mm un ications C o nfer ence , v olume 3, pages 1667 – 1671, 2001 . C. E . Tsour akakis. Fast counting of triangles in lar ge real networks , without counting: Algorithms and laws. In ICDM ’08 : IEEE Interna tional Confer ence on D a ta Mining , 2008. A. V ´ azqu ez. Gro wing netw ork w i th local rules: Preferen tial attachmen t, cluste ring hierarchy , and deg ree corr elations. Phys. Rev . E , 67(5):0561 04, May 2 003. S. W as serman an d P . Pattison . Logit models and logistic r egression s for soc ial netw orks. P s ychome- trika , 60:40 1–425, 1996. D. J. W atts an d S . H. Strogatz. Collecti ve dynamics of ’ small-world’ network s. Natur e , 393:440– 442, 1998. B. M. W axman . Routing of multipo int connectio ns. IEEE Jou rnal on Selected Ar eas in Communi- cation s , 6(9):1 617–1 622, December 1988. P . M. W eichsel. The kronecke r produc t of gr aphs. In American Math ematical Society , v olume 13, pages 37–52, 1962 . J. W inick and S . J amin. Inet-3 .0: Internet Top ology G en erator . T echni cal Report CSE-TR-456-02, Uni versity of Michigan , Ann Arbor , 2002. C. Wi uf, M. Brameie r , O. Hagbe rg, and M. P . Stumpf. A likelih ood app roach to analysis of network data. Pr oceedings of the National Academy of Sciences , 103(20):7 566–7570, 2006. Stephen J. Y oung and Edward R . S c heinerman. Random dot prod uct graph models for s ocial net- works . In W A W ’07: Pr oceedin gs of the 5th W orkshop On Algorithms And Models F or The W eb-Gra ph , pa ges 138 –149, 20 07. E. Z he lev a, H. S h arara, and L. Getoor . Co-e volu tion of socia l and af ﬁliation networks . In KDD , pages 1007–1 016, 2009. Ap pendix A. T able of netw orks T able 5 lists all the network datasets that were used in this paper . W e also compute d some of the structu ral netw ork properti es. M o st of the netwo rks a re av ailabl e for do w n load from h ttp://sna p.stanford.edu . 57 L E S K OV E C , C H A K R A BA RT I , K L E I N B E R G , F A L O U T S O S , A N D G H A H R A M A N I Network N E N c N c / N ¯ C D ¯ D Description Social netwo r k s A N S W E R S 598,31 4 1,834,200 488,484 0.82 0.11 22 5.72 Y aho o! Answers social netw ork (Lesk ovec et al., 2008 a ) D E L I C I O U S 205,282 4 36,735 147,567 0.72 0.3 24 6.28 del.icio.u s social network (Leskov ec et al., 2 008a ) E M A I L - I N S I D E 986 32,128 986 1.00 0.45 7 2.6 European research organization email netw ork (Lesk ovec et al., 2007a) E P I N I O N S 75,879 50 8,837 75,87 7 1.00 0.26 15 4.27 Who-trusts-whom graph of epinions.com (Richardson et al., 2003) F L I C K R 584,207 3,555,115 404,733 0.69 0.4 18 5.42 F lickr photo sharing social network (Kumar et al., 20 06 ) Information (citation) networks B L O G - NAT 0 5 - 6 M 31,600 271,37 7 29,150 0.92 0.24 10 3.4 Blog-to-blog citation network (6 months of data) (Lesko vec et al., 2007b) B L O G - NAT 0 6 A L L 32,443 31 8,815 32,38 4 1.00 0.2 18 3.94 Bl o g-to-blog citation ne t w ork (1 year of data) ( Lesk ovec et al., 20 07b ) C I T - H E P - P H 30,567 34 8,721 34,40 1 1.13 0.3 14 4.33 Ci tation ne t w ork of ArXiv hep-th papers (Gehrk e et al., 2003) C I T - H E P - T H 27,770 35 2,807 27,40 0 0.99 0.33 15 4.2 Citations network of ArXi v hep-ph papers (Gehrke et al., 2003) Collaboration netwo rks C A - D B L P 425,957 2,696,489 317,080 0.74 0.73 23 6.75 DBLP co-authorship network (Backstrom et al., 2006) C A - G R - Q C 5,242 28,980 4,158 0.79 0.66 17 6.1 Co-authorship network in gr-qc category of ArXi v (Leskov ec et al. , 200 5b ) C A - H E P - P H 12,008 23 7,010 11,20 4 0.93 0.69 13 4.71 Co-authorship netwo r k in hep-ph category of ArXiv (Lesk ovec et al., 2005b) C A - H E P - T H 9,877 51,971 8,638 0.87 0.58 18 5.96 Co-authorship network in hep-th category of ArXiv (Lesk ovec et al., 2005b) W eb grap hs W E B - N O T R E DA M E 325,729 1,497,134 325,729 1.00 0.47 46 7.22 W eb g r a ph of Uni versity of Notre Dame (Albert et al., 1999) Internet networks A S - N E W M A N 22,963 96,872 22,963 1.00 0.35 11 3.83 AS graph from Newma n (ne w, July 16, 2007) A S - R O U T E V I E W S 6,474 26,467 6,474 1.00 0.4 9 3 . 7 2 AS from Oregon Route V iew (Lesko vec et al., 2005b) G N U T E L L A - 2 5 22,687 54,705 22,663 1.00 0.01 11 5.57 Gnutella P2P network on 3/25 2000 (Ripeanu et al., 2002) G N U T E L L A - 3 0 36,682 88,328 36,646 1.00 0.01 11 5.75 Gnutella P2P network on 3/30 2000 (Ripeanu et al., 2002) Bi-partite networks A T P - G R - Q C 19,177 26,169 14 ,832 0.77 0 35 11.08 Afﬁliation network of gr-qc category in ArXiv (Lesk ovec et al., 2007b) Biological networks B I O - P R OT E I N S 4,626 29,602 4,626 1.00 0.12 12 4.24 Y east p r o tei n interaction netw ork (Colizza et a l ., 2 005 ) T able 5: Network datas ets w e analyzed. Statistics of networ ks we consider: number of nodes N ; number of edges E , number of nodes in lar gest connected component N c , fraction of nodes in lar gest connected compone nt N c / N , av erage cluster ing coefﬁcie nt ¯ C ; diameter D , an d a verag e path leng th ¯ D . Networks are a vail able for do wnload at ht tp://snap .stanford.edu . 58

Kronecker Graphs: An Approach to Modeling Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment