PAFit: an R Package for the Non-Parametric Estimation of Preferential Attachment and Node Fitness in Temporal Complex Networks

Many real-world systems are profitably described as complex networks that grow over time. Preferential attachment and node fitness are two simple growth mechanisms that not only explain certain structural properties commonly observed in real-world sy…

Authors: Thong Pham, Paul Sheridan, Hidetoshi Shimodaira

PAFit: an R Package for the Non-Parametric Estimation of Preferential   Attachment and Node Fitness in Temporal Complex Networks
P AFit : an R P ac k age for the Non-P arametric Estimation of Preferen tial A ttac hmen t and No de Fitness in T emp oral Complex Net w orks Thong Pham RIKEN Cen ter for AIP P aul Sheridan Hirosaki Univ ersity Hidetoshi Shimo daira Ky oto Universit y RIKEN Cen ter for AIP Abstract Man y real-w orld systems are profitably describ ed as complex net w orks that gro w o v er time. Preferen tial attac hmen t and no de fitness are tw o simple growth mechanisms that not only explain certain structural properties commonly observed in real-worl d systems, but are also tied to a num b er of applications in mo deling and inference. While there are statistical pac k ages for estimating v arious parame tric forms of the preferen tial attachmen t function, there is no such pack age implementing non-parametric estimation procedures. The non-parametric approach to the estimation of the preferential attachmen t function allo ws for comparativ ely finer-grained in v estigations of the ‘rich-get -ric her’ phenomenon that could lead to nov el insights in the search to explain certain nonstandard structural properties observed in real-world net works. This pap er introduces the R pack age P AFit , whic h implements non-parametric pro cedures for estimating the preferential attachmen t function and no de fitnesses in a gro wing netw ork, as well as a num b er of functions for generating complex net works from these t wo mec hanisms. The main computational part of the pack age is implemented in C++ with OpenMP to ensure scalabilit y to large-scale net w orks. In this pap er, we first in troduce the main funct ionalities of P AFit through sim ulated examples, and then use the pac k age to analyze a collaboration net work betw een scien tists in the field of complex netw orks. The results indicate the joint presence of ‘rich- get-ric her’ and ‘fit-get-ric her’ phenomena in the collab oration netw ork. The estimated attac hmen t function is observ ed to b e near-linear, which w e in terpret as meaning that the c hance an author gets a new collab orator is proportional to their current n umber of collaborators. F urthermore, the estimated author fitnesses reveal a host of familiar faces from the complex netw orks comm unity among the field’s topmost fittest netw ork scien tists. Keywor ds : temporal net works , dynamic netw orks, preferen tial attachmen t, fitness, ric h-get- ric her, fit-get-richer, R , C++ , Rcpp , Op enMP . 1. In tro duction Since the end of the last century , complex netw orks hav e b een increasingly used in mo deling man y temporal relati ons found in div erse field s ( Dorogovtsev and Mendes 2003 ; Caldarelli 2007 ; Newman 2010 ). Some notable examples include collaboration net works b et ween au- thors in a scien tific field ( Newman 2001 ), connection net works b etw een computers on the In ternet ( Barab´ asi et al. 2000 ), and sexual relation netw orks b et ween members of a com- 2 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness m unity ( Liljeros et al. 2001 ). The primary motiv ation for using complex net works as a simplified represen tation of real-w orld systems is that they shed light on the b eha viors of complex systems through the study of underlying patterns of connections. Although this is an ov er-simplification for systems dep ending heavily on domain-specific details, this approac h nev ertheless offers a first view of a system’s top ological prop erties, and can b e used to guide subsequen t in-depth analyses. Among the most important real-w orld net work structural prop erties is degree distribution. Degree distribution lets us understand the prop ortion of highly and lowly connected nodes in a netw ork. Since the highly connected no des are key components of a net work, this under- standing in turn sheds ligh t on the answ ers of imp ortant practical questions, including ho w to prev ent the spreading of rumors ( Neko vee et al. 2007 ), how to stop a virus outbreak ( P astor- Satorras and V espignani 2001 ), and ho w to guard against cyb ernetic attac ks ( Alb ert et al. 2000 ). The degree distributions of man y real-world net w orks ha ve been found to b e hea vy-tailed ( Al- b ert and Barab´ asi 1999 ). The b est-kno wn heavy- tailed distribution in net w ork science is the p o wer-la w, whic h is a distribution where the n umber of nodes in a net w ork with degree k is prop ortional to k − γ with γ > 1. Besides the p o wer-la w, there is emerging evidence that real-world net work degree distributions ha ve other heavy-tailed forms, including the log- normal ( Redner 2005 ), exponential ( Dunne et al. 2002 ), stretched exp onen tial ( Newman et al. 2002 ), and p o wer-la w with exp onen tial cut-off ( Clauset et al. 2009 ). All of these hea vy-tailed distributions differ from the ligh t-tailed binomial degree distribution, whic h is c haracteristic of netw orks pro duced by the classical Erd ¨ os-R ´ en yi (ER) random graph mo del ( Erd ¨ os and R ´ en yi 1959 ). This observ ation prompted net work scien tists to searc h for new mo deling ingredients capable of explaining hea vy-tailed degree distributions. What they found is that tem p oral complex net w ork mo dels incorp orating gro wth mec hanisms offer a p o werful mo deling framew ork for achieving this end. T emp oral complex netw ork mo dels, or temporal net work models for short, are probabilistic generativ e mo dels of real-w orld netw orks that c hange with time. In its most common form, a temp oral net work model assumes that a netw ork grows gradu ally from some initial state b y the addition of new no des and edges o v er a large num b er of discrete time-steps. Some well-kno wn basic models in the field of complex net works are the Barab´ asi-Alb ert (BA) mo del ( Alb ert and Barab´ asi 1999 ) and the Bianconi-Barab´ asi (BB) mo del ( Bianconni and Barab´ asi 2001 ). More complex growth models that are used in the field include exponential random graph mo dels ( Ripley et al. 2018 ; Krivitsky and Handcock 2018b ) and d ynamic stochasti c block mo dels ( Matias and Miele 2016 ). Gro wth mec hanisms, whic h gov ern ho w a no de acquir es new edges in the gro wth pro cess, are the most important elemen ts that distinguish differen t temp oral netw ork mo dels. This pap er focuses on estimating tw o in terpretable gro wth mec hanisms: preferen tial attac h- men t (P A) and no de fitness. In the P A mechanism, the probabilit y P i ( t ) that a no de v i acquires a new edge at time t is prop ortional to a p ositiv e function, A k i ( t ) , of its curren t degree k i ( t ). The function A k is called the attachmen t function. The name ‘preferen tial attac hment’ stems from the motiv ation for the mec hanism: if A k is an increasing functi on, a highly connected no de will acquire more edges than a lowly-connected no de, which is an app ealing prop erty in man y real-w orld situations. F rom now, w e will sa y that P A exists if A k is an increasing function. The opposite of P A, whic h w e call anti-P A, occurs when A k is Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 3 a decreasing function. Note, how ev er, that the meaning we use here differs from the original meaning of the term ‘preferen tial attac hmen t’ used in the BA mo del, whic h means only the linear case of A k = k . This linear form in fact has b een long known in other fields with v arious names suc h as ‘rich-get -ric her’ ( Simon 1955 ) and ‘cumulativ e adv antage’ ( Price 1976 ). When A k assumes the log-linear form of k α , with α called the attac hment exp onen t, w e ha ve the generalized BA model ( Krapivsky et al. 2001 ). In contrast with the P A mechanism, in the fitness mec hanism the probab ilit y P i ( t ) that a no de v i acquires a new edge dep ends only on the p ositiv e num b er η i . The quantit y η i is called the node fitness, or just fitness, of v i and can b e in terpreted as its in trinsic attractiv eness. The fitness mec hanism offers a simple wa y to express the v ariance in edge-acquisition potential b et ween nodes of the same degree. F or example, t wo early-career scien tists with roughly the same num b er of collaborators at some p oin t in time ma y acquire different n um b ers of collab orators in the future based on their intrinsic fitnesses. The P A and no de fitness mec hanisms combine to produce a wide range of degree distributions. In their com bined form, the probability P i ( t ) is prop ortional to the pro duct of A k i ( t ) and η i : P i ( t ) ∝ A k i ( t ) × η i . (1) Based on the functional form of A k and the distribution of η i ’s, the model defined b y Equa- tion 1 can pro duce netw orks with v arious degree distributions ( Bianconni and Barab´ asi 2001 ; Caldarelli et al. 2002 ; Borgs et al. 2007 ; Kong et al. 2008 ). In Section 2 , we will discuss the relation of Equation 1 with existing statistical mo dels. Equation 1 has a num b er of applications. Based on the functional forms of A k and η i , w e can test for the presence of one and/or the other of the ‘ric h-get-richer’ and ‘fit-get-richer ’ phenom- ena in a temp oral net work ( Pham et al. 2016 ). These t wo mec hanisms ha ve b een adv anced to explain another phenomenon called the ‘generalized friendship paradox’ ( F eld 1991 ; Eom and Jo 2014 ; Momeni and Rabbat 2015 ). They are also used in inference problems in biological net works ( Sheridan et al. 2010 ; Guetz and Holmes 2011 ), the W orld Wide W eb ( Kong et al. 2008 ), In ternet top ology graphs ( Bez´ ak o v´ a et al. 2006 ), and citation netw orks ( W ang et al. 2013 ; Sinatra et al. 2016 ; Ronda-Pup o and Pham 2018 ). Finally , w e can classify real-w orld temp oral net work data based on the estimated attac hmen t exponent of A k ( Kunegis et al. 2013 ). While there are existing R pac k ages that estimate P A in a gro wing netw ork, including the pac k ages tergm ( Krivitsky and Handco ck 2018b ) and RSiena ( Ripley et al. 2018 ). These pac k ages, ho wev er, rely on parametric metho ds to estimated the A k function. This means that one has to assume a functional form for A k , rather than learning it from observed data without constraint. Non-parametric estimation of A k allo ws for a finer inspection of the ‘ric h-get-richer’ phenomenon ( Pham et al. 2015 , 2016 ), and such metho ds hav e b een used to pro vide clues to explain irregularities observ ed in real-w orld degree distributions ( Sheridan and Ono dera 2018 ). This pap er in tro duces the R pack age P AFit ( Pham et al. 2018 ), whic h fills the gap with an implemen tation of the standard P A and node fitness non-parametric estimation procedures. In particular, w e implement Jeong’s met ho d ( Jeong et al. 2003 ), Newman’s metho d ( Newman 2001 ) and the P AFit method ( Pham et al. 2015 , 2016 ) in the pack age. The first t w o are heuris- tic metho ds that are widely used to estimate non-parametrically the attachmen t function A k in isolation, while the last one is a statistical metho d that can non-parametrically estimate 4 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness either A k (or η i ) in isolation or simultaneously estimate the t wo mechani sms. Although using P AFit is advisable in almost every circumstance, Jeong’s metho d and Newman’s metho d are still widely used and migh t still b e appropriate in certai n situations. Therefore, the inclu- sion of the tw o heuristic metho ds in th e pack age is warran ted. W e discuss their strengths and shortcomings in Section 2 when we provide an o verview of the metho dology and related statistical mo dels. The pack age also implements a v ariet y of functions to sim ulate temp oral netw orks from the P A and node fitness mec hanisms, as w ell as fu nctions to pl ot the estim ated results and underlying uncertain ties. W e review P AFit ’s main functions in Section 3 . Before demonstrating their usages on three simulated examples in Section 5 , we discuss ho w P AFit relates with existing net work analysis pac k ages in Section 4 . W e pro vide a systematic simulat ion to asses the results of the non-parametric join t estimation in Section 6 , before sho wing a complete end- to-end w ork-flo w analyzing a collab oration netw ork of scien tists from the field of complex net works in Section 7 . Finally , concluding remarks are given in Section 8 . 2. Mathematical bac kground Here we review the standard methods for estimating the attachmen t function A k and no de fitnesses η i in a temp oral net work. In Section 2.1 , w e state the net work gro wth mo del used in the pack age as w ell as discuss its relation with exiting statistical mo dels. W e review the estimation of A k in isolation in Section 2.2 , then the estimation of the η i ’s in isolation in Section 2.3 , and finally the joint estimation of A k and the η i ’s in Section 2.4 . 2.1. Net w ork mo del First w e describe the General T emp oral (GT) mo del ( Pham et al. 2016 ) used in P AFit . The mo del is a generalization of man y w ell-kno wn temp oral netw ork models in the complex net work field. Starting from some giv en initial graph G 0 , the GT mo del generates a temporal netw ork sequen tially as follo ws: at time-step t ≥ 1, the net work G t is obtained by adding new edges and new nodes to G t − 1 . The n umber of new edges and new nodes added at time t is denot ed as m ( t ) and n ( t ), resp ectiv ely . The mo del assumes that the parameters go verning the distributions of G 0 , m ( t ) and n ( t ) do not inv olv e A k and the η i ’s. Note that the term k i ( t ) is defined as the degree of node v i at the onset of time-step t . On top of these structur al preconditions, the GT mo del assumes that the probabilit y that a no de v i with degree k i ( t ) receiv es a new edge at time t is giv en b y the form ula of Equation 1 . The GT mo del includes a handful of well-kno wn gro wing net work mo dels, based on P A and no de fitness as sp ecial cases; see T able 1 for a summary . Unlik e the BA or BB models, the GT mo del allows for the emergence of new edges b etw een old nodes and can handle b oth undirect and directed netw orks. W e refer readers to Supplement ary Information Section S2.2 in Pham et al. ( 2016 ) for the definition of the mo del in the case of undirected netw orks. The GT model is related to mo dels used in the R pac k ages RSiena and tergm . But w e defer a discussion of these pack ages to Section 4 . Lo oking beyond the field of complex netw orks, the GT mo del bears some similarities to the contagious P oisson process ( Coleman 1964 ; Allison 1980 ) and the conditional frailt y mo del ( Kelly and Lim 2000 ; Box-Steffensmeier and De Bo ef 2006 ). In the con tagious P ois- Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 5 T emp oral netw ork mo del A ttachmen t function No de fitness Gro wing ER mo del ( Callaw ay et al. 2001 ) A k = 1 η i = 1 BA mo del A k = k η i = 1 Caldarelli mo del A k = 1 F ree BB mo del A k = k F ree T able 1: Some we ll-kno wn special cases of the GT mo del as defined by Equation 1 . son pro cess, the initial prop ensit y of each node plays a similar role to that of node fitness and the rate of enforcement represen ts the P A mec hanism. In the conditional frailt y mo del, while the frailt y of eac h no de describ es the heterogeneity among no des and thu s is similar to no de fitness, the ev ent-based baseline hazard rate has the same effect as the non-parametric function A k . 2.2. A ttac hment function estimation The methods for estimating the attac hment function A k in isolation assume a simplified v ersion of Equation 1 , in whic h the η i are assumed to b e 1. Thus the probability P i ( t ) in Equation 1 dep ends only on A k . Perhaps the most frequen tly-encountered parametric v ersion of this mo del is the log-linear form A k = k α with attac hmen t exp onen t α . Net work scientists are particularly interested in estimating α , since the asymptotic degree distribution of the net work corresponds to simple regions of α . If α is less than unity (the sub-linear case), then the degree distribution is a stretched exp onen tial, while in the sup er-linear case of α > 1, one no de will ev entually get all the incoming new edges ( Krapivsky et al. 2001 ). It is only the linear case of α = 1 that gives rise to a p o wer-la w distribution. Concerning the ab o ve mo del, there are three main metho ds for estimating A k : Jeong’s metho d ( Jeong et al. 2003 ), Newman’s metho d ( Newman 2001 ), and P AFit ( Pham et al. 2015 ). Jeong’s metho d basically makes a histogram of the num b er of new edges n k connected to a no de with degree k , then divides n k b y the num b er of no des with degree k in the system to get A k ( Jeong et al. 2003 ). Jeong’s metho d has the merit of b eing simple, but estimates obtained using the metho d are sub ject to high v ariance and low accuracy ( Pham et al. 2015 ). By contrast, Newman’s method combines a series of histograms for low er v ariance and higher accuracy ( Newman 2001 ). Note that in P AFit w e implemen ted a corrected version of New- man’s original metho d ( Pham et al. 2015 ). The main drawbac k of Newman’s metho d is that the mathematical assumption b ehind its deriv ation holds only when α = 1, th us the metho d amoun ts to an appro ximation when α 6 = 1 ( Pham et al. 2015 ). The final method is P AFit ( Pham et al. 2015 ). It iteratively maximizes an ob jectiv e function that is a combination of the log-likelihoo d of the mo del with a regularization term for A k b y a Minorization-Maximization (MM) algorithm ( Hunter and Lange 2000 ). There is a h yp er- parameter, called r , in the method that con trols the strength of the regularization. P AFit c ho oses r automatically by cross-v alidation ( Pham et al. 2016 ). W e defer the details to Section 2.4 . The metho d is not only able to recov er A k accurately , but also can estimate the standard deviation of the estimated A k for each k ( Pham et al. 2015 ). Its main dra wbac k is that it migh t b e slow, since it is an iterative algorithm. 2.3. No de fitness estimation 6 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness When we consider only no de fitnesses, there are t wo generative mo dels in the literature with differen t assumptions regarding the functional form of A k in Equation 1 . While the Caldarelli mo del ( Caldarelli et al. 2002 ) assumes that A k is 1 for all k , the BB mo del ( Bianconni and Barab´ asi 2001 ) assumes that A k = k . Both models ha v e b een sho wn to generate net works with v arious heavy-tailed distributions ( Borgs et al. 2007 ; Kong et al. 2008 ). No de fitnesses in both models can b e estimated by v arian ts of the P AFit method prop osed in Pham et al. ( 2016 ), by either setting A k = k for the BB mo del or A k = 1 for the Caldarelli mo del. These estimation metho ds use MM algorithms that maximize the corresp onding log- lik eliho o d functions with a regularization term that regularizes the distribution of the η i ’s. More sp ecifically , the inv erse v ariance of this distribution is con trolled by a h yp er-parameter, called s , whic h is chosen automatically b y cross-v alidation. W e defer a more detail discussion to the next section. W e note that node fitnesses in the BB mo del can also be estimated b y the method in Kong et al. ( 2008 ). But since P AFit has b een sho wn to outperform this metho d ( Pham et al. 2016 ), we did not include it in the pac k age. 2.4. Join t estimation of the attac hmen t function and no de fitnesses Finally , by using the full model in Equation 1 the metho d P AFit in Pham et al. ( 2016 ) can join tly estimate A k and η i . W e note this full mo del includes all the temp oral netw ork mo dels sho wn in T able 1 . F or a more complete table, see T able 1 in Pham et al. ( 2016 ). The ob jectiv e function of P AFit is a com bination of the log-lik eliho o d of the full model defined b y Equation 1 and t wo regularization terms: one for A k and one for η i . While w e refer readers to Supplemen tary Information Section S2.3 in Pham et al. ( 2016 ) for a complete presen tation, w e will sketc h here the log-lik eliho o d function for the case of directed netw orks. Assume the set of observ ed snapshots is { G t } T t =0 . Let A = [ A 0 A 1 · · · A K − 1 ] > b e the vector of the P A function and η = [ η 1 η 2 · · · η N ] be the v ector of no de fitnes ses. Here K is the maxim um degree app earing in the gro wth pro cess and N is the total n umber of no des at the end of the process. Let z i ( t ) be the num b er of new edges connected to no de v i at time-step t . Equation 1 implies that { z i ( t ) } N i =1 follo ws a m ultinomial distribution with parameters { π i ( t ) } N i =1 , where π i ( t ) = A k i ( t ) η i P N j =1 A k j ( t ) η j . (2) Here w e use the conv en tion k j ( t ) = − 1 for a node that did not exist at time-step t and A − 1 = 0. Using Equation 2 , one can write the likelihoo d of eac h snapshot G 1 , · · · , G T . The log-lik eliho o d function of the temp oral netw ork { G t } T t =0 is then the sum of the log-likelihoo d of eac h snapshot and is equiv alen t to: l ( A , η ) = T X t =1 N X i =1 z i ( t ) log A k i ( t ) + T X t =1 N X i =1 z i ( t ) log η i − T X t =1 N X i =1 z i ( t ) log N X j =1 A k j ( t ) η j + C, (3) with C being the logarithm of the pro duct of the probability mass functions of G 0 , m ( t ), and n ( t ). Since the GT mo del, as stated in Section 2.1 , assumes that the parameters go verning the distributions of G 0 , m ( t ) and n ( t ) do not inv olve A k and η i , we can treat C as a constant. The regularization term for A k is defined b y r eg A = − r K − 2 X k =1 w k (log A k +1 + log A k − 1 − 2 log A k ) 2 , (4) Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 7 with r ≥ 0, w k = P T t =1 m k ( t ) and m k ( t ) the num b er of edges that connect to a degree k no de at time-step t . This term controls the shap e of A k . When r is large, A k b ecomes more linear on a log-scale. In limiting as r approaches infinit y , we effectiv ely assume that A k = k α ( Pham et al. 2016 ). Th us this cov ers the case of α = 1 in the BA mo del and the BB mo del, and the case of α = 0 in the growing ER and the Caldarelli mo del. The regularization term for the no de fitnesses is defined by r eg F = N X i =1 (( s − 1) log η i − sη i ) , (5) with s > 0. This term is the sum of the logarithms of Gamma distribution densities with mean 1 and v ariance 1 /s . The regularization is equiv alent to placing such Gamma distribu- tions as priors independently for each node fitness η i . The larger the v alue of s , the more tigh tly concen trated the v alues of η i b ecome. If s is infinitely large, then all η i will take the same v alue. This is equiv alent to estimating the attachmen t function in isolation. T o conclude: join t estimation with the ab ov e regularization terms also compasses the t w o cases of estimating eit her A k or η i in isolation. In particular, we maximi ze the follo wing ob jective function: J ( A , η ) = l ( A , η ) + r eg A + reg F , with an MM algorithm. A t each iteration, the algorithm replaces the ob jectiv e function with an easier-to-maximize surrogate function and this surrogate function is maximized instead. The surrogate function is c hosen in suc h a wa y that the ob jective function v alue is guaran teed to b e nondecreasing o ver iterations. W e refer the readers to Hunter and Lange ( 2004 ) for the definition of a surrogate function and the tec hniques used to deriv e them. F or a surrogate function, the v ariables are often separable, i.e., the partial deriv ative of one v ariable do es not in volv e the others, and th us the maximizat ion at each iteration, i.e., finding the v ariables b y setting all the partial deriv ativ es to zero, can b e parallelized. While we refer readers to Supplemen tary Information Section S2.4 of Pham et al. ( 2016 ) for a detailed discussion, the essence of the MM algorithms in P AFit is to linearize the term log P N j =1 A k j ( t ) η j in E quation 3 and to apply Jensen’s inequality to make the v ariables in Equation 4 separable. As men tioned in the tw o previous sections, the v alues of r and s are automatically selected b y cross-v alidation. In partic ular, the dataset is divid ed in to a learning part { G t } T ∗ 0 and a testing part { G t } T T ∗ , where T ∗ is the smallest p ositiv e n um b er suc h th at the ratio of the n umber of new edges in the learning part, i.e., P T ∗ t =1 P N i =1 z i ( t ), to that of the whole dataset, i.e., P T t =1 P N i =1 z i ( t ), is at least p = 0 . 75 (the default v alue). F or eac h com bination of r and s , w e use the learning data to get the estimated v alue of A and η and plug these estimated v alues in to Equation 3 to calculate the log-lik eliho o d of the testing data. The com bination of r and s that maximize this log-likelihoo d is then c hosen. The metho d then re-estimates A and η using the whole dataset with the c hosen com bination of r and s . 3. P ac k age o v erview The P AFit pac k age provides functions to sim ulate v arious temporal netw ork models, gather essen tial netw ork statistics from ra w input data, and use these summarized statistics in the estimation of A k and the η i v alues. The hea vy computational parts of the pac k age are im- plemen ted in C++ through the use of the Rcpp pack age ( Eddelbuettel and F ran¸ cois 2011 ; 8 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Eddelbuettel 2013 ; Ed delbuettel and Balam uta 2017 ). F urthermore, multi-core mac hine users can enjoy a hassle-free speed up through Op enMP parallelization mechanisms implemen ted in the co de. Apart from the main functions, the pac k age also includes a real-world collaboration net work dataset b et ween scientists in the field of complex net w orks. T able 2 summarizes the main functions in the pac k age. In what follo ws, w e will review the main pac k age functions one b y one. Firstly , most well-kno wn temp oral net w ork mo dels based on P A and no de fitness mec ha- nisms can b e easily sim ulated using the pac k age. P AFit implemen ts generate_BA for the BA mo del, generate_ER for the growing ER model, generate_BB for the BB model, and generate_fit_only for the Caldarelli model. These functions hav e many customizable op- tions. F or example, the n um b er of new edges at eac h time-step is a tunable stochastic v ariable; see T able 3 for descriptions of the parameters. They are actually wrappers of the more p o wer- ful generate_net function, whic h simulate s net works with more flexible attachmen t function and no de fitness settings. Eac h temp oral net w ork mo del generation function outputs a PAFi t_net ob ject, whic h is a list with four fields: type , fitness , PA , and graph . The type field is a string indicating the t yp e of netw ork: "directed" or "undirected" . This field is "directed" for the net works generated b y the simulation functions. The fitness and PA fields con tain the true no de fitnesses and P A funct ion, respectively . The graph field con tains the generated temp oral net work in a three-column matrix format. Eac h ro w of this matrix is of the form (id_1 id_2 time_stamp) . While id_1 and id_2 are IDs of the source no de and the destination node, resp ectiv ely , time_stamp is the birth ti me of the edge. This is the so-called edge-list format in whic h ra w temp oral netw orks are stored in many on-line rep ositories ( Kunegis 2013 ; Lesk ov ec and Krevl 2014 ). W e will discuss ho w to use functions pro vided by P AFit to con v ert this edge-list format to formats used in other netw ork analysis pack ages in the next section. One can apply the function plot directly to a PAFit_net ob ject to visualize its conten ts. The second functionalit y of P AFit is implemented in get_stati stics . With its core part implemen ted in C++ , thi s function efficien tly collects all temp oral netw ork summary statistics that are needed in the subsequen t estimation of P A and node fitnesses. The input netw ork is assumed to be stored as a PAFit_net ob ject. One can use the function graph_from_file to read an edge-list graph from a text file in to a PAFit_ne t ob ject, or con v ert an edge-list matrix to this class by the function as.PAFit_net . The edge-list matrix is assumed to b e in the same format as P AFit simulat ed graphs, i.e., eac h row is of the form (id_1 id_2 time_stamp) . The no de IDs are required to b e integers greater than − 1, but need not to b e contiguous. Note that (id -1 t) describes a node id that appeared at time t without any edge. There are no assumptions on the v alues or data t yp es of time_stamp , other than that their c hronological order is the same as what the R function order returns. Examples of timestamps that satisfy this requiremen t are the in teger v ector 1:T , the format ‘yyyy-mm-dd’, and the POSIX time. The get_statistics function automatically handles b oth directed and undirected netw orks. It retu rns a list con taining man y statisti cs that can be used to c haracterize the netw ork gro wth process. Notable fields are m_tk containing the n umber of new edges that connect to a degree- k no de at time-step t , and node_degree cont aining the degree sequence, i.e., the degree of eac h node at each time-step. The most imp ortan t functionality of P AFit relates to the estimation of the attac hment func- Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 9 F unction Main input Output generate_ER net work parameters net work from the growing ER mo del generate_BA net work parameters net work from the generalized directed BA mo del generate_BB net work parameters net work from the BB mo del generate_fit_only net work parameters net work from the Caldarelli mo del generate_net net work parameters net work from the GT mo del get_statistics PAFit_net ob ject con taining the netw ork PAFit_data ob ject con taining summary statistics Jeong PAFit_net ob ject and PAFit_data ob ject estimated P A function b y Jeong’s metho d Newman PAFit_net ob ject and PAFit_data ob ject estimated P A function b y Newman’s metho d only_A_estimate PAFit_net ob ject and PAFit_data ob ject estimated P A function b y P AFit only_F_estimate PAFit_net ob ject and PAFit_data ob ject estimated no de fitnesses b y P AFit joint_estimate PAFit_net ob ject and PAFit_data ob ject estimated P A function and no de fitnesses by P AFit to_networkDynamic PAFit_net ob ject networkDynamic ob ject from_networkDynamic networkDynamic ob ject PAFit_net ob ject to_igraph PAFit_net ob ject igraph ob ject from_igraph igraph ob ject PAFit_net ob ject graph_to_file PAFit_net ob ject text file in either edge-list format or gml graph_from_file text file in edge-list format or gml a PAFit_net ob ject as.PAFit_net edge-list matrix PAFit_net ob ject test_linear_PA degree vector Linear_PA_test_result ob ject T able 2: Summary of the main functions in the P AFit pack age. 10 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness P arameter (default v alue) Description N ( 1000 ) total n umber of no des in the netw ork num_seed ( 2 ) initial graph is a circle with num_seed no des multiple_node ( 1 ) n umber of new nodes added at each time-step m ( 1 ) n umber of edges of a new no de alpha ( 1 ) attac hment exp onent α when we assume A k = k α mode_f ( "gamma" ) distribution of no de fitnesses: gamma, log-normal or p ow er-law s ( 10 ) distribution of no de fitnesses has mean 1 and v ariance 1 /s T able 3: Main parameters in net work generating functions in the P AFit pac k age. tion and no de fitnesses of a temporal netw ork. This is implemented through v arious metho ds. There are three usages: estimation of the attac hment functi on in isolation, estimation of no de fitnesses in isolation, and the join t estimation of the attachme n t function and no de fitnesses. The functions for estimating the attac hment function in isolation are: Jeong for Jeong’s metho d, N ewman for Newman’s metho d, and only_A_estimate for the P AFit method in Pham et al. ( 2015 ). F or estimation of node fitnesses in isolation, only_F_estimate implemen ts a v ariant of the P AFit method in Pham et al. ( 2016 ). F or the join t estimation of the at tac hmen t function and no de fitnesses, we implemen t the full version of the P AFit metho d ( Pham et al. 2016 ) in joint_estimate . The input of these functions is the output ob ject of the function get_statistics . The output ob jects of these functions contain the estimation result s as we ll as some additional information p ertaining to the estimation pro cess. In T able 4 , w e sho w the input parameters of joint_estimate , the most imp ortant function in P AFit . This function takes the temp oral net work net_object and the summarized statistics net_stat as the main inputs. There are three parameters that control th e estimation pro cess: p , stop_cond , and mode_r eg_A . The parameter p specifies the ratio of the num b er of new edges in the learning data to that of the full data in the cross-v alidation step. F ollo wing Pham et al. ( 2016 ), its default v alue is set at 0 . 75. The parameter stop_cond sp ecifies the thr eshold  : the iterativ e algorithm will con tinue un til the relativ e difference in the ob jectiv e function J ( A , η ) b et ween t wo successiv e iterations falls b elo w this threshold ( Pham et al. 2016 ; Zhou et al. 2011 ). The defau lt v alue  = 10 − 8 is set follo wing Pham et al. ( 2016 ). The parameter mode_reg_A sp ecifies the regularization term for A k . The default v alue mode_reg_A = 0 corresp onds to the regularization term in Equation 4 ( Pham et al. 2016 ). When mode_reg_A = 1 , the following regularization term is used: − r K − 1 X k =2 w k  log A k +1 − log A k log ( k + 1) − log k − log A k − log A k − 1 log k − log ( k − 1)  2 . (6) Although this regularization term will enforce exactly the form A k = k α , it is significan tly slo wer to optimize this regularization term while the improv emen t o v er Equation 4 is little. Finally , although one can roughly assess whether P A exists in the netw ork b y visu al insp ection of the estimated P A function, Handco c k and Jones ( 2004 ) pro vide a metho d to test whether the linear P A-only case, i.e., A k = k and η i = 1, is consisten t with a given degree v ector. W e implemented th is metho d in the function test_linear_PA . This function c ho oses the b est fitted distribution to a giv en degree vector among a set of distributions by comparing the Ak aike Information Criterion (AIC) ( Ak aike 1974 ) or th e Ba yesian Information Crite- Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 11 P arameter Default v alue net_object no default v alue net_stat get_statistics(net_object) p 0 . 75 stop_cond 10 − 8 mode_reg_A 0 T able 4: P arameters of the joint_estimate function and their default v alues. rion (BIC) ( Raftery 1995 ). The set of distributions are Y ule, W aring, P oisson, geometric, and negativ e binomial. The linear P A-only case corresp onds to Y ule or W aring ( Y ule 1925 ; Irwin 1963 ). 4. Related net work pac k ages Since netw ork analysis has b een an imp ortan t field for a long time, v arious asp ects of it hav e b een implemen ted in a large n um b er of soft ware pack ages. T o our b est effort, w e ha v e con- firmed that the non-parametric joint estimation of P A and fitness mec hanisms in a growing net work is not implemen ted elsewhere. Restricting the discussion to pack ages in R , there are some notable implementations of related statistical net w ork mo dels. F or example, sto c hastic blo c k mo dels in the pack ages igraph ( Csardi and Nepusz 2006 ), sna ( Butts 2016 ), blo c kmo d- els ( INRA and Leger 2015 ) and dynsbm ( Matias and Miele 2016 , 2018 ); exp onential random graph mo dels in the pac k ages ergm ( Hun ter et al. 2008 ; Handco c k et al. 2018 ), tergm ( Krivit- sky and Handcock 2018b ), hergm ( Sch wein b erger and Luna 2018 ), b tergm ( Leifeld et al. 2018 ), and RSiena ( Ripley et al. 2018 ); and latent space mo dels in the pack age latentnet ( Krivitsky and Handco c k 2008 , 2018a ). The dynsbm pack age estimates a dynamic stochastic block mo del in whic h no des are assumed to b elong to some laten t groups which can v ary with time, and the edge weigh t b etw een t w o no des at an y time follows some parametric distribution. The pac k age can deal with b oth discrete and contin uously weigh ted edges. The igraph pac k age con tains the functions sample_pa and sample_growing which are the equiv alents of generate_BA and generate_ER in P AFit , respectively . Although igraph also generates netw orks from many other mechanisms, it do es not con tain any function for esti- mating the P A function and/or no de fitnesses. It does con tain man y functionali ties for dealing with sto c hastic blo c k mo dels and v arious other netw ork mo dels. Some of the ab o v e pac k ages are included in the extensiv e meta-pac k age statnet ( Handcock et al. 2008 , 2016 ). In statnet , pac k ages that deal with temp oral netw orks are: netw orkDy- namic ( Butts et al. 2016 ), tsna ( Bender-deMoll and Morris 2016 ), and tergm . The net work- Dynamic pack age provides the networkDynamic class to store dynamic netw orks and v arious functions t o manipulate them. The tsna pack age calculates many temp oral statistics of a dynamic netw ork stored in a networkDynamic ob ject. The closest pac k ages to P AFit that estimate P A in a temp oral net w ork are tergm and RSiena , whic h implemen t sophisticated contin uous-time and discrete-time Marko v mo dels. Regarding P A, all the implement ed options in tergm and RSiena p ertain to the parametric estimation of A k , in con trast to the non-parametric estimation methods implemen ted in P AFit . Although 12 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness it migh t b e theoretically p ossible to describe a non-p arametric A k function in tergm and RSiena , they contai n no regularization terms for the join t estimation of the non-parametric P A function and no de fitnesses. Joint estimation without regularization terms is very likely unable to reco ver the true parameters, since the num b er of parameters is t ypically high. On the other hand, P AFit is specifically designed for estimating A k non-parametrically with no de fitnesses, since it has t wo regularization terms in Equations 4 and 5 , together with the cross-v alidation step for selecting suitable regularization parameters. P AFit provides functionalities to comm unicate with existing net w ork analysis pac k ages. Using to_networkDynamic and from_networkDynami c , one can con vert a PAFit_net ob ject to a net workDynamic ’s networkDynami c ob ject and vice v ersa. The functions to_igraph and from_igraph do the same for igraph ’s igraph ob jects. The extensive functions of statnet and igraph pac k ages can then b e used. One can also output the graph stored in a PAFit_net ob ject to the univ ersal gml format b y the function graph_to_file , or read from a gml file b y the function graph_from_file . 5. P ac k age usage Here w e sh o w three usages of P AFit : the estimation of the attac hment function A k in i solation in Section 5.1 , the estimation of no de fitnesses η i in isolation in Section 5.2 , and the joint estimation of A k and the η i v alues in Section 5.3 . 5.1. A ttac hment function estimation First w e generate a net work from a directed v ersion of the BA mo del, called Price’s mo del ( Price 1976 ). F rom the initial graph with t wo nodes and one edge, one new no de with m = 5 new edges is added at eac h time-step until the num b er of no des is N = 1000. R> set.seed(1) R> library("PAFit") R> sim_net_1 <- generate_B A(N = 1000, m = 5) Recall that A k is linear in the BA mo del, i.e., the attac hment exp onen t α is equal to 1, and the no de fitnesses are uniform. One can observ e the emergence of h ubs in this net w ork by visualizing the generated graph at v arious time-steps by the function plot . The follo wing script plots the netw ork snapshot at time t = 1 in Figure 1a and its corresp onding degree distribution in Figure 1d : R> plot(sim_net_1, slice = 1, arrowhead. cex = 3, vertex.cex = 3) R> plot(sim_net_1, slice = 1, plot = "degree", cex = 3, cex.axis = 2, + cex.lab = 2) Note that if the net w ork is directed, as it is in this example, the option plot = "degree" will plot the in-degree distribution. In the same w a y , w e plot netw ork snapshots at time t = 10 and t = 100 in Figures 1b and 1c and their corresp onding degree distributions in Figures 1e and 1f . The next step is to use the function get_statistics to get the summary statistics for the temp oral netw ork: Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 13 (a) Snapshot at t = 1. (b) Snapshot at t = 10. (c) Snapshot at t = 100. Degree + 1 Frequency 1 2 1 2 ● ● (d) Degree distribution at t = 1. Degree + 1 Frequency 1 2 5 10 20 1 2 5 ● ● ● ● ● (e) Degree distribution at t = 10. Degree + 1 Frequency 1 10 100 1 10 ● ● ● ● ● ● ● ● ● ● ● ● (f ) Degree distribution at t = 100. Figure 1: Net work snapshots and their corresp onding in-degree distributions at time-steps t = 1, 10, and 100. The temp oral netw ork, sim_net_1 , is generated from Price’s mo del with total n umber of no des N = 1000. R> stats_1 <- get_statisti cs(sim_net_1) With stats_1 containing al l the needed summary statistics, w e then apply the three methods of estimating the attachmen t function in isolation: R> result_Jeong <- Jeong(sim_net_1, stats_1) R> result_Newman <- Newman(sim_net_1, stats_1) R> result_PA_only <- only_A_estimate(sim_net_1, stats_1) Let us explain result_PA_onl y in more detail. Information on the estimated results as w ell as the estimation pro cess can b e viewed by in v oking summary : R> summary(result_PA_only) Estimation results by the PAFit method. Mode: Only the attachment function was estimated. Selected r parameter: 0.1 14 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Estimated attachment exponent: 1.001139 Attachment exponent ± 2 s.d.: (0.9908913, 1.011387) ----------------------- -------------------- Additional information: Number of bins: 50 Number of iterations: 63 Stopping condition: 1e-08 As stated in Section 2 , the P AFit method first fi nds the r parameter, whic h regularizes the P A function, by cross-v alidation, and then estimates A k using the chosen r . The estimated func- tion can b e accessed via $estimate _result$k and $estimate_r esult$A of result_PA_only . F rom this estimated function, the attac hment exp onent α (when w e assume A k = k α ) and its standard deviation are also estimated. Here ˆ α is 1 . 001 ± 0 . 01 as w e can see from the output of summary . These v alues can be accessed via $estimate_result$alpha and $estimate_result$ci . The output also reveals that P AFit applies binning with 50 bins by default. In this pro cedure, w e divide the range of k into bins consisting of consecutive degrees, and assume that all k in a bin ha ve the same v alue of A k . Binning is an imp ortan t regularization technique that significan tly stabilizes the estimation of the attac hment function ( Pham et al. 2015 ). In this example, the center of each bin is stored in the field $center_k of stats_1 . Since the cen ter of a bin is also the P A v alue corresp onding to that bin in the linear P A case, w e can plot the estimated attach men t funct ion together with the true attac hmen t function using the follo wing script, which pro duces the plot of Figure 2a . The options min_A and max_A sp ecify the minim um and maximum v alues in the vertical axis of the plot, resp ectiv ely . R> plot(result_PA_only,stats_1, min_A = 1, max_A = 2000, + cex = 3, cex.axis = 2, cex.lab = 2) R> lines(stats_1$center_k, stats_1$ce nter_k, col = "red") The estimation results of Jeong’s metho d and Newman’s metho d can b e plotted in a similar w ay , and are shown in Figures 2b and 2c , resp ectiv ely . Ov erall, Newman’s method and P AFit estimate the attac hmen t function A k ab out equally w ell, while Jeong’s metho d is found to underestimate the function and also exhibits high v ari- ance. This can also b e observed in the estimated attachmen t exp onen t of the three metho ds: Newman’s method and P AFit recov er the true α , while Jeong’s metho d underestimates it. Note that in P AFit we also obtain the interv al of the estimated A k ± 2 s.d. (ligh tblue region in Figure 2a ), which are una v ailable in the other t w o metho ds. This is a significan t adv antage of P AFit ov er the other tw o metho ds since it allo ws the user to quanti fy uncertain ties in the result. 5.2. No de fitnesses estimation Here w e estimate no de fitnesses from a BB mo del generated netw ork with the assumption that A k = k . T o demonstrate the functionality of the pac k age, we generate a BB net work with a nonstandard setting: R> sim_net_2 <- generate_B B(N = 1000, num_seed = 100, multiple_node = 100, + m = 15, s = 10) Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 15 Degree k Attachment function 1 10 100 1000 1 10 100 1000 (a) P AFit ( ˆ α = 1 . 001 ± 0 . 01) Degree k Attachment function 1 10 100 1000 1 10 100 1000 (b) Jeong’s metho d ( ˆ α = 0 . 96 ± 0 . 07) Degree k Attachment function 1 10 100 1000 1 10 100 1000 (c) Newman’s metho d ( ˆ α = 1 . 01 ± 0 . 02) Figure 2: Estimating the attac hment function in isolation from sim_net_1 . The true attach- men t function is A k = k α with attac hment exponent α = 1. W e also show the estimated α and the in terv al of the estimated α ± 2 s.d. provided by each metho d. This net work gro ws from a seed netw ork with N 0 = 100 no des where the no des form a line graph. The v alue of N 0 can be sp ecified by num_seed . At each time-step w e add n = 100 new no des where each no de has m = 15 new edges. The v alues of n and m can be sp ecified via multiple_node and m , respectively . The total n umber of no des in the final net work is N = 1000. Finall y , the distribution from which w e generate node fitnesses is the Gamma distribution with mean 1 and inv erse v ariance s = 10. Next w e get the netw ork summary statistics and then apply the estimation function: R> stats_2 <- get_statisti cs(sim_net_2) R> result_fit_only <- only_F_estimate(sim_net_2, stats_2) R> plot(result_fit_only, stats_2, plot = "f", + cex = 2, cex.axis = 1.5, cex.lab = 1.5) The final line of the snipp et generates the distribution of estimated no de fitnesses sho wn in Figure 3a . The function only_F_estimate estimates no de fitnesses under the assumption that A k = k b y default. But one also can estimate node fitnesses in the Caldarelli mo del, i.e., A k = 1 for all k , with the option model_A = "Constant" . The function only_F_estimate w orks b y first finding the estimated v alue ˆ s of s b y cross-v alidation, and then using ˆ s in the subsequent estimation of no de fitnesses. The summar y information of the estimati on result can b e view ed b y in voking summary : R> summary(result_fit_only) Estimation results by the PAFit method. Mode: Only node fitnesses were estimated. Selected s parameter: 8 ----------------------- -------------------- Additional information: 16 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Fitness Density 0.26 1.00 1.94 0 1 2 3 (a) Distribution of estimated fitnesses. T r ue fitness Estimated fitness 0.2 0.5 1 2 0.2 0.5 1 2 (b) Estimated fitnesses versus true fitnesses. Figure 3: Estimating node fitnesses in isolation from sim_net_2 , whic h is generated with attac hment function A k = k . The true node fitnesses are sampled from a Gamma distribution with mean 1 and in v erse v ariance 10. The attachmen t function in the estimation method is fixed at A k = k . In panel b, we only plot no des for which the num b er of acquired new edges is at least 5. Number of bins: 50 Number of iterations: 19 Stopping condition: 0.00000001 The method slightl y under-estimated s . W e can c heck whether the no de fitnesses were esti- mated w ell by plotting the estimated fitnesses v ersus the true fitnesses b y running the follo wing command: R> plot(result_fit_only, stats_2, true_f = sim_net_2$fitness, + plot = "true_f", cex = 2, cex.axis = 1.5, cex.lab = 1.5) This will pro duce the plot of Figure 3b . It turns out that the estimated node fitnesses agree prett y w ell with the true no de fitnesses. W e note that the light blue band around the ˆ η i v alues depicts the interv als of ˆ η i ± 2 s.d.. The upp er and lo w er v alues can b e accessed via $estimate_result$upper_ f and $estimate_result$lower_f of result_fit_on ly , resp ec- tiv ely . 5.3. Join t estimation of the attac hmen t function and no de fitnesses Here we sho w how to estimate the attac hmen t function and node fitnesses sim ultaneously . W e need to assume in Section 5.1 the equalit y of all η i for the estimation of A k in isolation, and in Section 5.2 a sp ecific functional form of A k for the estimation of η i in isolation. Such assumptions b ecome unnecessary when w e perform join t estimation, since the appropriate Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 17 functional forms will be automatically enforced through the regularization parameters r and s , whic h will b e chosen by cross-v alidation. W e recommend the joint estimation pro cedure as the standard estimation pro cedure in this pac k age, unless there is a sp ecific reason to justify the one or the other of these assumptions. This time w e generate a net work in which the attachmen t function is A k = k α with α = 0 . 5 and the Gamma distribution of no de fitnesses has mean 1 and v ariance 1 /s with s = 10: R> sim_net_3 <- generate_n et(N = 1000, num_seed = 100, multiple_node = 100, + m = 15, s = 10, alpha = 0.5) W e then apply joint_estimation : R> stats_3 <- get_statisti cs(sim_net_3) R> result_PAFit <- joint_estimate(sim_net_3, stats_3) R> summary(result_PAFit) Estimation results by the PAFit method. Mode: Both the attachment function and node fitness were estimated. Selected r parameter: 10 Selected s parameter: 18.75 Estimated attachment exponent: 0.5168941 Attachment exponent ± 2 s.d.: (0.5097277, 0.5240605 ) ----------------------- -------------------- Additional information: Number of bins: 50 Number of iterations: 596 Stopping condition: 0.00000001 W e can plot the estimated attachmen t function as in Figure 4a , and the distri bution of the ˆ η i ’s as in Figure 4b with the follo wing code: R> plot(result_PAFit, stats_3, min_A = 1, max_A = 40, + cex = 3, cex.axis = 2, cex.lab = 2) R> lines(stats_3$center_k, stats_3$ce nter_k^0.5, col = "red") R> plot(result_PAFit, stats_3, plot = "f", + cex = 3, cex.axis = 2, cex.lab = 2) Concerning the estimated v alues, while s is sligh tly o ver-estimated by ˆ s = 18 . 75, ˆ α = 0 . 52 ± 0 . 01 is a go o d estimate of α . W e can also plot the estimated fitnesses v ersus the true fitnesses as in Figure 4c with the follo wing co de: R> plot(result_PAFit, stats_3, true_f = sim_net_3$fitness, plot = "true_f", + cex = 3, cex.axis = 2, cex.lab = 2) Since the mean of ˆ η i ’s is normalized to 1, the o ver-estimation of s leads to o ver-e stimation of lo w-v alue fitnesses and under-estimation of high-v alue fitness, as can be seen in the plot of Figure 4c . 18 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Degree k Attachment function 1 10 100 1 10 (a) Estimated attachmen t func- tion ( ˆ α = 0 . 52 ± 0 . 01). Fitness Density 0.43 1.00 1.89 0 1 2 3 4 (b) Distribution of estimated no de fitnesses. T rue fitness Estimated fitness 0.1 0.2 0.5 1 2 0.1 0.2 0.5 1 2 (c) Estimated and true η i v alues. Figure 4: Join t estimation of the attac hment function and no de fitnesses from sim_net_3 . The red line in panel a is the true attachmen t function A k = k 0 . 5 . The true no de fitnesses are sampled from a Gamma distribution with mean 1 and inv erse v ariance s = 10. W e show how joint estimation impro ves on estimating either no de fitnesses in isolation (Fig- ures 5a and 5b ) or the P A function in isolation (Figure 5c ). F or estimating no de fitnesses in isolation, t wo cases are sho wn: the result when we assume the BB mo del in whic h A k = k (Figure 5a ) and the result when w e assume the Caldarelli mo del in which A k = 1 (Figure 5b ). In either case, the estimated no de fitnesses are visually worse than those of the joint es- timation in Figure 4c . Similarly , estimating the P A function in isolation apparently led to o verestimation of the P A func tion in t he region of l arge k . T o conclude, estimating either no de fitnesses or the P A function in isolation would lik ely b e w orse than the join t estimation, if the underlying assumptions ab out the true no de fitnesses and the true P A function are wrong. T rue fitness Estimated fitness 0.1 1 0.1 1 (a) Estimated and true η i v alues when assuming A k = k . ˆ s = 3 . 2. T rue fitness Estimated fitness 0.1 1 0.1 1 (b) Estimated and true η i v alues when assuming A k = 1. ˆ s = 4. Degree k Attachment function 1 10 100 1 10 (c) Estimated P A function in iso- lation ( ˆ α = 0 . 60 ± 0 . 05). Figure 5: Estimation of no de fitnesses and the P A function in isolation from sim_net_3 . The red line in panel c is the true attachme n t function A k = k 0 . 5 . The true no de fitnesses are sampled from a Gamma distribution with mean 1 and in verse v ariance s = 10. Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 19 6. Sim ulation Study In this section, w e presen t the results of a simu lation study that we conducted to assess the p erformance of the joint_estimation function. W e assume the functional form A k = k α for the attachm en t function. T o cov er the spectrum of P A and anti-P A phenomena, w e choose four v alues for α : − 0 . 5, 0, 0 . 5, and 1. W e sample no de fitnesses from a Gamma distribution with mean 1 and v ariance 1 /s . Three v alues for s we chose are: 5, 20, and 80. While small v alues of s lead to widely v aried no de fitnesses, large v alues of s leads to highly concentrat ed no de fitnesses. F or eac h com bination of α and s , w e generated M = 50 net works, and estimated A k and s for eac h net work using the joint_estimation function. W e then fit the form ˆ A k = k α to ˆ A k in order to estimate α . W e then compared the means of the estimation results of α and s with the true v alues. Eac h simulated net work has a total of 1000 nodes, where the initial graph has 200 no des and 50 new nodes are added at eac h time-step for a total of 10 time-steps. Eac h new node has 50 new edges. The results are sho wn in Figure 6 . The attac hment exp onen t α was est imated reasonably w ell across all combi nations of α and s . Except for the cases in whic h the attach men t function gro ws fast ( α = 1) or the case in which node fitnesses ha ve high v ariance ( s = 5), the estimated v alues of s w ere also acceptable. The case of s = 5 resulted in a slight ov er-estimation, which is p erhaps attributable to high no de fitnesses v ariance. W e also notice that s w as slightly o ver-estimated when α = 1, whic h may be caused by the fast gro wing rate of the P A function. One also notices that the interv als of ˆ s ± 2 s.d. are m uc h larger than those for ˆ α . The ab o ve observ ations imply that it is muc h harder to estimate s than α . 7. Analysis of a collab oration net w ork b et w een scien tists In this section, w e demonstrate the complete work-flo w for the join t estimation of A k and η i on a collab oration net work b et ween scientists from the field of complex netw orks. In this net work, nodes represen t scien tists and an undirected edge exists b et ween them if, and only if, they hav e coauthored a paper. The degree of a no de represen ts the num b er of collab orators of a scien tist, since m ultiple edges are not considered. The temporal net w ork is stored in coauthor.net , and the names of the scien tists are store d in coauthor.author_id . The net work without timestamps was compiled by Mark Newman from the bibliographies of tw o review articles on complex net works ( Newman 2006 ). P aul Sheridan, the second author of the present work, augmented the dataset with time stamps. More information on the dataset can b e found in the pack age reference manual. The first step in the analysis is to conv ert the edge-list matrix coauthor.net to a PAFit_net ob ject, and get the summary statistics using the function get_statistics . R> set.seed(1) R> true_net <- as.PAFit_ne t(coauthor.net, type = "undirected") R> net_stats <- get_statis tics(true_net) R> summary(net_stats) Contains summary statistics for the temporal network. Type of network: undirected 20 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● T r ue Estimate 5 20 80 124 −0.5 0.0 0.5 1.0 α s Figure 6: Sim ulation result. F or eac h combin ation of α and s , we generated 50 net works using attac hment function A k = k α and a Gamma distribution with mean 1 and inv erse v ariance s for no de fitnesses. Each red p oint indicates the a v erage of the esimated α and the selected s o ver 100 sim ulations for the corresp onding com bination of α and s . At each red p oin t, the horizon tal/verticalcal bar indicates the interv al of the estimated α /selected s ± 2 s.d., resp ectiv ely . Number of nodes in the final network: 1498 Number of edges in the final network: 5698 Number of new nodes: 1358 Number of new edges: 1255 Number of time-steps: 145 Maximum degree: 37 Number of bins: 38 The temp oral netw ork grew in 145 time-steps from an initial netw ork at September 2000, to Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 21 a final state at September 2007. The resolution of those time-steps is mon thly . The final net work has 1498 scien tists with 5698 collaborations among them. One can plot the degree distribution of the final snapshot as follows: R> plot(true_net,plot = "degree") This will pro duce the plot of Figure 7 . Degree + 1 Frequency 2 5 10 20 1 10 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Figure 7: Degree distribution of the final snapshot of the collab oration netw ork. Before any estimation of the P A function and no de fitnesses is carried out, one can test whether the linear P A-only case is consistent with the observ ed degree distribution of the collab oration netw ork: R> test_linear_PA_result <- test_linear_PA(net_stats$final_deg) R> print(test_linear_PA_result) This will generate T able 5 . In this case, since the Y ule and W aring distributions are not the b est mo dels, one can conclude that the linear P A-only case is inconsistent with the observ ed degree v ector. T o further in vestigate the P A function and no de fitnesses, w e in vok e the joint_estimate function for joint estimation: 22 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Mo del Log-lik eliho o d AIC BIC nb -2415.08 4840.16 4866.72 pois -2468.63 4945.27 4966.51 waring -2557.49 5124.99 5151.55 geom -2898.64 5811.27 5848.45 yule -2959.97 5929.95 5956.51 T able 5: The result of applying the function test_linear_PA to the observ ed degree vector of the collab oration netw ork. This function calculates the AIC and BIC of fiv e mo dels: Y ule ( yule ), W aring ( waring ), P oisson ( pois ), geometric ( geom ), and negativ e binomial ( nb ) when fitting them to the observed degree ve ctor. R> full_result <- joint_estimate(true_net, net_stats) R> summary(full_result) Estimation results by the PAFit method. Mode: Both the attachment function and node fitness were estimated. Selected r parameter: 10 Selected s parameter: 45 Estimated attachment exponent: 0.9951764 Attachment exponent ± 2 s.d.: (0.9715202, 1.018833) ----------------------- -------------------- Additional information: Number of bins: 38 Number of iterations: 607 Stopping condition: 0.00000001 W e can visualize the estimated attac hment function and the distribution of estimated no de fitnesses by: R> plot(full_result, net_stats, line = "TRUE", + cex = 2, cex.axis = 1.5, min_A = 1, max_A = 1000, cex.lab = 1.5) R> plot(full_result, net_stats, plot = "f", + cex = 2, cex.axis = 1.5, cex.lab = 1.5) This snipp et will sequen tially generate the plots of Figure s 8a and 8b . When other options are set at their default v alues, the option line = "TRUE" will plot the function ˆ A k = k ˆ α , whic h is a straigh t line on a double logarithmic scale. The best fit mo del when we p erformed join t estimation is close to the BB mo del. In Figure 8a , the estimated A k is an increasing function with ˆ α = 1 . 00 ± 0 . 05. W e tak e this as evidence in fav or of the presence of linear P A in the col lab oration net work. Let us take a concrete example: a netw ork scien tist with tw ent y collab orators has roughly twice the c hance to get a new collaborator compared with someone who only has ten collaborators, assuming they ha ve the same fitness. F or comparison’s sake, we also plot the estimation results of A k in isolation using Jeong’s metho d, Newman’s metho d, and P AFit in Figures 8c , 8d , and 8e , resp ectiv ely: R> result_Jeong <- Jeong(true_net, net_stats) R> result_Newman <- Newman(true_net, net_stats) Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 23 Degree k Attachment function 1 2 5 10 20 1 10 100 1000 (a) Estimated A k with joint estimation by P AFit ( ˆ α = 1 . 00 ± 0 . 05). Fitness Density 0.77 1.00 2.31 0 1 2 3 4 (b) Histogram of estimated no de fitnesses. Degree k Attachment function 1 2 5 1 10 100 1000 (c) Jeong’s metho d ( ˆ α = 0 . 77 ± 0 . 80). Degree k Attachment function 1 2 5 10 20 1 10 100 1000 (d) Newman’s metho d ( ˆ α = 1 . 39 ± 0 . 56). Degree k Attachment function 1 2 5 10 20 1 10 100 1000 (e) P AFit ( ˆ α = 1 . 05 ± 0 . 07). Figure 8: Estimation of the attac hment function and no de fitnesses for the netw ork scientist collab oration net work. Panels a and b sho w the joint estimation result, while panels c, d, and e sho w the results when we estimated the P A function in isolation. R> result_onlyA <- only_A_estimate(true_net, net_stats) R> plot(result_Jeong, net_stats, line = "TRUE", min_A = 1, max_A = 1000, + cex = 3, cex.axis = 2, cex.lab = 2) R> plot(result_Newman, net_stats, line = "TRUE", min_A = 1, max_A = 1000, + cex = 3, cex.axis = 2, cex.lab = 2) R> plot(result_onlyA, net_stats, line = "TRUE", min_A = 1, max_A = 1000, + cex = 3, cex.axis = 2, cex.lab = 2) The options min_A = 1 and max_A = 1000 specify the range of the v ertical axis and are needed for making the plots comparable. The high v ariance of ˆ α from either Jeong’s metho d or Newman’s metho d would render qual- 24 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness itativ e assessments of the P A function inconclusive, if one relied only on those metho ds: one could not confiden tly ascertain whether the P A function is sub-linear, linear, or sup er-linear in nature. W e notice that the estimated A k obtained from the joint estimation resem bles that of Figure 8e , when w e estimate it in isolation. The reason is that estimated no de fitnesses in Figure 8b are highly concen trated around the mean. Thus their distribution is not very far from the case when all the fitnesses are 1. Nev ertheless, w e observ e that the estimated A k from the joint estimation is reduced when compared with that of Figure 8e . This is exp ected since in the join t estimation, a p ortion of the connection probabilit y in Equation 1 is explain ed b y node fitness. Although the distribution in the plot of Figure 8b is concentrated around its mean, we notice that its righ t tail is rather long, which is a sign that this tail con tains interesting information. W e can extract the information from this region by finding the topmost ‘fittest’ net work scien tists. This can b e done as follo ws: R> sorted_fit <- sort(full_result$estimate_result$f, decreasing = TRUE) R> top_scientist <- coauthor.author_id[names(sorted_fit), ] R> print(cbind(sorted_fit[1:10], top_scientist[1:10, 2])) This snipp et will produce the results show in T able 6 . The table sho ws the ten net work scien tists that we found to ha v e the highest ability to attract new collaborators from the field. An yone acquaint ed with the field will recognized a num b er of familiar faces. F or example, at Rank Estimated fitness Name 1 1.42 BARABASI, A 2 1.35 NEWMAN, M 3 1.26 JEONG, H 4 1.25 LA TORA, V 5 1.24 ALON, U 6 1.23 OL TV AI, Z 7 1.23 YOUNG, M 8 1.22 W ANG, B 9 1.21 SOLE, R 10 1.21 BOCCALETTI, S T able 6: T en topmost ‘fittest’ netw ork scientists in the field of complex net w orks. the top of the list is none other than Alb ert-L´ aszl´ o Barab´ asi, who intr o duced the BA mo del. Num b er tw o and num b er three are Mark Newman and Haw o ong Jeong, who resp ectiv ely are the authors of the ep on ymously named Newman’s metho d and Jeong’s metho d. 8. Conclusion W e in tro duced the R pack age P AFit , whic h pro vides a comprehensiv e framew ork for the non-parametric estimation of P A and no de fitness mec hanisms in the growt h of temp oral complex net works. In summary , P AFit implemen ts functions to simul ate v arious te mp oral net work mo dels based on these t wo mechanisms, gathers summary statistics from real-w orld temp oral net work datasets, and estimates non-parametrically the attachmen t function and Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 25 no de fitnesses. W e provided a n um b er of simu lated examples, as well as a complete analysis of a real-w orld collab oration netw ork. Ac kno wledgmen ts This w ork was supported in part b y gran ts from the Japan So ciet y for the Promotion of Science KAKENHI [JP16J03918 to T.P . and 16H01547 to H.S.]. References Ak aike H (1974). “A New Lo ok at the Statistical Mo del Identification.” IEEE T r ansactions on A utomatic Contr ol , 19 (6), 716–723. doi:10.1109/TAC.1974.1100705 . Alb ert R, Barab´ asi A (1999). “Emergence of Scaling in Random Net works.” Scienc e , 286 , 509–512. doi:10.1126 /science.286.5439.509 . Alb ert R, Jeong H, Barab´ asi A (2000). “Error and A ttack T olerance of Complex Net works.” Natur e , 406 (6794), 378–382. doi:10.1038/35019019 . Allison PD (1980). “Estimation and T esting for a Marko v Mo del of Reinforcemen t.” So cio- lo gic al Metho ds & R ese ar ch , 8 (4), 434–453. doi:10.1177/004912 418000800405 . Barab´ asi AL, Albert R, Jeong H (2000). “Scale-F ree Characteristics of Random Netw orks: The T op ology of The W orld-Wide W eb.” Physic a A: Statistic al Me chanics and its Applic ations , 281 , 69 – 77. doi:10.1016/S0378- 4371(00)00018- 2 . Bender-deMoll S, Morris M (2016). tsna : T o ols for T emp or al So cial Network A nalysis . R pac k age version 0.2.0, URL https://CRAN.R- project.org/package=tsna . Bez´ ak ov´ a I, Kalai A, San thanam R (2006). “Graph Mo del Selection Using Maxim um Lik e- liho od.” In Pr o c e e dings of the 23r d International Confer enc e on Machine L e arning , ICML ’06, pp. 105–112. A CM, New Y ork, NY, USA. doi:10.1145/1143844.1143858 . Bianconni G, Barab´ asi A (2001). “Comp etition and Multiscaling in Ev olving Net works.” Eur ophysics L etters , 54 , 436. doi:10.1209 /epl/i2001- 00260- 6 . Borgs C, Cha y es J, Dask alakis C, Ro c h S (2007). “First to Mark et is not Everyth ing: an Analysis of Preferential Attac hmen t with Fitness.” In Pr o c e e dings of the thirty-ninth annual A CM symp osium on The ory of c omputing . doi:10.1145/1250790.125 0812 . Bo x-Steffensmeier JM, De Bo ef S (2006). “Rep eated Ev en ts Surviv al Mo dels: the Conditional F railty Mo del.” Statistics in Me dicine , 25 (20), 3518–3533. doi:10.1002/sim.2434 . Butts CT (2016). sna : T o ols for So cial Network A nalysis . R pack age v ersion 2.4, URL https://CRAN.R- project.org/package= sna . Butts CT, Leslie-Cook A, Krivitsky PN, Bender-deMoll S (2016). networkDynamic : Dy- namic Extensions for Network Obje cts . R pack age v ersion 0.9.0, URL https://CRAN. R- project.org/package=networkDynami c . 26 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Caldarelli G (2007). Sc ale-F r e e Networks . Oxford Universiy Press. Caldarelli G, Capo cci A, De Los Rios P , Mu ˜ noz MA (2002). “Scale-F ree Net works from V arying V ertex Intri nsic Fitness.” Physic al R eview L etters , 89 , 258702. doi:10.1103/ PhysRevLett.89.258702 . Calla wa y DS, Hopcroft JE, Klein b erg JM, Newman MEJ, Strogatz SH (2001). “Are Randomly Gro wn Graphs Really Random?” Physic al R eview E , 64 , 041902. doi:10.1103/PhysRevE. 64.041902 . Clauset A, Shalizi CR, Newman MEJ (2009). “Po wer-La w Distributions in Empirical Data.” SIAM R eview , 51 (4), 661–703. doi:10.1137/070 710111 . Coleman JS (1964). Intr o duction to Mathematic al So ciolo gy . F ree Press of Glenco e London. Csardi G, Nepusz T (2006). “The igraph Softw are Pac k age for Complex Net work Research.” InterJournal , Complex Systems , 1695. URL http://igraph.org . Dorogo vtsev SN, Mendes JFF (2003). Evolution of Networks: F r om Biolo gic al Nets to the Internet and WWW . Oxford Universit y Press, Inc., New Y ork, NY, USA. Dunne JA, Williams RJ, Martinez ND (2002). “F o o d-W eb Structur e and Net w ork Theory: the Role of Connectance and Size.” Pr o c e e dings of the National A c ademy of Scienc es , 99 (20), 12917–12922. doi:10.1073 /pnas.192407699 . Eddelbuettel D (2013). Se amless R and C++ Inte gr ation with R cpp . Springer-V erlag, New Y ork. Eddelbuettel D, Balamuta JJ (2017). “Extending R with C++ : a Brief Introduction to Rcpp . ” Pe erJ Pr eprints , 5 , e3188v1. doi:10.7287/p eerj.preprints.3188v1 . Eddelbuettel D, F ran¸ cois R (2011). “ Rcpp : Seamless R and C++ Integration.” Journal of Statistic al Softwar e, A rticles , 40 (8), 1–18. doi:10.18637/jss.v04 0.i08 . Eom YH, Jo HH (2014). “Generalized F riendship Parado x in Complex Netw orks: the Case of Scien tific Collab oration.” Scientific R ep orts , 4 . doi:10.1038/srep04603 . Erd ¨ os P , R´ enyi A (1959). “On Random Graphs.” Public ationes Mathematic ae Debr e c en , 6 , 290–297. F eld SL (1991). “Why Y our F riends Hav e More F riends Than Y ou Do.” Americ an Journal of So ciolo gy , 96 (6), 1464–1477. doi:10.1086/229693 . Guetz AN, Holmes SP (2011). “Adapt iv e Imp ortance Sampling for Netw ork Growth Models.” A nnals of Op er ations R ese ar ch , 189 (1), 187–203. doi:10.1007/s10479 - 010- 0685- 2 . Handco c k MS, Hunter DR, Butts CT, Goo dreau SM, Krivitsky PN, Bender-deMoll S, Morris M (2016). statnet : Softwar e T o ols for the Statistic al A nalysis of Network Data . The Statnet Pro ject ( http://www.statnet.org ). R pack age v ersion 2016.9, URL https:// CRAN.R- project.org/package=statnet . Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 27 Handco c k MS, Hun ter DR, Butts CT, Go o dreau SM, Krivitsky PN, Morris M (2018). er gm : Fit, Simulate and Diagnose Exp onential-F amily Mo dels for Networks . The Statnet Pro ject ( http://www.statnet.org ). R pack age version 3.9.4, URL https://CRAN.R- project. org/package=ergm . Handco c k MS, Hunter DR, Butts CT, Goo dreau SM, Morris M (2008). “ statnet : Soft w are T o ols for the Represen tation, Visualization, Analysis and Sim ulation of Netw ork Data.” Journal of Statistic al Softwar e, A rticles , 24 (1), 1–11. doi:10.18637/jss.v024.i01 . Handco c k MS, Jones JH (2004). “Lik eliho o d-Based Inference for Sto c hastic Models of Sexual Net work F ormation.” The or etic al Population Biolo gy , 65 (4), 413 – 422. doi:10.1016/j. tpb.2003.09.006 . Demography in the 21st Century . Hun ter D, Lange K (2000). “Quanti le Regression via an MM Algorithm.” Journal of Compu- tational and Gr aphic al Statistics , pp. 60–77. doi:10.2307/1390613 . Hun ter D, Lange K (2004). “A T utorial on MM Algorithms.” The A meric an Statistician , 58 , 30–37. doi:10.1198 /0003130042836 . Hun ter DR, Handco c k MS, Butts CT, Go o dreau SM, Morris M (2008). “ ergm : a Pac k age to Fit, Sim ulate and Diagnose Exp onen tial-F amily Mo dels for Net works.” Journal of Statistic al Softwar e, Articles , 24 (3), 1–29. doi:10.1863 7/jss.v024.i03 . INRA, Leger JB (2015). blo ckmo dels : L atent and Sto chastic Blo ck Mo del Estimation by a ‘V- EM’ A lgorithm . R pack age v ersion 1.1.1, URL https://CRAN.R- project.org/package= blockmodels . Irwin JO (1963). “The Place of Mathematics in Medical and Biological Statistics.” Journal of the R oyal Statistic al So ciety A , 126 (1), 1–45. doi:10.2307/2982445 . Jeong H, N´ eda Z, Barab´ asi A (2003). “Measuring Preferen tial A ttachmen t in Evolving Net- w orks.” Eur ophysics L etters , 61 (61), 567–572. doi:10.1209/epl/ i2003- 00166- 9 . Kelly PJ, Lim LL Y (2000). “Surviv al Analysis for Recurren t Ev ent Data: an Application to Childho od Infectious Diseases.” Statistics in Me dicine , 19 (1), 13–33. doi:10.1002/(SICI) 1097- 0258(20000115)19:1< 13::AID- SIM279> 3.0.CO;2- 5 . Kong J, Sarshar N, Royc howdh ury V (2008). “Exp erience v ersus T alent Shap es the Structure of the W eb.” Pr o c e e dings of the National A c ademy of Scienc es of the USA , 37 , 105. doi: 10.1073/pnas.0805921105 . Krapivsky P , Rodgers G, Redner S (2001). “Organization of Gro wing Net works.” Physic al R eview E , p. 066123. doi:10.1103/PhysRevE .63.066123 . Krivitsky P , Handco ck M (2008). “Fitti ng Latent Cluster Mo dels for Net works with laten tnet .” Journal of Statistic al Softwar e, A rticles , 24 (5), 1–23. doi:10.18637/jss.v024.i05 . Krivitsky PN, Handcock MS (2018a). latentnet : L atent Position and Cluster Mo dels for Statistic al Networks . The Statnet Pro ject ( http://www.statnet.org ). R pac k age v ersion 2.9.0, URL https://CRAN.R- project.org/package=la tentnet . 28 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Krivitsky PN, Handco ck MS (2018b). ter gm : Fit, Simulate and Diagnose Mo dels for Net- work Evolution Base d on Exp onential-F amily R andom Gr aph Mo dels . The Statnet Pro ject ( http://www.statnet.org ). R pack age version 3.5.2, URL https://CRAN.R- project. org/package=tergm . Kunegis J (2013). “K ONECT – The Kobl enz Netw ork Collection.” konect.uni-k oblenz.de. URL http://konect.uni- koblenz.de/ . Kunegis J, Blattner M, Moser C (2013). “Preferen tial Attac hment in Online Netw orks: Mea- suremen t and Explanations.” In Pr o c e e dings of the 5th A nnual A CM Web Scienc e Con- fer enc e , W ebSci ’13, pp. 205–214. ACM, New Y ork, NY, USA. doi:10.1145/2464464. 2464514 . Leifeld P , Cranmer S, Desmarais B (2018). “T emp oral Exp onential Random Graph Mo d- els with btergm : Estimation and Bootstrap Confidence Interv als.” Journal of Statistic al Softwar e, Articles , 83 (6), 1–36. doi:10.1863 7/jss.v083.i06 . Lesk ov ec J, Krevl A (2014). “SNAP Datasets: Stanford Large Netw ork Dataset Collection.” http://snap.stanford.ed u/data . Liljeros F, Edl ing CR, Amaral LAN, Stanley HE, Aberg Y (2001). “The W eb of Human Sexual Con tacts.” Natur e , 411 (6840), 907–908. doi:10.1038/35082140 . Matias C, Miele V (2016). “Statistical Clustering of T emp oral Netw orks Through a Dynamic Sto c hastic Blo c k Mo del.” Journal of the R oyal Statistic al So ciety B , 79 (4), 1119–1141. doi:10.1111/rssb.12200 . Matias C, Miele V (2018). dynsbm : Dynamic Sto chastic Blo ck Mo dels . R pac k age version 0.5, URL https://CRAN.R- project.org/package=dyn sbm . Momeni N, Rabbat MG (2015). Me asuring the Gener alize d F riendship Par adox in Networks with Quality-Dep endent Conne ctivity , pp. 45–55. Springer-V erlag In ternational Publishing, Cham. doi:10.100 7/978- 3- 319- 16112- 9_5 . Nek ov ee M, Moreno Y, Bianconi G, Marsili M (2007). “Theory of Rumour Spreading in Complex So cial Netw orks.” Physic a A: Statistic al Me chanics and its Applic ations , 374 (1), 457 – 470. doi:10.1016/j.p hysa.2006.07.017 . Newman M (2001). “Clu stering and Preferen tial Att ac hmen t in Gro wing Net w orks.” Physic al R eview E , 64 (2), 025102. doi:10.1103/PhysR evE.64.025102 . Newman M (2006). “Finding Commu nit y Structure in Net works Using the Eigen vectors of Matrices.” Physic al R eview E , 74 , 036104. doi:10.1103/PhysRevE.74.036104 . Newman M (2010). Networks: an Intr o duction . Oxford Univ ersity Press, Inc., New Y ork, NY, USA. Newman M, F orrest S , Balthrop J (2002). “Email Net works and the Spread of Computer Viruses.” Physic al R eview E , 66 , 035101. doi:10.1103/PhysRevE.66.035101 . P astor-Satorras R, V espignani A (2001). “Epidemic Spreading in Scale-F ree Netw orks.” Phys- ic al R eview L etters , 86 , 3200–3203. doi:10.1103/PhysRevLett .86.3200 . Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 29 Pham T, Sheridan P , Shimo daira H (2015). “P AFit: a Statistical Metho d for Measuring Preferen tial A ttac hmen t in T emp oral Complex Netw orks.” PLOS ONE , 10 (9), e0137796. doi:10.1371/journal.pon e.0137796 . Pham T, Sheridan P , Shi mo daira H (2016). “Join t Estimation of Preferen tial A ttac hment and No de Fitness in Growing Complex Net work s.” Scientific R ep orts , 6 . doi:10.1038/ srep32558 . Pham T, Sheridan P , Shimo daira H (2018). P AFit : Gener ative Me chanism Estimation in T emp or al Complex Networks . R pac k age version 1.0.0.6, URL https://CRAN.R- project. org/package=PAFit . Price DdS (1976). “A General Theory of Bibliometric and other Cum ulative Adv antage Pro cesses.” Journal of the A meric an So ciety for Information Scienc e , 27 , 292–306. doi: 10.1002/asi.4630270505 . Raftery AE (1995). “Ba yesian Mo del Selection in So cial Research .” So ciolo gic al Metho dolo gy , 25 , 111–163. ISSN 00811750, 14679531. doi:10.2307/271063 . Redner S (2005). “Citation Statist ics from 110 Y ears of Ph ysical Review.” Physics T o day , 58 (6), 49–54. doi:10.1063/1.1996475 . Ripley RM, Snijders T AB, B´ oda Z, V ¨ or ¨ os A, Preciado P (2018). “Man ual for Siena V ersion 4.0.” T e chnic al r ep ort , Oxford: Universit y of Oxford, Departmen t of Statistics; Nuffield College. R pac k age v ersion 1.2-12. https://www.cran.r- project.org/web /packages/ RSiena/ . Ronda-Pup o GA, Pham T (2018). “The Evolutions of the Ric h Get Richer and the Fit Get Ric her Phenomena in Scholarly Net works: the Case of the Strategic Managemen t Journal.” Scientometrics , 116 (1), 363–383. doi:10.1007/s11192- 018- 2761- 3 . Sc hw einberger M, Luna P (2018). “ hergm : Hierarc hical Exp onential-F amily Random Graph Mo dels.” Journal of Statistic al Softwar e, Articles , 85 (1), 1–39. doi:10.18637/jss.v085. i01 . Sheridan P , Kamimura T, Shimo daira H (2010). “A Scale-F ree Structure Prior for Graphical Mo dels with Applications in F unctional Genomics.” PL oS ONE , 5 (11). doi:10.1371/ journal.pone.0013580 . Sheridan P , Onodera T (2018). “A Preferen tial Attac hment P aradox: How Preferen tial At- tac hment Com bines with Growth to Produce Net works with Log-Normal In-Degree Distri- butions.” Scientific R ep orts , 8 (1), 2811. doi:10.1038/s41598 - 018- 21133- 2 . Simon HA (1955). “On a Class of Skew Distribution F unctions.” Biometrika , 4 2 (3-4), 425–440. doi:10.1093/biomet/42.3 - 4.425 . Sinatra R, W ang D, Deville P , Song C, Barab´ asi AL (2016). “Quantifying the Ev olution of Individual Scientific Impact.” Scienc e , 354 (6312). doi:10.1126/science.aa f5239 . W ang D, Song C, Barab´ asi AL (2013). “Quantifying Long-T erm Scientific Impact.” Scienc e , 342 (6154), 127–132. doi:10.1126/science.1237825 . 30 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Y ule GU (1925). “A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J.C. Willis, F.R.S.” Philosophic al T r ansactions of the R oyal So ciety of L ondon B: Biolo gic al Scienc es , 213 (402-410), 21–87. doi:10.1098/rstb.1925.00 02 . Zhou H, Alexander D, Lange K (2011). “A Quasi-Newton Acceleration for High-Dimensional Optimization Algorithms.” Statistics and Computing , 21 , 261–273. doi:10.1007/ s11222- 009- 9166- 3 . Affiliation: Thong Pham Mathematical Statistics T eam RIKEN Cen ter for Adv anced Intelligence Pro ject, T oky o, Japan E-mail: thong.pha m@riken.jp

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment