PAFit: an R Package for the Non-Parametric Estimation of Preferential Attachment and Node Fitness in Temporal Complex Networks

P AFit : an R P ac k age for the Non-P arametric Estimation of Preferen tial A ttac hmen t and No de Fitness in T emp oral Complex Net w orks Thong Pham RIKEN Cen ter for AIP P aul Sheridan Hirosaki Univ ersity Hidetoshi Shimo daira Ky oto Universit y RIKEN Cen ter for AIP Abstract Man y real-w orld systems are proﬁtably describ ed as complex net w orks that gro w o v er time. Preferen tial attac hmen t and no de ﬁtness are tw o simple growth mechanisms that not only explain certain structural properties commonly observed in real-worl d systems, but are also tied to a num b er of applications in mo deling and inference. While there are statistical pac k ages for estimating v arious parame tric forms of the preferen tial attachmen t function, there is no such pack age implementing non-parametric estimation procedures. The non-parametric approach to the estimation of the preferential attachmen t function allo ws for comparativ ely ﬁner-grained in v estigations of the ‘rich-get -ric her’ phenomenon that could lead to nov el insights in the search to explain certain nonstandard structural properties observed in real-world net works. This pap er introduces the R pack age P AFit , whic h implements non-parametric pro cedures for estimating the preferential attachmen t function and no de ﬁtnesses in a gro wing netw ork, as well as a num b er of functions for generating complex net works from these t wo mec hanisms. The main computational part of the pack age is implemented in C++ with OpenMP to ensure scalabilit y to large-scale net w orks. In this pap er, we ﬁrst in troduce the main funct ionalities of P AFit through sim ulated examples, and then use the pac k age to analyze a collaboration net work betw een scien tists in the ﬁeld of complex netw orks. The results indicate the joint presence of ‘rich- get-ric her’ and ‘ﬁt-get-ric her’ phenomena in the collab oration netw ork. The estimated attac hmen t function is observ ed to b e near-linear, which w e in terpret as meaning that the c hance an author gets a new collab orator is proportional to their current n umber of collaborators. F urthermore, the estimated author ﬁtnesses reveal a host of familiar faces from the complex netw orks comm unity among the ﬁeld’s topmost ﬁttest netw ork scien tists. Keywor ds : temporal net works , dynamic netw orks, preferen tial attachmen t, ﬁtness, ric h-get- ric her, ﬁt-get-richer, R , C++ , Rcpp , Op enMP . 1. In tro duction Since the end of the last century , complex netw orks hav e b een increasingly used in mo deling man y temporal relati ons found in div erse ﬁeld s ( Dorogovtsev and Mendes 2003 ; Caldarelli 2007 ; Newman 2010 ). Some notable examples include collaboration net works b et ween au- thors in a scien tiﬁc ﬁeld ( Newman 2001 ), connection net works b etw een computers on the In ternet ( Barab´ asi et al. 2000 ), and sexual relation netw orks b et ween members of a com- 2 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness m unity ( Liljeros et al. 2001 ). The primary motiv ation for using complex net works as a simpliﬁed represen tation of real-w orld systems is that they shed light on the b eha viors of complex systems through the study of underlying patterns of connections. Although this is an ov er-simpliﬁcation for systems dep ending heavily on domain-speciﬁc details, this approac h nev ertheless oﬀers a ﬁrst view of a system’s top ological prop erties, and can b e used to guide subsequen t in-depth analyses. Among the most important real-w orld net work structural prop erties is degree distribution. Degree distribution lets us understand the prop ortion of highly and lowly connected nodes in a netw ork. Since the highly connected no des are key components of a net work, this under- standing in turn sheds ligh t on the answ ers of imp ortant practical questions, including ho w to prev ent the spreading of rumors ( Neko vee et al. 2007 ), how to stop a virus outbreak ( P astor- Satorras and V espignani 2001 ), and ho w to guard against cyb ernetic attac ks ( Alb ert et al. 2000 ). The degree distributions of man y real-world net w orks ha ve been found to b e hea vy-tailed ( Al- b ert and Barab´ asi 1999 ). The b est-kno wn heavy- tailed distribution in net w ork science is the p o wer-la w, whic h is a distribution where the n umber of nodes in a net w ork with degree k is prop ortional to k − γ with γ > 1. Besides the p o wer-la w, there is emerging evidence that real-world net work degree distributions ha ve other heavy-tailed forms, including the log- normal ( Redner 2005 ), exponential ( Dunne et al. 2002 ), stretched exp onen tial ( Newman et al. 2002 ), and p o wer-la w with exp onen tial cut-oﬀ ( Clauset et al. 2009 ). All of these hea vy-tailed distributions diﬀer from the ligh t-tailed binomial degree distribution, whic h is c haracteristic of netw orks pro duced by the classical Erd ¨ os-R ´ en yi (ER) random graph mo del ( Erd ¨ os and R ´ en yi 1959 ). This observ ation prompted net work scien tists to searc h for new mo deling ingredients capable of explaining hea vy-tailed degree distributions. What they found is that tem p oral complex net w ork mo dels incorp orating gro wth mec hanisms oﬀer a p o werful mo deling framew ork for achieving this end. T emp oral complex netw ork mo dels, or temporal net work models for short, are probabilistic generativ e mo dels of real-w orld netw orks that c hange with time. In its most common form, a temp oral net work model assumes that a netw ork grows gradu ally from some initial state b y the addition of new no des and edges o v er a large num b er of discrete time-steps. Some well-kno wn basic models in the ﬁeld of complex net works are the Barab´ asi-Alb ert (BA) mo del ( Alb ert and Barab´ asi 1999 ) and the Bianconi-Barab´ asi (BB) mo del ( Bianconni and Barab´ asi 2001 ). More complex growth models that are used in the ﬁeld include exponential random graph mo dels ( Ripley et al. 2018 ; Krivitsky and Handcock 2018b ) and d ynamic stochasti c block mo dels ( Matias and Miele 2016 ). Gro wth mec hanisms, whic h gov ern ho w a no de acquir es new edges in the gro wth pro cess, are the most important elemen ts that distinguish diﬀeren t temp oral netw ork mo dels. This pap er focuses on estimating tw o in terpretable gro wth mec hanisms: preferen tial attac h- men t (P A) and no de ﬁtness. In the P A mechanism, the probabilit y P i ( t ) that a no de v i acquires a new edge at time t is prop ortional to a p ositiv e function, A k i ( t ) , of its curren t degree k i ( t ). The function A k is called the attachmen t function. The name ‘preferen tial attac hment’ stems from the motiv ation for the mec hanism: if A k is an increasing functi on, a highly connected no de will acquire more edges than a lowly-connected no de, which is an app ealing prop erty in man y real-w orld situations. F rom now, w e will sa y that P A exists if A k is an increasing function. The opposite of P A, whic h w e call anti-P A, occurs when A k is Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 3 a decreasing function. Note, how ev er, that the meaning we use here diﬀers from the original meaning of the term ‘preferen tial attac hmen t’ used in the BA mo del, whic h means only the linear case of A k = k . This linear form in fact has b een long known in other ﬁelds with v arious names suc h as ‘rich-get -ric her’ ( Simon 1955 ) and ‘cumulativ e adv antage’ ( Price 1976 ). When A k assumes the log-linear form of k α , with α called the attac hment exp onen t, w e ha ve the generalized BA model ( Krapivsky et al. 2001 ). In contrast with the P A mechanism, in the ﬁtness mec hanism the probab ilit y P i ( t ) that a no de v i acquires a new edge dep ends only on the p ositiv e num b er η i . The quantit y η i is called the node ﬁtness, or just ﬁtness, of v i and can b e in terpreted as its in trinsic attractiv eness. The ﬁtness mec hanism oﬀers a simple wa y to express the v ariance in edge-acquisition potential b et ween nodes of the same degree. F or example, t wo early-career scien tists with roughly the same num b er of collaborators at some p oin t in time ma y acquire diﬀerent n um b ers of collab orators in the future based on their intrinsic ﬁtnesses. The P A and no de ﬁtness mec hanisms combine to produce a wide range of degree distributions. In their com bined form, the probability P i ( t ) is prop ortional to the pro duct of A k i ( t ) and η i : P i ( t ) ∝ A k i ( t ) × η i . (1) Based on the functional form of A k and the distribution of η i ’s, the model deﬁned b y Equa- tion 1 can pro duce netw orks with v arious degree distributions ( Bianconni and Barab´ asi 2001 ; Caldarelli et al. 2002 ; Borgs et al. 2007 ; Kong et al. 2008 ). In Section 2 , we will discuss the relation of Equation 1 with existing statistical mo dels. Equation 1 has a num b er of applications. Based on the functional forms of A k and η i , w e can test for the presence of one and/or the other of the ‘ric h-get-richer’ and ‘ﬁt-get-richer ’ phenom- ena in a temp oral net work ( Pham et al. 2016 ). These t wo mec hanisms ha ve b een adv anced to explain another phenomenon called the ‘generalized friendship paradox’ ( F eld 1991 ; Eom and Jo 2014 ; Momeni and Rabbat 2015 ). They are also used in inference problems in biological net works ( Sheridan et al. 2010 ; Guetz and Holmes 2011 ), the W orld Wide W eb ( Kong et al. 2008 ), In ternet top ology graphs ( Bez´ ak o v´ a et al. 2006 ), and citation netw orks ( W ang et al. 2013 ; Sinatra et al. 2016 ; Ronda-Pup o and Pham 2018 ). Finally , w e can classify real-w orld temp oral net work data based on the estimated attac hmen t exponent of A k ( Kunegis et al. 2013 ). While there are existing R pac k ages that estimate P A in a gro wing netw ork, including the pac k ages tergm ( Krivitsky and Handco ck 2018b ) and RSiena ( Ripley et al. 2018 ). These pac k ages, ho wev er, rely on parametric metho ds to estimated the A k function. This means that one has to assume a functional form for A k , rather than learning it from observed data without constraint. Non-parametric estimation of A k allo ws for a ﬁner inspection of the ‘ric h-get-richer’ phenomenon ( Pham et al. 2015 , 2016 ), and such metho ds hav e b een used to pro vide clues to explain irregularities observ ed in real-w orld degree distributions ( Sheridan and Ono dera 2018 ). This pap er in tro duces the R pack age P AFit ( Pham et al. 2018 ), whic h ﬁlls the gap with an implemen tation of the standard P A and node ﬁtness non-parametric estimation procedures. In particular, w e implement Jeong’s met ho d ( Jeong et al. 2003 ), Newman’s metho d ( Newman 2001 ) and the P AFit method ( Pham et al. 2015 , 2016 ) in the pack age. The ﬁrst t w o are heuris- tic metho ds that are widely used to estimate non-parametrically the attachmen t function A k in isolation, while the last one is a statistical metho d that can non-parametrically estimate 4 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness either A k (or η i ) in isolation or simultaneously estimate the t wo mechani sms. Although using P AFit is advisable in almost every circumstance, Jeong’s metho d and Newman’s metho d are still widely used and migh t still b e appropriate in certai n situations. Therefore, the inclu- sion of the tw o heuristic metho ds in th e pack age is warran ted. W e discuss their strengths and shortcomings in Section 2 when we provide an o verview of the metho dology and related statistical mo dels. The pack age also implements a v ariet y of functions to sim ulate temp oral netw orks from the P A and node ﬁtness mec hanisms, as w ell as fu nctions to pl ot the estim ated results and underlying uncertain ties. W e review P AFit ’s main functions in Section 3 . Before demonstrating their usages on three simulated examples in Section 5 , we discuss ho w P AFit relates with existing net work analysis pac k ages in Section 4 . W e pro vide a systematic simulat ion to asses the results of the non-parametric join t estimation in Section 6 , before sho wing a complete end- to-end w ork-ﬂo w analyzing a collab oration netw ork of scien tists from the ﬁeld of complex net works in Section 7 . Finally , concluding remarks are given in Section 8 . 2. Mathematical bac kground Here we review the standard methods for estimating the attachmen t function A k and no de ﬁtnesses η i in a temp oral net work. In Section 2.1 , w e state the net work gro wth mo del used in the pack age as w ell as discuss its relation with exiting statistical mo dels. W e review the estimation of A k in isolation in Section 2.2 , then the estimation of the η i ’s in isolation in Section 2.3 , and ﬁnally the joint estimation of A k and the η i ’s in Section 2.4 . 2.1. Net w ork mo del First w e describe the General T emp oral (GT) mo del ( Pham et al. 2016 ) used in P AFit . The mo del is a generalization of man y w ell-kno wn temp oral netw ork models in the complex net work ﬁeld. Starting from some giv en initial graph G 0 , the GT mo del generates a temporal netw ork sequen tially as follo ws: at time-step t ≥ 1, the net work G t is obtained by adding new edges and new nodes to G t − 1 . The n umber of new edges and new nodes added at time t is denot ed as m ( t ) and n ( t ), resp ectiv ely . The mo del assumes that the parameters go verning the distributions of G 0 , m ( t ) and n ( t ) do not inv olv e A k and the η i ’s. Note that the term k i ( t ) is deﬁned as the degree of node v i at the onset of time-step t . On top of these structur al preconditions, the GT mo del assumes that the probabilit y that a no de v i with degree k i ( t ) receiv es a new edge at time t is giv en b y the form ula of Equation 1 . The GT mo del includes a handful of well-kno wn gro wing net work mo dels, based on P A and no de ﬁtness as sp ecial cases; see T able 1 for a summary . Unlik e the BA or BB models, the GT mo del allows for the emergence of new edges b etw een old nodes and can handle b oth undirect and directed netw orks. W e refer readers to Supplement ary Information Section S2.2 in Pham et al. ( 2016 ) for the deﬁnition of the mo del in the case of undirected netw orks. The GT model is related to mo dels used in the R pac k ages RSiena and tergm . But w e defer a discussion of these pack ages to Section 4 . Lo oking beyond the ﬁeld of complex netw orks, the GT mo del bears some similarities to the contagious P oisson process ( Coleman 1964 ; Allison 1980 ) and the conditional frailt y mo del ( Kelly and Lim 2000 ; Box-Steﬀensmeier and De Bo ef 2006 ). In the con tagious P ois- Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 5 T emp oral netw ork mo del A ttachmen t function No de ﬁtness Gro wing ER mo del ( Callaw ay et al. 2001 ) A k = 1 η i = 1 BA mo del A k = k η i = 1 Caldarelli mo del A k = 1 F ree BB mo del A k = k F ree T able 1: Some we ll-kno wn special cases of the GT mo del as deﬁned by Equation 1 . son pro cess, the initial prop ensit y of each node plays a similar role to that of node ﬁtness and the rate of enforcement represen ts the P A mec hanism. In the conditional frailt y mo del, while the frailt y of eac h no de describ es the heterogeneity among no des and thu s is similar to no de ﬁtness, the ev ent-based baseline hazard rate has the same eﬀect as the non-parametric function A k . 2.2. A ttac hment function estimation The methods for estimating the attac hment function A k in isolation assume a simpliﬁed v ersion of Equation 1 , in whic h the η i are assumed to b e 1. Thus the probability P i ( t ) in Equation 1 dep ends only on A k . Perhaps the most frequen tly-encountered parametric v ersion of this mo del is the log-linear form A k = k α with attac hmen t exp onen t α . Net work scientists are particularly interested in estimating α , since the asymptotic degree distribution of the net work corresponds to simple regions of α . If α is less than unity (the sub-linear case), then the degree distribution is a stretched exp onen tial, while in the sup er-linear case of α > 1, one no de will ev entually get all the incoming new edges ( Krapivsky et al. 2001 ). It is only the linear case of α = 1 that gives rise to a p o wer-la w distribution. Concerning the ab o ve mo del, there are three main metho ds for estimating A k : Jeong’s metho d ( Jeong et al. 2003 ), Newman’s metho d ( Newman 2001 ), and P AFit ( Pham et al. 2015 ). Jeong’s metho d basically makes a histogram of the num b er of new edges n k connected to a no de with degree k , then divides n k b y the num b er of no des with degree k in the system to get A k ( Jeong et al. 2003 ). Jeong’s metho d has the merit of b eing simple, but estimates obtained using the metho d are sub ject to high v ariance and low accuracy ( Pham et al. 2015 ). By contrast, Newman’s method combines a series of histograms for low er v ariance and higher accuracy ( Newman 2001 ). Note that in P AFit w e implemen ted a corrected version of New- man’s original metho d ( Pham et al. 2015 ). The main drawbac k of Newman’s metho d is that the mathematical assumption b ehind its deriv ation holds only when α = 1, th us the metho d amoun ts to an appro ximation when α 6 = 1 ( Pham et al. 2015 ). The ﬁnal method is P AFit ( Pham et al. 2015 ). It iteratively maximizes an ob jectiv e function that is a combination of the log-likelihoo d of the mo del with a regularization term for A k b y a Minorization-Maximization (MM) algorithm ( Hunter and Lange 2000 ). There is a h yp er- parameter, called r , in the method that con trols the strength of the regularization. P AFit c ho oses r automatically by cross-v alidation ( Pham et al. 2016 ). W e defer the details to Section 2.4 . The metho d is not only able to recov er A k accurately , but also can estimate the standard deviation of the estimated A k for each k ( Pham et al. 2015 ). Its main dra wbac k is that it migh t b e slow, since it is an iterative algorithm. 2.3. No de ﬁtness estimation 6 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness When we consider only no de ﬁtnesses, there are t wo generative mo dels in the literature with diﬀeren t assumptions regarding the functional form of A k in Equation 1 . While the Caldarelli mo del ( Caldarelli et al. 2002 ) assumes that A k is 1 for all k , the BB mo del ( Bianconni and Barab´ asi 2001 ) assumes that A k = k . Both models ha v e b een sho wn to generate net works with v arious heavy-tailed distributions ( Borgs et al. 2007 ; Kong et al. 2008 ). No de ﬁtnesses in both models can b e estimated by v arian ts of the P AFit method prop osed in Pham et al. ( 2016 ), by either setting A k = k for the BB mo del or A k = 1 for the Caldarelli mo del. These estimation metho ds use MM algorithms that maximize the corresp onding log- lik eliho o d functions with a regularization term that regularizes the distribution of the η i ’s. More sp eciﬁcally , the inv erse v ariance of this distribution is con trolled by a h yp er-parameter, called s , whic h is chosen automatically b y cross-v alidation. W e defer a more detail discussion to the next section. W e note that node ﬁtnesses in the BB mo del can also be estimated b y the method in Kong et al. ( 2008 ). But since P AFit has b een sho wn to outperform this metho d ( Pham et al. 2016 ), we did not include it in the pac k age. 2.4. Join t estimation of the attac hmen t function and no de ﬁtnesses Finally , by using the full model in Equation 1 the metho d P AFit in Pham et al. ( 2016 ) can join tly estimate A k and η i . W e note this full mo del includes all the temp oral netw ork mo dels sho wn in T able 1 . F or a more complete table, see T able 1 in Pham et al. ( 2016 ). The ob jectiv e function of P AFit is a com bination of the log-lik eliho o d of the full model deﬁned b y Equation 1 and t wo regularization terms: one for A k and one for η i . While w e refer readers to Supplemen tary Information Section S2.3 in Pham et al. ( 2016 ) for a complete presen tation, w e will sketc h here the log-lik eliho o d function for the case of directed netw orks. Assume the set of observ ed snapshots is { G t } T t =0 . Let A = [ A 0 A 1 · · · A K − 1 ] > b e the vector of the P A function and η = [ η 1 η 2 · · · η N ] be the v ector of no de ﬁtnes ses. Here K is the maxim um degree app earing in the gro wth pro cess and N is the total n umber of no des at the end of the process. Let z i ( t ) be the num b er of new edges connected to no de v i at time-step t . Equation 1 implies that { z i ( t ) } N i =1 follo ws a m ultinomial distribution with parameters { π i ( t ) } N i =1 , where π i ( t ) = A k i ( t ) η i P N j =1 A k j ( t ) η j . (2) Here w e use the conv en tion k j ( t ) = − 1 for a node that did not exist at time-step t and A − 1 = 0. Using Equation 2 , one can write the likelihoo d of eac h snapshot G 1 , · · · , G T . The log-lik eliho o d function of the temp oral netw ork { G t } T t =0 is then the sum of the log-likelihoo d of eac h snapshot and is equiv alen t to: l ( A , η ) = T X t =1 N X i =1 z i ( t ) log A k i ( t ) + T X t =1 N X i =1 z i ( t ) log η i − T X t =1 N X i =1 z i ( t ) log N X j =1 A k j ( t ) η j + C, (3) with C being the logarithm of the pro duct of the probability mass functions of G 0 , m ( t ), and n ( t ). Since the GT mo del, as stated in Section 2.1 , assumes that the parameters go verning the distributions of G 0 , m ( t ) and n ( t ) do not inv olve A k and η i , we can treat C as a constant. The regularization term for A k is deﬁned b y r eg A = − r K − 2 X k =1 w k (log A k +1 + log A k − 1 − 2 log A k ) 2 , (4) Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 7 with r ≥ 0, w k = P T t =1 m k ( t ) and m k ( t ) the num b er of edges that connect to a degree k no de at time-step t . This term controls the shap e of A k . When r is large, A k b ecomes more linear on a log-scale. In limiting as r approaches inﬁnit y , we eﬀectiv ely assume that A k = k α ( Pham et al. 2016 ). Th us this cov ers the case of α = 1 in the BA mo del and the BB mo del, and the case of α = 0 in the growing ER and the Caldarelli mo del. The regularization term for the no de ﬁtnesses is deﬁned by r eg F = N X i =1 (( s − 1) log η i − sη i ) , (5) with s > 0. This term is the sum of the logarithms of Gamma distribution densities with mean 1 and v ariance 1 /s . The regularization is equiv alent to placing such Gamma distribu- tions as priors independently for each node ﬁtness η i . The larger the v alue of s , the more tigh tly concen trated the v alues of η i b ecome. If s is inﬁnitely large, then all η i will take the same v alue. This is equiv alent to estimating the attachmen t function in isolation. T o conclude: join t estimation with the ab ov e regularization terms also compasses the t w o cases of estimating eit her A k or η i in isolation. In particular, we maximi ze the follo wing ob jective function: J ( A , η ) = l ( A , η ) + r eg A + reg F , with an MM algorithm. A t each iteration, the algorithm replaces the ob jectiv e function with an easier-to-maximize surrogate function and this surrogate function is maximized instead. The surrogate function is c hosen in suc h a wa y that the ob jective function v alue is guaran teed to b e nondecreasing o ver iterations. W e refer the readers to Hunter and Lange ( 2004 ) for the deﬁnition of a surrogate function and the tec hniques used to deriv e them. F or a surrogate function, the v ariables are often separable, i.e., the partial deriv ative of one v ariable do es not in volv e the others, and th us the maximizat ion at each iteration, i.e., ﬁnding the v ariables b y setting all the partial deriv ativ es to zero, can b e parallelized. While we refer readers to Supplemen tary Information Section S2.4 of Pham et al. ( 2016 ) for a detailed discussion, the essence of the MM algorithms in P AFit is to linearize the term log P N j =1 A k j ( t ) η j in E quation 3 and to apply Jensen’s inequality to make the v ariables in Equation 4 separable. As men tioned in the tw o previous sections, the v alues of r and s are automatically selected b y cross-v alidation. In partic ular, the dataset is divid ed in to a learning part { G t } T ∗ 0 and a testing part { G t } T T ∗ , where T ∗ is the smallest p ositiv e n um b er suc h th at the ratio of the n umber of new edges in the learning part, i.e., P T ∗ t =1 P N i =1 z i ( t ), to that of the whole dataset, i.e., P T t =1 P N i =1 z i ( t ), is at least p = 0 . 75 (the default v alue). F or eac h com bination of r and s , w e use the learning data to get the estimated v alue of A and η and plug these estimated v alues in to Equation 3 to calculate the log-lik eliho o d of the testing data. The com bination of r and s that maximize this log-likelihoo d is then c hosen. The metho d then re-estimates A and η using the whole dataset with the c hosen com bination of r and s . 3. P ac k age o v erview The P AFit pac k age provides functions to sim ulate v arious temporal netw ork models, gather essen tial netw ork statistics from ra w input data, and use these summarized statistics in the estimation of A k and the η i v alues. The hea vy computational parts of the pac k age are im- plemen ted in C++ through the use of the Rcpp pack age ( Eddelbuettel and F ran¸ cois 2011 ; 8 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Eddelbuettel 2013 ; Ed delbuettel and Balam uta 2017 ). F urthermore, multi-core mac hine users can enjoy a hassle-free speed up through Op enMP parallelization mechanisms implemen ted in the co de. Apart from the main functions, the pac k age also includes a real-world collaboration net work dataset b et ween scientists in the ﬁeld of complex net w orks. T able 2 summarizes the main functions in the pac k age. In what follo ws, w e will review the main pac k age functions one b y one. Firstly , most well-kno wn temp oral net w ork mo dels based on P A and no de ﬁtness mec ha- nisms can b e easily sim ulated using the pac k age. P AFit implemen ts generate_BA for the BA mo del, generate_ER for the growing ER model, generate_BB for the BB model, and generate_fit_only for the Caldarelli model. These functions hav e many customizable op- tions. F or example, the n um b er of new edges at eac h time-step is a tunable stochastic v ariable; see T able 3 for descriptions of the parameters. They are actually wrappers of the more p o wer- ful generate_net function, whic h simulate s net works with more ﬂexible attachmen t function and no de ﬁtness settings. Eac h temp oral net w ork mo del generation function outputs a PAFi t_net ob ject, whic h is a list with four ﬁelds: type , fitness , PA , and graph . The type ﬁeld is a string indicating the t yp e of netw ork: "directed" or "undirected" . This ﬁeld is "directed" for the net works generated b y the simulation functions. The fitness and PA ﬁelds con tain the true no de ﬁtnesses and P A funct ion, respectively . The graph ﬁeld con tains the generated temp oral net work in a three-column matrix format. Eac h ro w of this matrix is of the form (id_1 id_2 time_stamp) . While id_1 and id_2 are IDs of the source no de and the destination node, resp ectiv ely , time_stamp is the birth ti me of the edge. This is the so-called edge-list format in whic h ra w temp oral netw orks are stored in many on-line rep ositories ( Kunegis 2013 ; Lesk ov ec and Krevl 2014 ). W e will discuss ho w to use functions pro vided by P AFit to con v ert this edge-list format to formats used in other netw ork analysis pack ages in the next section. One can apply the function plot directly to a PAFit_net ob ject to visualize its conten ts. The second functionalit y of P AFit is implemented in get_stati stics . With its core part implemen ted in C++ , thi s function eﬃcien tly collects all temp oral netw ork summary statistics that are needed in the subsequen t estimation of P A and node ﬁtnesses. The input netw ork is assumed to be stored as a PAFit_net ob ject. One can use the function graph_from_file to read an edge-list graph from a text ﬁle in to a PAFit_ne t ob ject, or con v ert an edge-list matrix to this class by the function as.PAFit_net . The edge-list matrix is assumed to b e in the same format as P AFit simulat ed graphs, i.e., eac h row is of the form (id_1 id_2 time_stamp) . The no de IDs are required to b e integers greater than − 1, but need not to b e contiguous. Note that (id -1 t) describes a node id that appeared at time t without any edge. There are no assumptions on the v alues or data t yp es of time_stamp , other than that their c hronological order is the same as what the R function order returns. Examples of timestamps that satisfy this requiremen t are the in teger v ector 1:T , the format ‘yyyy-mm-dd’, and the POSIX time. The get_statistics function automatically handles b oth directed and undirected netw orks. It retu rns a list con taining man y statisti cs that can be used to c haracterize the netw ork gro wth process. Notable ﬁelds are m_tk containing the n umber of new edges that connect to a degree- k no de at time-step t , and node_degree cont aining the degree sequence, i.e., the degree of eac h node at each time-step. The most imp ortan t functionality of P AFit relates to the estimation of the attac hment func- Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 9 F unction Main input Output generate_ER net work parameters net work from the growing ER mo del generate_BA net work parameters net work from the generalized directed BA mo del generate_BB net work parameters net work from the BB mo del generate_fit_only net work parameters net work from the Caldarelli mo del generate_net net work parameters net work from the GT mo del get_statistics PAFit_net ob ject con taining the netw ork PAFit_data ob ject con taining summary statistics Jeong PAFit_net ob ject and PAFit_data ob ject estimated P A function b y Jeong’s metho d Newman PAFit_net ob ject and PAFit_data ob ject estimated P A function b y Newman’s metho d only_A_estimate PAFit_net ob ject and PAFit_data ob ject estimated P A function b y P AFit only_F_estimate PAFit_net ob ject and PAFit_data ob ject estimated no de ﬁtnesses b y P AFit joint_estimate PAFit_net ob ject and PAFit_data ob ject estimated P A function and no de ﬁtnesses by P AFit to_networkDynamic PAFit_net ob ject networkDynamic ob ject from_networkDynamic networkDynamic ob ject PAFit_net ob ject to_igraph PAFit_net ob ject igraph ob ject from_igraph igraph ob ject PAFit_net ob ject graph_to_file PAFit_net ob ject text ﬁle in either edge-list format or gml graph_from_file text ﬁle in edge-list format or gml a PAFit_net ob ject as.PAFit_net edge-list matrix PAFit_net ob ject test_linear_PA degree vector Linear_PA_test_result ob ject T able 2: Summary of the main functions in the P AFit pack age. 10 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness P arameter (default v alue) Description N ( 1000 ) total n umber of no des in the netw ork num_seed ( 2 ) initial graph is a circle with num_seed no des multiple_node ( 1 ) n umber of new nodes added at each time-step m ( 1 ) n umber of edges of a new no de alpha ( 1 ) attac hment exp onent α when we assume A k = k α mode_f ( "gamma" ) distribution of no de ﬁtnesses: gamma, log-normal or p ow er-law s ( 10 ) distribution of no de ﬁtnesses has mean 1 and v ariance 1 /s T able 3: Main parameters in net work generating functions in the P AFit pac k age. tion and no de ﬁtnesses of a temporal netw ork. This is implemented through v arious metho ds. There are three usages: estimation of the attac hment functi on in isolation, estimation of no de ﬁtnesses in isolation, and the join t estimation of the attachme n t function and no de ﬁtnesses. The functions for estimating the attac hment function in isolation are: Jeong for Jeong’s metho d, N ewman for Newman’s metho d, and only_A_estimate for the P AFit method in Pham et al. ( 2015 ). F or estimation of node ﬁtnesses in isolation, only_F_estimate implemen ts a v ariant of the P AFit method in Pham et al. ( 2016 ). F or the join t estimation of the at tac hmen t function and no de ﬁtnesses, we implemen t the full version of the P AFit metho d ( Pham et al. 2016 ) in joint_estimate . The input of these functions is the output ob ject of the function get_statistics . The output ob jects of these functions contain the estimation result s as we ll as some additional information p ertaining to the estimation pro cess. In T able 4 , w e sho w the input parameters of joint_estimate , the most imp ortant function in P AFit . This function takes the temp oral net work net_object and the summarized statistics net_stat as the main inputs. There are three parameters that control th e estimation pro cess: p , stop_cond , and mode_r eg_A . The parameter p speciﬁes the ratio of the num b er of new edges in the learning data to that of the full data in the cross-v alidation step. F ollo wing Pham et al. ( 2016 ), its default v alue is set at 0 . 75. The parameter stop_cond sp eciﬁes the thr eshold  : the iterativ e algorithm will con tinue un til the relativ e diﬀerence in the ob jectiv e function J ( A , η ) b et ween t wo successiv e iterations falls b elo w this threshold ( Pham et al. 2016 ; Zhou et al. 2011 ). The defau lt v alue  = 10 − 8 is set follo wing Pham et al. ( 2016 ). The parameter mode_reg_A sp eciﬁes the regularization term for A k . The default v alue mode_reg_A = 0 corresp onds to the regularization term in Equation 4 ( Pham et al. 2016 ). When mode_reg_A = 1 , the following regularization term is used: − r K − 1 X k =2 w k  log A k +1 − log A k log ( k + 1) − log k − log A k − log A k − 1 log k − log ( k − 1)  2 . (6) Although this regularization term will enforce exactly the form A k = k α , it is signiﬁcan tly slo wer to optimize this regularization term while the improv emen t o v er Equation 4 is little. Finally , although one can roughly assess whether P A exists in the netw ork b y visu al insp ection of the estimated P A function, Handco c k and Jones ( 2004 ) pro vide a metho d to test whether the linear P A-only case, i.e., A k = k and η i = 1, is consisten t with a given degree v ector. W e implemented th is metho d in the function test_linear_PA . This function c ho oses the b est ﬁtted distribution to a giv en degree vector among a set of distributions by comparing the Ak aike Information Criterion (AIC) ( Ak aike 1974 ) or th e Ba yesian Information Crite- Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 11 P arameter Default v alue net_object no default v alue net_stat get_statistics(net_object) p 0 . 75 stop_cond 10 − 8 mode_reg_A 0 T able 4: P arameters of the joint_estimate function and their default v alues. rion (BIC) ( Raftery 1995 ). The set of distributions are Y ule, W aring, P oisson, geometric, and negativ e binomial. The linear P A-only case corresp onds to Y ule or W aring ( Y ule 1925 ; Irwin 1963 ). 4. Related net work pac k ages Since netw ork analysis has b een an imp ortan t ﬁeld for a long time, v arious asp ects of it hav e b een implemen ted in a large n um b er of soft ware pack ages. T o our b est eﬀort, w e ha v e con- ﬁrmed that the non-parametric joint estimation of P A and ﬁtness mec hanisms in a growing net work is not implemen ted elsewhere. Restricting the discussion to pack ages in R , there are some notable implementations of related statistical net w ork mo dels. F or example, sto c hastic blo c k mo dels in the pack ages igraph ( Csardi and Nepusz 2006 ), sna ( Butts 2016 ), blo c kmo d- els ( INRA and Leger 2015 ) and dynsbm ( Matias and Miele 2016 , 2018 ); exp onential random graph mo dels in the pac k ages ergm ( Hun ter et al. 2008 ; Handco c k et al. 2018 ), tergm ( Krivit- sky and Handcock 2018b ), hergm ( Sch wein b erger and Luna 2018 ), b tergm ( Leifeld et al. 2018 ), and RSiena ( Ripley et al. 2018 ); and latent space mo dels in the pack age latentnet ( Krivitsky and Handco c k 2008 , 2018a ). The dynsbm pack age estimates a dynamic stochastic block mo del in whic h no des are assumed to b elong to some laten t groups which can v ary with time, and the edge weigh t b etw een t w o no des at an y time follows some parametric distribution. The pac k age can deal with b oth discrete and contin uously weigh ted edges. The igraph pac k age con tains the functions sample_pa and sample_growing which are the equiv alents of generate_BA and generate_ER in P AFit , respectively . Although igraph also generates netw orks from many other mechanisms, it do es not con tain any function for esti- mating the P A function and/or no de ﬁtnesses. It does con tain man y functionali ties for dealing with sto c hastic blo c k mo dels and v arious other netw ork mo dels. Some of the ab o v e pac k ages are included in the extensiv e meta-pac k age statnet ( Handcock et al. 2008 , 2016 ). In statnet , pac k ages that deal with temp oral netw orks are: netw orkDy- namic ( Butts et al. 2016 ), tsna ( Bender-deMoll and Morris 2016 ), and tergm . The net work- Dynamic pack age provides the networkDynamic class to store dynamic netw orks and v arious functions t o manipulate them. The tsna pack age calculates many temp oral statistics of a dynamic netw ork stored in a networkDynamic ob ject. The closest pac k ages to P AFit that estimate P A in a temp oral net w ork are tergm and RSiena , whic h implemen t sophisticated contin uous-time and discrete-time Marko v mo dels. Regarding P A, all the implement ed options in tergm and RSiena p ertain to the parametric estimation of A k , in con trast to the non-parametric estimation methods implemen ted in P AFit . Although 12 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness it migh t b e theoretically p ossible to describe a non-p arametric A k function in tergm and RSiena , they contai n no regularization terms for the join t estimation of the non-parametric P A function and no de ﬁtnesses. Joint estimation without regularization terms is very likely unable to reco ver the true parameters, since the num b er of parameters is t ypically high. On the other hand, P AFit is speciﬁcally designed for estimating A k non-parametrically with no de ﬁtnesses, since it has t wo regularization terms in Equations 4 and 5 , together with the cross-v alidation step for selecting suitable regularization parameters. P AFit provides functionalities to comm unicate with existing net w ork analysis pac k ages. Using to_networkDynamic and from_networkDynami c , one can con vert a PAFit_net ob ject to a net workDynamic ’s networkDynami c ob ject and vice v ersa. The functions to_igraph and from_igraph do the same for igraph ’s igraph ob jects. The extensive functions of statnet and igraph pac k ages can then b e used. One can also output the graph stored in a PAFit_net ob ject to the univ ersal gml format b y the function graph_to_file , or read from a gml ﬁle b y the function graph_from_file . 5. P ac k age usage Here w e sh o w three usages of P AFit : the estimation of the attac hment function A k in i solation in Section 5.1 , the estimation of no de ﬁtnesses η i in isolation in Section 5.2 , and the joint estimation of A k and the η i v alues in Section 5.3 . 5.1. A ttac hment function estimation First w e generate a net work from a directed v ersion of the BA mo del, called Price’s mo del ( Price 1976 ). F rom the initial graph with t wo nodes and one edge, one new no de with m = 5 new edges is added at eac h time-step until the num b er of no des is N = 1000. R> set.seed(1) R> library("PAFit") R> sim_net_1 <- generate_B A(N = 1000, m = 5) Recall that A k is linear in the BA mo del, i.e., the attac hment exp onen t α is equal to 1, and the no de ﬁtnesses are uniform. One can observ e the emergence of h ubs in this net w ork by visualizing the generated graph at v arious time-steps by the function plot . The follo wing script plots the netw ork snapshot at time t = 1 in Figure 1a and its corresp onding degree distribution in Figure 1d : R> plot(sim_net_1, slice = 1, arrowhead. cex = 3, vertex.cex = 3) R> plot(sim_net_1, slice = 1, plot = "degree", cex = 3, cex.axis = 2, + cex.lab = 2) Note that if the net w ork is directed, as it is in this example, the option plot = "degree" will plot the in-degree distribution. In the same w a y , w e plot netw ork snapshots at time t = 10 and t = 100 in Figures 1b and 1c and their corresp onding degree distributions in Figures 1e and 1f . The next step is to use the function get_statistics to get the summary statistics for the temp oral netw ork: Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 13 (a) Snapshot at t = 1. (b) Snapshot at t = 10. (c) Snapshot at t = 100. Degree + 1 Frequency 1 2 1 2 ● ● (d) Degree distribution at t = 1. Degree + 1 Frequency 1 2 5 10 20 1 2 5 ● ● ● ● ● (e) Degree distribution at t = 10. Degree + 1 Frequency 1 10 100 1 10 ● ● ● ● ● ● ● ● ● ● ● ● (f ) Degree distribution at t = 100. Figure 1: Net work snapshots and their corresp onding in-degree distributions at time-steps t = 1, 10, and 100. The temp oral netw ork, sim_net_1 , is generated from Price’s mo del with total n umber of no des N = 1000. R> stats_1 <- get_statisti cs(sim_net_1) With stats_1 containing al l the needed summary statistics, w e then apply the three methods of estimating the attachmen t function in isolation: R> result_Jeong <- Jeong(sim_net_1, stats_1) R> result_Newman <- Newman(sim_net_1, stats_1) R> result_PA_only <- only_A_estimate(sim_net_1, stats_1) Let us explain result_PA_onl y in more detail. Information on the estimated results as w ell as the estimation pro cess can b e viewed by in v oking summary : R> summary(result_PA_only) Estimation results by the PAFit method. Mode: Only the attachment function was estimated. Selected r parameter: 0.1 14 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Estimated attachment exponent: 1.001139 Attachment exponent ± 2 s.d.: (0.9908913, 1.011387) ----------------------- -------------------- Additional information: Number of bins: 50 Number of iterations: 63 Stopping condition: 1e-08 As stated in Section 2 , the P AFit method ﬁrst ﬁ nds the r parameter, whic h regularizes the P A function, by cross-v alidation, and then estimates A k using the chosen r . The estimated func- tion can b e accessed via $estimate _result$k and $estimate_r esult$A of result_PA_only . F rom this estimated function, the attac hment exp onent α (when w e assume A k = k α ) and its standard deviation are also estimated. Here ˆ α is 1 . 001 ± 0 . 01 as w e can see from the output of summary . These v alues can be accessed via $estimate_result$alpha and $estimate_result$ci . The output also reveals that P AFit applies binning with 50 bins by default. In this pro cedure, w e divide the range of k into bins consisting of consecutive degrees, and assume that all k in a bin ha ve the same v alue of A k . Binning is an imp ortan t regularization technique that signiﬁcan tly stabilizes the estimation of the attac hment function ( Pham et al. 2015 ). In this example, the center of each bin is stored in the ﬁeld $center_k of stats_1 . Since the cen ter of a bin is also the P A v alue corresp onding to that bin in the linear P A case, w e can plot the estimated attach men t funct ion together with the true attac hmen t function using the follo wing script, which pro duces the plot of Figure 2a . The options min_A and max_A sp ecify the minim um and maximum v alues in the vertical axis of the plot, resp ectiv ely . R> plot(result_PA_only,stats_1, min_A = 1, max_A = 2000, + cex = 3, cex.axis = 2, cex.lab = 2) R> lines(stats_1$center_k, stats_1$ce nter_k, col = "red") The estimation results of Jeong’s metho d and Newman’s metho d can b e plotted in a similar w ay , and are shown in Figures 2b and 2c , resp ectiv ely . Ov erall, Newman’s method and P AFit estimate the attac hmen t function A k ab out equally w ell, while Jeong’s metho d is found to underestimate the function and also exhibits high v ari- ance. This can also b e observed in the estimated attachmen t exp onen t of the three metho ds: Newman’s method and P AFit recov er the true α , while Jeong’s metho d underestimates it. Note that in P AFit we also obtain the interv al of the estimated A k ± 2 s.d. (ligh tblue region in Figure 2a ), which are una v ailable in the other t w o metho ds. This is a signiﬁcan t adv antage of P AFit ov er the other tw o metho ds since it allo ws the user to quanti fy uncertain ties in the result. 5.2. No de ﬁtnesses estimation Here w e estimate no de ﬁtnesses from a BB mo del generated netw ork with the assumption that A k = k . T o demonstrate the functionality of the pac k age, we generate a BB net work with a nonstandard setting: R> sim_net_2 <- generate_B B(N = 1000, num_seed = 100, multiple_node = 100, + m = 15, s = 10) Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 15 Degree k Attachment function 1 10 100 1000 1 10 100 1000 (a) P AFit ( ˆ α = 1 . 001 ± 0 . 01) Degree k Attachment function 1 10 100 1000 1 10 100 1000 (b) Jeong’s metho d ( ˆ α = 0 . 96 ± 0 . 07) Degree k Attachment function 1 10 100 1000 1 10 100 1000 (c) Newman’s metho d ( ˆ α = 1 . 01 ± 0 . 02) Figure 2: Estimating the attac hment function in isolation from sim_net_1 . The true attach- men t function is A k = k α with attac hment exponent α = 1. W e also show the estimated α and the in terv al of the estimated α ± 2 s.d. provided by each metho d. This net work gro ws from a seed netw ork with N 0 = 100 no des where the no des form a line graph. The v alue of N 0 can be sp eciﬁed by num_seed . At each time-step w e add n = 100 new no des where each no de has m = 15 new edges. The v alues of n and m can be sp eciﬁed via multiple_node and m , respectively . The total n umber of no des in the ﬁnal net work is N = 1000. Finall y , the distribution from which w e generate node ﬁtnesses is the Gamma distribution with mean 1 and inv erse v ariance s = 10. Next w e get the netw ork summary statistics and then apply the estimation function: R> stats_2 <- get_statisti cs(sim_net_2) R> result_fit_only <- only_F_estimate(sim_net_2, stats_2) R> plot(result_fit_only, stats_2, plot = "f", + cex = 2, cex.axis = 1.5, cex.lab = 1.5) The ﬁnal line of the snipp et generates the distribution of estimated no de ﬁtnesses sho wn in Figure 3a . The function only_F_estimate estimates no de ﬁtnesses under the assumption that A k = k b y default. But one also can estimate node ﬁtnesses in the Caldarelli mo del, i.e., A k = 1 for all k , with the option model_A = "Constant" . The function only_F_estimate w orks b y ﬁrst ﬁnding the estimated v alue ˆ s of s b y cross-v alidation, and then using ˆ s in the subsequent estimation of no de ﬁtnesses. The summar y information of the estimati on result can b e view ed b y in voking summary : R> summary(result_fit_only) Estimation results by the PAFit method. Mode: Only node fitnesses were estimated. Selected s parameter: 8 ----------------------- -------------------- Additional information: 16 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Fitness Density 0.26 1.00 1.94 0 1 2 3 (a) Distribution of estimated ﬁtnesses. T r ue fitness Estimated fitness 0.2 0.5 1 2 0.2 0.5 1 2 (b) Estimated ﬁtnesses versus true ﬁtnesses. Figure 3: Estimating node ﬁtnesses in isolation from sim_net_2 , whic h is generated with attac hment function A k = k . The true node ﬁtnesses are sampled from a Gamma distribution with mean 1 and in v erse v ariance 10. The attachmen t function in the estimation method is ﬁxed at A k = k . In panel b, we only plot no des for which the num b er of acquired new edges is at least 5. Number of bins: 50 Number of iterations: 19 Stopping condition: 0.00000001 The method slightl y under-estimated s . W e can c heck whether the no de ﬁtnesses were esti- mated w ell by plotting the estimated ﬁtnesses v ersus the true ﬁtnesses b y running the follo wing command: R> plot(result_fit_only, stats_2, true_f = sim_net_2$fitness, + plot = "true_f", cex = 2, cex.axis = 1.5, cex.lab = 1.5) This will pro duce the plot of Figure 3b . It turns out that the estimated node ﬁtnesses agree prett y w ell with the true no de ﬁtnesses. W e note that the light blue band around the ˆ η i v alues depicts the interv als of ˆ η i ± 2 s.d.. The upp er and lo w er v alues can b e accessed via $estimate_result$upper_ f and $estimate_result$lower_f of result_fit_on ly , resp ec- tiv ely . 5.3. Join t estimation of the attac hmen t function and no de ﬁtnesses Here we sho w how to estimate the attac hmen t function and node ﬁtnesses sim ultaneously . W e need to assume in Section 5.1 the equalit y of all η i for the estimation of A k in isolation, and in Section 5.2 a sp eciﬁc functional form of A k for the estimation of η i in isolation. Such assumptions b ecome unnecessary when w e perform join t estimation, since the appropriate Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 17 functional forms will be automatically enforced through the regularization parameters r and s , whic h will b e chosen by cross-v alidation. W e recommend the joint estimation pro cedure as the standard estimation pro cedure in this pac k age, unless there is a sp eciﬁc reason to justify the one or the other of these assumptions. This time w e generate a net work in which the attachmen t function is A k = k α with α = 0 . 5 and the Gamma distribution of no de ﬁtnesses has mean 1 and v ariance 1 /s with s = 10: R> sim_net_3 <- generate_n et(N = 1000, num_seed = 100, multiple_node = 100, + m = 15, s = 10, alpha = 0.5) W e then apply joint_estimation : R> stats_3 <- get_statisti cs(sim_net_3) R> result_PAFit <- joint_estimate(sim_net_3, stats_3) R> summary(result_PAFit) Estimation results by the PAFit method. Mode: Both the attachment function and node fitness were estimated. Selected r parameter: 10 Selected s parameter: 18.75 Estimated attachment exponent: 0.5168941 Attachment exponent ± 2 s.d.: (0.5097277, 0.5240605 ) ----------------------- -------------------- Additional information: Number of bins: 50 Number of iterations: 596 Stopping condition: 0.00000001 W e can plot the estimated attachmen t function as in Figure 4a , and the distri bution of the ˆ η i ’s as in Figure 4b with the follo wing code: R> plot(result_PAFit, stats_3, min_A = 1, max_A = 40, + cex = 3, cex.axis = 2, cex.lab = 2) R> lines(stats_3$center_k, stats_3$ce nter_k^0.5, col = "red") R> plot(result_PAFit, stats_3, plot = "f", + cex = 3, cex.axis = 2, cex.lab = 2) Concerning the estimated v alues, while s is sligh tly o ver-estimated by ˆ s = 18 . 75, ˆ α = 0 . 52 ± 0 . 01 is a go o d estimate of α . W e can also plot the estimated ﬁtnesses v ersus the true ﬁtnesses as in Figure 4c with the follo wing co de: R> plot(result_PAFit, stats_3, true_f = sim_net_3$fitness, plot = "true_f", + cex = 3, cex.axis = 2, cex.lab = 2) Since the mean of ˆ η i ’s is normalized to 1, the o ver-estimation of s leads to o ver-e stimation of lo w-v alue ﬁtnesses and under-estimation of high-v alue ﬁtness, as can be seen in the plot of Figure 4c . 18 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Degree k Attachment function 1 10 100 1 10 (a) Estimated attachmen t func- tion ( ˆ α = 0 . 52 ± 0 . 01). Fitness Density 0.43 1.00 1.89 0 1 2 3 4 (b) Distribution of estimated no de ﬁtnesses. T rue fitness Estimated fitness 0.1 0.2 0.5 1 2 0.1 0.2 0.5 1 2 (c) Estimated and true η i v alues. Figure 4: Join t estimation of the attac hment function and no de ﬁtnesses from sim_net_3 . The red line in panel a is the true attachmen t function A k = k 0 . 5 . The true no de ﬁtnesses are sampled from a Gamma distribution with mean 1 and inv erse v ariance s = 10. W e show how joint estimation impro ves on estimating either no de ﬁtnesses in isolation (Fig- ures 5a and 5b ) or the P A function in isolation (Figure 5c ). F or estimating no de ﬁtnesses in isolation, t wo cases are sho wn: the result when we assume the BB mo del in whic h A k = k (Figure 5a ) and the result when w e assume the Caldarelli mo del in which A k = 1 (Figure 5b ). In either case, the estimated no de ﬁtnesses are visually worse than those of the joint es- timation in Figure 4c . Similarly , estimating the P A function in isolation apparently led to o verestimation of the P A func tion in t he region of l arge k . T o conclude, estimating either no de ﬁtnesses or the P A function in isolation would lik ely b e w orse than the join t estimation, if the underlying assumptions ab out the true no de ﬁtnesses and the true P A function are wrong. T rue fitness Estimated fitness 0.1 1 0.1 1 (a) Estimated and true η i v alues when assuming A k = k . ˆ s = 3 . 2. T rue fitness Estimated fitness 0.1 1 0.1 1 (b) Estimated and true η i v alues when assuming A k = 1. ˆ s = 4. Degree k Attachment function 1 10 100 1 10 (c) Estimated P A function in iso- lation ( ˆ α = 0 . 60 ± 0 . 05). Figure 5: Estimation of no de ﬁtnesses and the P A function in isolation from sim_net_3 . The red line in panel c is the true attachme n t function A k = k 0 . 5 . The true no de ﬁtnesses are sampled from a Gamma distribution with mean 1 and in verse v ariance s = 10. Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 19 6. Sim ulation Study In this section, w e presen t the results of a simu lation study that we conducted to assess the p erformance of the joint_estimation function. W e assume the functional form A k = k α for the attachm en t function. T o cov er the spectrum of P A and anti-P A phenomena, w e choose four v alues for α : − 0 . 5, 0, 0 . 5, and 1. W e sample no de ﬁtnesses from a Gamma distribution with mean 1 and v ariance 1 /s . Three v alues for s we chose are: 5, 20, and 80. While small v alues of s lead to widely v aried no de ﬁtnesses, large v alues of s leads to highly concentrat ed no de ﬁtnesses. F or eac h com bination of α and s , w e generated M = 50 net works, and estimated A k and s for eac h net work using the joint_estimation function. W e then ﬁt the form ˆ A k = k α to ˆ A k in order to estimate α . W e then compared the means of the estimation results of α and s with the true v alues. Eac h simulated net work has a total of 1000 nodes, where the initial graph has 200 no des and 50 new nodes are added at eac h time-step for a total of 10 time-steps. Eac h new node has 50 new edges. The results are sho wn in Figure 6 . The attac hment exp onen t α was est imated reasonably w ell across all combi nations of α and s . Except for the cases in whic h the attach men t function gro ws fast ( α = 1) or the case in which node ﬁtnesses ha ve high v ariance ( s = 5), the estimated v alues of s w ere also acceptable. The case of s = 5 resulted in a slight ov er-estimation, which is p erhaps attributable to high no de ﬁtnesses v ariance. W e also notice that s w as slightly o ver-estimated when α = 1, whic h may be caused by the fast gro wing rate of the P A function. One also notices that the interv als of ˆ s ± 2 s.d. are m uc h larger than those for ˆ α . The ab o ve observ ations imply that it is muc h harder to estimate s than α . 7. Analysis of a collab oration net w ork b et w een scien tists In this section, w e demonstrate the complete work-ﬂo w for the join t estimation of A k and η i on a collab oration net work b et ween scientists from the ﬁeld of complex netw orks. In this net work, nodes represen t scien tists and an undirected edge exists b et ween them if, and only if, they hav e coauthored a paper. The degree of a no de represen ts the num b er of collab orators of a scien tist, since m ultiple edges are not considered. The temporal net w ork is stored in coauthor.net , and the names of the scien tists are store d in coauthor.author_id . The net work without timestamps was compiled by Mark Newman from the bibliographies of tw o review articles on complex net works ( Newman 2006 ). P aul Sheridan, the second author of the present work, augmented the dataset with time stamps. More information on the dataset can b e found in the pack age reference manual. The ﬁrst step in the analysis is to conv ert the edge-list matrix coauthor.net to a PAFit_net ob ject, and get the summary statistics using the function get_statistics . R> set.seed(1) R> true_net <- as.PAFit_ne t(coauthor.net, type = "undirected") R> net_stats <- get_statis tics(true_net) R> summary(net_stats) Contains summary statistics for the temporal network. Type of network: undirected 20 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● T r ue Estimate 5 20 80 124 −0.5 0.0 0.5 1.0 α s Figure 6: Sim ulation result. F or eac h combin ation of α and s , we generated 50 net works using attac hment function A k = k α and a Gamma distribution with mean 1 and inv erse v ariance s for no de ﬁtnesses. Each red p oint indicates the a v erage of the esimated α and the selected s o ver 100 sim ulations for the corresp onding com bination of α and s . At each red p oin t, the horizon tal/verticalcal bar indicates the interv al of the estimated α /selected s ± 2 s.d., resp ectiv ely . Number of nodes in the final network: 1498 Number of edges in the final network: 5698 Number of new nodes: 1358 Number of new edges: 1255 Number of time-steps: 145 Maximum degree: 37 Number of bins: 38 The temp oral netw ork grew in 145 time-steps from an initial netw ork at September 2000, to Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 21 a ﬁnal state at September 2007. The resolution of those time-steps is mon thly . The ﬁnal net work has 1498 scien tists with 5698 collaborations among them. One can plot the degree distribution of the ﬁnal snapshot as follows: R> plot(true_net,plot = "degree") This will pro duce the plot of Figure 7 . Degree + 1 Frequency 2 5 10 20 1 10 100 ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Figure 7: Degree distribution of the ﬁnal snapshot of the collab oration netw ork. Before any estimation of the P A function and no de ﬁtnesses is carried out, one can test whether the linear P A-only case is consistent with the observ ed degree distribution of the collab oration netw ork: R> test_linear_PA_result <- test_linear_PA(net_stats$final_deg) R> print(test_linear_PA_result) This will generate T able 5 . In this case, since the Y ule and W aring distributions are not the b est mo dels, one can conclude that the linear P A-only case is inconsistent with the observ ed degree v ector. T o further in vestigate the P A function and no de ﬁtnesses, w e in vok e the joint_estimate function for joint estimation: 22 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Mo del Log-lik eliho o d AIC BIC nb -2415.08 4840.16 4866.72 pois -2468.63 4945.27 4966.51 waring -2557.49 5124.99 5151.55 geom -2898.64 5811.27 5848.45 yule -2959.97 5929.95 5956.51 T able 5: The result of applying the function test_linear_PA to the observ ed degree vector of the collab oration netw ork. This function calculates the AIC and BIC of ﬁv e mo dels: Y ule ( yule ), W aring ( waring ), P oisson ( pois ), geometric ( geom ), and negativ e binomial ( nb ) when ﬁtting them to the observed degree ve ctor. R> full_result <- joint_estimate(true_net, net_stats) R> summary(full_result) Estimation results by the PAFit method. Mode: Both the attachment function and node fitness were estimated. Selected r parameter: 10 Selected s parameter: 45 Estimated attachment exponent: 0.9951764 Attachment exponent ± 2 s.d.: (0.9715202, 1.018833) ----------------------- -------------------- Additional information: Number of bins: 38 Number of iterations: 607 Stopping condition: 0.00000001 W e can visualize the estimated attac hment function and the distribution of estimated no de ﬁtnesses by: R> plot(full_result, net_stats, line = "TRUE", + cex = 2, cex.axis = 1.5, min_A = 1, max_A = 1000, cex.lab = 1.5) R> plot(full_result, net_stats, plot = "f", + cex = 2, cex.axis = 1.5, cex.lab = 1.5) This snipp et will sequen tially generate the plots of Figure s 8a and 8b . When other options are set at their default v alues, the option line = "TRUE" will plot the function ˆ A k = k ˆ α , whic h is a straigh t line on a double logarithmic scale. The best ﬁt mo del when we p erformed join t estimation is close to the BB mo del. In Figure 8a , the estimated A k is an increasing function with ˆ α = 1 . 00 ± 0 . 05. W e tak e this as evidence in fav or of the presence of linear P A in the col lab oration net work. Let us take a concrete example: a netw ork scien tist with tw ent y collab orators has roughly twice the c hance to get a new collaborator compared with someone who only has ten collaborators, assuming they ha ve the same ﬁtness. F or comparison’s sake, we also plot the estimation results of A k in isolation using Jeong’s metho d, Newman’s metho d, and P AFit in Figures 8c , 8d , and 8e , resp ectiv ely: R> result_Jeong <- Jeong(true_net, net_stats) R> result_Newman <- Newman(true_net, net_stats) Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 23 Degree k Attachment function 1 2 5 10 20 1 10 100 1000 (a) Estimated A k with joint estimation by P AFit ( ˆ α = 1 . 00 ± 0 . 05). Fitness Density 0.77 1.00 2.31 0 1 2 3 4 (b) Histogram of estimated no de ﬁtnesses. Degree k Attachment function 1 2 5 1 10 100 1000 (c) Jeong’s metho d ( ˆ α = 0 . 77 ± 0 . 80). Degree k Attachment function 1 2 5 10 20 1 10 100 1000 (d) Newman’s metho d ( ˆ α = 1 . 39 ± 0 . 56). Degree k Attachment function 1 2 5 10 20 1 10 100 1000 (e) P AFit ( ˆ α = 1 . 05 ± 0 . 07). Figure 8: Estimation of the attac hment function and no de ﬁtnesses for the netw ork scientist collab oration net work. Panels a and b sho w the joint estimation result, while panels c, d, and e sho w the results when we estimated the P A function in isolation. R> result_onlyA <- only_A_estimate(true_net, net_stats) R> plot(result_Jeong, net_stats, line = "TRUE", min_A = 1, max_A = 1000, + cex = 3, cex.axis = 2, cex.lab = 2) R> plot(result_Newman, net_stats, line = "TRUE", min_A = 1, max_A = 1000, + cex = 3, cex.axis = 2, cex.lab = 2) R> plot(result_onlyA, net_stats, line = "TRUE", min_A = 1, max_A = 1000, + cex = 3, cex.axis = 2, cex.lab = 2) The options min_A = 1 and max_A = 1000 specify the range of the v ertical axis and are needed for making the plots comparable. The high v ariance of ˆ α from either Jeong’s metho d or Newman’s metho d would render qual- 24 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness itativ e assessments of the P A function inconclusive, if one relied only on those metho ds: one could not conﬁden tly ascertain whether the P A function is sub-linear, linear, or sup er-linear in nature. W e notice that the estimated A k obtained from the joint estimation resem bles that of Figure 8e , when w e estimate it in isolation. The reason is that estimated no de ﬁtnesses in Figure 8b are highly concen trated around the mean. Thus their distribution is not very far from the case when all the ﬁtnesses are 1. Nev ertheless, w e observ e that the estimated A k from the joint estimation is reduced when compared with that of Figure 8e . This is exp ected since in the join t estimation, a p ortion of the connection probabilit y in Equation 1 is explain ed b y node ﬁtness. Although the distribution in the plot of Figure 8b is concentrated around its mean, we notice that its righ t tail is rather long, which is a sign that this tail con tains interesting information. W e can extract the information from this region by ﬁnding the topmost ‘ﬁttest’ net work scien tists. This can b e done as follo ws: R> sorted_fit <- sort(full_result$estimate_result$f, decreasing = TRUE) R> top_scientist <- coauthor.author_id[names(sorted_fit), ] R> print(cbind(sorted_fit[1:10], top_scientist[1:10, 2])) This snipp et will produce the results show in T able 6 . The table sho ws the ten net work scien tists that we found to ha v e the highest ability to attract new collaborators from the ﬁeld. An yone acquaint ed with the ﬁeld will recognized a num b er of familiar faces. F or example, at Rank Estimated ﬁtness Name 1 1.42 BARABASI, A 2 1.35 NEWMAN, M 3 1.26 JEONG, H 4 1.25 LA TORA, V 5 1.24 ALON, U 6 1.23 OL TV AI, Z 7 1.23 YOUNG, M 8 1.22 W ANG, B 9 1.21 SOLE, R 10 1.21 BOCCALETTI, S T able 6: T en topmost ‘ﬁttest’ netw ork scientists in the ﬁeld of complex net w orks. the top of the list is none other than Alb ert-L´ aszl´ o Barab´ asi, who intr o duced the BA mo del. Num b er tw o and num b er three are Mark Newman and Haw o ong Jeong, who resp ectiv ely are the authors of the ep on ymously named Newman’s metho d and Jeong’s metho d. 8. Conclusion W e in tro duced the R pack age P AFit , whic h pro vides a comprehensiv e framew ork for the non-parametric estimation of P A and no de ﬁtness mec hanisms in the growt h of temp oral complex net works. In summary , P AFit implemen ts functions to simul ate v arious te mp oral net work mo dels based on these t wo mechanisms, gathers summary statistics from real-w orld temp oral net work datasets, and estimates non-parametrically the attachmen t function and Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 25 no de ﬁtnesses. W e provided a n um b er of simu lated examples, as well as a complete analysis of a real-w orld collab oration netw ork. Ac kno wledgmen ts This w ork was supported in part b y gran ts from the Japan So ciet y for the Promotion of Science KAKENHI [JP16J03918 to T.P . and 16H01547 to H.S.]. References Ak aike H (1974). “A New Lo ok at the Statistical Mo del Identiﬁcation.” IEEE T r ansactions on A utomatic Contr ol , 19 (6), 716–723. doi:10.1109/TAC.1974.1100705 . Alb ert R, Barab´ asi A (1999). “Emergence of Scaling in Random Net works.” Scienc e , 286 , 509–512. doi:10.1126 /science.286.5439.509 . Alb ert R, Jeong H, Barab´ asi A (2000). “Error and A ttack T olerance of Complex Net works.” Natur e , 406 (6794), 378–382. doi:10.1038/35019019 . Allison PD (1980). “Estimation and T esting for a Marko v Mo del of Reinforcemen t.” So cio- lo gic al Metho ds & R ese ar ch , 8 (4), 434–453. doi:10.1177/004912 418000800405 . Barab´ asi AL, Albert R, Jeong H (2000). “Scale-F ree Characteristics of Random Netw orks: The T op ology of The W orld-Wide W eb.” Physic a A: Statistic al Me chanics and its Applic ations , 281 , 69 – 77. doi:10.1016/S0378- 4371(00)00018- 2 . Bender-deMoll S, Morris M (2016). tsna : T o ols for T emp or al So cial Network A nalysis . R pac k age version 0.2.0, URL https://CRAN.R- project.org/package=tsna . Bez´ ak ov´ a I, Kalai A, San thanam R (2006). “Graph Mo del Selection Using Maxim um Lik e- liho od.” In Pr o c e e dings of the 23r d International Confer enc e on Machine L e arning , ICML ’06, pp. 105–112. A CM, New Y ork, NY, USA. doi:10.1145/1143844.1143858 . Bianconni G, Barab´ asi A (2001). “Comp etition and Multiscaling in Ev olving Net works.” Eur ophysics L etters , 54 , 436. doi:10.1209 /epl/i2001- 00260- 6 . Borgs C, Cha y es J, Dask alakis C, Ro c h S (2007). “First to Mark et is not Everyth ing: an Analysis of Preferential Attac hmen t with Fitness.” In Pr o c e e dings of the thirty-ninth annual A CM symp osium on The ory of c omputing . doi:10.1145/1250790.125 0812 . Bo x-Steﬀensmeier JM, De Bo ef S (2006). “Rep eated Ev en ts Surviv al Mo dels: the Conditional F railty Mo del.” Statistics in Me dicine , 25 (20), 3518–3533. doi:10.1002/sim.2434 . Butts CT (2016). sna : T o ols for So cial Network A nalysis . R pack age v ersion 2.4, URL https://CRAN.R- project.org/package= sna . Butts CT, Leslie-Cook A, Krivitsky PN, Bender-deMoll S (2016). networkDynamic : Dy- namic Extensions for Network Obje cts . R pack age v ersion 0.9.0, URL https://CRAN. R- project.org/package=networkDynami c . 26 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Caldarelli G (2007). Sc ale-F r e e Networks . Oxford Universiy Press. Caldarelli G, Capo cci A, De Los Rios P , Mu ˜ noz MA (2002). “Scale-F ree Net works from V arying V ertex Intri nsic Fitness.” Physic al R eview L etters , 89 , 258702. doi:10.1103/ PhysRevLett.89.258702 . Calla wa y DS, Hopcroft JE, Klein b erg JM, Newman MEJ, Strogatz SH (2001). “Are Randomly Gro wn Graphs Really Random?” Physic al R eview E , 64 , 041902. doi:10.1103/PhysRevE. 64.041902 . Clauset A, Shalizi CR, Newman MEJ (2009). “Po wer-La w Distributions in Empirical Data.” SIAM R eview , 51 (4), 661–703. doi:10.1137/070 710111 . Coleman JS (1964). Intr o duction to Mathematic al So ciolo gy . F ree Press of Glenco e London. Csardi G, Nepusz T (2006). “The igraph Softw are Pac k age for Complex Net work Research.” InterJournal , Complex Systems , 1695. URL http://igraph.org . Dorogo vtsev SN, Mendes JFF (2003). Evolution of Networks: F r om Biolo gic al Nets to the Internet and WWW . Oxford Universit y Press, Inc., New Y ork, NY, USA. Dunne JA, Williams RJ, Martinez ND (2002). “F o o d-W eb Structur e and Net w ork Theory: the Role of Connectance and Size.” Pr o c e e dings of the National A c ademy of Scienc es , 99 (20), 12917–12922. doi:10.1073 /pnas.192407699 . Eddelbuettel D (2013). Se amless R and C++ Inte gr ation with R cpp . Springer-V erlag, New Y ork. Eddelbuettel D, Balamuta JJ (2017). “Extending R with C++ : a Brief Introduction to Rcpp . ” Pe erJ Pr eprints , 5 , e3188v1. doi:10.7287/p eerj.preprints.3188v1 . Eddelbuettel D, F ran¸ cois R (2011). “ Rcpp : Seamless R and C++ Integration.” Journal of Statistic al Softwar e, A rticles , 40 (8), 1–18. doi:10.18637/jss.v04 0.i08 . Eom YH, Jo HH (2014). “Generalized F riendship Parado x in Complex Netw orks: the Case of Scien tiﬁc Collab oration.” Scientiﬁc R ep orts , 4 . doi:10.1038/srep04603 . Erd ¨ os P , R´ enyi A (1959). “On Random Graphs.” Public ationes Mathematic ae Debr e c en , 6 , 290–297. F eld SL (1991). “Why Y our F riends Hav e More F riends Than Y ou Do.” Americ an Journal of So ciolo gy , 96 (6), 1464–1477. doi:10.1086/229693 . Guetz AN, Holmes SP (2011). “Adapt iv e Imp ortance Sampling for Netw ork Growth Models.” A nnals of Op er ations R ese ar ch , 189 (1), 187–203. doi:10.1007/s10479 - 010- 0685- 2 . Handco c k MS, Hunter DR, Butts CT, Goo dreau SM, Krivitsky PN, Bender-deMoll S, Morris M (2016). statnet : Softwar e T o ols for the Statistic al A nalysis of Network Data . The Statnet Pro ject ( http://www.statnet.org ). R pack age v ersion 2016.9, URL https:// CRAN.R- project.org/package=statnet . Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 27 Handco c k MS, Hun ter DR, Butts CT, Go o dreau SM, Krivitsky PN, Morris M (2018). er gm : Fit, Simulate and Diagnose Exp onential-F amily Mo dels for Networks . The Statnet Pro ject ( http://www.statnet.org ). R pack age version 3.9.4, URL https://CRAN.R- project. org/package=ergm . Handco c k MS, Hunter DR, Butts CT, Goo dreau SM, Morris M (2008). “ statnet : Soft w are T o ols for the Represen tation, Visualization, Analysis and Sim ulation of Netw ork Data.” Journal of Statistic al Softwar e, A rticles , 24 (1), 1–11. doi:10.18637/jss.v024.i01 . Handco c k MS, Jones JH (2004). “Lik eliho o d-Based Inference for Sto c hastic Models of Sexual Net work F ormation.” The or etic al Population Biolo gy , 65 (4), 413 – 422. doi:10.1016/j. tpb.2003.09.006 . Demography in the 21st Century . Hun ter D, Lange K (2000). “Quanti le Regression via an MM Algorithm.” Journal of Compu- tational and Gr aphic al Statistics , pp. 60–77. doi:10.2307/1390613 . Hun ter D, Lange K (2004). “A T utorial on MM Algorithms.” The A meric an Statistician , 58 , 30–37. doi:10.1198 /0003130042836 . Hun ter DR, Handco c k MS, Butts CT, Go o dreau SM, Morris M (2008). “ ergm : a Pac k age to Fit, Sim ulate and Diagnose Exp onen tial-F amily Mo dels for Net works.” Journal of Statistic al Softwar e, Articles , 24 (3), 1–29. doi:10.1863 7/jss.v024.i03 . INRA, Leger JB (2015). blo ckmo dels : L atent and Sto chastic Blo ck Mo del Estimation by a ‘V- EM’ A lgorithm . R pack age v ersion 1.1.1, URL https://CRAN.R- project.org/package= blockmodels . Irwin JO (1963). “The Place of Mathematics in Medical and Biological Statistics.” Journal of the R oyal Statistic al So ciety A , 126 (1), 1–45. doi:10.2307/2982445 . Jeong H, N´ eda Z, Barab´ asi A (2003). “Measuring Preferen tial A ttachmen t in Evolving Net- w orks.” Eur ophysics L etters , 61 (61), 567–572. doi:10.1209/epl/ i2003- 00166- 9 . Kelly PJ, Lim LL Y (2000). “Surviv al Analysis for Recurren t Ev ent Data: an Application to Childho od Infectious Diseases.” Statistics in Me dicine , 19 (1), 13–33. doi:10.1002/(SICI) 1097- 0258(20000115)19:1< 13::AID- SIM279> 3.0.CO;2- 5 . Kong J, Sarshar N, Royc howdh ury V (2008). “Exp erience v ersus T alent Shap es the Structure of the W eb.” Pr o c e e dings of the National A c ademy of Scienc es of the USA , 37 , 105. doi: 10.1073/pnas.0805921105 . Krapivsky P , Rodgers G, Redner S (2001). “Organization of Gro wing Net works.” Physic al R eview E , p. 066123. doi:10.1103/PhysRevE .63.066123 . Krivitsky P , Handco ck M (2008). “Fitti ng Latent Cluster Mo dels for Net works with laten tnet .” Journal of Statistic al Softwar e, A rticles , 24 (5), 1–23. doi:10.18637/jss.v024.i05 . Krivitsky PN, Handcock MS (2018a). latentnet : L atent Position and Cluster Mo dels for Statistic al Networks . The Statnet Pro ject ( http://www.statnet.org ). R pac k age v ersion 2.9.0, URL https://CRAN.R- project.org/package=la tentnet . 28 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Krivitsky PN, Handco ck MS (2018b). ter gm : Fit, Simulate and Diagnose Mo dels for Net- work Evolution Base d on Exp onential-F amily R andom Gr aph Mo dels . The Statnet Pro ject ( http://www.statnet.org ). R pack age version 3.5.2, URL https://CRAN.R- project. org/package=tergm . Kunegis J (2013). “K ONECT – The Kobl enz Netw ork Collection.” konect.uni-k oblenz.de. URL http://konect.uni- koblenz.de/ . Kunegis J, Blattner M, Moser C (2013). “Preferen tial Attac hment in Online Netw orks: Mea- suremen t and Explanations.” In Pr o c e e dings of the 5th A nnual A CM Web Scienc e Con- fer enc e , W ebSci ’13, pp. 205–214. ACM, New Y ork, NY, USA. doi:10.1145/2464464. 2464514 . Leifeld P , Cranmer S, Desmarais B (2018). “T emp oral Exp onential Random Graph Mo d- els with btergm : Estimation and Bootstrap Conﬁdence Interv als.” Journal of Statistic al Softwar e, Articles , 83 (6), 1–36. doi:10.1863 7/jss.v083.i06 . Lesk ov ec J, Krevl A (2014). “SNAP Datasets: Stanford Large Netw ork Dataset Collection.” http://snap.stanford.ed u/data . Liljeros F, Edl ing CR, Amaral LAN, Stanley HE, Aberg Y (2001). “The W eb of Human Sexual Con tacts.” Natur e , 411 (6840), 907–908. doi:10.1038/35082140 . Matias C, Miele V (2016). “Statistical Clustering of T emp oral Netw orks Through a Dynamic Sto c hastic Blo c k Mo del.” Journal of the R oyal Statistic al So ciety B , 79 (4), 1119–1141. doi:10.1111/rssb.12200 . Matias C, Miele V (2018). dynsbm : Dynamic Sto chastic Blo ck Mo dels . R pac k age version 0.5, URL https://CRAN.R- project.org/package=dyn sbm . Momeni N, Rabbat MG (2015). Me asuring the Gener alize d F riendship Par adox in Networks with Quality-Dep endent Conne ctivity , pp. 45–55. Springer-V erlag In ternational Publishing, Cham. doi:10.100 7/978- 3- 319- 16112- 9_5 . Nek ov ee M, Moreno Y, Bianconi G, Marsili M (2007). “Theory of Rumour Spreading in Complex So cial Netw orks.” Physic a A: Statistic al Me chanics and its Applic ations , 374 (1), 457 – 470. doi:10.1016/j.p hysa.2006.07.017 . Newman M (2001). “Clu stering and Preferen tial Att ac hmen t in Gro wing Net w orks.” Physic al R eview E , 64 (2), 025102. doi:10.1103/PhysR evE.64.025102 . Newman M (2006). “Finding Commu nit y Structure in Net works Using the Eigen vectors of Matrices.” Physic al R eview E , 74 , 036104. doi:10.1103/PhysRevE.74.036104 . Newman M (2010). Networks: an Intr o duction . Oxford Univ ersity Press, Inc., New Y ork, NY, USA. Newman M, F orrest S , Balthrop J (2002). “Email Net works and the Spread of Computer Viruses.” Physic al R eview E , 66 , 035101. doi:10.1103/PhysRevE.66.035101 . P astor-Satorras R, V espignani A (2001). “Epidemic Spreading in Scale-F ree Netw orks.” Phys- ic al R eview L etters , 86 , 3200–3203. doi:10.1103/PhysRevLett .86.3200 . Thong Pham, P aul Sheridan, Hidetoshi Shimo daira 29 Pham T, Sheridan P , Shimo daira H (2015). “P AFit: a Statistical Metho d for Measuring Preferen tial A ttac hmen t in T emp oral Complex Netw orks.” PLOS ONE , 10 (9), e0137796. doi:10.1371/journal.pon e.0137796 . Pham T, Sheridan P , Shi mo daira H (2016). “Join t Estimation of Preferen tial A ttac hment and No de Fitness in Growing Complex Net work s.” Scientiﬁc R ep orts , 6 . doi:10.1038/ srep32558 . Pham T, Sheridan P , Shimo daira H (2018). P AFit : Gener ative Me chanism Estimation in T emp or al Complex Networks . R pac k age version 1.0.0.6, URL https://CRAN.R- project. org/package=PAFit . Price DdS (1976). “A General Theory of Bibliometric and other Cum ulative Adv antage Pro cesses.” Journal of the A meric an So ciety for Information Scienc e , 27 , 292–306. doi: 10.1002/asi.4630270505 . Raftery AE (1995). “Ba yesian Mo del Selection in So cial Research .” So ciolo gic al Metho dolo gy , 25 , 111–163. ISSN 00811750, 14679531. doi:10.2307/271063 . Redner S (2005). “Citation Statist ics from 110 Y ears of Ph ysical Review.” Physics T o day , 58 (6), 49–54. doi:10.1063/1.1996475 . Ripley RM, Snijders T AB, B´ oda Z, V ¨ or ¨ os A, Preciado P (2018). “Man ual for Siena V ersion 4.0.” T e chnic al r ep ort , Oxford: Universit y of Oxford, Departmen t of Statistics; Nuﬃeld College. R pac k age v ersion 1.2-12. https://www.cran.r- project.org/web /packages/ RSiena/ . Ronda-Pup o GA, Pham T (2018). “The Evolutions of the Ric h Get Richer and the Fit Get Ric her Phenomena in Scholarly Net works: the Case of the Strategic Managemen t Journal.” Scientometrics , 116 (1), 363–383. doi:10.1007/s11192- 018- 2761- 3 . Sc hw einberger M, Luna P (2018). “ hergm : Hierarc hical Exp onential-F amily Random Graph Mo dels.” Journal of Statistic al Softwar e, Articles , 85 (1), 1–39. doi:10.18637/jss.v085. i01 . Sheridan P , Kamimura T, Shimo daira H (2010). “A Scale-F ree Structure Prior for Graphical Mo dels with Applications in F unctional Genomics.” PL oS ONE , 5 (11). doi:10.1371/ journal.pone.0013580 . Sheridan P , Onodera T (2018). “A Preferen tial Attac hment P aradox: How Preferen tial At- tac hment Com bines with Growth to Produce Net works with Log-Normal In-Degree Distri- butions.” Scientiﬁc R ep orts , 8 (1), 2811. doi:10.1038/s41598 - 018- 21133- 2 . Simon HA (1955). “On a Class of Skew Distribution F unctions.” Biometrika , 4 2 (3-4), 425–440. doi:10.1093/biomet/42.3 - 4.425 . Sinatra R, W ang D, Deville P , Song C, Barab´ asi AL (2016). “Quantifying the Ev olution of Individual Scientiﬁc Impact.” Scienc e , 354 (6312). doi:10.1126/science.aa f5239 . W ang D, Song C, Barab´ asi AL (2013). “Quantifying Long-T erm Scientiﬁc Impact.” Scienc e , 342 (6154), 127–132. doi:10.1126/science.1237825 . 30 P AFit : Non-P arametric Estimation of Preferen tial A ttachmen t and No de Fitness Y ule GU (1925). “A Mathematical Theory of Evolution, Based on the Conclusions of Dr. J.C. Willis, F.R.S.” Philosophic al T r ansactions of the R oyal So ciety of L ondon B: Biolo gic al Scienc es , 213 (402-410), 21–87. doi:10.1098/rstb.1925.00 02 . Zhou H, Alexander D, Lange K (2011). “A Quasi-Newton Acceleration for High-Dimensional Optimization Algorithms.” Statistics and Computing , 21 , 261–273. doi:10.1007/ s11222- 009- 9166- 3 . Aﬃliation: Thong Pham Mathematical Statistics T eam RIKEN Cen ter for Adv anced Intelligence Pro ject, T oky o, Japan E-mail: thong.pha m@riken.jp

PAFit: an R Package for the Non-Parametric Estimation of Preferential Attachment and Node Fitness in Temporal Complex Networks

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment