A Preferential Attachment Paradox: How Preferential Attachment Combines with Growth to Produce Networks with Log-normal In-degree Distributions

A Preferential Attac hment P arado x: Ho w Preferential Attachment Combines with Gr owth to Pr oduce Netw orks with Log-normal In-degree Distrib utions P aul Sheridan 1,* and T aku Onodera 2 1 Hirosaki University , Depar tment of Active Lif e Promotion Science, Hirosaki, 036-8562, Japan 2 The University of T oky o , Institute of Medical Science, Human Genome Center , T okyo , 108-8639, Japan * paul.sheridan.stats@gmail.com ABSTRA CT Every network scientist kno ws that pref erential attachment combines with g rowth to produce networks with power-la w in-degree distributions. How , then, is it possib le f or the network of American Ph ysical Society journal collection citations to enjo y a log-normal citation distribution when it was f ound to hav e grown in accordance with pref erential attachment? This anomalous result, which we e xalt as the pref erential attachment par adox, has remained une xplained since the ph ysicist Sidney Redner ﬁrst made light of it o ver a decade ago . Here we propose a resolution. The chief source of the mischief , we contend, lies in Redner having relied on a measurement procedure bereft of the accuracy required to distinguish pref erential attachment from another f orm of attachment that is consistent with a log-normal in-degree distribution. There was a high-accur acy measurement procedure in use at the time, but it would ha ve ha ve been difﬁcult to use it to shed light on the paradox, due to the presence of a systematic error inducing design ﬂaw . In recent years the design ﬂaw had been recognised and corrected. W e show that the bringing of the newly corrected measurement procedure to bear on the data leads to a resolution of the parado x. Introduction The physicist Sidney Redner reported a rather curious anomaly in a decade-old study on the citation statistics of the American Physical Society (APS) journal collection 1 . Redner ef fectiv ely discovered that while the APS citation network had gro wn in accordance with a process commonly know as preferential attachment, the corresponding citation distribution closely follows a log-normal distribution on a double logarithmic scale. The network scientist will recognise preferential attachment as a process whereby the nodes of a network acquire ne w connections in proportion to the number of connections they already entertain. What makes his observations so puzzling is that growing network models based on preferential attachment hav e long been kno wn to generate networks with po wer-la w , as opposed to log-normal, in-degree distrib utions 2 – 6 . This anomaly , or paradox, as Redner referred to it, may be called for con venience the pr efer ential attachment par adox . In this paper we propose a resolution to the paradox. But we ﬁrst take pains to reproduce the anomalous ﬁndings that Redner reported. In so doing we conﬁrm that the APS citation distribution closely follows a log-normal distribution on a double logarithmic scale, and, moreov er , that the associated APS citation network had gro wn in accordance with preferential attachment. Only then do we v enture to resolve the paradox. The resolution we propose requires that two main obstacles be ov ercome. The ﬁrst is to recognise that whether preferential attachment is observed in a growing network or not depends on the choice of measurement procedure. This insight will lead us to conclude that the preferential attachment observ ed by Redner amounts to an artefact of the procedure he used to measure the process over coarse time resolutions 7 . And, what is more, when we perform the measurements at comparatively ﬁne time resolutions, the outcomes are found to be reconcilable with a form of attachment that is consistent with a log-normal citation distribution. The second obstacle is a purely technical matter related to the accurate measurement of preferential attachment at ﬁne time resolutions. The fact that the paradox has remained unresolv ed until no w is e xplained in part by the presence of a design ﬂa w in the standard ﬁne time resolution measurement procedure 8 . The ﬂaw , for which a correction has only recently been suggested 9 , has been an obstacle to progress, because it functions to distort the measurements taken using the procedure, much as a crooked ruler distorts the true lengths of the objects tha t it measures. But once these obstacles to measuring preferential attachment have been realised, we will see how a little detecti ve w ork is all that stands in the way of a deﬁniti ve resolution to the paradox. The Preferential Attachment Parado x In the previous section we outlined the preferential attachment paradox. Informally it is this: Growing network models based on growth and preferential attachment are known to generate networks with power -law in-de gree distributions. So ho w can preferential attachment combine with growth to generate networks with log-normally distributed in-de gree distributions? In particular , ho w did preferential attachment give rise to a log-normally distributed citation distrib ution in the growth of the APS citation netw ork? In this section we formulate the paradox in technical terms and illustrate it on the APS journal collection citation network. The illustration is in point of fact a careful reproduction of those anomalous results reported by Redner in the very same paper in which he ﬁrst called attention to the paradox 1 . For the sak e of revie w they are: 1) the APS citation distribution closely follows a log-normal distribution on a double logarithmic scale, and 2) the associated APS citation network had gro wn in accordance with preferential attachment. W e reproduce these results successiv ely belo w . But we begin with an o vervie w of the APS journal collection citation data. The APS Journal Collection Citation Data The APS ranks among the w orld’ s foremost learned societies for ph ysicists. The society publishes a dozen research journals that span virtually all ﬁelds of modern physics. Its journal collection citation data from July 1893 through December 2009 is freely av ailable for do wnload upon request at the society website 10 . The dataset is comprised of just ov er 450 , 000 timestamped articles and 4 , 500 , 000 intra-APS journal citations. But in keeping with Redner , whose analysis we aim to reproduce, we restrict our attention to only those articles from July 1893 up to and including June 2003. According to our tally this 110 year stretch of data cov ers precisely 347 , 038 articles and 3 , 063 , 726 citations. The mean number of citations is 8 . 8 which agrees with Redner’ s reported value. The scrupulous reader may object that Redner reports 353 , 268 articles and 3 , 110 , 839 citations ov er the same time period. In other words, there is about a 1% shortfall on our part in both instances. Roughly 40% of the missing citations are accounted for by the fact that we ﬁltered 12 , 425 duplicate citations and 115 self-citations in the course of processing the data. The remaining shortfalls are perhaps attributable to vigorous data cleansing ef forts on the part of APS technicians ov er the years. Any bibliographic dataset is readily conceptualised as a type of network called a citation network. The nodes of a citation network represent articles in such a manner that a node Y is connected to a node X if the article corresponding to X is cited by the article corresponding to Y in its references. The said connection, if it exists, is conferred with an orientation so as to point like an arro w from the citing article Y to the cited article X. Multiple connections from Y to X (i.e. duplicate citations) and self-loops from Y to Y (i.e. self-citations) are prohibited as a matter of con venience. Thus a citation network, at least in this paper , will be recognised by network aﬁcionados as a simple directed network representation of bibliographic data. And when we speak of the APS citation network without qualiﬁcation, we mean precisely this kind of representation of the APS citation data from July 1893 through June 2003 as related abov e. The APS Citation Network Citation Distrib ution The distribution of node de grees in a network is one of the most important network properties, and a deﬁning characteristic of network structure. Many readers will already be f amiliar with the notion that the in-de gr ee k of a node in a directed netw ork is the number of incoming connections it shares with other nodes, and moreov er that the in-degr ee distribution , P ( k ) , is an associated function which gi ves the proportion of nodes in the netw ork with in-degree k . In the conte xt of citation networks, a usual goal of the network scientists is to characterise the distribution of incoming citations. The y giv e the name citation distribution to the in-de gree distribution P ( k ) of a citation network, which is at once seen to giv e the proportion of papers in the network cited k times. Network scientists ha ve long e xplored ﬁtting citation distributions by a v ariety of dif ferent functional forms; see Radicchi et al. 11 for a brief re view . Sufﬁce it to say here that the question of which functional form – if any – best characterises citation distributions remains a subject of ongoing research. Redner appealed to the log-normal to describe the APS citation distrib ution in his study of citation statistics from the ﬁrst 110 years of the APS journal collection 1 . Strictly speaking, he found that visual inspection rev eals the ( complementary ) cumulative in-de gr ee distribution C ( k ) = ∑ i ≥ k P ( i ) of the APS citation distribution P APS ( k ) is well-ﬁtted by the so-called log-normal form L ( k ; β 0 , β 1 , β 2 ) = β 0 exp  − β 1 log ( k ) − β 2 log 2 ( k )  ov er a substantial range of incoming citations when β 0 = 0 . 15 , β 1 = 0 . 40 , and β 2 = 0 . 16 . In the context of a citation network, we refer to C ( k ) , which gives the proportion of papers cited at least k times, as a cumulative citation distribution . Redner concluded on the basis of the above outcome that the form of P APS ( k ) is inconsistent with a power -law for reasons described in Supplementary Note 1. The remainder of this section is dev oted to sho wing that P APS ( k ) is better described by a discretisation of the log-normal distribution log N ( k ; µ , σ ) = 1 ( k + 1 ) √ 2 π σ 2 × e xp " − ( log ( k + 1 ) − µ ) 2 2 σ 2 # (1) 2/ 13 for k ≥ 0 with location parameter µ and scale parameter σ > 0 , than by either of a corresponding discretised power -law ( k + 1 ) − γ or discretised exponential distrib ution λ e − λ ( k + 1 ) with rate parameter λ > 0 . In addition, we in vestigate the extent to which the log-normal can be said to plausibly model P APS ( k ) in absolute terms. W e use k + 1 instead of k in Eq. ( 1 ) on the one hand so as to include papers with zero citations, and on the other because doing so dovetails with the modelling framew ork that we will dev elop in a later section. The exponential distrib ution we include in our analysis as a token light-tailed alternativ e to the heavy-tailed log-normal and po wer-la w distributions. Number of Citations, k + 1 APS Cumulative Citation Distrib ution, C APS ( k ) 1 10 100 1000 10 − 5 10 − 4 10 − 3 10 − 2 10 − 1 10 0 APS Citation Data P ow er−l aw P ow er−l aw T ail Log−no rmal Log−no rmal T ail Exponential Figure 1. The APS cumulative citation distrib ution C ( k ) for all publications dating from July 1893 up to and including June 2003. The log-normal cumulati ve distribution (green) ﬁts the observ ed data better than either a power -law cumulati ve (red) or exponential cumulati ve distrib ution (blue). This holds true regardless of whether the data is ﬁt o ver its full domain (solid lines) or merely in the tail region of k (dashed lines). The cutoff v alues deﬁning the tail regions (dashed lines) are calculated using the maximum likelihood estimation method of Clauset et al. 12 described in the main text. W e used the po weRlaw R package v . 0.60.3 13 to ﬁt our trio of functional forms to the APS citation distribution. The package implements the maximum likelihood estimation methods and goodness-of-ﬁt tests, based on the K olmogorov-Smirno v (KS) test and likelihood ratios, for ﬁtting hea vy-tailed distributions to observ ed data described in Clauset et al. 12 . Their approach may be summed up as follo ws: Candidate functional forms are separately ﬁtted to an observed in-de gree distribution ov er the domain k + 1 ≥ k min by maximum likelihood, conditional on the choice of lo wer cutof f k min . An optimal k min is estimated for each candidate using a goodness-of-ﬁt based testing approach. Alternately , the value of k min may be set to 1 to ﬁt the data over its full domain. The goodness-of-ﬁt of each functional form is assessed using a KS based hypothesis testing approach. The APS cumulativ e citation distribution is plotted in Fig. 1 on a double logarithmic scale together with a medle y of ﬁtted functional form cumulati ves. The associated cumulative distributions are plotted in Fig. 1 solely on aesthetic grounds. That said, a cursory visual inspection will satisfy ev en the most quantitatively minded reader that the log-normal better describes the entire APS citation distribution ( k min =1), than either a power -law or exponential distrib ution. And just like that our modest claim is prov ed correct. This is by no means to say that the log-normal plausibly describes the APS citation distribution. On the contrary , we found that the goodness-of-ﬁt of log-normal to the data ( H 0 = log-normal with µ = 1.41, σ = 1.27, k min = 1: KS = 0.01 & P = 0.00) was poor insofar as rigorous statistical testing is concerned. Clauset et al. 12 recommend a signiﬁcance le vel of 0 . 10 as a conservati ve choice for pronouncing a gi ven functional form a plausible ﬁt to the data — for what it is worth. Thus the hypothesis that the log-normal plausibly ﬁts the APS citation distribution o ver its full domain is to be rejected. But proponents of the log-normal will be heartened to learn that this unhapp y state of aff airs is entirely rev ersed once we conﬁne our attention to either the body or tail region of the distribution. Let us take the APS citation distribution body and tail to correspond to the re gion 0 ≤ k ≤ 150 and k ≥ 20 , respecti vely . V isual inspection of the Fig. 1 plot alone is enough to conclude that the body is plausibly ﬁt by the log-normal distrib ution. W e visually selected k ≤ 150 as a conserv ativ e choice of cutof f point on account that the po weRlaw R package cannot be used to assess the goodness-of-ﬁt of a log-normal to the body of a distribution. On the other hand, we found that the APS citation distribution tail is plausibly ﬁt by the log-normal 3/ 13 distribution at signiﬁcance lev el 0.10 ( H 0 = log-normal with µ = -1.00, σ = 1.76, k min = 20: KS = 0.00 & P = 0.33). Note the cutof f k min = 20 is the minimum k yielding a plausible ﬁt of the log-normal to the data at signiﬁcance lev el 0 . 10 . For this reason, we use k ≥ 20 to deﬁne the tail of the distribution. The same, howe ver , cannot be said for a power -law ( H 0 = power -law with γ = 2.87, k min = 44: KS = 0.01 & P = 0.01). The plot of Fig. 1 serves to visually reinforce these conclusion. Let us conclude by taking stock of our ﬁndings. The log-normal distribution, we found, provides an incontrov ertibly better ﬁt to the APS citation distrib ution over its full domain, than does a po wer-law . Moreov er , the log-normal looks to ﬁt the APS citation distribution pretty nearly as judged by visual inspection, but this is not supported by rigorous statistical testing. This is because the log-normal undershoots the target in the tail of the distribution. W e may nev ertheless speak informally of the APS citation distribution as “closely” follo wing a log-normal in certain non-technical contexts. That said, we found the log-normal does provide a good ﬁt (in the technical sense) to the data when conﬁned to either the body ( 0 ≤ k ≤ 150 ) or tail ( k ≥ 20 ) of the distribution. W e are careful to be precise about which region of the distrib ution that we mean in technical contexts. In light of these considerations, the reader will do well to keep the informal and technical senses of the log-normal providing a close ﬁt to the APS citation distribution in mind. The APS Citation Network Attachment Rate A gro wing netw ork represents bibliographic data ov er time in a manner conduci ve to the quantiﬁcation of preferential attachment. Before considering ho w to represent bibliographic data as a gro wing network it should be understood that what we mean by bibliographic data is a collection of intra-referencing articles complete with timestamps of the form YYYY -MM-DD. While timestamps prov ed superﬂuous to the construction of the APS citation netw ork, to a growing netw ork representation of the APS bibliographic data they are essential. This is because a gr owing network is formally deﬁned as a nested sequence of networks, G = { G t } T t = 1 , that begins with an initial network , G 1 , with n 1 > 0 nodes and m 0 1 ≥ 0 edges and ends with a ﬁnal network , G T = G . Nesting means that the network G t at time-step t for t > 1 is obtained by augmenting G t − 1 with n t ≥ 0 nodes that form m t ≥ 0 connections with the nodes in G t − 1 and m 0 t ≥ 0 connections among the nodes ne wly added (see Figure S1 for a graphical depiction of this modelling scheme). A bibliographic dataset is represented as a gro wing network by specifying a mapping from article timestamps to sequence time-steps that preserves chronological order up to a desired lev el of time-resolution. Articles are mapped to nodes and references to directed edges within this framew ork in the obvious fashion. A few examples of growing network representations will serve to make their workings more comprehensible. T able S1 summarises the examples here described. The APS citation netw ork is a gro wing network in the tri vial sense that all article timestamps from 1893-07-01 to 2003-06-30 are mapped to a single time-step. The netw ork in this case consists of n 1 = 347 , 083 nodes and m 0 1 = 3 , 063 , 726 edges. It is sometimes con venient to qualify the APS citation network as being minimally resolved to emphasise its gro wing network nature. What may be called the maximally r esolved APS citation network falls at the opposite end of the time resolution spectrum. In this case there are as many time-steps as there are articles so that the sequence is grown by n t = 1 node with m t ≥ 0 edges at each time-step t . The value of m 0 t is equal to 0 for all t since self-citations are prohibited. Identically timestamped articles are discriminated according to the lexicographical ordering of their unique article IDs in a slight abuse of the representation. It is easy to imagine in a similar vein daily , monthly , and yearly r esolved APS citation networks lying between these two e xtremes. For example, yearly resolution means that all articles published in the same calendar year are mapped to nodes in the same time-step. For a giv en time-step t , n t is the number of articles published in the corresponding year , m t the number of citations to articles from previous years, and m 0 t the number of citations to articles in the same year . Note that journal issue print date timestamps are used to construct the APS citation network at daily resolution. Figure S2 shows a conceptual depiction of the time resolutions here described. The gro wing networks we have described here will prov e key to resolving the preferential attachment paradox in a later section. But in the present subsection, our focus is squarely on a collection of bi-epochally resolved APS citation networks. In principle, bi-epochal resolution describes the scenario when a partition of the article timestamps of a bibliographic dataset into two non-o verlapping interv als, labeled T 1 and T 2 hereafter , is used to deﬁne a gro wing network representation comprised of two time-steps. In practice, the time intervals do not al ways cov er the entire data. T able S2 summarises our reconstruction of four bi-epochally resolved gro wing network representations of the APS citation data that Redner submitted to analysis in his original study 1 . In order to characterise preferential attachment network scientists measure the rates at which articles with k citations are cited by ne w articles. This they achie ve by observing the process of citation formation o ver time as vie wed through the prism of this or that growing network representation. Loosely speaking, the attachment rate ˆ A ( k ) of a growing network is deﬁned as the likelihood that an edge from among the m t added at time-step t > 1 connects to a node of in-degree k . Jeong et al. 7 proposed to measure the attachment rate of a bi-epochally resolved gro wing network, G = { G 1 , G 2 } , as ˆ A ( k ) def = n 1 m 2 × m 2 ( k ) n 1 ( k ) , (2) 4/ 13 Number of Citations, k + 1 APS Attachment Rate, A ^ ( k ) 0 200 400 600 800 0 50 100 150 200 T 1 = 1990−99 & T 2 = 2000 T 1 = 1980−99 & T 2 = 2000 T 1 = 1970−99 & T 2 = 2000 T 1 = 1893−99 & T 2 = 2000 0 50 100 150 0 10 20 30 40 50 Figure 2. The attachment rate for v arious bi-epochally resolved APS citation networks appear linear to the naked eye ov er a wide range of k , especially in the re gion from k = 0 to about 150 (inset). Four dif ferent measurements of ˆ A ( k ) may be distinguished by colour in the plot. Each indi vidual measurement is determined by recording how the publications in time interval T 2 = 2000 cite the publications in time interv als T 1 = 1990 - 99 (blue), T 1 = 1980 - 99 (green), T 1 = 1970 - 99 (red), and T 1 = 1893 - 99 (black), respectiv ely . The data have been a veraged o ver a range of k ± 0 . 025 k . See T able S2 and the surrounding text for details. where m 2 ( k ) is the number of edges from G 2 that connect to an in-degree k node in G 1 of which there are assumed to be n 1 ( k ) in number . The factor n 1 / m 2 serves as a mathematically con venient constant of normalisation. The domain is taken to be { k | m 2 ( k ) / n 1 ( k ) 6 = 0 } under the con vention that 0 / 0 = 0 . Mark Ne wman took steps to generalise Jeong’ s measure to certain arbitrarily time resolved gro wing networks 8 . The details of Newman’ s measure are deferred to a later section. The attachment rate of a gro wing network is sometimes said to be “preferential” when the trend line of the measured ˆ A ( k ) is found to be an increasing function of k . But in this paper , we apply the term “preferential” to measured attachment rates in a more restricted sense. Namely: if ˆ A ( k ) increases linearly in k , then attachment rate is said to be pr efer ential . In this idealised case the in-degree distribution of the resulting netw ork is bound to follow a po wer-la w under certain regularity conditions 2 , 3 , 5 . Redner found the attachment rates for bi-epochally resolved growing network representations of v arious APS citation data subsets to be nearly linear functions of k with the agreement being especially pronounced for k less than 150 1 . In Fig. 2 we reproduce his experimental measurements in all b ut a few extraneous details that are discussed in Supplementary Note 2. The attachment rate ˆ A ( k ) , as deﬁned by Jeong’ s measure, is plotted for bi-epochally resolved gro wing network representations of four different subsets of the APS citation data. In each case, the measured ˆ A ( k ) is observed to approximately follo w a straight line. Thus, the presence of preferential attachment is apparently conﬁrmed in each growing network. Redner extrapolates from these bi-epochal outcomes that preferential attachment accounts for the formation of citations in the maximally resolved APS citation network. Let us provisionally accept this conclusion with the understanding that it will be overturned in due course. A Digression on Gro wing Network Models The bridge between preferential attachment and netw ork in-degree distribution is the growing network model. In this subsection we describe a general gro wing network modelling scheme that includes a number of important gro wing network models as special cases. T able 1 summarises the particular gro wing network models described in detail below . W e deﬁne a gr owing network model as a growing network subject to the following constraint: each edge from among the m t edges added at time-step t > 1 connects to a gi ven node of in-de gree k from G t − 1 with probability proportional to the attachment function A ( k ) , a time-independent function of k that gov erns the formation of ne w connections. In particular , the probability that a said edge connects to some in-degree k node from G t − 1 is giv en by π t ( k ) ∝ n t − 1 ( k ) × A ( k ) , (3) where n t − 1 ( k ) stands for the number of in-degree k nodes in G t − 1 for t > 1 . It is the form of A ( k ) together with any structural 5/ 13 constraints imposed on the v alues of T , n t , m t , and m 0 t that deﬁnes a gro wing network model. The attachment rate ˆ A ( k ) of a growing netw ork may be rightly regarded as a realisation of an attachment function, A ( k ) , as deﬁned by a compatible growing network model. Note that we allo w for multiple edges to occur between nodes in the above formulation of a gro wing network model as a matter of mathematical con venience. T able 1. An assortment gro wing network models rele vant to the present w ork . Model Name Attach. Fn. Eq. Deg. Dist. Ref. Price’ s model preferential ( 4 ) power -law 2 Jeong’ s model preferential ( 4 ) unkno wn 7 Callaway’ s model uniform ( 5 ) e xponential 14 , 15 Krapivsk y’ s model log-linear ( 6 ) div erse 5 , 6 Redner’ s model nonlinear ( 7 ) log-normal 1 In Price’ s model 2 , or rather , a mild generalisation thereof, the attachment function takes the linear form A ( k ) ∝ k + 1 , (4) where the unit of fset acts as a kind of initial attractiveness, ensuring that zero in-degree nodes stand a ﬁghting chance of acquiring new connections. The form of Eq. ( 4 ) makes precise what we mean by a preferential attachment function in this paper . The model deﬁnition is completed by taking n t = 1 and assuming the mean value m of the m t ’ s is constant over time as t becomes large. The average in-degree distrib ution of networks generated in this manner is kno wn to follo w a power -law tail with scaling exponent γ = 2 + 1 / m in the limit of lar ge T 16 . What we will call Jeong’ s model is the gro wing network model analog of the bi-epochally resolved gro wing network construction from the pre vious subsection. It consists of two networks G 1 and G 2 , i.e., T = 2 . The preferential attachment function, as deﬁned in Eq. ( 4 ), governs the formation of connections between m 2 of the n 2 nodes in G 2 with the n 1 nodes of G 1 at time-step t = 2 . Callaw ay’ s model, formulated here in the language of Price’ s model without loss of substance, is the random recursiv e tree deﬁned by substituting the uniform attachment function A ( k ) ∝ 1 (5) into Price’ s model. Callaway’ s model has been shown to generate networks with e xponentially distributed in-de gree distribu- tions 14 , 15 , and its other properties ha ve been examined at length in the classic literature 14 , 17 , 18 . An important Price’ s model generalisation deﬁned by the log-linear attachment function A ( k ) ∝ ( k + 1 ) α (6) for attachment e xponent α > 0 was analysed by Krapivsk y , Redner, and Leyvraz 5 , 6 . They ﬁttingly named their model “the growing network model, ” but we refer to it as Krapi vsky’ s model in this paper . W e have slightly redeﬁned the attachment function from the original k α + 1 for mathematical con v enience. Price’ s model corresponds to the special case when α = 1 . For 0 < α < 1 the resulting in-degree distribution tak es the form of a stretched exponential function 6 . For α > 1 all nodes connect to a handful of large hubs. Meanwhile the limiting case of α = 0 corresponds to Callaway’ s model. Note that we show the APS citation distribution ﬁtted to the stretched e xponential function predicted by Krapivsk y’ s model in Figure S3. Finally , Redner writes in passing that the growing network model obtained by substituting the nonlinear attachment function A ( k ) ∝ k + 1 1 + β log ( k + 1 ) (7) with β > 0 into Price’ s model generates networks with log-normally distrib uted in-degree distrib utions 1 . In Supplementary Note 3, we sho w that Redner’ s model, as we will call it, generates networks with in-degree distrib utions that asymptotically follow the log-normal distrib ution. The proof is adapted from an outline that was kindly supplied to the authors by Redner via email. The Preferential Attachment Parado x Illustrated on the APS Citation Data It is only in virtue of the preceding digression on gro wing network models that it has at last become possible to cast the preferential attachment paradox in a reasonably technical light. The paradoxical argument runs as follo ws: Premise 1 A preferential attachment rate giv es rise to networks with power -law in-degree distrib utions. Recall that we have deﬁned a preferential rate of attachment to mean that a gro wing network model attachment function A ( k ) increases linearly with k . 6/ 13 Premise 2 Measurement suggests a preferential rate of the attachment for the maximally resolved APS citation growing network. In other words, the observed attachment rate ˆ A ( k ) is approximately a linear function of k . Premise 3 The observed APS citation network in-de gree distribution is not well-described by a po wer-la w . Conclusion That the APS citation network has a power -law in-degree distribution follo ws from a nai ve application of Premises 1 and 2. Paradox The stated conclusion is in direct contradiction with Premise 3. In fact, measurement suggests that the APS citation network in-degree distrib ution is better described by a log-normal distribution, than by a po wer -law . The conclusion is tri vially seen to follo w from the premises. Thus it must be the case that some or another premise is either incoherent or outright false. Redner followed this line of reasoning to its contradictory conclusion for the APS citation network. But it is worth noting that the argument applies to any gro wing network featuring preferential attachment, which culminates in a network with a log-normally distrib uted in-degree distribution. The Preferential Attachment Parado x Resolved W e hav e seen that a network cannot enjoy a log-normal in-degree distrib ution and have gro wn in accordance with preferential attachment without apparently contradicting network theory . Y et, we are committed to the vie w that the APS citation network is endowed with e xactly these properties. In this section we will see that a critical examination of the premises underlying the argument leads to a ready e xplanation of the paradox. First of all, it may be outright denied that the APS citation distribution is log-normally distrib uted over its full domain; for to maintain otherwise would blindly disre gard the statistical testing outcomes presented in the subsection on modelling the APS citation network citation distribution. This line of objection, while technically correct, does not present an interesting challenge to the argument. The APS citation distribution being well-described by the log-normal in the body of k (i.e. 0 ≤ k ≤ 150 ) turns out to be enough to resolve the paradox. In fact, we will see that the extent to which the log-normal falls short of the APS citation distribution in the e xtreme tail region of k (i.e. k ≥ 150 ) is explained by an equal and opposite departure from an ideal in the APS citation network attachment rate. The upshot is that log-normality assumption may be accepted without prejudice to the argument. The second premise holds that the maximally resolved APS citation network grew in accordance with a preferential rate of attachment. According to Redner’ s argument, evidence in support of this claim is found in the linear character of the bi-epochally resolved APS citation network attachment rates from Fig. 2 . These results, it will be remembered, were extrapolated to the maximally resolv ed APS citation network as a whole. Redner’ s conclusion rests on the assumption that the bi-epochally resolved attachment rates are in fact linear . But in Fig. 3 (A) the very same attachment rates are shown plotted on a double logarithmic scale. V isual inspection rev eals the log-transformed attachment rates to not strictly adhere to straight line relationships. The linear scale plot of Fig. 2 must therefore conceal the nonlinearities made apparent in the log-log plot, since a straight line must again be such on a double logarithmic scale. This shows ho w the plotting of attachment rates on a linear scale can be misleading. Thus Redner’ s extrapolation is thro wn into jeopardy , and, as a result, his argument for the truth of the second premise collapses. The question is whether an explanation of the paradox follows from the manner by which the argument for the second premise fails to apply to the maximally resolv ed APS citation network. In the remainder of this section we venture to answer the question in the afﬁrmati ve. This brings us to the connection between the empirical world of gro wing networks and the theoretical w orld of growing network models. In particular , the measuring of a preferential rate of attachment is asserted to be a necessary and suf ﬁcient condition for concluding that a growing network is well-modelled by Price’ s model. This test for Price’ s model, which at ﬁrst sight might seem unobjectionable, is re vealed to be misleading as soon as ef forts are made to formulate it carefully . The trouble stems from implicitly assuming that a preferential rate of attachment is intelligible outside the conte xt of a gro wing network model. Howe v er , preferential attachment is always conditional on a growing netw ork model through not only the laws of edge formation, as deﬁned by the preferential attachment function of Eq. ( 4 ), b ut also the speciﬁcation of model speciﬁc structural constraints. Consequently , the most one can hope to say of a given gro wing network, even in principle, is that it exhibits preferential attachment with respect to this or that particular gro wing network model. In other words, a preferential rate of attachment is necessary (but not suf ﬁcient) for concluding that Price’ s model describes a growing network. In fact, a gro wing network, G , is obliged to satisfy four conditions in order to comply with Price’ s model. First, G ’ s initial network G 1 should be small relati ve to its ﬁnal network G , i.e., n 1  N . Practical e xperience suggests to us that N ≥ 1000 × √ n 1 serves as a good rule of thumb, but this by no means rests on a sound theoretical foundation. Second, G must grow by a single node at each time-step, i.e., n t = 1 for t > 1 . Third, the number of edges m t added at time-step t must come from a distribution with a ﬁxed mean and ﬁnite variance. Fourth, the formation of connections must be governed by the linear attachment function 7/ 13 1 10 100 1000 1 10 100 1000 Number of Citations, k + 1 APS Attachment Rate, A ^ ( k ) 1 10 100 1000 1 10 100 T 1 = 1990−99 & T 2 = 2000 T 1 = 1980−99 & T 2 = 2000 T 1 = 1970−99 & T 2 = 2000 T 1 = 1893−99 & T 2 = 2000 A B Number of Citations, k + 1 APS Attachment Rate, A ^ ( k ) Krapivsky’s Model ( AIC = −4174 ) Redner’s Model ( AIC = −15056 ) Figure 3. (A) Nonlinear tendencies in the attachment rates for v arious bi-epochally resolved APS citation networks are made apparent on a double logarithmic scale. Four dif ferent measurements of ˆ A ( k ) may be distinguished by colour in the plot. Each indi vidual measurement is determined by recording how the publications in time interv al T 2 = 2000 cite the publications in time intervals T 1 = 1990 - 99 (blue), T 1 = 1980 - 99 (green), T 1 = 1970 - 99 (red), and T 1 = 1893 - 99 (black), respectiv ely . The data have been a veraged o ver a range of k ± 0 . 025 k . See T able S2 and the surrounding text for details. (B) The attachment rate for the maximally r esolved APS citation network is best ﬁt by Redner’ s model . Shown is the attachment rate for the maximally resolved APS citation network as calculated by Ne wman’ s measure (black), the estimated log-linear attachment function of Krapivsk y’ s model (red), and the estimated nonlinear attachment function of Redner’ s model (green). The best model (the smallest AIC v alue) is Redner’ s model. Price’ s model is not included in the model comparison because it is a special case of Kravipsk y’ s model. The data have been a veraged ov er a range of k ± 0 . 025 k . of Eq. ( 4 ), i.e., preferential attachment must prev ail in the growth of the network. The test for Price’ s model that is assumed in the second premise ignores all but the last of these conditions. Let us reinterpret the Fig. 3 (A) attachment rates in the light of these new re velations. The ﬁrst thing to note is that a casual inspection of T able S2 rev eals the corresponding bi-epochally resolved APS citation networks to be in blatant violation of the Price’ s model structural constraints. They are, howe ver , consistent by deﬁnition with the Jeong’ s model structural constraints. The second thing is that there is a noticeable tendency to ward log-linearity in the attachment rates as the T 2 = 2000 articles cite the T 1 = 1893 - 99 (black), T 1 = 1970 - 99 (red), T 1 = 1980 - 99 (green), and T 1 = 1990 - 99 (blue) articles. In the last case, a log-linear ﬁt is especially not out of the question. It is interesting that we came up with a v alue of ˆ α ≈ 0 . 90 for the corresponding attachment rate exponent. This value is close to ˆ α = 1 , which is the mark of a preferential rate of attachment. So there is ev en a case to be made for the attachment rate plotted in blue being not only log-linear ( ˆ α = 0 . 90 ), but also approximately linear ( ˆ α ≈ 1 . 00 ). The same, howe ver , cannot be reasonably maintained of the other attachment rates. The point is that Jeong’ s measure, as applied by Redner to the APS citation data, may be too crude an instrument to permit for the dra wing of subtle distinctions in regard to attachment rate functional form. All this suggests that it is necessary to take seriously the misspeciﬁcation of the Price’ s model structural constraints in order to characterise APS attachment rate functional form. Fortunately , Mark Newman de vised a way a to measure attachment rates relativ e to quite a broad class of growing netw ork models 8 . Ne wman’ s measure is deﬁned according to ˆ A ( k ) def = Z W ( k ) ∑ t > 1 w t ( k ) m t ( k ) n t − 1 ( k ) , (8) with weights w t ( k ) = m t × [ n t − 1 ( k ) 6 = 0 ] that hav e sum W ( k ) = ∑ t > 1 w t ( k ) ( [ P ] denotes the Iverson brack et for gi ven proposition P ; [ P ] = 1 if P is true and 0 otherwise) and degree independent normalising constant Z = ∑ t > 1 n t − 1 / m t − 1 ; the symbol n t ( k ) is used to denote the number of in-degree k nodes in G t . Ne wman’ s measure is consistent with the Price’ s model structural constraints, because, in contrast with Jeong’ s measure, it assumes a time resolution consistent with the model. There are se veral further points reg arding the measure that warrant discussion. First, Newman committed a slight error in his original formulation of the measure, the consequence of which was to introduce a waterfall ef fect in the large k region of measured attachment rates. 8/ 13 The measure deﬁned by Eq. ( 8 ) incorporates the correction proposed by Pham et. al 9 to eliminate this artefact (see Fig. S4 for a dramatic illustration of the said waterfall ef fect). Second, it is a pleasant exercise to v erify that Eq. ( 8 ) reduces to Jeong’ s measure of Eq. ( 2 ) in the special case of Jeong’ s model. Third, Newman’ s measure assumes that the constant of proportionality implicit to Eq. ( 3 ) grows in proportion to the time-step t . This assumption holds true for Price’ s model which is deﬁned by Eq. ( 4 ) with constant m t on av erage 19 , 20 . By contrast, the measure is necessarily approximate in the cases of Krapisky’ s model (unless α = 0 or 1) and Redner’ s model. Figure 3 (B) shows what happens when Newman’ s measure is brought to bear on the maximally resolved APS citation network. The results are striking. V isual inspection makes plain that the nonlinear attachment function from Redner’ s model provides a better ﬁt to the measured attachment rate, than does the log-linear attachment function from Krapivsk y’ s model. The outcome of a model comparison, in which we used the AIC criteria to select the best model, lends numerical support to this conclusion. The AIC score is − 4174 for Krapi vsky’ s model and − 15056 for Redner’ s model. It follows that Redner’ s model compares fa vorably to that of Price, since the latter forms a special case of Krapi vsky’ s model. The resolution to the paradox is now obvious: The APS citation distribution closely follows a log-normal distribution, because the underlying network’ s gro wth is closely described by a gro wing network model (i.e. Redner’ s model) that predicts just such an outcome. This explains the paradox. It is instructiv e, as an afterthought, to extend our model comparison to the daily , monthly , and yearly resolved APS citation data. T able 2 shows that the AIC and BIC criteria selects Redner’ s model over Krapi vsky’ s model in all instances with the single exception of the yearly resolution case. This is interesting because it highlights a tendency toward log-linearity in the APS attachment rate as the time resolution decreases. Fig. 4 conv eys the effect graphically . In Panel A, the measured attachment rates are plotted for the daily , monthly , and yearly resolved data. Panel B sho ws the same attachment rates overlaid with segmented linear re gression lines of best ﬁt we calculated using the R package earth 4.4.7 21 . W e deﬁned a log-linearity score heuristic for an attachment rate as the common logarithm of the horizontal component of the longest log-linear segment. Thus the higher the score, the more “log-linear” the attachment rate. As expected, attachment rate log-linearity increases with decreasing time resolution, so that the yearly resolved attachment rate is the most log-linear . Panel C sho ws the plotted Redner’ s model and Krapivsk y’ s model attachment functions of best ﬁt. The lesson is that we can expect crudely time resolved data to exhibit a bias to ward log-linearity in measured attachment rates. T able 2. Model comparison results f or gr owing network repr esentations of the APS citation data at various time resolutions . Shown are AIC and BIC v alues for the ﬁt of the log-linear attachment function of Krapivsk y’ s model and the nonlinear one of Redner’ s model to the maximally , daily , monthly , and yearly resolved APS citation data attachment rate, respecti vely . The best model (the smallest AIC/BIC value) for each le vel of resolution is indicated in bold. Redner’ s model best describes the data at the three highest levels of resolution (maximal, daily , and monthly). Krapivsky’ s model best describes the data at the lowest le vel of resolution (yearly). Resolution Model Attach. Fn. Eq. AIC BIC Maximal Krapivsk y Log-linear ( 6 ) -4,174 -4,294 Redner Nonlinear ( 7 ) -15,056 -12,429 Daily Krapivsk y Log-linear ( 6 ) -7,262 -7,252 Redner Nonlinear ( 7 ) -12,434 -12,423 Monthly Krapivsk y Log-linear ( 6 ) -6,548 -6,538 Redner Nonlinear ( 7 ) -7,716 -7,706 Y early Krapivsky Log-linear ( 6 ) -4,207 -4,198 Redner Nonlinear ( 7 ) -3,887 -3,878 All that remains is to tie up a few loose ends. First, we have asserted that conﬁning ourselves to the range 0 ≤ k ≤ 150 would be sufﬁcient to explain the paradox. For justiﬁcation, observe how the maximally resolved APS citation network attachment rate plotted in Fig. 3 (B) overshoots the Redner’ s model attachment function after about k ≥ 150 , and the APS citation distribution plotted in Fig. 1 o vershoots the log-normal distrib ution after about k ≥ 150 . These ef fects are two sides of the same coin: the attachment rate ov ershooting the predicted attachment function (i.e. k ≥ 150 nodes acquiring citations at a higher expected rates) automatically leads to the citation distribution overshooting in the log-normal distribution (i.e. k ≥ 150 nodes are more highly connected than predicted by the log-normal). Thus the lack of agreement between theory and observation can be understood within the modelling frame work we ha ve presented, and does not detract from our ar guments. Second, the maximally resolved APS citation data, on which we rely to explain the paradox, is consistent with the constant m t on av erage assumption on which the models we ha ve considered here depend (i.e. m t = 1 for all t ). Ho wev er , it is important to point out that this assumption is violated for more coarse time resolutions. For example, the number of APS articles ha ve grown exponentially o ver time with a doubling rate of about 6 . 5 years. And lastly , the matter of whether the attachment rate 9/ 13 Y earl y Resolution 1 10 100 1000 10 − 3 10 − 2 10 − 1 10 0 10 1 Monthl y Resolution 1 10 100 1000 0.01 0.1 1 10 Dail y Resolution Number of Citation s , k + 1 APS Attachment Rate , A ^ ( k ) 1 10 100 1000 0.1 1 10 Number o f Citations , k + 1 Number of Citation s , k + 1 A B C Y earl y Resolution 1 10 100 1000 10 − 3 10 − 2 10 − 1 10 0 10 1 Linear ity Score: 2.64 Number of Citation s , k + 1 APS Attachment Rat e , A ^ ( k ) Number o f Citations , k + 1 Number of Citation s , k + 1 Monthl y Resolution 1 10 100 1000 0.01 0.1 1 10 Linear ity Score: 1.79 Dail y Resolution 1 10 100 1000 0.1 1 10 Linear ity Score: 1.29 Y earl y Resolution 1 10 100 1000 10 − 3 10 − 2 10 − 1 10 0 10 1 Krapivsky’s Model ( AIC = −4207 ) Redner’s Model ( AIC = −3887 ) Monthl y Resolution 1 10 100 1000 0.01 0.1 1 10 Krapivsky’s Model C ( A I = −6548 ) Redner’s Model A ( IC = −7716 ) Number of Citation s , k + 1 APS Attachment Rate , A ^ ( k ) Number of Citation s , k + 1 Number of Citation s , k + 1 Dail y Resolution 1 10 100 1000 0.1 1 10 Krapivsky’s Model ( AIC = −7263 ) Redner’s Model A ( IC = −12434 ) Figure 4. Overview of the measur ed attachment rates for gr owing network r epresentations of the APS journal collection data at daily , monthly , and yearly time resolutions . (A) The attachment rates plotted in isolation. Each ˆ A k was estimated using Newman’ s measure from APS journal publications dating from July 1893 through 2003 inclusiv e. The data hav e been av eraged over a range of k ± 0 . 025 k . (B) The same attachment rates ﬁtted using the segmented linear regression technique discussed in the main text. A larger linearity score reﬂects a stronger log-linear tendency in a measured attachment rate. The yearly attachment rate is the most log-linear by this score. (C) The attachment rates are ﬁtted by the estimated log-linear attachment function of Krapivsk y’ s model (red), and the estimated nonlinear attachment function of Redner’ s model (green). The best model (the smallest AIC v alue) is Redner’ s model in the case of daily and monthly time resolution, but Krapivsk y’ s model in the case of yearly time resolution. Price’ s model is not included in the model comparison because it is a special case of Kravipsk y’ s model. remains constant over time in the case of the APS citation data merits some consideration, since this is an assumption of our models. T o test the assumption, we partitioned the data from 1901 to 2000 into four non-ov erlapping time windo ws (i.e. 1901 - 74 , 1974 - 88 , 1988 - 95 , 1995 - 2000 ) and estimated the attachment e xponent separately in each case using Ne wman’ s method at maximal time resolution. The time windo ws were selected such that the number of articles are equally distrib uted. 10/ 13 The corresponding estimates for α (i.e. 0 . 97 , 0 . 94 , 1 . 05 , & 1 . 06 , respectiv ely) lend credence to the notion that the constant attachment rate assumption holds at least approximately true. Discussion The main purpose of this paper has been to resolv e the preferential attachment paradox. Our proposed resolution highlights various pitf alls that the working network scientist would do well to a void when measuring preferential attachment. First, we hav e called attention to the basic fact that an attachment rate is always measured relati ve to this or that gro wing model. Granted, this observation is not particularly important as regards the brute assessment of whether or not real-world network attachment rates increase on a verage with node de gree. Recall that this is one way to deﬁne preferential attachment. The measurement procedures that Mark Newman 8 and Jeong et. al 7 proposed in the early 2000s hav e proved adequate for conﬁrming this form of preferential attachment for numerous instances as summarised in other sources 9 , 20 . The same holds true of more recent measurement procedures 9 , 19 , 22 , 23 . But the situation is completely dif ferent for the characterisation of attachment rate functional form. In the present work we hav e seen that the APS citation network attachment rate is better modelled by a nonlinear function under maximal time resolution, and a log-linear function under yearly resolution. This serves as a cautionary tale when it comes to making model-free statements about attachment rate functional form. Second, we hav e taken pains to state the importance of using the corrected version of Newman’ s method 9 when assessing attachment rate function form at ﬁne time resolutions. Third, the importance of plotting attachment rates on a double logarithmic scale cannot be ov erstated in light of the striking contract between the plots of Figs. 2 and 3 (A). On a different note, we w ould be remiss not to comment on the conspicuous lack of statistical formalism employed in the analysis of attachment rate data. The contrast in technical sophistication between the manners in which degree distrib utions and attachment rates are characterised in the literature is striking. Analysing the APS citation distrib ution w as straightforw ard thanks to the statistical formalism of Clauset et al. 12 as implemented in the po weRlaw R package 13 . More generally , the standardisation of ﬁtting power -laws and other heavy-tailed forms to observed degree distrib utions was a direct outcome of Clauset et al. 12 . No comparable formalism exists for attachment rate analysis to our kno wledge. Although important strides in the modelling of citation dynamics are found in the work of Eom and F ortunato 24 , and Goloso vsky and Solomon 25 , 26 . This is an intolerable state of af fairs seeing that attachment rate and de gree distribution are a package deal in so far as gro wing networks are concerned. An easy-to-use statistical toolkit is needed for ﬁtting and comparing established growing network model attachment functions to observed attachment rates. Fortunately , it should be possible to adapt the maximum likelihood estimation methods and goodness-of-ﬁt tests described in Clauset et al. 12 to this purpose. Implementing the proposed methodology in Python and R would go a long way to streamline the analysis of attachment rate data in academic publications. Lastly , there is a pressing need for a re view paper on the measurement of the chief processes describing how complex networks change over time. The measuring of preferential attachment in growing networks, which has so preoccupied our thinking in the present work, is part of a larger enterprise to measure nothing short of all conjectured network evolutionary processes. Preferential attachment is one of many such processes to have been conjectured, including node ﬁtness 27 , node duplication coupled with edge rewiring 28 , homophily 29 , topological distance 8 , and node birth/death processes 4 . At least three good re views ha ve been written on generati ve network models 30 – 32 , but none on the subject of measuring the processes they embody in real-world netw orks. It is high time for a surve y of the methodological landscape and critical exposition of real-w orld ﬁndings in this area be undertaken. References 1. Redner , S. Citation Statistics from 110 Y ears of Physical Revie w. Phys. T oday 58 , 49–54 (2005). URL http: //dx.doi.org/10.1063/1.1996475 . DOI 10.1063/1.1996475. 2. de Solla Price, D. J. A general theory of bibliometric and other cumulati ve adv antage processes. J. Am. Soc. for Inf. Sci. 27 , 292–306 (1976). 3. Barab ´ asi, A.-L. & Albert, R. Emergence of Scaling in Random Networks. Sci. 286 , 509–512 (1999). URL http: //dx.doi.org/10.1126/science.286.5439.509 . DOI 10.1126/science.286.5439.509. 4. Dorogovtse v , S. N., Mendes, J. F . F . & Samukhin, A. N. Structure of Growing Networks with Preferential Linking. Phys. Rev. Lett. 85 , 4633–4636 (2000). URL http://dx.doi.org/10.1103/physrevlett.85.4633 . DOI 10.1103/physre vlett.85.4633. 5. Krapivsk y , P . L., Redner , S. & Leyvraz, F . Connectivity of Gro wing Random Networks. Phys. Rev. Lett. 85 , 4629–4632 (2000). URL http://dx.doi.org/10.1103/physrevlett.85.4629 . DOI 10.1103/physrevlett.85.4629. 6. Krapivsk y , P . L. & Redner , S. Or ganization of growing random networks. Phys. Rev. E 63 , 066123+ (2001). URL http://dx.doi.org/10.1103/physreve.63.066123 . DOI 10.1103/physreve.63.066123. 11/ 13 7. Jeong, H., N ´ eda, Z. & Barab ´ asi, A. L. Measuring preferential attachment in e volving networks. Europhys. Lett. 61 , 567–572 (2003). URL http://dx.doi.org/10.1209/epl/i2003- 00166- 9 . DOI 10.1209/epl/i2003-00166-9. 8. Newman, M. E. J. Clustering and preferential attachment in gro wing networks. Phys. Rev. E 64 (2001). URL http: //arxiv.org/abs/cond- mat/0104209 . cond- mat/0104209 . 9. Pham, T ., Sheridan, P . & Shimodaira, H. P aﬁt: A statistical method for measuring preferential attachment in temporal complex netw orks. PLoS ONE 10 , e0137796 (2015). URL http://dx.doi.org/10.1371%2Fjournal.pone. 0137796 . DOI 10.1371/journal.pone.0137796. 10. APS Journals. APS Data Sets for Researc h . http://journals.aps.org/datasets (2017). [Online; accessed 1-September-2017]. 11. Radicchi, F ., Fortunato, S. & V espignani, A. Citation networks. In Scharnhorst, A., B ¨ orner , K. & van den Besselaar , P . (eds.) Models of Science Dynamics: Encounters Between Complexity Theory and Information Sciences , 233–257 (Springer Berlin Heidelberg, Berlin, Heidelberg, 2012). URL http://dx.doi.org/10.1007/978- 3- 642- 23068- 4_7 . DOI 10.1007/978-3-642-23068-4 7. 12. Clauset, A., Shalizi, C. R. & Newman, M. E. J. Po wer-Law Distributions in Empirical Data. SIAM Rev. 51 , 661–703 (2009). 13. Gillespie, C. S. Fitting heavy tailed distributions: The poweRla w package. J. Stat. Softw. 64 , 1–16 (2015). URL http://www.jstatsoft.org/v64/i02/ . 14. Na, H. S. & Rapoport, A. Distribution of nodes of a tree by de gree. Math. Biosci. 6 , 313–329 (1970). URL http://www. sciencedirect.com/science/article/pii/0025556470900714 . DOI 10.1016/0025-5564(70)90071-4. 15. Callaway , D. S., Hopcroft, J. E., Kleinberg, J. M., Newman, M. E. J. & Strogatz, S. H. Are randomly grown graphs really random? Phys. Rev. E 64 , 041902 (2001). URL http://link.aps.org/doi/10.1103/PhysRevE.64. 041902 . DOI 10.1103/PhysRevE.64.041902. 16. Newman, M. Networks: An Intr oduction (Oxford Univ ersity Press, Inc., New Y ork, NY , USA, 2010). 17. Moon, J. W . The distance between nodes in r ecursive tr ees , 125–132. London Mathematical Society Lecture Note Series (Cambridge Univ ersity Press, 1974). 18. Meir , A. & Moon, J. On the altitude of nodes in random trees. Can. J. Math. 997–1015 (1978). URL http://dx.doi. org/10.4153/CJM- 1978- 085- 0 . DOI 10.4153/CJM-1978-085-0. 19. Massen, C. & Jonathan, P . Preferential attachment during the e volution of a potential ener gy landscape. The J. Chem. Phys. 127 , 114306 (2007). 20. Sheridan, P ., Y agahara, Y . & Shimodaira, H. Measuring preferential attachment in gro wing netw orks with missing-timelines using marko v chain monte carlo. Phys. A: Stat. Mec h. its Appl. 391 , 5031–5040 (2012). URL http://EconPapers. repec.org/RePEc:eee:phsmap:v:391:y:2012:i:20:p:5031- 5040 . 21. Milborrow , S. Deriv ed from mda:mars by Hastie, T . & Tibshirani, R. earth: Multivariate Adaptive Re gr ession Splines (2011). URL http://CRAN.R- project.org/package=earth . 22. G ´ omez, V ., Kappen, H. J. & Kaltenbrunner , A. Modeling the structure and evolution of discussion cascades. In Pr oceedings of the 22nd A CM Conference on Hyperte xt and Hypermedia , HT ’11, 181–190 (A CM, New Y ork, NY , USA, 2011). URL http://doi.acm.org/10.1145/1995966.1995992 . DOI 10.1145/1995966.1995992. 23. Kune gis, J., Blattner, M. & Moser, C. Preferential attachment in online networks: Measurement and explanations. In W ebSci’13 (France, 2013). 24. Eom, Y .-H. & Fortunato, S. Characterizing and Modeling Citation Dynamics. PLoS ONE 6 , e24926+ (2011). URL http://dx.doi.org/10.1371/journal.pone.0024926 . DOI 10.1371/journal.pone.0024926. 25. Golosovsk y , M. & Solomon, S. Stochastic dynamical model of a growing citation netw ork based on a self-exciting point process. Phys. Rev. Lett. 109 , 098701 (2012). URL https://link.aps.org/doi/10.1103/PhysRevLett. 109.098701 . DOI 10.1103/PhysRevLett.109.098701. 26. Golosovsk y , M. & Solomon, S. Growing complex netw ork of citations of scientiﬁc papers: Modeling and measurements. Phys. Rev. E 95 , 012324 (2017). URL https://link.aps.org/doi/10.1103/PhysRevE.95.012324 . DOI 10.1103/PhysRe vE.95.012324. 27. Bianconni, G. & Barab ´ asi, A. Competition and multiscaling in evolving netw orks. Eur ophys. Lett. 54 , 436 (2001). 12/ 13 28. Pastor -Satorras, R., Smith, E. & Sol ´ e, R. V . Evolving protein interaction netw orks through gene duplication. J. Theor. Biol. 222 , 199 – 210 (2003). URL http://www.sciencedirect.com/science/article/pii/ S0022519303000286 . DOI http://dx.doi.org/10.1016/S0022-5193(03)00028-6. 29. McPherson, M., Lovin, L. S. & Cook, J. M. Birds of a Feather: Homophily in Social Networks. Annu. Rev. So- ciol. 27 , 415–444 (2001). URL http://dx.doi.org/10.1146/annurev.soc.27.1.415 . DOI 10.1146/an- nurev .soc.27.1.415. 30. Albert, R. & Barab ´ asi, A.-L. Statistical mechanics of complex networks. Rev. Mod. Phys. 74 , 47–97 (2002). URL http://link.aps.org/doi/10.1103/RevModPhys.74.47 . DOI 10.1103/RevModPhys.74.47. 31. Boccaletti, S., Latora, V ., Moreno, Y ., Chav ez, M. & Hwang, D. Complex networks: Structure and dynamics. Phys. Reports 424 , 175–308 (2006). URL http://dx.doi.org/10.1016/j.physrep.2005.10.009 . DOI 10.1016/j.physrep.2005.10.009. 32. Holme, P . Modern temporal netw ork theory: a colloquium. The Eur. Phys. J. B 88 , 1–30 (2015). URL http: //dx.doi.org/10.1140/epjb/e2015- 60657- 4 . DOI 10.1140/epjb/e2015-60657-4. Ackno wledgements W e kindly thank Sidney Redner for supplying us with a sketch of the proof which appears in Supplementary Note 3 and Thong Pham for some helpful discussions. A uthor contributions statement P .S. and T .O. conceived the analysis, P .S. conducted the analysis. P .S. and T .O. wrote the manuscript. Competing interests The authors declare that they ha ve no competing interests. 13/ 13

A Preferential Attachment Paradox: How Preferential Attachment Combines with Growth to Produce Networks with Log-normal In-degree Distributions

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment