Uncovering disassortativity in large scale-free networks
Mixing patterns in large self-organizing networks, such as the Internet, the World Wide Web, social and biological networks are often characterized by degree-degree dependencies between neighbouring nodes. In this paper we propose a new way of measuring degree-degree dependencies. One of the problems with the commonly used assortativity coefficient is that in disassortative networks its magnitude decreases with the network size. We mathematically explain this phenomenon and validate the results on synthetic graphs and real-world network data. As an alternative, we suggest to use rank correlation measures such as Spearman’s rho. Our experiments convincingly show that Spearman’s rho produces consistent values in graphs of different sizes but similar structure, and it is able to reveal strong (positive or negative) dependencies in large graphs. In particular, we discover much stronger negative degree-degree dependencies} in Web graphs than was previously thought. {Rank correlations allow us to compare the assortativity of networks of different sizes, which is impossible with the assortativity coefficient due to its genuine dependence on the network size. We conclude that rank correlations provide a suitable and informative method for uncovering network mixing patterns.
💡 Research Summary
The paper addresses a fundamental problem in the quantitative analysis of large complex networks: the measurement of degree‑degree correlations, commonly referred to as assortativity. The standard assortativity coefficient, introduced by Newman, is the Pearson correlation between the degrees of nodes at either end of an edge. While this metric works reasonably well for small or moderately sized graphs, the authors demonstrate both theoretically and empirically that it suffers from a severe size‑dependence in disassortative (negative‑correlation) scale‑free networks.
In the first part of the work, the authors derive the scaling behavior of the Pearson‑based assortativity coefficient for random graphs generated by the configuration model and by preferential‑attachment processes. Because the degree distribution follows a power law with exponent γ∈(2,3) in most real‑world networks, the second and third moments of the degree distribution diverge with the number of vertices N. The variance term that appears in the denominator of the Pearson correlation therefore grows roughly as N^{(3‑γ)/(γ‑1)}. Consequently, even if the underlying structural tendency (e.g., high‑degree nodes preferentially linking to low‑degree nodes) remains unchanged, the measured coefficient shrinks toward zero as N increases. This explains why many published studies report only weak negative assortativity for massive Internet or Web graphs, despite visual evidence of strong hub‑to‑leaf wiring.
To overcome this limitation, the authors propose using rank‑based correlation measures, specifically Spearman’s ρ (and, to a lesser extent, Kendall’s τ). By replacing raw degree values with their ranks before computing the Pearson correlation, the influence of extreme degree values is dramatically reduced. Rank correlations are insensitive to the heavy‑tailed nature of the degree distribution and, crucially, they do not depend on N for graphs that share the same underlying mixing pattern.
The experimental section validates these claims on four fronts. First, synthetic graphs with controlled exponents (γ ranging from 2.1 to 3.5) and sizes (10³ to 10⁶ nodes) are generated. Spearman’s ρ remains essentially constant across orders of magnitude in N, while the traditional assortativity coefficient decays toward zero at a rate consistent with the theoretical analysis. Second, a suite of real‑world networks is examined, including several large Web crawls (Stanford Web, Common Crawl), the autonomous‑system (AS) level Internet topology, and representative social and biological networks. For the Web graphs, previously reported Pearson assortativity values hover around –0.05, suggesting only mild disassortativity. In contrast, Spearman’s ρ reveals values between –0.30 and –0.45, indicating a much stronger tendency for high‑degree pages to connect to low‑degree pages. Similar patterns are observed in the AS network (moderate negative ρ) and in social networks (positive ρ, reflecting homophily among high‑degree individuals). Third, the authors explicitly compare pairs of networks that differ dramatically in size but are believed to share the same generative mechanism; Spearman’s ρ yields nearly identical numbers, whereas the Pearson coefficient diverges. Fourth, robustness checks confirm that the results are not artifacts of sampling, degree‑preserving randomization, or the choice between Spearman’s ρ and Kendall’s τ.
The discussion emphasizes the practical implications of adopting rank‑based measures. Because ρ is scale‑independent, researchers can now meaningfully compare assortativity across datasets collected at different times, from different domains, or at different resolutions. This opens the door to longitudinal studies of network evolution, cross‑domain meta‑analyses, and more reliable validation of generative models that aim to reproduce observed mixing patterns. Moreover, rank correlations are computationally cheap (O(E log E) for sorting) and integrate seamlessly with existing network‑analysis toolkits.
In conclusion, the paper makes three key contributions: (1) a rigorous mathematical explanation of why the Pearson assortativity coefficient diminishes with network size in heavy‑tailed, disassortative graphs; (2) a compelling empirical demonstration that Spearman’s ρ provides a stable, size‑independent alternative; and (3) a re‑evaluation of the mixing patterns of several canonical large‑scale networks, revealing stronger degree‑degree dependencies than previously recognized. The authors argue convincingly that rank‑based correlation measures should replace the traditional assortativity coefficient as the standard metric for quantifying mixing patterns in complex networks. Future work is suggested on extending rank‑based analysis to temporal networks, multilayer structures, and to the development of hypothesis‑testing frameworks that exploit the distributional properties of ρ in network ensembles.
Comments & Academic Discussion
Loading comments...
Leave a Comment