Mean clustering coefficients: the role of isolated nodes and leafs on clustering measures for small-world networks
Many networks exhibit the small-world property of the neighborhood connectivity being higher than in comparable random networks. However, the standard measure of local neighborhood clustering is typically not defined if a node has one or no neighbors. In such cases, local clustering has traditionally been set to zero and this value influenced the global clustering coefficient. Such a procedure leads to underestimation of the neighborhood clustering in sparse networks. We propose to include $\theta$ as the proportion of leafs and isolated nodes to estimate the contribution of these cases and provide a formula for estimating a clustering coefficient excluding these cases from the Watts and Strogatz (1998 Nature 393 440-2) definition of the clustering coefficient. Excluding leafs and isolated nodes leads to values which are up to 140% higher than the traditional values for the observed networks indicating that neighborhood connectivity is normally underestimated. We find that the definition of the clustering coefficient has a major effect when comparing different networks. For metabolic networks of 43 organisms, relations changed for 58% of the comparisons when a different definition was applied. We also show that the definition influences small-world features and that the classification can change from non-small-world to small-world network. We discuss the use of an alternative measure, disconnectedness D, which is less influenced by leafs and isolated nodes.
💡 Research Summary
The paper addresses a fundamental bias in the widely used clustering coefficient for complex networks. The traditional local clustering coefficient C_i = Γ_i / (k_i·(k_i‑1)) is undefined for nodes with degree k_i ≤ 1, and the common practice is to assign C_i = 0 (or occasionally 1). When these values are included in the global averages—either the simple mean C₁ = (1/N)∑C_i or the normalized version C₂ = ∑Γ_i / ∑k_i·(k_i‑1)—the resulting global clustering coefficient is systematically lowered in sparse networks that contain many leaf nodes (degree = 1) or isolated nodes (degree = 0).
To correct this, the author introduces θ, the proportion of nodes with degree ≤ 1, and defines a new global clustering coefficient C′ = C₁ / (1 − θ). This formulation simply rescales the traditional mean by the factor f = 1/(1 − θ), which can be substantial when θ is large. The paper derives this relationship analytically and demonstrates that the increase factor f can reach up to 2.42 for the networks examined.
Empirical analysis covers a diverse set of real‑world networks: the C. elegans neuronal network, metabolic networks of 43 organisms, the yeast protein‑protein interaction (PPI) network, the German Autobahn system, the western US power grid, and the World‑Wide‑Web. For each network the author reports N (number of nodes), edge density d, θ, and the factor of increase from C₁ to C′. θ ranges from 0.02 in the neuronal network to 0.59 in the web, leading to C′ values that are 1.02‑ to 2.42‑times larger than the traditional C₁. In the yeast PPI network, C₁ = 14.4 % and C₂ = 8.4 % increase to C′ = 18.7 %, a 30 % rise over C₁ and more than double C₂.
The impact on network comparison is striking. By examining all 903 pairwise comparisons among the 43 metabolic networks, the author finds that 58 % of the ordering relations (which network has higher clustering) are reversed when using C′ instead of C₁. Using C₂ the reversal rate rises to 76 %, and switching between C₁ and C₂ alone changes 77 % of the relations. This demonstrates that the choice of clustering definition can dramatically alter conclusions about relative network organization.
Small‑worldness, quantified as σ = (C/C_random) / (L/L_random), is also affected. Since C′ is larger when θ is larger, σ typically increases for real networks because θ in the original networks exceeds θ in the corresponding Erdős‑Rényi random graphs. The author confirms this by generating random graphs with the same size and edge count, showing that θ_random is consistently lower. Conversely, synthetic small‑world networks generated by rewiring a lattice (the classic Watts‑Strogatz method) exhibit lower θ than their random counterparts, leading to a decrease in σ when C′ is used. An alternative “inverse rewiring” model that starts from a random graph and deliberately creates more isolated nodes reproduces the higher θ observed in real metabolic networks, further supporting the role of leaf and isolated nodes in shaping small‑world metrics.
Finally, the paper proposes an alternative metric, disconnectedness D, which measures the fraction of node pairs that are not connected by any path. D is less sensitive to leaf and isolated nodes and can serve as a complementary indicator of network cohesion.
In summary, the study reveals that the conventional clustering coefficient conflates genuine neighborhood connectivity with the prevalence of leaf and isolated nodes, leading to systematic underestimation of clustering in sparse networks. By explicitly accounting for the proportion of such nodes (θ) and using the adjusted coefficient C′, researchers obtain a more faithful representation of local cohesion, avoid misleading comparisons, and achieve more reliable assessments of small‑world properties. The work has practical implications for a wide range of fields—biology, transportation, power systems, and the internet—where network sparsity is common and accurate quantification of clustering is essential.
Comments & Academic Discussion
Loading comments...
Leave a Comment