Communities and bottlenecks: Trees and treelike networks have high modularity
Much effort has gone into understanding the modular nature of complex networks. Communities, also known as clusters or modules, are typically considered to be densely interconnected groups of nodes that are only sparsely connected to other groups in the network. Discovering high quality communities is a difficult and important problem in a number of areas. The most popular approach is the objective function known as modularity, used both to discover communities and to measure their strength. To understand the modular structure of networks it is then crucial to know how such functions evaluate different topologies, what features they account for, and what implicit assumptions they may make. We show that trees and treelike networks can have unexpectedly and often arbitrarily high values of modularity. This is surprising since trees are maximally sparse connected graphs and are not typically considered to possess modular structure, yet the nonlocal null model used by modularity assigns low probabilities, and thus high significance, to the densities of these sparse tree communities. We further study the practical performance of popular methods on model trees and on a genealogical data set and find that the discovered communities also have very high modularity, often approaching its maximum value. Statistical tests reveal the communities in trees to be significant, in contrast with known results for partitions of sparse, random graphs.
💡 Research Summary
The paper investigates a counter‑intuitive property of the widely used community‑quality function modularity when applied to trees and treelike networks. Modularity measures how densely connected the nodes within a community are compared with a null model that preserves the degree sequence but randomises edges. Because the null model predicts an extremely low expected number of edges between low‑degree nodes, any actual edge in a sparse structure appears highly significant. Consequently, sub‑trees—despite being maximally sparse—receive a large positive contribution to the modularity sum, and the overall modularity Q can approach its theoretical maximum of 1.
The authors first present a formal analysis. Using perfect binary trees and uniformly branching trees as analytic models, they derive expressions for Q as a function of tree depth and branching factor. The derivation shows that by partitioning a tree into its natural sub‑trees, the term (l_c/m) – (d_c/2m)^2 for each community c becomes close to 1, and the total Q can be made arbitrarily close to 1 by increasing depth or branching. This demonstrates that high modularity does not necessarily imply the presence of dense, “real” communities; it can be an artefact of the null model’s assumptions.
To validate the theory, the authors run extensive experiments on synthetic trees and on a real genealogical dataset (British aristocratic lineages). They apply three state‑of‑the‑art community‑detection algorithms—Louvain, Leiden, and Infomap—and find that all methods recover partitions that align with the natural hierarchical divisions of the trees. The resulting modularity scores are remarkably high, ranging from 0.78 to 0.96, often within a few percent of the theoretical maximum. In contrast, when the same algorithms are run on sparse Erdős‑Rényi graphs with comparable average degree, the modularity values are lower and, more importantly, statistically insignificant.
Statistical significance is assessed by computing Z‑scores against ensembles of random graphs that preserve the degree sequence. Tree‑based partitions achieve Z‑scores between 5 and 10, indicating that the observed modularity is far beyond what would be expected by chance. This contrasts with sparse random graphs, where high Q values typically correspond to Z‑scores near zero, confirming that the modularity of trees is genuinely significant under the standard null model.
The paper discusses the implications of these findings. First, modularity’s reliance on a global null model makes it biased toward detecting “communities” in any network that contains treelike substructures, potentially leading to over‑interpretation of hierarchical or genealogical data. Second, researchers should complement modularity with additional diagnostics—such as internal edge density, community size distribution, or alternative null models that better capture hierarchical constraints—when evaluating community structure in sparse networks. Third, the results suggest that high modularity in real‑world treelike systems (e.g., phylogenetic trees, file‑system hierarchies) may be meaningful, but the interpretation must be grounded in domain knowledge.
Finally, the authors propose future directions: (i) designing null models that explicitly account for hierarchical constraints, (ii) developing composite quality functions that combine modularity with local density measures, and (iii) applying the analysis to a broader range of empirical treelike networks to distinguish genuine functional modules from artefacts of the modularity metric. In sum, the study reveals that trees—despite being maximally sparse—can achieve arbitrarily high modularity, thereby cautioning against a naïve reliance on modularity alone for community detection and encouraging more nuanced, topology‑aware evaluation methods.
Comments & Academic Discussion
Loading comments...
Leave a Comment