Biological network comparison using graphlet degree distribution

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Analogous to biological sequence comparison, comparing cellular networks is an important problem that could provide insight into biological understanding and therapeutics. For technical reasons, comparing large networks is computationally infeasible, and thus heuristics such as the degree distribution have been sought. It is easy to demonstrate that two networks are different by simply showing a short list of properties in which they differ. It is much harder to show that two networks are similar, as it requires demonstrating their similarity in all of their exponentially many properties. Clearly, it is computationally prohibitive to analyze all network properties, but the larger the number of constraints we impose in determining network similarity, the more likely it is that the networks will truly be similar. We introduce a new systematic measure of a network’s local structure that imposes a large number of similarity constraints on networks being compared. In particular, we generalize the degree distribution, which measures the number of nodes ’touching’ k edges, into distributions measuring the number of nodes ’touching’ k graphlets, where graphlets are small connected non-isomorphic subgraphs of a large network. Our new measure of network local structure consists of 73 graphlet degree distributions (GDDs) of graphlets with 2-5 nodes, but it is easily extendible to a greater number of constraints (i.e. graphlets). Furthermore, we show a way to combine the 73 GDDs into a network ‘agreement’ measure. Based on this new network agreement measure, we show that almost all of the 14 eukaryotic PPI networks, including human, are better modeled by geometric random graphs than by Erdos-Reny, random scale-free, or Barabasi-Albert scale-free networks.

💡 Research Summary

The paper addresses the challenging problem of comparing large biological networks, where exhaustive analysis of all possible structural properties is computationally infeasible. Traditional heuristics such as the degree distribution capture only a single aspect of network topology (the number of edges incident to each node) and therefore provide a weak constraint when assessing similarity. To impose many more constraints without prohibitive cost, the authors introduce the concept of graphlet degree distributions (GDDs). A graphlet is defined as a small, connected, non‑isomorphic induced subgraph; the study considers all graphlets containing 2, 3, 4, and 5 nodes, which amount to 73 distinct types. For each graphlet, the “graphlet degree” of a node is the number of times that node participates in an instance of that graphlet. The distribution of these degrees over all nodes constitutes the graphlet degree distribution for that graphlet. Consequently, a network is characterized by a vector of 73 distributions, each encoding a different local wiring pattern.

To compare two networks, each of the 73 GDDs is first normalized (so that the area under each distribution sums to one). A distance metric—such as the L1 norm or Jensen‑Shannon divergence—is then computed between the corresponding normalized GDDs of the two networks. The 73 individual distances are aggregated into a single “agreement” score, typically by taking a weighted average or geometric mean. The agreement score lies in the interval

Biological network comparison using graphlet degree distribution

💡 Research Summary

Comments & Academic Discussion

Leave a Comment