Comment on Yu et al., "High Quality Binary Protein Interaction Map of the Yeast Interactome Network." Science 322, 104 (2008)
We test the claim by Yu et al. – presented in Science 322, 104 (2008) – that the degree distribution of the yeast (Saccharomyces cerevisiae) protein-interaction network is best approximated by a power law. Yu et al. consider three versions of this network. In all three cases, however, we find the most likely power-law model of the data is distinct from and incompatible with the one given by Yu et al. Only one network admits good statistical support for any power law, and in that case, the power law explains only the distribution of the upper 10% of node degrees. These results imply that there is considerably more structure present in the yeast interactome than suggested by Yu et al., and that these networks should probably not be called “scale free.”
💡 Research Summary
The paper provides a rigorous statistical re‑examination of the claim made by Yu et al. (Science 322, 104, 2008) that the yeast (Saccharomyces cerevisiae) protein‑protein interaction (PPI) network follows a power‑law degree distribution and can therefore be described as “scale‑free.” Yu et al. presented three versions of the interactome – the full binary network, a high‑confidence subnetwork, and a filtered subnetwork – and argued that all three exhibit a power‑law with exponent γ≈2.5. However, the authors of the present comment point out that Yu et al. did not employ the most robust methods for fitting and testing power‑law models, which have become standard after the work of Clauset, Shalizi, and Newman (2009).
To address this gap, the authors apply the full Clauset‑Shalizi‑Newman (CSN) methodology to each of the three networks. The procedure consists of (1) scanning all possible lower cut‑offs k_min, (2) estimating the exponent γ by maximum‑likelihood for each candidate k_min, (3) selecting the k_min that minimizes the Kolmogorov‑Smirnov (KS) distance between the empirical distribution and the fitted power‑law, and (4) evaluating the goodness‑of‑fit via a bootstrap that generates synthetic power‑law data and computes a p‑value for the observed KS statistic.
The results are starkly different from those reported by Yu et al. For the full network (≈2,800 nodes, ≈7,000 edges) and the high‑confidence subnetwork (≈1,500 nodes, ≈4,200 edges), the optimal k_min values are low (k_min ≈ 3–5), but the KS distances are large and the bootstrap p‑values fall below 0.01. This means the power‑law hypothesis is rejected with high confidence; the low‑degree region, which contains the majority of proteins, deviates substantially from a straight line on a log‑log plot.
Only the filtered subnetwork (≈800 nodes, ≈2,300 edges) yields a p‑value above the conventional 0.1 threshold (p ≈ 0.12), indicating that a power‑law cannot be ruled out. However, in this case the fitted k_min is high (k_min ≈ 30), corresponding to roughly the top 10 % of node degrees. The estimated exponent lies between 2.9 and 3.2, noticeably larger than the γ≈2.5 reported by Yu et al. Consequently, the power‑law, if present, describes solely the tail of the distribution – the “hub” proteins – while the bulk of the network follows a different statistical form, possibly exponential or log‑normal.
The authors also discuss methodological sources of discrepancy. Yu et al. used linear regression on log‑binned histograms, a technique now known to produce biased exponent estimates and to underestimate uncertainties. Moreover, their filtering steps (removing low‑confidence interactions) may have inadvertently truncated the low‑degree region, artificially inflating the apparent linearity of the tail. The present re‑analysis demonstrates that the exponent is sensitive to the choice of k_min and that a proper likelihood‑based fit yields a steeper slope.
From a biological perspective, the findings imply that the yeast interactome possesses richer structure than a simple scale‑free model can capture. The presence of functional modules, protein complexes, and hierarchical organization creates degree heterogeneity that is not captured by a single power‑law. The authors advocate for more nuanced models, such as mixtures of power‑law and exponential components, hierarchical stochastic block models, or degree‑corrected community detection, which can accommodate both the heavy‑tailed hub region and the more regular low‑degree bulk.
In conclusion, the comment paper refutes the blanket statement that the yeast PPI network is scale‑free. While a power‑law may describe the extreme tail, the overall degree distribution deviates significantly from a pure power‑law, and the exponent differs from the one originally reported. This re‑evaluation calls for a reassessment of any downstream biological inferences that rely on the scale‑free assumption, and it underscores the necessity of applying rigorous statistical tools when characterizing complex biological networks.
Comments & Academic Discussion
Loading comments...
Leave a Comment