Node similarity within subgraphs of protein interaction networks

Reading time: 6 minute
...

📝 Original Info

  • Title: Node similarity within subgraphs of protein interaction networks
  • ArXiv ID: 0707.2076
  • Date: 2009-11-13
  • Authors: Researchers from original ArXiv paper

📝 Abstract

We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n=4 to n=12) in four different protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs -- each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a difference in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogues in S. cerevisiae have been reported to co-cluster into the same complexes. Indeed, we find that these paralogous proteins are over-represented as twins compared to pairs chosen at random. These results indicate that twinness can detect ancestral relationships from currently available PIN data.

💡 Deep Analysis

Deep Dive into Node similarity within subgraphs of protein interaction networks.

We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n=4 to n=12) in four different protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs – each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a difference in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogue

📄 Full Content

arXiv:0707.2076v2 [q-bio.MN] 17 Aug 2007 Node Similarity Within Subgraphs of Protein Interaction Networks Orion Penner,1 Vishal Sood,1, 2 Gabriel Musso,3 Kim Baskerville,4 Peter Grassberger,1, 2 and Maya Paczuski1 1Complexity Science Group, University of Calgary, Calgary, Alberta T2N 1N4, Canada 2Institute for Biocomplexity and Informatics, University of Calgary, Calgary, Alberta T2N 1N4, Canada 3Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Ontario M5S 3E1, Canada 4Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada (Dated: November 13, 2018) We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n = 4 to n = 12) in four different protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs – each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a difference in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogues in S. cerevisiae have been reported to co-cluster into the same complexes. Indeed, we find that these paralogous proteins are over-represented as twins compared to pairs chosen at random. These results indicate that twinness can detect ancestral relationships from currently available PIN data. PACS numbers: 87.14.Ee, 02.70.Uu, 87.10.+e, 89.75.Fb, 89.75.Hc I. INTRODUCTION Proteins constitute the machinery that carry out cellular processes by forming stable or transitory complexes with each other – organized perhaps into a web of overlapping modules. Information about this complex system can be condensed into a protein interaction network (PIN), which is a graph where nodes are proteins and links are measured or inferred pairwise binding interactions in a cell. Major efforts over the years devoted to resolving protein interactions have employed both small-scale and large-scale techniques. High throughput methods, such as yeast two hybrid (Y2H) and tandem affinity purification (TAP) have recently generated vast amounts of protein interaction data [1, 2, 3], allowing PINs from different organisms, experiments, research teams etc. to be compared. A basic statistical feature of any network is its degree distribution, P(k), for the number of links, k, connected to a node. In this respect, a variety of networks have been shown [4] to deviate decisively from a random graph, where the degree distribution is Poisson. In fact, early work suggested that degree distributions for PINs were power-law or scale-free [5]. However, as demonstrated in Fig. 1, degree distributions for recently obtained PINs are neither power- law nor particularly stable across different state-of-the-art constructions for the same organism – here the budding yeast S. cerevisiae. Note that all of the data sets studied here are based on the TAP-MS technique, except for Batada et al., which is a compilation of data obtained from a number of different techniques. Despite the empirical inconsistency presented by PIN degree distributions, similar local structures can stand out when each network is compared to a null model where links are switched while retaining the original degree sequence [6, 7]. Subgraphs that are significantly over-abundant are referred to as motifs, while subgraphs that are significantly under-represented are referred to as anti-motifs [6]. It has been reported that proteins within motifs are more conserved than other proteins [8]. In PINs, dense subgraphs containing 3 or 4 nodes are motifs, while tree-like subgraphs are anti-motifs [9]. Complementary to the search for motifs, graph clustering algorithms have been applied to identify components or complexes in PINs. By construction, these components tend to contain a high density of links but are weakly connected to the rest of the network (see e.g. Refs. [10, 11]). Complexes identified in this manner can contain up to 100 or more proteins, with on-going debate [11] as to their biological significance. However, biological processes such as signal transduction, cell-fate regulation, transcription, and translation typically involve a few tens of proteins. In previous work [10] mesoscale (5-25) protein clusters have also been identified using graph clustering algorithms. These clusters were matched with groups of proteins known to form complex macromolecular structures, or modules of proteins that participate i

…(Full text truncated)…

Reference

This content is AI-processed based on ArXiv data.

Start searching

Enter keywords to search articles

↑↓
ESC
⌘K Shortcut