Condensed Matter / cond-mat.stat-mech Quantitative Biology / q-bio.MN

Node similarity within subgraphs of protein interaction networks

February 23, 2026

Reading time: 6 minute

...

#Condensed Matter #Network #Quantitative Biology

📝 Original Info

Title: Node similarity within subgraphs of protein interaction networks
ArXiv ID: 0707.2076
Date: 2009-11-13
Authors: Researchers from original ArXiv paper

📝 Abstract

We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n=4 to n=12) in four different protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs -- each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a difference in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogues in S. cerevisiae have been reported to co-cluster into the same complexes. Indeed, we find that these paralogous proteins are over-represented as twins compared to pairs chosen at random. These results indicate that twinness can detect ancestral relationships from currently available PIN data.

💡 Deep Analysis

Deep Dive into Node similarity within subgraphs of protein interaction networks.

We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n=4 to n=12) in four different protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs – each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a difference in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogue

📄 Full Content

arXiv:0707.2076v2 [q-bio.MN] 17 Aug 2007 Node Similarity Within Subgraphs of Protein Interaction Networks Orion Penner,1 Vishal Sood,1, 2 Gabriel Musso,3 Kim Baskerville,4 Peter Grassberger,1, 2 and Maya Paczuski1 1Complexity Science Group, University of Calgary, Calgary, Alberta T2N 1N4, Canada 2Institute for Biocomplexity and Informatics, University of Calgary, Calgary, Alberta T2N 1N4, Canada 3Department of Medical Genetics and Microbiology, University of Toronto, Toronto, Ontario M5S 3E1, Canada 4Perimeter Institute for Theoretical Physics, Waterloo, Ontario N2L 2Y5, Canada (Dated: November 13, 2018) We propose a biologically motivated quantity, twinness, to evaluate local similarity between nodes in a network. The twinness of a pair of nodes is the number of connected, labeled subgraphs of size n in which the two nodes possess identical neighbours. The graph animal algorithm is used to estimate twinness for each pair of nodes (for subgraph sizes n = 4 to n = 12) in four diﬀerent protein interaction networks (PINs). These include an Escherichia coli PIN and three Saccharomyces cerevisiae PINs – each obtained using state-of-the-art high throughput methods. In almost all cases, the average twinness of node pairs is vastly higher than expected from a null model obtained by switching links. For all n, we observe a diﬀerence in the ratio of type A twins (which are unlinked pairs) to type B twins (which are linked pairs) distinguishing the prokaryote E. coli from the eukaryote S. cerevisiae. Interaction similarity is expected due to gene duplication, and whole genome duplication paralogues in S. cerevisiae have been reported to co-cluster into the same complexes. Indeed, we ﬁnd that these paralogous proteins are over-represented as twins compared to pairs chosen at random. These results indicate that twinness can detect ancestral relationships from currently available PIN data. PACS numbers: 87.14.Ee, 02.70.Uu, 87.10.+e, 89.75.Fb, 89.75.Hc I. INTRODUCTION Proteins constitute the machinery that carry out cellular processes by forming stable or transitory complexes with each other – organized perhaps into a web of overlapping modules. Information about this complex system can be condensed into a protein interaction network (PIN), which is a graph where nodes are proteins and links are measured or inferred pairwise binding interactions in a cell. Major eﬀorts over the years devoted to resolving protein interactions have employed both small-scale and large-scale techniques. High throughput methods, such as yeast two hybrid (Y2H) and tandem aﬃnity puriﬁcation (TAP) have recently generated vast amounts of protein interaction data [1, 2, 3], allowing PINs from diﬀerent organisms, experiments, research teams etc. to be compared. A basic statistical feature of any network is its degree distribution, P(k), for the number of links, k, connected to a node. In this respect, a variety of networks have been shown [4] to deviate decisively from a random graph, where the degree distribution is Poisson. In fact, early work suggested that degree distributions for PINs were power-law or scale-free [5]. However, as demonstrated in Fig. 1, degree distributions for recently obtained PINs are neither power- law nor particularly stable across diﬀerent state-of-the-art constructions for the same organism – here the budding yeast S. cerevisiae. Note that all of the data sets studied here are based on the TAP-MS technique, except for Batada et al., which is a compilation of data obtained from a number of diﬀerent techniques. Despite the empirical inconsistency presented by PIN degree distributions, similar local structures can stand out when each network is compared to a null model where links are switched while retaining the original degree sequence [6, 7]. Subgraphs that are signiﬁcantly over-abundant are referred to as motifs, while subgraphs that are signiﬁcantly under-represented are referred to as anti-motifs [6]. It has been reported that proteins within motifs are more conserved than other proteins [8]. In PINs, dense subgraphs containing 3 or 4 nodes are motifs, while tree-like subgraphs are anti-motifs [9]. Complementary to the search for motifs, graph clustering algorithms have been applied to identify components or complexes in PINs. By construction, these components tend to contain a high density of links but are weakly connected to the rest of the network (see e.g. Refs. [10, 11]). Complexes identiﬁed in this manner can contain up to 100 or more proteins, with on-going debate [11] as to their biological signiﬁcance. However, biological processes such as signal transduction, cell-fate regulation, transcription, and translation typically involve a few tens of proteins. In previous work [10] mesoscale (5-25) protein clusters have also been identiﬁed using graph clustering algorithms. These clusters were matched with groups of proteins known to form complex macromolecular structures, or modules of proteins that participate i

…(Full text truncated)…

📄 Read Full PDF on ArXiv

Reference

This content is AI-processed based on ArXiv data.

Node similarity within subgraphs of protein interaction networks

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Table of Contents

Table of Contents

📝 Original Info

📝 Abstract

💡 Deep Analysis

📄 Full Content

Reference

Related Posts

Computing equilibrium concentrations for large heterodimerization networks

Finite size effects and symmetry breaking in the evolution of networks of competing Boolean nodes

Damage Spreading in Spatial and Small-world Random Boolean Networks

Start searching

No results found