From Local Measurements to Network Spectral Properties: Beyond Degree Distributions

It is well-known that the behavior of many dynamical processes running on networks is intimately related to the eigenvalue spectrum of the network. In this paper, we address the problem of inferring global information regarding the eigenvalue spectru…

Authors: Victor M. Preciado, Ali Jadbabaie

Research in complex networks has important applications in today's massive networked systems, including the Internet, the World-Wide Web (WWW), as well as social, biological and chemical networks [1]. The availability of massive databases, and reliable tools for data analysis provides a powerful framework to explore structural properties of largescale networks [2]. In many real-world cases, it is impossible to efficiently retrieve and/or store the exact structure of a complex network due to, for example, a prohibitively large network size or privacy/security concerns. On the other hand, it is often possible to gather a great deal of information by examining local samples of the graph topology, such as the degree distribution, which are usually easy to collect. In this context, a challenging problem is to find relationships between the set of local samples of the network structure and its global functionality. Many dynamical processes on networks, such as random walks [3], virus/rumor spreading [4], [5], and synchronization of oscillators [6], [7], are interesting to study in the context of large-scale complex networks. The behavior of many of these processes are intimately related to the eigenvalue spectra of the underlying graph structure [8], [9]. For example, the speed of spreading of a virus is directly related with the spectral radius of the adjacency matrix of the network [4]. Hence, spectral graph theory provides us with a framework to study the relationship between the network structure and its dynamical behavior. Network motifs are small subgraphs that are present in a network with a much higher frequently than in random networks with the same degree sequence [10]. There is both empirical and theoretical evidence showing that these subgraphs play a key role in the network's function and organization [11]. One of the main objectives of this paper is to explicitly relate global properties of a given network with the presence of certain small subgraphs that can be counted via local measurements. We focus our attention on global properties related with the network's eigenvalues, in particular, the socalled spectral moments. Since the spectral properties are known to have a direct influence on the network dynamical behavior, our result builds a bridge between local network measurements (i.e., the presence of small subgraphs) and global dynamical behavior (via the spectral moments). It is also worth pointing out that, although a set of spectral moments is not enough to completely describe the spectral distribution of a network, it allows us to extract a great deal of information. For example, Popescu and Bertsimas provide in [12] an optimization framework for computing optimal bounds on the properties of a distribution from moments constrains. More generally, there is a variety of techniques that can be applied to extract spectral information from a truncated sequence of spectral moments [8]. Based on our results, we also propose a novel decentralized algorithm to efficiently aggregate a set of local network measurements into global spectral moments. Our work is related to [13], where a fully distributed algorithm is proposed to compute the full set of eigenvalues and eigenvectors of a matrix representing the network topology. In contrast, our approach is computationally much cheaper, since it does not require a complete eigenvalue decomposition. Furthermore, our approach also provides a clearer view of the role of certain subgraphs in the network's dynamical behavior. The rest of this paper is organized as follows. In the next section, we review graph-theoretical terminology and introduce definitions needed in our derivations. In Section III, we derive explicit relationships between the moments of the eigenvalue spectrum and local network measurements. Based on these expressions, we introduce a distributed algorithm to compute these moments in Section IV. We conclude the paper mentioning some future work. Let G = (V, E) denote a simple undirected graph (with no self-loops) on n nodes, where we call nodes v i and v j adjacent (or neighbors), which we denote by v i ∼ v j . The set of all nodes adjacent to a node v ∈ V (G) constitutes the neighborhood of node v, defined by N v = {w ∈ V (G) : {v, w} ∈ E (G)}, and the number of those neighbors is called the degree of node v, denoted by deg v or d v . We define a walk of length k from v 0 to v k to be an ordered sequence of nodes (v 0 , v 1 , ..., v k ) such that v i ∼ v i+1 for i = 0, 1, ..., k -1. If v 0 = v k , then the walk is closed. A closed walk with no repeated nodes (with the exception of the first and last nodes) is called a cycle. Triangles and quadrangles are cycles of length three and four, respectively. We say that a graph G is connected if there exists a walk between every pair of nodes. Let d (v, w) denote the distance between two nodes v and w, i.e., the minimum length of a walk from v to w. We define the diameter of a graph, denoted by diam (G), as the maximum distance between any pair of nodes in G. We say that v and w are k-th order neighbors if d (v, w) = k, and define the k-th order neighborhood of a node v as the set of nodes within a distance k from v, i.e., k and edge-set E v k defined as the subset of edges of E (G) that connect two nodes in N v k . Graphs can be algebraically represented via the adjacency matrix. The adjacency matrix of an undirected graph G, denoted by A G = [a ij ], is an n×n symmetric matrix defined entry-wise as a ij = 1 if nodes v i and v j are adjacent, and a ij = 0 otherwise1 . In this section, we derive an explicit relationship between the spectral moments of the adjacency matrix and the presence of certain subgraphs in G. We say that a graph H is embedded in G if H is isomorphic2 to a subgraph in G. The embedding frequency of H in G, denoted by F (H, G), is the number of different subgraphs of G to which H is isomorphic. The term network motif is used to designate those subgraphs of G that occur with embedding frequencies far higher than in random networks with the same degree sequence [10]. Theoretical and experimental evidence shows that some of these motifs carry significant information about the network's function and organization [11]. In this section, we derive an explicit expression for the spectral moments as a linear combination of the embedding frequencies of certain subgraphs. Our results provide a direct relationship between the presence of network motifs and global properties of the network, in particular, spectral moments. In the coming subsections, we first provide a theoretical foundation for our analysis. Second, we derive explicit expressions for the spectral moments in terms of network metrics, such as the degree sequence or the number of triangles in the graph. In the next Section, we propose a decentralized algorithm to compute the spectral moments of a network based on decentralized subgraph counting. Algebraic graph theory provide us with the tools to relate topological properties of a graph with its spectral properties. In particular, we are interested in studying the spectral moments of the adjacency matrix of a given graph. We denote by {λ i } n i=1 the set of eigenvalues of A G and define the k-th spectral moment of the adjacency matrix as From algebraic graph theory, we have the following result relating the k-th spectral moment of a graph G with the number of closed walks of length k in G [14]: Lemma 1: Let G be a simple graph. The k-th spectral moment of the adjacency matrix of G can be written as G is the set of all possible closed walks of length k in G3 . Corollary 2: Let G be a simple graph. Denote by E G and ∆ G the number of edges and triangles in G, respectively. Then, In the following, we develop on the above Lemma to derive an expression for m k (A G ) , for any k, as a linear combination of the embedding frequencies of certain subgraphs. Although the first 3 moments present very simple expressions in Corollary 2, the larger the k, the more involved those expressions become. In what follows, we describe a systematic procedure to derive these expressions efficiently. First, we need to introduce some nomenclature. Given a walk of length k in G, w = (v 0 , v 1 , ..., v k ), we denote its node-set as V (w) = {v 0 , ..., v k } and its edge-set as Hence, we define the underlying simple graph of a walk w as H (w) = (V (w) , E (w)). We say that a simple subgraph h ⊆ G spans the walk w if h is isomorphic to H (w). Notice how different walks can share the same underlying simple graph. We also denote by I (h) the unlabeled simple graph isomorphic to a given graph h. Applying the function I to the underlying graph of a given walk w, we obtain an unlabeled graph g = I (H (w)). Notice how different walks, not even sharing the same edge-set, can share the same unlabeled underlying simple graph. For example, any closed triangular walk . Furthermore, we define the set of unlabeled graph associated with walks of length k as . In what follows, we derive an expression for the k-th spectral moment as a linear combination of the embedding frequencies of the (unlabeled) graphs in I k . Note that the mapping H : . This observation shall be important in the next Section, where we design a decentralized algorithm to compute spectral moments. Based on the above, we derive the following result regarding the spectral moments of G: Theorem 3: Given a simple graph G, its k-th spectral moment can be written as where ω is the number of all possible closed walks of length k in g with underlying unlabeled graphs isomorphic to g. Proof: Consider the composition function as follows. Given an unlabeled graph g ∈ I k , we define a partition subset Note that, for a particular (labeled) graph h ∈ H k , the number of closed walks of length k such that their underlying simple graph is h, i.e., |R h |, is a quantity that depends exclusively on the topology of the unlabeled version of h, i.e., g = I (h). We define this quantity as ω k (g) = |R h |. Hence, we can rewrite the above as where we have used the fact that in the last equality. Therefore, substituting the above in (2) we obtain the expression for the spectral moments in the statement of the theorem. The expression in the right-hand side of ( 4) is a linear combination of the embedding frequencies of the set of subgraphs g ∈ I k . The coefficients ω k (g) in this linear combination are defined as the number of closed walks of length k with unlabeled underlying graph g. In what follows, we describe an algorithm to determine the set I k and compute ω (k) g . In order to compute the first K spectral moments, we have to study the sets of closed walks Ψ (k) G , for k ≤ K. For each particular value of k, the set I k are unlabeled connected graphs with at most k edges. Also, the maximum number of nodes visited by walks in Ψ (k) G are equal to k. We denote by G k the set of all (unlabeled) nonisomorphic connected graphs with at most k nodes and at most k edges4 ; hence, I k ⊆ G k . For example, we illustrate all the graphs in G 4 (excepting the isolated-node graph) in Fig. 1. In order to compute the k-th moment via (4), we need to compute ω (k) g for all g ∈ I k , where ω (k) g can be interpreted as the the number of closed walks of length k in g that visit all the nodes and edges of g (hence, its underlying simple graph is g). The computational complexity of computing ω (k) g is the same as the one of counting the number of Eulerian walks in an undirected (multi)graph. This counting problem is known to be #P -complete [15]. Hence, there is not a closed-form expression for ω (k) g , and it has to be computed via a bruteforce combinatorial exploration over all the possible closed walks of length k visiting all the nodes and edges of the subgraph g. This exploration can be performed in reasonable time for subgraphs g of small and medium size -which are the ones we are interested in. Although computationally expensive, this computation is done once and for all for a particular g. In other words, once ω (k) g is computed, the coefficient for the spectral moment in (4) are known for any given G. For convenience, we provide the coefficients ω (k) g for all the graphs in G 4 in Fig. 1. Using this table as an example, we also observe that those graphs t ∈ G k such that t ∈ I k , have ω being zero indicates that there is no walk in t satisfying the conditions to be part of the set I k ). For convenience, we also provide a list of the graphs in ∪ k=5,6,7 I k and the associated coefficients ω (k) g in Fig. 2 (where we have left blank those cells where ω (k) g = 0). In the following paragraphs, we illustrate the usage of Theorem 3 to derive explicit expressions for the first 5 spectral moments of a given graph G. From the subgraphs and coefficients in Fig. 1, we can compute the 4 th spectral moment via (4) as follows where E G is the number of edges, Λ G is the number of wedge-graphs 5 in G, and Φ G is the number of 4-cycles in G (see subgraphs and coefficients in Fig. 1). Thus, in order to compute the 4 th spectral moment, we must be able to count the number of wedge-graphs and 4-cycles in G. Although the number of edges and cycles in a graph are common network metrics, the number of wedges is not. It is then convenient to rewrite the number of wedges in terms of more familiar network metrics. In fact, we can rewrite the number of wedges in terms of the degree sequence of the graph as follows: where W r = n v=1 (deg v) r is the r-th power-sums of the degree sequence {deg v} n v=1 . Since E G = 1 2 n v=1 deg v in a simple graph, we can write 5 in terms of power-sums and 5 A wedge graph is isomorphic to a chain graph of length 2. We illustrate this result in the following example. Example 4: Consider the n-ring graph R n (without selfloops). We know that the eigenvalues of the adjacency matrix of the ring graph A Rn are 2 cos i 2π n n-1 i=0 . Hence, the 4-th moment is given by the summation m 4 (A Rn ) = , which (after some computations) can be found to be equal to 6 for n ∈ {2, 4}. We can apply (5) to easily reach the same result without performing an eigenvalue decomposition, as follows. In the ring graph, we have Rn = 0 for n = 4, and Φ (4) R4 = 1. Hence, we obtain the same value for m 4 (A Rn ) directly from (5). From the subgraphs in Fig. 2, and the row of coefficients ω (5) g , we have the following expression for the 5 th spectral moment via (4): where ∆ G and Φ (5) G is the number of triangles and 5-cycles in G. Also, we represent by Υ G the number of subgraphs isomorphic to the sixth subgraph in the top row of Fig. 2, counting from the left. Analyzing the structure of this subgraph, we have that is the number of triangles in G touching node i. Hence, defining the clustering-degree correlation coefficient as G , we can write Υ G = C ∆d -6∆ G , and the 5 th spectral moment as: Notice that, apart from triangles and 5-cycles, the clusteringdegree correlation influences the 5-th moment; hence, nontrivial variations of the local clustering with the degree, as reported in [16], are relevant in the computation of the 5-th spectral moments. Remark 5: The main advantage of (4) in Theorem 3 may not be apparent in graphs with simple, regular structure. For these graphs, an explicit eigenvalue decomposition is available and there may be no need to look for alternative ways to compute spectral moments. On the other hand, in the case of large-scale complex networks, the structure of the network is usually very intricate, in many cases not even known exactly, and an explicit eigenvalue decomposition can be very challenging, if not impossible. It is in these cases where the alternative approach proposed in this paper is most useful. As we show in the next section, the spectral moments can be efficiently computed via a distributed approach from aggregation of local samples of the graph topology, without knowing the complete structure of the network. According to (4), the k-th spectral moment is equal to a linear combination of the embedding frequencies F (g, G) for g ∈ I k . On the other hand, computing the embedding frequencies in large-scale networks can be challenging in a centralized framework, since the computational cost of counting subgraphs rapidly grows with the network size. In this section, we introduce an efficient decentralized approach to compute the embedding frequencies of subgraphs from local samples of the network topology. During this section, we assume that there is an agent in each node v ∈ V (G) that is able to access its rth neighborhood, G v r . A naive approach to compute the embedding frequency of a particular subgraph g would be to compute the embedding frequency of g in each neighborhood, F (g, G v r ), and sum them up. This approach obviously does not work because this particular subgraph g ⊆ G can be in the intersection of multiple neighborhoods; hence, that subgraph would be counted multiple times. In what follows, we propose a decentralized counting procedure that allow us to know how many times a particular subgraph is counted. First, we need to introduce several definitions: Definition 6: We say that a graphical property P G is locally measurable within a radius r around a node v if P G is a function of N v r , i.e., P G = f (N v r ). Definition 7: We say that a subgraph h ⊆ G is locally countable within a radius r around a node v if h ⊆ G v r . For example, both edges and triangles touching a node v are locally countable within a radius 1. Also, the number of quadrangles touching v is locally countable within a radius 2. Furthermore, a wedge is locally countable within a radius 1 only by the node at the center of the wedge, but it is not countable by the nodes at the extremes of the wedge. On the other hand, the wedge is locally countable within a radius 2 by all the nodes in the wedge. In this examples, we observe how not all the nodes being part of a subgraph h ⊆ G have to be able to locally count the subgraph. In particular, if the radius r is smaller than ⌈diam (h) /2⌉, none of the nodes in h is able to count the subgraph locally. On the contrary, if the radius r is greater or equal to the diameter of the subgraph, all the nodes in h are able to locally count it. In the middle range, ⌈diam (h) /2⌉ ≤ r ≤ diam (h), some nodes are able to locally count h and some are not. Definition 8: For a given value of r, the set of detector nodes of a given subgraph h ⊆ G is defined as Note that, although there can be other nodes u ∈ V (h) able to locally count h, we do not include them in the set of detector nodes D h . Also note that, given an unlabeled graph g, the size of D (r) hi for all the subgraphs h i ⊆ G isomorphic to g depends exclusively on the structure of g. In other words, we have that D for all i = 1, ..., F (g, G). For a given g, the value of D (r) g can be algorithmically computed as follows: 1) For convenience, we include the values of D (r) g for g ∈ ∪ k=4,5,6 I k for radius r = 1, 2, and 3 in Fig. 4. After these preliminary results, we now describe a novel algorithm to distributedly compute F (g, G). Let us consider a particular node v ∈ V (G). We define the local embedding frequency of subgraph g within a neighborhood of radius r around node v, denoted by H (g, N v r ), as the number of different subgraphs l i ⊆ N v r isomorphic to g such that v ∈ V (l i ). Our distributed algorithm is based on the following result: Theorem 9: Let G be a simple graph. The spectral moments of the adjacency matrix of G can be written as Proof: Note that the local embedding frequency only count subgraphs l i touching node v; hence, H (g, N v r ) ≤ F (g, N v r ). Furthermore, we have that since every graph h ⊆ G isomorphic to g is counted D (r) g times in the above summation. Therefore, substituting (10) in (4), we obtain the statement of the Theorem. Based on Theorem 9 it is straightforward to compute the k-th spectral moment distributedly as follows. First, define the following local variables k (v) is locally measurable within a radius r around each node v, namely, each agent is able to distributedly compute µ r k (v) for all v ∈ V (G). Thus, from (9), we have that the k-th spectral moment can be computed via a simple distributed averaging of µ r k (v). Remark 10: The maximum order of the spectral moment that can be computed via this distributed approach depends on the radius r of the neighborhood that each agent can reach. In particular, in order to compute the k-th moment, we should be able to detect all the graphs g ∈ I k . It is easy to prove that the k-ring with diameter ⌊k/2⌋ is the graph in I k with the maximum diameter. As we mentioned before, in order for a particular subgraph h to be locally countable, the radius r must be greater or equal to ⌈diam (h) /2⌉. Hence, for a particular r, we can distributedly compute moments up to an order k max = 2r + 1. In this paper, we have derived explicit relationships between spectral properties of a network and the presence of certain subgraphs. In particular, we are able to express the spectral moments as a linear combination of the embedding frequencies of these subgraphs. Since the spectral properties are known to have a direct influence on the dynamics of the network, our result builds a bridge between local network measurements (i.e., the presence of small subgraphs) and global dynamical behavior (via the spectral moments). Furthermore, based on our result, we have also developed a novel decentralized algorithm to efficiently aggregate a set of local network measurements into global spectral moments. Our approach is based on a an efficient decentralized approach to compute the embedding frequencies of subgraphs from local samples of the network topology. Future work involves to extend our methodology to directed graphs and graphs with self-loops, such as those appearing in transcription networks. Also, we are interested in developing techniques to extract explicit information regarding the dynamical behavior of a network from a sequence of spectral moments. Furthermore, it would be interesting to find a fully decentralized algorithm to iteratively modify the structure of a network of agents in order to control its dynamical behavior. We find particularly interesting the case in which individual agents have knowledge of their local network structure only (i.e., myopic information), while they try to collectively aggregate these local pieces of information to find the most beneficial modification of the network structure. For simple graphs with no self-loops, a ii = 0 for all i. An isomorphism of graphs G and H is a bijection between the vertex sets V(G) and V(H), f : V(G) → V(H), such that any two vertices u and v of G are adjacenct in G if and only if f (u) and f (v) are adjacent in H. We denote by |Z| the cardinality of a set Z. This set can be easily determined using the command ListGraphs included in the package Combinatorica included in Mathematica.

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment