Tree decompositions and social graphs

Recent work has established that large informatics graphs such as social and information networks have non-trivial tree-like structure when viewed at moderate size scales. Here, we present results from the first detailed empirical evaluation of the u…

Authors: Aaron B. Adcock, Blair D. Sullivan, Michael W. Mahoney

Tree decompositions and social graphs
T ree decomp ositions and so cial graphs Aaron B. Adco c k ∗ Blair D. Sulliv an † Mic hael W. Mahoney ‡ Abstract Recen t w ork has established that large informatics graphs suc h as so cial and information net works ha v e non-trivial tree-lik e structure when view ed at moderate size scales. Here, we presen t results from the first detailed empirical ev aluation of the use of tree decomp osition (TD) heuristics for structure iden tification and extraction in social graphs. Although TDs ha v e historically been used in structural graph theory and scien tific computing, w e show that—even with existing TD h euristics developed for those v ery differen t areas—TD methods can identify in teresting structure in a wide range of realistic informatics graphs. Our main contributions are the follo wing: we sho w that TD metho ds can iden tify structures that correlate strongly with the core-p eriphery structure of realistic net works, even when using simple greedy heuristics; we sho w that the p eripheral bags of these TDs correlate well with low-conductance comm unities (when they exist) found using local sp ectral computations; and w e s ho w that several t yp es of large-scale “ground-truth” communities, defined by demographic metadata on the no des of the net work, are w ell-lo calized in the large-scale and/or peripheral structures of the TDs. Our other main con tributions are the following: w e provide detailed empirical results for TD heuristics on toy and synthetic net works to establish a baseline to understand b etter the behavior of the heuristics on more complex real-world netw orks; and we prov e a theorem pro viding formal justification for the intuition that the only t wo imp edimen ts to low-distortion h yp erb olic embedding are high tree-width and long geo desic cycles. Our results suggest future directions for improv ed TD heuristics that are more appropriate for realistic so cial graphs. 1 In tro duction Understanding the properties of realistic informatics graphs such as large so cial and information net works and dev eloping algorithmic and statistical tools to analyze such graphs is of con tin uing in terest, and recent work has fo cused on identifying and exploiting what may b e termed tree-like structure in these real-world graphs. Since an undirected graph is a tree if any t w o vertices are connected b y exactly one path, or equiv alen tly if the graph is connected but has no cycles, real- w orld graphs are clearly not trees in an y na ¨ ıv e sense of the w ord. F or example, realistic so cial graphs hav e non-zero clustering co efficien t, indicating an abundance of cycles of length three. There are, how ev er, more sophisticated notions that can b e used to characterize the manner in whic h a graph ma y be view ed as tree-like. These are of interest since, e.g., graphs that are trees ha ve man y nice algorithmic and statistical prop erties, and the hop e is that graphs that are tree- lik e inherit some of these nice properties. In particular, δ -hyperb olicit y is a notion from geometric group theory that quantifies a w ay in whic h a graph is tree-like in terms of its distance or metric prop erties. Alternatively , tree decomp ositions (TDs) are to ols from structural graph theory that quan tify a wa y in which a graph is tree-like in terms of its cut or partitioning prop erties. ∗ Departmen t of Electrical Engineering, Stanford Universit y , Stanford, CA 94305. Email: aadco ck@stanford.edu † Departmen t of Computer Science, North Carolina State Univ ersity , Raleigh, NC 27695. Email: blair sulliv an@ncsu.edu ‡ In ternational Computer Science Institute and Department of Statistics, Universit y of California at Berkeley , Berk eley , CA 94720. Email: mmahoney@stat.berkeley .edu 1 Although TDs and δ -h yp erb olicit y capture v ery differen t w a ys in whic h a general graph can be tree-lik e, recen t empirical w ork (describ ed in more detail below) has sho wn interesting connections b et w een them. In particular, for realistic so cial and information netw orks, these t wo notions of tree-lik eness capture v ery similar structural prop erties, at least when the graphs are view ed at large size scales; and this structure is closely related to what ma y b e termed the nested core- p eriphery or k -core structure of these net works. Recent w ork has also sho wn that computing δ -h yp erb olicit y exactly is extremely exp ensive, that hyperb olicit y is quite brittle and difficult to w ork with for realistic so cial graphs, and that common metho ds to approximate δ pro vide only a very rough guide to its extremal v alue and asso ciated graph prop erties. Motiv ated by this, as w ell as b y the large b ody of work in linear algebra and scientific computing on practical metho ds for computing TDs, in this pap er we present results from the first detailed empirical ev aluation of the use of TD heuristics for structure identification and extraction in so cial graphs. A TD (defined more precisely b elo w) is a sp ecialized mapping of an arbitrary input graph G in to a tree H , where the no des of H (called bags) consist of ov erlapping subsets of vertices of G . Quan tities suc h as the treewidth—the size of the bag in H that con tains the largest n umber of v ertices from G —can b e used to characterize how tree-like is G . A single bag that contains every v ertex from G is a legitimate but trivial TD (since the width is as large as p ossible for a graph of the given size). Thus, one usually fo cuses on finding “b etter” TDs, where b etter typically means minimizing the width. The problem of finding the treewidth of G and of finding an optimal TD of G are b oth NP-hard, and thus most effort has fo cused on developing heuristics, e.g., b y constructing the TD iteratively b y choosing greedily vertices of G that minimize quan tities such as the degree or fill. Since w e are in terested in applying TDs on realistic graphs, it is these heuristics (to b e describ ed in more detail b elo w) that we will use in this pap er. Our goals are to describ e the b eha vior of TD heuristics on real-world and syn thetic so cial graphs and to use these TD to ols to identify and extract useful structure from these graphs. In particular, (in Section 6) we show the follo wing. W e first show (in Section 6.1) that TD metho ds can identify large-scale structures in realistic netw orks that correlate strongly with the recen tly-describ ed core-p eriphery structure of these netw orks, even when using simple greedy TD heuristics. W e do this b y relating the global “core-p eriphery” structure of these netw orks, as captured using the k -core decomp osition, with what we call the “cen tral-p erimeter” structure, whic h is a measure of the cen trality or eccentricit y of each bag in the TD. W e also describ e ho w small-scale structures such as the in ternal bag structure of TDs of these netw orks reflects— dep ending on the density and other prop erties of these netw orks—their clustering co efficien t and other related clustering prop erties of the original netw orks. W e next show (in Section 6.2) that the peripheral bags of these TDs correlate w ell with lo w-conductance comm unities/clusters (when they exist) found using lo cal sp ectral computations, in the sense that these low-conductance (i.e., go od-conductance) communities/clusters o ccup y a small num b er of p eripheral bags in the TDs. In particular, this shows that in graphs for whic h the so-called Net w ork Comm unit y Profile (NCP) Plot is up ward-sloping (as, e.g., describ ed in [1], and indicating the presence of go o d small and absence of go o d large clusters), the small-scale “dips” in the NCP are lo calized in clusters that are on a p eripheral branch in the TD. W e finally consider (in Section 6.3) how several types of large-scale “ground-truth” communities/clusters, as defined by demographic metadata on the no des of the netw ork (and that are not go od-conductance clusters), are lo calized in the TDs. In particular, we lo ok at tw o so cial netw ork graphs consisting of friendship edges b et ween students at a universit y , we use metadata asso ciated with graduation year and residence hall information, and we show that clusters defined by these metadata are well-localized in the large-scale central and/or small-scale p eripheral structures of the TDs. A significant challenge in applying existing TD heuristics—which hav e b een dev elop ed for v ery different applications in scientific computing and numerical linear algebra—is that it can b e 2 difficult to determine whether one is observing a “real” prop ert y of the netw orks or an artifact of the particular TD heuristic that was used to examine the netw ork. Th us, to establish a baseline and to determine their b eha vior in idealized settings, w e hav e first applied sev eral existing TD heuristics to a range of to y and syn thetic data. (See Section 4 and Section 5, resp ectiv ely .) The toy data consist of a binary tree, a lattice, a cycle, a clique, and a dense random graph, i.e., graphs for whic h optimal TDs are known. The synthetic data consists of Erd˝ os-R ´ en yi and p o w er law random graph mo dels, whic h help us understand the effect of noise/randomness on the TDs. (Other random graph mo dels exhibit similar prop erties, when their parameters are set to corresp ondingly sparse v alues.) F or these graphs, w e place a particular emphasis on the properties of the TDs as the density parameters (i.e., the connection probability for the Erd˝ os-R´ enyi graphs and the p o w er law parameter for the p o wer law graphs) are v aried from v ery sparse to extremely sparse, and we are interested in how this relates to the large-scale core-p eriphery structure. Our detailed empirical results for TD heuristics on toy and synthetic netw orks are imp ortan t for understanding the b eha vior of these heuristics on more complex real-world netw orks; but our results on syn thetic and real-world net works also suggest future directions for the dev elopment of TD heuristics that are more appropriate for so cial graph data. Existing TD heuristics fo cus on pro ducing minim um-width TDs, whic h are of in terest in more traditional graph theory and linear algebra applications, but they are not w ell-optimized for finding structures of interest in so cial graphs. Although it is b ey ond the scop e of this paper, the developmen t of TD heuristics that are more appropriate for so cial graph applications (e.g., understanding how the bag structure of those TDs relates to the output of recently-dev elop ed lo cal spectral metho ds that find go o d small clusters in large netw orks) is an imp ortan t question raised b y our results. The remainder of this pap er is organized as follo ws. In Section 2, w e presen t definitions from graph theory , a detailed discussion of tree decomp ositions and the algorithms for their construction, and a brief discussion of other prior related work. Section 3 details the datasets w e mak e use of throughout the pap er. The subsequent four sections pro vide our main empirical results. In particular, in Section 4, we consider several TD heuristics applied to to y graphs; and in Section 5, we consider TD heuristics applied to syn thetic random graphs. Then, in Section 6, w e describ e the results of applying TDs to a carefully-c hosen suite of real-w orld social graphs. In Section 7, w e prov e a theoretical result connecting treewidth and treelength with the (v ery differen t) notion of δ -h yp erb olicit y , under an assumption on the length of the longest geo desic cycle in the graph. Finally , in Section 8, w e provide a brief discussion of results and conclusion. 2 Bac kground and Related W ork In this section, we will review relev an t graph theory , TD ideas, and computational metho ds, as w ell as relev an t related work. 2.1 Preliminaries on Graph Theory Let G = ( V , E ) b e a gr aph with vertex set V and edge set E ⊆ V × V . W e often refer to graphs as networks and v ertices as no des , and we will mo del so cial and information netw orks by undirected graphs. W e note that TDs are themselv es graphs (constructed from other input graphs). The de gr e e of a vertex v , denoted d ( v ), is defined as the num b er of vertices that are adjacent to v (or the sum of the weigh ts of adjacent edges, if the graph is weigh ted). The av erage degree is denoted ¯ d . A graph is called c onne cte d if there exists a path b et ween an y tw o v ertices. A graph is called a tr e e if it is connected and has no cycles. A vertex in a tree is called a le af if it has degree 1. A graph H = ( S, F ) is a subgraph of G if S ⊆ V , F ⊆ E . An induc e d sub gr aph of G on a set of 3 v ertices S ⊆ V is the graph G [ S ] := ( S, S × S ∩ E ). Unless otherwise sp ecified, our analyses will alw ays consider the giant c omp onent , i.e., the largest connected subgraph of G . The diameter of a graph is the maxim um distance b et ween an y tw o vertices, and the e c c en- tricity of a vertex is the maxim um distance b et ween that vertex and all other v ertices in the graph. Note that the maximum eccen tricity of a graph is equal to the diameter. The clustering c o efficient of a v ertex is the ratio of the n umber of edges present among its neigh b ors to the maxim um p ossible num b er of such edges; when we refer to the clustering co efficien t of a netw ork, w e use the a verage of the clustering co efficien t of all its v ertices. A cut is a partitioning of a net work’s vertex set into t wo pieces. The volume of a cut is the sum ov er v ertices in the smaller piece of the num b er of incident edges, and the surfac e ar e a of a cut is the num b er of edges with one end-p oin t in eac h piece. In this case, the c onductanc e of a cut—one of the most imp ortan t measures for assessing the qualit y of a cut—is the surface area divided by the v olume (that is, w e will b e following the conv en tions used in previous work [1, 2]). Finally , we will refer to the “core-p eriphery” structure of a net work. F ollowing prior work [1, 3, 2], w e use the k -c or e de c omp osition to identify these core no des. The k -core of a net work G is the maximal induced subgraph H ⊆ G such that every no de in H has degree at least k . The k -core has the adv an tage of b eing easily computable in O ( V + E ) time [4, 5, 6]. 2.2 Preliminaries on T ree Decomp ositions TDs are com binatorial ob jects that describ e specialized mappings of cuts in a net work to no des of a tree. Although originally introduced in the con text of structural graph theory (the pro of of the Graph Minors Theorem [7]), TDs hav e gained attention in the broader comm unity due to their use in efficient algorithms for certain NP-hard problems. In particular, there are p olynomial-time algorithms for solving man y suc h problems on all graphs that hav e TDs whose width (defined b elo w) is b ounded from ab o ve b y a constant [8, 9]. These algorithms hav e b een applied to problems in constraint satisfaction, computational biology , linear algebra, probabilistic netw orks, and machine learning [10, 11, 12, 13, 14, 15, 16, 17, 18]. Definition 1. A tree decomp osition (TD) of a gr aph G = ( V , E ) is a p air ( X = { X i : i ∈ I } , T = ( I , F )) , with e ach X i ⊆ V , and T a tr e e with the fol lowing pr op erties: 1. ∪ i ∈ I X i = V , 2. F or al l ( v , w ) ∈ E , ∃ i ∈ I with v , w ∈ X i , and 3. F or al l v ∈ V , { i ∈ I : v ∈ X i } forms a c onne cte d subtr e e of T. The X i ar e c al le d the bags of the tr e e de c omp osition. The third condition of the definition is a contin uit y requirement that allows the TD to b e used in dynamic programming algorithms for many NP-hard problems. 1 It is equiv alent to requiring that for all i, j, k ∈ I , if j is on the path from i to k in T , then X i ∩ X k ⊆ X j . 2 The qualit y of a tree decomp osition is often measured in terms of its largest bag size. 1 Alternativ ely , the bags and edges of the TD form separators (cuts) in the graph. The set of vertices contained in any bag, or intersection of t wo adjacent bags, form a separator in G . This structural prop ert y is imp ortan t as it allo ws TDs to b e thought of as a metho d of organizing cuts in a netw ork. This is also related to ho w the treewidth of a netw ork is used to measure how tree-like a netw ork is. Intuitiv ely , a tree has a treewidth of 1 b ecause the graph can b e separated by the remov al of a single edge (or vertex) in the netw ork, whereas a cycle requires tw o edges to b e cut and thus has a treewidth of 2. TDs with large widths require larger num b ers of vertices to separate a net work into t wo disconnected pieces. 2 A related asp ect of the definition of a TD is the ov erlapping nature of the bags of a TD. V ertices in the graph will appear in many bags in the TD. This is particularly true of high degree or high k -core no des [3]. 4 Definition 2. L et T = ( { X i } , T = ( I , F )) b e a tr e e de c omp osition of a gr aph G . The width of T is define d to b e max i ∈ I | X i | − 1 , and the treewidth of G , denote d tw ( G ) , is the minimum width over al l valid tr e e de c omp ositions of G . A tr e e de c omp osition whose width is e qual to the tr e ewidth is often r eferr e d to as optimal . By this definition, trees ha ve the minimum p ossible treewidth of 1 (their bags contain the edges of the original tree and th us hav e size 2); but, in contrast to δ -h yp erb olicit y (see Section 2.4), an n -v ertex clique is the le ast tree-like graph (attaining the maxim um treewidth of n − 1). In fact, the only v alid TDs of a clique hav e all vertices in a single bag. Since TDs remain v alid under taking subgraphs (once y ou delete an y vertices no longer presen t), if W is an y complete subgraph of G , then every TD of G has some bag that contains all the vertices of W [19]. Tw o other canonical examples (to which w e will return in detail b elo w) are the cycle and the grid, which ha ve v astly differing treewidths. All cycles (regardless of the n umber of vertices) ha ve treewidth 2 (see Figure 5 b elo w). The n × n planar grid, on the other hand, has treewidth n , and th us it is not tree-like by this measure. Grids are particularly noteworth y in the discussion of TDs due to a result (describ ed in more detail b elo w) showing that they are essentially the only obstruction to having b ounded treewidth. 3 Finding a TD for a giv en graph whose width is minimal (equal to the treewidth) is an NP- hard problem [21, 12]. Most metho ds (including those discussed here) for constructing TDs were designed to minimize width, as most prior work fo cused on using these structures to reduce computational cost for an algorithm/application. 4 Also, although treewidth is a graph inv arian t, TDs of a netw ork are not unique, even under the condition of ha ving minimum width. See, e.g., Figure 5 b elow, whic h shows several distinct minimum width TDs of a cycle. Finally , although it is not standard, w e will abuse the term width to apply it directly to a bag of a tree decomp osition (in which case, it takes the v alue of the cardinalit y of the set minus one), so that we can talk ab out the maximum width (whic h is the equiv alent to the usual definition of width), and me dian width of a decomp osition (which is the median of the widths of the bags). W e will use the term c enter to refer to the bag (or bags) asso ciated with the no de(s) of minim um eccen tricity in the tree underlying a TD, and the term p erimeter for bags associated with nodes of relativ ely high eccentricit y . W e do this to help provide a framew ork for discussing the connection in many so cial and information net works b et ween the core (resp. p eriphery) of the netw ork and the cen tral (resp. perimeter) bags of its TD computed with certain heuristics. Note that by definition, a tree will hav e at most tw o bags at its center. 2.3 Constructing T ree Decomp ositions Here, w e give a brief o verview of existing algorithms for constructing TDs; more comprehensive surv eys can b e found in [12, 22]. The algorithms for finding low-width TDs are generally divided in to tw o classes: “theoretical” and “computational.” The former category includes, for example, the linear algorithm of Bo dlaender [23], whic h c hecks if a TD of width at most k exists (for a fixed constant k ), and the approximation algorithms of Amir [24]. These are generally considered 3 In particular, in the so-called Grid Minor Theorem, Rob ertson and Seymour sho wed that every graph of treewidth at least k contains a f ( k ) × f ( k ) grid as a graph minor, for some in teger-v alued function f . The original estimate of the function f gav e an exp onen tial relationship betw een the treewidth and the grid size, and although sev eral results greatly improv ed the relationship, the question of whether or not it held for any polynomial function f remained open for ov er 25 y ears. Recen tly , Chekuri and Chuzho y prov ed that there is a univ ersal constan t δ > 0 so that all graphs of treewidth at least k hav e a grid-minor of size Ω( k δ ) × Ω( k δ ) [20], resolving this conjecture. 4 In the con text of understanding the intermediate-scale structure of real net works and improving inference (e.g., link prediction, ov erlapping communit y detection, etc.), there are likely more appropriate ob jective functions, although their general iden tification and developmen t is left as future w ork. 5 (practically) intractable due to v ery large hidden constan ts and complexit y of implementation— e.g., Bo dlaender’s algorithm w as shown b y R¨ ohrig [25] to hav e to o high a computational cost ev en when k = 4. The appro ximation algorithms of Amir ha ve b een tested on graphs with up to sev eral hundred v ertices, but they require hours of running time even at this size scale. There has also b een work on exact algorithms, the most computational of which is p erhaps the Quic kT ree algorithm of Shoikhet and Geiger, whic h w as tested on graphs with up to ab out 100 v ertices and treewidth 11 [26]. Thus, in practice, most computational work requires the use of heuristic approac hes (i.e., those whic h offer no w orst-case guarantee on their maxim um deviation from optimalit y). Since w e are interested in applying TDs to real net work data, we will fo cus on these “practical” algorithms in the remainder of this pap er. W e used INDDGO [27, 28], an op en source soft ware suite for computing TDs and numerous graph and TD parameters. 2.3.1 Chordal Graph Decomp osition A common metho d for constructing TDs is based on algorithms for decomp osing chor dal gr aphs . Definition 3. A gr aph G is chordal if it has no induc e d cycles of length gr e ater than thr e e (e quivalently, every cycle in G with length at le ast four, has a chor d). Chordal graphs are c haracterized by the existence of an ordering π = ( v 1 , . . . , v n ) of their v ertices so that for eac h v i , the set of its neighbors v j with j > i form a clique. This is a p erfe ct elimination or dering , and it gives a straigh tforward construction for a TD (also called the clique-tree) of a c hordal graph, with bags consisting of the sets of higher-indexed neighbors of each v ertex. F or a general graph G , one common approac h for finding TDs is to first find a chordal graph H containing G , then use the asso ciated TD (since, as men tioned earlier, TDs remain v alid for all subgraphs on the same v ertex set). The t ypical approach is via triangulation , a pro cess that uses a p erm utation of the v ertex set (called the elimination or dering ) to guide the addition of edges, whic h are referred to as fil l edges. Chordal completions are not unique. F or example, the complete graph formed on the vertices of G is chordal and contains G (although, it is a trivial or the “worst” triangulation, in the sense that it has the most fill edges and largest p ossible clique subgraph among all triangulations). W e will use the notation G + π to denote the triangulation of G using ordering π . An outline of the pro cess is giv en in Algorithm 1. The pro cess for finding a TD T π using an elimination order π and Gavril’s algorithm ([29]) for decomp osing chordal graphs is giv en in Algorithm 2. W e may refer to the width of an ordering, b y which w e mean the treewidth of the chordal graph G + π . The literature includes sev eral slight v arian ts on Gavril’s construction routine (such as Algorithm 2 in [22]), but the ov erall pro cess and width of the TD pro duced is the same for each. P erhaps surprisingly , there alw a ys exists some elimination ordering whic h produces an optimal TD (one of minimum width), and this ma y co-o ccur with high fill. The following theorem (see [22]) presen ts the connections b et ween treewidth, triangulations, and elimination orderings. Theorem 1. [22] L et G = ( V , E ) b e a gr aph, and let k ≤ n b e a non-ne gative inte ger. Then the fol lowing ar e e quivalent. 1. G has tr e ewidth at most k . 2. G has a triangulation H s.t. any c omplete sub gr aph of H (clique) has at most k + 1 vertic es. 3. Ther e is an elimination or dering π , such that G + π do es not c ontain any clique on k + 2 vertic es as a sub gr aph. 4. Ther e is an elimination or dering π , such that no vertex v ∈ V has mor e than k neighb ors in G + π which o c cur later in π . 6 Algorithm 1 T riangulate a graph G into a c hordal graph G + π Input: Graph G = ( V , E ), and π = ( v 1 , . . . , v n ), a p ermutation of V Output: Chordal graph G + π ⊇ G , for which π is a p erfect elimination ordering 1: Initialize G + π = ( V 0 , E 0 ) with V 0 = V and E 0 = E 2: for i = 1 to n do 3: Let N i = { v j | j > i and ( v i , v j ) ∈ E } 4: for { x, y } ⊆ N i do 5: if x 6 = y and ( x, y ) 6∈ N i then 6: E 0 = E 0 ∪ { ( x, y ) } 7: end if 8: end for 9: end for 10: return G + π Algorithm 2 Construct a TD T π of a graph G using elimination ordering π and Ga vril’s algorithm Input: Graph G = ( V , E ), π a p erm utation of V Output: TD T π = ( X , ( I , F )) with ( I , F ) a tree, and bags X = { X i } , X i ⊆ V 1: Initialize T = ( X , ( I , F )) with X = I = F = ∅ , n = | V | 2: Create an empty n -long array t [] 3: Use Algorithm 1 to create a triangulation G + π using π . 4: Let k = 1, I = { 1 } , X 1 = { π n } , t [ π n ] = 1 5: for i = n − 1 to 1 do 6: Find B i = { neigh b ors of π i in G + π } ∩ { π i +1 , . . . , π n } 7: Find m = j suc h that j ≤ k for all π k ∈ B i 8: if B i = X t [ m ] then 9: X t [ m ] = X t [ m ] ∪ { π i } ; t [ π i ] = t [ m ] 10: else 11: k = k + 1 12: I = I ∪ { k } ; X k = B i ∪ { π i } 13: F = F ∪ { ( k , t [ m ]) } ; t [ π i ] = k 14: end if 15: end for 16: return T π = ( X, ( I , F )) Th us, if one can pro duce a “go o d” elimination ordering (i.e., one with a small maxim um clique), it is easy to construct a lo w-width TD, and such an ordering alwa ys exists if the treewidth is b ounded. The intuition b ehind fill-reducing orderings to minimize width follows from the idea that in order to pro duce a large clique that wasn’t already in the netw ork, one “should” hav e to add many fill edges. 2.3.2 Ordering Heuristics Here, we describ e the landscap e of heuristics for creating elimination orderings, fo cusing on those used in our empirical ev aluations. A more detailed analysis of heuristics as well as theoretical connections b et ween c hordal graphs and TDs is a v ailable [22]. The space of all p ossible elimination orderings is O ( n !) for a graph on n v ertices, making it impractical to search using brute force tec hniques. One p ossibility for exploring the space is to apply a sto c hastic lo cal search approac h 7 lik e simulated annealing, but since this is relatively slo w, it is not common in practice. The first class of sp ecialized methods are known as triangulation recognition heuristics, whic h include lexicographic breadth-first-search ( lex-m ) and maxim um cardinality search ( mcs ) [22, 30, 31, 32, 33, 34, 35]. These metho ds are guaranteed to pro vide a p erfect elimination ordering for chordal graphs, so man y b eliev ed they would pro duce lo w-fill and/or lo w-width orderings for more general graphs. In [22], the authors rep ort go o d results with resp ect to width when using these metho ds on graphs which are already c hordal or hav e regular structures, but p o or results compared to the greedy heuristics when ev en small amoun ts of randomness is added to the net work. F urther empirical ev aluation in [27] supports these claims. Additionally , these heuristics are to o computationally exp ensiv e to run on very large graphs. A large set of additional heuristics uses the idea of splitting the graph (using a small separator), recursiv ely decomp osing the resulting pieces, and then “gluing” the solutions in to a single TD [24, 36, 37, 38, 11, 39]. T o quote Bo dlaender and Koster [22], “they are significan tly more complex, significan tly slow er, and often giv e b ounds that are higher than those of simple algorithms.” W e do, ho wev er, use a related approac h that finds a set of nested graph partitions, but instead of decomp osing the resulting pieces, it places the separators in to an elimination ordering. This approac h is called neste d disse ction [40, 41], and it is quite p opular for computing fill-reducing orderings for sparse matrices in numerical linear algebra. The algorithm recursiv ely finds a small v ertex separator (bisector) in a graph, and it ensures that in the resulting elimination ordering, the vertices in the tw o comp onen ts formed by the bisection all app ear b efore the vertices in the separator. W e use the “no de nested dissection” algorithm implemented in METIS [42] (called through INDDGO), and we refer to this heuristic as metnnd . In METIS , the recursion is stopp ed when the comp onents are smaller than a certain size, and some version of minimum degree ordering is then applied to the remaining pieces. The softw are “grows” eac h bisection using a greedy no de-based strategy . Since the algorithm is searching for bisections, there is a tunable “balance” condition (determining how close to 50/50 the split needs to b e), although for all computations rep orted in this pap er, we left the parameter at its default v alue. P erhaps the most p opular class of elimination ordering routines are gr e e dy heuristics, named b ecause they make greedy decisions to pick the subsequent no de in the elimination ordering. There are innumerable v ariations, but the most common use t wo basic concepts: c ho osing a vertex to minimize fil l (how many new edges will be added to the graph if a v ertex is chosen to b e next in the ordering); or c ho osing a v ertex of minim um degree (lo w-degree v ertices ha ve small neighborho o ds, whic h also limits the p otential fill). When applied in their most rudimentary forms, these are the mindeg [43] and minfill orderings. Both of these indirectly limit the size of cliques pro duced in the final triangulation (although they were originally designed to minimize the num b er of fill edges added during the triangulation, a quan tity which is not alwa ys correlated). F or additional heuristics combining these strategies and incorp orating additional lo cal information, see [22]. Ev en though keeping up dated vertex degrees for mindeg during triangulation (greedy order- ings mak e their decisions based on a partially triangulated graph at each step) is significan tly less computationally intensiv e than computing curren t v ertex fills for minfill , there hav e b een efforts to reduce further the complexity . In particular, the approximate minimum degree or amd heuristic [44, 45]. This heuristic computes an upp er b ound on the degree of each no de in eac h pass using tec hniques based on the quotien t graph for matrix factorization, and it has been shown to b e significantly faster and of similar quality (in terms of fill and width minimization). W e use amd interc hangeably with the traditional mindeg , esp ecially on larger netw orks. 8 2.4 Additional Related W ork F or completeness, w e pro vide here a brief ov erview of the large b o dy of additional related w ork. As already men tioned, TDs pla yed an imp ortant role in the proving of the graph minor theorem [7], but they ha ve also b ecome p opular in theoretical computer science, as many NP-hard optimization problems hav e a p olynomial time algorithm for graphs with b ounded treewidth [8]. In addition, b ounding the treewidth of the underlying graph of probabilistic graphical mo dels allows for fast inference computations [46]. Additional o verviews of TDs and their uses in discrete optimization are av ailable [47, 48, 49, 50]; and one can also learn more ab out the uses of these metho ds in numerical linear algebra and sparse matrix computations [51], as well as connections with triangulation metho ds: triangulation of minimum treewidth [52], empirical w ork on treewidth computations [53], the minimum degree heuristic and connections with triangulation [54], and a surv ey of triangulation metho ds [55]. Finally , the treewidth of random graphs for v arious parameter settings has b een studied [56, 57]. A different notion of tree-likeness is provided b y δ -hyperb olicit y . 5 Early more mathematical w ork did not consider graphs and netw orks [60, 61], but more recent more applied w ork has [62, 63, 64]. Computing δ exactly is v ery exp ensiv e [58], and sampling-based metho ds to approximate it provide only a very rough guide to its v alue and prop erties [3]. F or man y references on δ - h yp erb olicit y in net work analysis, see [64] (and the more recen t pap er [65]) and references therein. There has b een w ork on trying to relate h yp erb olicit y and TD-based ideas, often going b ey ond treewidth to consider other metrics such as treelength or chordalit y or the expansion prop erties of the graph [66, 67, 68, 69, 70, 71, 72, 73, 74]. Recen t work in net work analysis and comm unity structure analysis has p oin ted to some sort of “core-periphery” structure in man y real net w orks [1, 75, 76, 3, 2]; and recen tly this has been re- lated to the k -core decomposition—see, e.g., [3] and references therein. The k -core decomp osition is of interest more generally , and additional references for k -core decomp ositions, including their use in visualization and in larger-scale applications, include [77, 78, 79, 80, 81, 82, 83]. Questions of well-connected or expander-lik e cores are of particular interest in applications having to do with diffusion pro cesses, influential spreaders, and related questions of so cial contagion [84, 85]. There are a few other pap ers that hav e used TDs to inv estigate the structural prop erties of so cial and information netw orks: e.g., to lo ok at the tree-lik eness of in ternet latency and band- width [86]; to compare hyperb olicity and treewidth on internet net works [87]; and to examine the relationship b et ween hyperb olicit y , treewidth, and the core-p eriphery structure in a muc h wider range of so cial and information netw orks [3]. In particular, [87] concludes that the hyperb olicit y is small in the net works they examined but the treewidth is relatively large, presumably due to a highly connected core; and [3] concludes that man y real so cial and information netw orks do ha ve a tree-lik e structure, with resp ect to both metric-based hyperb olicity and (in spite of the large treewidth) the cut-based TDs, that corresp onds to the core-p eriphery structure of the netw ork. Finally , very recen tly we b ecame aw are of [88] and [89]. 3 Net w ork Datasets W e ha ve examined the empirical p erformance of existing TD heuristics on a broad set of real- w orld so cial and information netw orks as well as a large corpus of syn thetic graphs. The real-w orld net works ha v e b een chosen to b e represen tativ e of a broad range of netw orks, as analyzed in prior w ork [1, 3, 2], and the syn thetic graphs ha ve been c hosen to illustrate the behavior of TD methods 5 Our prior work fo cused on the use of δ -h yp erbolicity [58, 3, 59]. It can be a useful to ol for describing and analyzing real netw orks, ev en though it is exp ensiv e to compute, but aside from our theoretical result in Section 7 relating it to treewidth and treelength, it is not our fo cus in this paper. 9 in con trolled settings. See T able 1 for a summary of the net works we hav e considered. The real- w orld graphs are connected, but w e are in terested in parameter v alues for the synthetic graphs whic h migh t cause the instances to b e disc onne cte d . In these cases, w e w ork with the giant comp onen t, and the statistics in T able 1 are for this connected subgraph. Net work n c k l k m ¯ d ¯ C D ER Random Graphs ER(1.6) 3210 1 2 2.16 0.00 38 ER(1.8) 3617 1 2 2.28 9.30 × 10 − 4 34 ER(2) 4001 1 2 2.39 9.11 × 10 − 4 30 ER(4) 4879 1 3 4.05 8.96 × 10 − 3 15 ER(8) 4998 1 5 8.04 1.59 × 10 − 3 7 ER(16) 5000 4 11 16.1 3.13 × 10 − 3 5 ER(32) 5000 7 23 32.1 6.39 × 10 − 3 4 PL Random Graphs PL(2.50) 4895 1 4 2.78 2.46 × 10 − 3 18 PL(2.75) 4650 1 2 2.43 6.99 × 10 − 4 22 PL(3.00) 4071 1 2 2.24 1.18 × 10 − 3 29 SNAP Social Graphs CA-GrQc 4158 1 43 6.46 .665 17 CA-AstroPh 17903 1 56 22.0 .669 14 as20000102 6474 1 12 3.88 .399 9 Gnutella09 8104 1 10 6.42 .0137 10 Email-Enron 33696 1 43 10.7 .708 13 FB Social Graphs FB-Cal tech 762 1 35 43.7 .426 6 FB-Ha verford 1446 1 63 82.4 .327 6 FB-Lehigh 5073 1 62 78.2 .270 6 FB-Rice 4083 1 72 90.5 .300 6 FB-St anford 11586 1 91 98.1 .252 9 Miscellaneous Graphs PowerGrid 4941 1 5 2.67 0.107 46 Polblogs 1222 1 36 27.4 0.360 8 PlanarGrid 2500 2 2 3.92 0.00 98 ro ad-TX 1379917 1 3 1.39 0.0209 1054 web-St anford 281903 1 71 8.20 2 . 89 × 10 − 3 674 T able 1: Statistics of analyzed netw orks: no des in giant comp onen t n c ; k l the low est k -core; k m the maxim um k -core; a verage degree ¯ d = 2 E / N ; a verage clustering co efficien t ¯ C ; and diameter D . Erd˝ os-R ´ en yi (ER) graphs. Although ER graphs are often criticized for their inability to mo del p ertinen t prop erties of realistic netw orks, extr emely sparse ER graphs ha ve several structural inhomogeneities that are imp ortan t for understanding tree-like structure in realistic net works [3]. 6 In particular, in the extremely sparse regime of 1 /n < p < log ( n ) /n , ER graphs are (w.h.p.) not even fully connected; ER graphs in this regime hav e an upw ard-trending NCP (net work comm unity profile) [1]; with respect to their k -core structure, a shallo w (but non-trivial) core-p eriphery structure emerges [3, 90]; and with resp ect to their metric prop erties (as measured with δ -h yp erbolicity), graphs in this regime hav e non-trivial tree-like prop erties [3]. F ollowing previous work [3], w e set the target n umber of vertices to n = 5000, and w e choose p = d n for v arious v alues of d from d = 1 . 6 to d = 32. W e denote these netw orks using ER( d ) . T able 1 6 The same is true for many other less unrealistic random graph mo dels, assuming their parameters are set to analogously sparse v alues (which they are often not). 10 clearly sho ws that, as a function of incr e asing d , i.e., increasing p , the size of the giant component increases to 5000, the num b er of edges increases dramatically , the clustering c oefficient remains close to zero, the av erage degree ¯ d increases, and the diameter decreases dramatically . P o wer La w (PL) graphs. W e also considered the Ch ung-Lu mo del [91], an ER-like random graph mo del parameterized to hav e a p o wer law degree distribution (in exp ectation) with p o wer la w (or heterogeneity) parameter γ , which we v ary b et ween 2 and 3. W e denote these net works using PL( γ ) . W e consider v alues of the degree heterogeneit y parameter γ ∈ { 2 . 50 , 2 . 75 , 3 . 00 } . T able 1 shows that, as a function of de cr e asing γ , the size of the gian t comp onen t increases, the a verage degree ¯ d increases, and the diameter decreases. Although not sho wn in T able 1, as γ decreases, PL graphs also form a rather prominent, and mo derately-deep, k -core structure [3]. These are all trends that parallel the b ehavior of ER as d increases. 7 SNAP graphs. W e selected v arious so cial/information net works that w ere used in the large- scale empirical analysis that first established the up ward-sloping NCP and associated nested core-p eriphery for a broad range of realistic so cial and information graphs [1]. These are a v ail- able at the SNAP website [92]. In particular, the net works we considered are CA-GrQc and CA-AstroPh (t wo collab oration netw orks); as20000102 (an autonomous system snapshot); Gnutella09 (a p eer-to-p eer net work from Gn utella); Email-Enron (an email net work from the Enron database); as well as the Stanford W eb netw ork web-St anford and the T exas road net work ro ad-TX . These netw orks are very sparse, e.g., few er than ca. 10 edges p er no de; and they exhibit substan tial degree heterogeneity , mo derately high clustering co efficien ts (except for Gnutella09 , web-St anford , and ro ad-TX ), and mo derately small diameters. In addition, although not presented in T able 1 (and with the exception of ro ad-TX ), these graphs ha ve a m uch stronger core-p eriphery structure, as measured b y the k -core decomp osition, than typical syn thetic netw orks [3, 2, 1]. F aceb ook Net works. W e selected several represen tative F aceb o ok graphs out of ca. 100 F aceb o ok graphs from v arious American universities collected in ca. 2005 [93]. These data sets range in size from around 700 v ertices ( FB-Cal tech ) to approximately 30 , 000 vertices ( FB- Texas84 ). In particular, w e examine FB-Cal tech , FB-Rice , FB-Ha verford , FB-Lehigh , and FB-St anford in this pap er. These net works all arise via similar generativ e pro cedures, and th us there are strong similarities b et ween them. There are a few distinctiv e netw orks, how ev er, that are worth mentioning. In particular, several univ ersities ( FB-Cal tech, FB-Rice, FB- UCSC ) hav e a particularly strong resident housing system, and it is known that this manifests itself in structural prop erties of the graphs [93]. Below, we will use the meta-information asso- ciated with this housing system to pro vide “ground-truth” clusters/communities for comparison and ev aluation. 8 One imp ortan t c haracteristic to observe from T able 1 is that these F aceb ook net works, while sparse, are m uch denser than any of the SNAP graphs we consider or that were considered previously [1]. 9 Miscellaneous Net works. W e also selected a v ariet y of real-w orld netw orks that, based on prior w ork [1, 3, 2], are kno wn to hav e v ery differen t prop erties than the SNAP so cial graphs or the F aceb o ok social graphs. In particular, we consider Polblogs , a political bloggers net work [94] (a graph constructed from p olitical blogs which are link ed); the W estern US p o w er grid Po werGrid [95]; and a tw o-dimensional 50 × 50 planar grid PlanarGrid . 7 F or sparse ER graphs, this happens since there is not enough edges for concen tration of measure to occur, i.e., for empirical quantities suc h as the empirical degrees to b e very close to their exp ected v alue. F or PL graphs, an analogous lac k of measure concentration o ccurs due to the exogenously-specified heterogeneity parameter γ . 8 See [2] for ho w this affects the NCP of these netw orks. 9 Among the differences caused by the muc h higher density of F aceb ook netw orks is that these netw orks hav e a m uch deeper k -core structure than the other real netw orks, and they tend to lack deep cuts, e.g., they lack even go od very-im balanced partitions suc h as those resp onsible for the upw ard-sloping NCP [1, 2]. 11 4 T ree decomp ositions of to y net works In this section, w e will describ e the results of using a v ariety of TD heuristics on a set of very simple “toy” netw orks, on whic h the optimal-width TDs are kno wn. The fiv e toy netw orks we consider are a binary tree ( SmallBinar y ), a small section of the tw o-dimensional planar grid ( SmallPlanar ), a cycle ( SmallCycle ), a clique ( SmallClique ), and an Erd˝ os-R´ enyi graph with an edge probability of p = 0 . 5 ( SmallER ). Eac h of these netw orks has 100 no des (except for SmallCycle , whic h has only 10 no des—the reason for this is that the principal change by ha ving a larger cycle is that the eccentricit y of the decomp ositions b ecomes m uch larger, whic h simply makes it more difficult to visualize—and SmallBinar y , which has 128 no des to main tain symmetry). In Figure 1, we provide visualizations 10 of each of the five netw orks. These very simple netw ork top ologies illustrate in a controlled wa y the b eha vior of different TD heuristics in a range of settings. F or example, while SmallBinar y is a tree, the other graphs are not; the t w o-dimensional grid is quite different from a tree, as is SmallCycle (although, from the treewidth persp ective it is fairly close to a tree), and b oth hav e high-qualit y well- balanced partitions; and b oth SmallClique and SmallER are expanders (not constan t degree expanders, but expanders in the sense that they don’t hav e any go o d partitions) and thus v ery non-tree-lik e (from the TD p ersp ectiv e), but eac h has imp ortan t differences with resp ect to their resp ectiv e TDs. W e will fo cus on which types of structures different heuristics tend to capture, as w ell as how different heuristics deal with no des (not bags) which are asso ciated with the core or p eriphery of the original net work. Imp ortan tly , these to y netw orks hav e basic constructions, and they (mostly) hav e known optimal width TDs—e.g., SmallPlanar and SmallCycle hav e sev eral known equiv alent minimum width TDs—and the SmallER net work serves to illustrate some of the effects of randomness on a TD. The insights w e obtain here can b e used to in terpret the output of TD heuristics in muc h more complex syn thetic and real netw orks. 4.1 TD prop erties of to y net works In Figures 2, 3, and 4, we show visualizations of TDs pro duced by v arious heuristics (the greedy mindeg in Figure 2; the metnnd , nested no de dissection via METIS, in Figure 3; and lexm in Figure 4) for eac h of these fiv e to y net works. In these visualizations, the size of the bag corresp onds with the bag’s width, and the coloring is based on the fraction of edges presen t in the induced subgraph of the bag. In particular, if the no des in the bag form a clique in the original net work, then the fraction of edges present is 1 . 0 and the bag is dark red; while if the the no des are completely disconnected in the original netw ork, then the bag is dark blue. F rom these figures, we see substantial differences betw een the TDs that different heuristics generate for these five to y netw orks. All heuristics giv e the same unin teresting results for Small- Clique ; but for all of the other netw orks, including SmallBinar y , there are differences in the decomp ositions pro duced by the different heuristics. Consider SmallPlanar , SmallCycle , and SmallER . F or b oth SmallPlanar and SmallER , mindeg and metnnd return TDs with sev eral prominen t branc hes, while lexm returns a path for the TD. F or mindeg , this is due to the tendency of the algorithm to pick low-degree no des on the “outside” of the net work and then w ork its w ay around the outside of the netw ork. F or metnnd , this is due to the tendency of the algorithm to cut the netw orks rep eatedly into smaller pieces and then recursively “eat aw a y” at these smaller pieces to form the TD. On the other hand, the lexm heuristic works to pro duce a minimal triangulation using lexicographic lab elings along paths. This often results in a path- lik e TD, as the algorithm uses a breadth-first search through the netw ork. F or SmallCycle , metnnd returns a “branch y” TD, while b oth mindeg and lexm return path-like TDs. 10 These and other visualizations were created with the GraphViz command neato [96], with the help of [97, 98]. 12 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 (a) SmallBinar y 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 (b) 10 × 10 SmallPlanar 1 2 3 4 5 6 7 8 9 10 (c) SmallCycle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 (d) 100 node Small- Clique 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 (e) SmallER Figure 1: A set of small netw orks. Edges are colored by their length in the planar embedding. (a) SmallBinar y (b) SmallPlanar (c) SmallCycle (d) Small- Clique (e) SmallER Figure 2: Greedy mindeg TDs of to y netw orks. Bags are colored b y the fraction of p ossible edges presen t in the bag, with red b eing denser and blue b eing less dense. (a) SmallBinar y (b) SmallPlanar (c) SmallCycle (d) Small- Clique (e) SmallER Figure 3: metnnd (nested no de dissection via METIS) TDs of toy net works. Bags are colored by the fraction of p ossible edges presen t in the bag, with red b eing denser and blue b eing less dense. (a) SmallBinar y (b) SmallPlanar (c) SmallCycle (d) Small- Clique (e) SmallER Figure 4: lexm TDs of toy netw orks. Bags are colored by the fraction of p ossible edges presen t in the bag, with red b eing denser and blue b eing less dense. More quantitativ ely , in T ables 2 and 3, w e provide basic statistics for TD heuristics applied to eac h of these net works. T able 2 shows a summary of our results for the the (maximum) 13 width of TDs pro duced b y v arious heuristics (for SmallER , the width giv en is a veraged ov er fiv e different instan tiations of the net work). W e ignore the issue of tie-break er choices (e.g., in mindeg , c ho osing among non-unique minimum degree no des). On the whole, the heuristics do a go o d job of finding optimal width TDs on SmallBinar y , SmallClique , and SmallCycle (with the exception of metnnd ). The greedy heuristics ha ve trouble finding the optimal width TD on SmallPlanar , while lexm and mcs b oth find an optimal decomp osition on the grid. On SmallER , w e observe that the greedy heuristics and metnnd outperform lexm and mcs ; this is in agreement with previously-rep orted results [22]. T able 3 sho ws a summary of our results for the me dian width of TDs pro duced by v arious heuristics (as defined in Section 2.2). The median width is p oten tially more useful for revealing structure in realistic net work data since, e.g., it can b e used to see whether a TD is dominated b y larger bags or b y smaller bags. 11 If a netw ork is dominated b y bags of small size (suc h as SmallBinar y, SmallCycle, SmallPlanar ), dep ending on the internal structure of the bag, this can indicate sev eral things. F or example, the small bags could consist of tigh t clusters or cliques, indicating that the netw ork has man y tightly connected but smal l groups of no des. Alternativ ely , if a small bag’s structure is mostly disconnected, this ma y indicate the bag is related to small cycles (an example is given b elo w). F or SmallCycle and SmallPlanar , the small bags are cyclical, while for SmallBinar y the small bags all consist of 2-cliques. SmallClique and the SmallER hav e large median wid ths (though this is trivial in the case of the clique). The 100-clique is b oth trivial and to o large of a clique to b e realistic, but SmallER has in teresting bags. The results in T able 2 show the small median widths of SmallPlanar, SmallCycle and SmallBinar y and the large median widths of SmallER and SmallClique . T able 2 also demonstrates that, while there are differences in the widths of the TDs pro duced by the heuristics, these differences are reasonably small. 12 Net work n W mindeg W minfill W nnd W mcs W lexm SmallBinar y 128 1 1 3 1 1 SmallPlanar 100 13 13 14 10 10 SmallCycle 10 3 3 3 3 3 SmallClique 100 100 100 100 100 100 SmallER 100 86 85 86 91 89 T able 2: TD heuristic maximum widths. The widths of the SmallClique and SmallER are relativ ely large (they grow linearly with the netw ork size), the width of SmallPlanar netw ork is of an in termediate size (they grow with the square ro ot of the netw ork size), and the widths of SmallBinar y and SmallCycle are small (they sta y constant with the size of the netw orks). The greedy heuristics find smaller width decomp ositions on SmallER , while lexm and mcs p erform b etter on SmallPlanar . 11 Using medians rather than eccen tricity can result in different central bags. How ever in most of the netw orks that w e studied, the results were very similar. In particular, the biggest changes o ccurred in the FB net w orks where the median shifted to wards the heavier end of the path-lik e TD. How ever, these bags were still a part of the thick trunk of the netw ork and thus the results w ere very similar. In other netw orks, the median bag was very close the cen tral eccentric bag, and the main difference is that the median bag tended to hav e more whisker branches (a branc h consisting of one or tw o bags of small width). This do es not substan tially change any of our analysis. 12 W e will see b elo w that most real netw orks ha ve small median width, with smallest bags dominated by cliques, in termediate bags dominated b y cycles, and with large, connected, cen tral bags whic h resem ble bags of SmallER . 14 Net work n ˜ W mindeg ˜ W minfill ˜ W nnd ˜ W mcs ˜ W lexm SmallBinar y 128 1 1 1 1 1 SmallPlanar 100 5 5 5 10 8 SmallCycle 10 3 3 3 3 3 SmallClique 100 100 100 100 100 100 SmallER 100 52 51 49 85 80 T able 3: TD heuristic me dian widths. This quantit y is muc h smaller than the corresp onding widths in sev eral of the net w orks (although it remains large with SmallER ), indicating that these net works are dominated by bags whic h are muc h smaller than the largest bag in the netw ork. (a) A tree (left) and a opti- mal TD (righ t). (b) A clique (left) and a optimal TD (righ t). (c) An optimal TD of a cycle which is similar to the decomp osition found by mindeg . The center no de is placed in ev ery bag of the decomp osition. (d) An optimal TD of the cycle which is similar to the decomp osition found by lexm . Note the cycle is flattened and the bags are formed across the de- comp osition. (e) An optimal TD of the cycle which is similar to the decomp osition found by metnnd . The cycle is “pinc hed” in several places, forming cen tral bags; the remaining pieces of the cycle can then b e de- comp osed recursively (pinched in again) or using the metho ds in (b) and (c). Figure 5: Example TDs. The tree and the clique hav e a standard optimal TD. The cycle has man y p ossible minimum width TDs, though all place disconnected no des in the bags. 4.2 TDs on clique-like and cycle-lik e to y net works An imp ortant asp ect of TD heuristics is the difference b etw een their b eha vior on (denser) clique- lik e graphs and (sparser) cycle-like graphs. In Figure 5, w e illustrate this. First, for reference, in Figures 5a and 5b we give canonical minim um width TDs for a tree and a clique. T o understand the difference b et ween cycles and cliques, recall that there are many wa ys of pro ducing a TD on a cycle (three of these are illustrated in Figures 5c, 5d, and 5e). One simple w ay is to pro duce a tree which is a path. This can b e done by taking a no de v and placing it and its tw o neighbors in a bag at one end of the path. Then, k eeping v in every bag, progress around the cycle sequen tially forming the next bag by including v and the tw o no des of the next edge (see Figure 5c). Another metho d pro duces a path by “flattening” the cycle, and places each edge in a bag with the no de from the other side (see Figure 5d). The metnnd heuristic “pinches” the cycle at a few p oin ts, and the pro duces branches from eac h of those p oin ts (see Figure 5e). There are many differences b et ween these TD heuristics, but an imp ortan t p oint is that the no des in the cycle must b e placed in bags with no des they are not neighbors of in the original graph. Different TD heuristics are very different in terms of ho w they mak e this decision, and its effect can b e seen in the TDs constructed by these heuristics. 15 Another imp ortan t consideration is the interior structure of each bag that is produced b y a TD heuristic. Recall that in SmallClique , the only v alid TD (which do es not con tain unnecessary bags) is a single bag containing the entire netw ork. Relatedly , if the netw ork is a k-tr e e , formed b y ov erlapping cliques (rather than ov erlapping edges, as in a normal or 2-tree), then the TD will ha ve bags whic h consist of the individual cliques. Th us, with cliques, it is the lo c al structur e (lo cal in the original graph, in the sense that it is driv en b y neigh b ors of a given no de in the original graph, in contrast with what is going on in, e.g., SmallCycle ) that drives the bag formation. With cycles, on the other hand, this lo cal structure is partially “lost” in the bags of the TD. This is of in terest since, as already mentioned, the interior structure of bags of differen t widths is imp ortant for understanding what is creating the prop erties of the TD. As an example, we observe that, for all of the heuristics, the larger bags on SmallPlanar ha ve many disconnected no des and only a few edges. This is a signature of “cyclical” b ehavior; and, indeed, from the TD p ersp ectiv e, the grid “lo oks like” a set of small, regular, ov erlapping cycles. The structure of the TD is formed by the heuristic’s metho d of moving across the grid and closing cycles. This suggests that a simple metric to measure whether the interior of a bag is driven by cycles or is driven by small, tightly connected clusters: measure the fraction of edges presen t in the bag, i.e., the e dge density of the bag. (W e will do this b elow, and this is why we color many of the visualizations by the density of the bag.) 4.3 TDs on well-partitionable and p o orly-partitionable toy netw orks SmallPlanar (for which there exist go o d well-balanced partitions) and SmallER (for which there do not exist goo d w ell-balanced partitions) also illustrate differences betw een the TD heuris- tics. F or both graphs, the greedy heuristics and metnnd ha v e significan tly smaller median widths than maxim um widths. This is indicativ e of heterogeneity in the netw ork: there are no des whic h are so entangled with other no des that they must app ear together in a large bag, but there are also no des which are connected to only a small num b er of other no des and only need to app ear in a few v ery small bags. This can partially b e explained by the tendency of the greedy heuristics to work from the “b oundary” of the graph (e.g., b oundary no des hav e smaller degrees) and to pic k p oin ts to “eat into” the graph. This is illustrated in Figure 6 for SmallPlanar . Using mindeg as an example, recall that heuristic w orks b y successiv ely pic king a minim um degree no de in the net w ork; thus, when applied to SmallPlanar , it will pic k eac h of the corner vertices of the grid first. This then forms small bags at eac h corner and, dep ending on whether it is pic king non-unique no des at random or in an ordered fashion, it will then pro ceed to work in from the p eriphery of the netw ork. Indeed, in Figure 2 and 3, w e see that the TDs for these heuristics hav e four ma jor arms with small lea ves con taining no des from the b order of the grid. Figure 6 pro vides a visualization of where the no des from these bags (one of the p eripheral bags and one of the cen tral bags in the TD) are in SmallPlanar . (See, in particular Figures 6a and 6c for mindeg .) The lexm and mcs heuristics, in contrast, find the minimum p ossible width for the grid, but the TDs—as illustrated in Figure 4 for lexm —that are pro duced are long, path-like trunks. This is due to they wa y that lexm picks a starting no de and then works across the graph in the style of a breadth-first searc h. With SmallPlanar , it starts at one corner and then mov es across the netw ork to form a minimal triangulation. Although the (maxim um) width is minimal, the median widths of these net works are relatively large, as most of the bags are roughly the same size (se e Figures 6b and 6d for the results of lexm ). With SmallER (which is harder to visualize since it do esn’t embed w ell in t wo dimensions), the mindeg and metnnd algorithms also eat in from the “b oundary” of the netw ork, where here “b oundary” means no des with slightly smaller degrees or slightly b etter cuts (slightly smaller due 16 (a) mindeg bag from upp er left arm of Figure 2b. (b) lexm bag from lo wer right of Figure 4b. (c) mindeg cen tral bag in Figure 2b. (d) lexm cen tral bag in Figure 4b. Figure 6: Representativ e bags from an arm in mindeg and an arm in lexm , as well as the asso- ciated cen tral bags. In mindeg , each arm progresses from a different corner of SmallPlanar . Ho wev er, when these bag lines conv erge, the cen tral bags end-up containing pieces of each line, as in Figure 6c. In lexm , the line pro ceeds diagonally across the grid from the low er left corner to the upp er right in a regular manner, as in Figure 6d. This results in smaller central bags and pro duces a path decomp osition. See the main text for more details. to random fluctuations). As with the very differen t SmallPlanar , this pro duces sev eral arms and then a few cen tral bags. In SmallER , the greedy heuristics pro duce a b etter TD in terms of width than the lexm and mcs heuristics, b oth search based heuristics. These similarities and (substan tial) differences b et ween TD heuristics in SmallER (compared with SmallPlanar ) are apparent in Figures 2, 3, and 4. 4.4 Summary of TD results on to y net w orks Ov erall, the greedy heuristics, e.g., mindeg or metnnd , seem to pro duce a b etter representation of the large-scale structure of SmallPlanar and SmallER than the lexm and mcs heuristics in t wo w ays. In SmallER , the greedy heuristics find decompositions with both smaller maxim um as well as smaller median widths. (Since most real netw orks hav e a randomized asp ect to their generation, this indicates that greedy heuristics may b e more useful.) On SmallPlanar , the median width is smaller and the greedy heuristics do a b etter job of “capturing” all four corners of the grid. In other words, the resulting tree decomp osition has four branc hes, each of which is tied to a sp ecific corner of the netw ork, while lexm and mcs TDs capture t wo of the corner structures. (Although the maximum width is smaller with lexm and mcs , the ability to capture what is an obvious visual feature of a simple netw ork is of p oten tial in terest.) In the rest of the pap er, we will b e considering significantly larger and more complicated net works than these to y examples. With these larger net works, the metnnd and amd heuristics, as implemented using INDDGO [27], are the most scalable, compared with the basic greedy algorithms ( mindeg or minfill ). The amd heuristic is very related to the mindeg heuristic (recall that amd pic ks minim um no des based on an easy-to-compute appr oximation of no de degree), and it giv es similar results to mindeg . The the most consistent difference b et ween the tw o heuristics seems to b e the n umber of central/o verlapping bags pro duced. Th us, we will often show results only for the amd heuristic as a matter of visual conv enience. 17 5 T ree decomp ositions of syn thetic net works In this section, we will describe the results of using a v ariet y of TD heuristics on a set of synthetic net works. W e fo cus our attention on tw o simple classes of random graphs: the p opular Erd˝ os- R ´ en yi (ER) random graphs (in Section 5.1); and a p o w er law (PL) extension of the basic ER mo del (in Section 5.2). (W e emphasize, though, that similar qualitative results also hold for man y other random graph mo dels—in their extremely sparse regimes.) This will allo w us to begin to understand how TDs b eha ve in random graph mo dels with a v ery simple random structure. Imp ortan tly , we will fo cus on extr emely sparse graphs. F or the ER mo del, this means v alues of the connection probability p that lead to the graph not even b eing fully-connected (in whic h case w e will consider the giant comp onen t), while for the PL mo del this means v alues of the degree heterogeneit y parameter γ that are typically used to describ e many realistic netw orks and that lead to analogously sparse graphs. ER graphs are often presented as “stra wmen,” since they obviously do not provide a realistic mo del for man y asp ects of real-world netw orks (e.g., the hea vy-tailed degree distributions and the non-zero clustering co efficient present in many real net works). Indeed, “v anilla ER” graphs that are often considered (e.g., ER graphs with densities that are sufficien tly large that the graph is fully-connected) are not tree-like—either by the metric notion of δ -hyperb olicit y or b y the cut-based notion of TDs. Recent work has shown, how ever, that with resp ect to their large-scale structure, extr emely sparse ER net works do capture several subtle but ubiquitous prop erties of in terest in realistic netw orks: first, the small-scale versus large-scale isoperimetric structure of the NCP [1, 2]; second, a size-resolved v ersion of δ -h yp erbolicity that is consisten t with large- scale metric tree-likeness [3]; and third, a non-trivial core-p eriphery structure with resp ect to k -core decomp ositions [3]. (In particular, in the sparsest regime of the ER net works that we consider, ER(1.6) , a very shallow core-p eriphery structure app ears—whereas none exists at the higher densities.) Imp ortan tly , for all three of these prop erties, similar results w ere seen with other random graph mo dels, such as PL random graphs in the regime of the degree heterogeneit y parameter that is commonly-used. Prior work has also provided evidence that these extremely sparse random graphs ha ve non-trivial tree-like structure (at least relativ e to m uc h denser ER graphs) when viewed with resp ect to the cut-lik e notion of tree-likeness [3]. Here, we provide a muc h more detailed analysis of this phenomenon for TD heuristics applied to ER and PL graphs. W e will b e particularly in terested in similarities b et ween extremely sparse ER graphs and PL graphs with resp ect to the core-p eriphery structure (e.g., from k -core and related decomp ositions) of a net work. Among other things, we show that this core-p eriphery structure is captured with the amd TD. Of particular in terest is the how the core-p eriphery structure relates to central (low eccentricit y) or p erimeter (high eccentricit y) bags in the TD. 5.1 TDs of ER Netw orks Here, w e give a summary of the empirical results of an analysis of TDs on ER random graphs, with an emphasis on the b eha vior as the connectivit y parameter p is v aried. In the very sparse to extremely sparse regime, ER netw orks ha v e non-trivial global structural changes as p is v aried [99, 100]. In particular, for our subsequent results, there are three regimes of p that are of in terest: if p < 1 n , then the largest connected comp onen t is O (log n ) in size, and the small comp onen ts are lik ely trees; if 1 n < p < log n n , then the graph has a gian t comp onen t (i.e., a constant fraction of the size of the net work is connected), and the remaining small comp onen ts of size O (log n ) are lik ely trees; and if p > log n n , then almost surely the net w ork is fully-connected, the degrees are v ery near their exp ected v alue, and there are no go o d-conductance clusters (of an y size). W e are interested in these last tw o regimes, and we consider synthetic graphs ( ER(1.6) through ER(32) —v alues 18 Net work N amd E amd W ˜ W ˜ D ER(1.6) 3127 44 79 1 1.0 ER(1.8) 3457 38 157 1 1.0 ER(2) 3760 38 235 2 0.67 ER(4) 3777 35 1093 3 0.40 ER(8) 2787 29 2208 8 0.20 ER(16) 1856 28 3142 17 0.12 ER(32) 1136 22 3863 33 0.06 (a) ER net works Net work N amd E amd W ˜ W ˜ D PL(2.5) 4672 32 219 1 1.0 PL(2.75) 4500 39 148 1 1.0 PL(3.0) 3974 36 96 1 1.0 (b) PL net works T able 4: Basic amd TD statistics for ER and PL netw orks. N amd giv es the num b er of bags in the TD, E amd giv es the maximum eccen tricity (diameter) of the TD, W and ˜ W are the maximum and median width of the TD, and ˜ D is the median bag densit y . of p b et w een 1 . 6 / 5000 and 32 / 5000, for graphs with n = 5000 no des) that go from extremely sparse to somewhat denser. T able 1 provides basic statistics for thes e graphs. 5.1.1 Visualization and basic statistics W e start with Figure 7 and T able 4a, which sho w the basic features of the TDs of ER netw orks. Figure 7 presents a visualization of part of the output of a TD with the amd heuristic, colored b y densit y of bag subgraph, for the sparsest ( ER(1.6) ) and densest ( ER(32) ) netw orks in our ER suite. Results are similar to those of metnnd . Observe that there is a m uch greater heterogeneit y in the densit y of bags for ER(1.6) than for ER(32) . F or the former, there are man y small bags whic h are cliques; while for the latter, there are few er small bags, and the bags are m uch sparser in general. This suggests (and w e hav e v erified by insp ection) that the sparser ER(1.6) has greater structural heterogeneity than the denser ER(32) . A more detailed understanding of this can b e obtained from the summary statistics in T able 4a. Sev eral observ ations are worth making. First, the num b er of bags in the TD tends to decrease as the density p increases (with the exception of the sparsest regime, where the gian t comp onent is smaller). This is b ecause the netw ork is mostly placed into one bag, and only a few bags are needed to take care of the remaining no des. Second, the TD itself, viewed as a graph, has smaller diameter as the densit y p increases. Third, the maximum and median width increases with the a verage degree of the net work. Indeed, the width increases quic kly with the a verage degree, with the largest bag (at the “center” of the TD) con taining 77% of the no des in the net work in ER(32) . Finally , the median density of the bags decreases dramatically as the densit y of the original graph increases. This initially-coun terintuitiv e phenomenon is easily-explained: for extremely sparse ER, the TDs hav e many small bags, which only need small n umbers of edges to ha ve a reasonably high edge density . With the dense graphs, many no des hav e to b e placed in eac h bag, and this requires quadratically more edges p er bag to achiev e a similar densit y . W e would next lik e to lo ok in more detail at the structure of the TDs generated on these differen t ER netw orks (e.g., what c hanges as we mov e from the central, large bags of the TD to the smaller, p eripheral bags of the TD) as w ell as the in ternal structure of eac h bag. Recall, first, that, in a very sparse ER graph with exp ected degree greater than 2 log 2, but still sufficiently sparse, there are three different parts of the random net work (t wo parts which ma y b e view ed as core-like, one part which ma y b e view ed as periphery-like) [90]. The core-lik e part of these graphs is bi-connected, and it has an expander-like inner core (i.e., a set of no des of “higher” degree), surrounded by an outer core which has long c hains of no des (forming sparse cycles). The third, p eripheral, part of the net work consists of tree “whiskers” that hang off the bicon- nected core. A similar structure has b een observed empirically when lo oking at low-conductance 19 (a) ER(1.6) , the largest bag in this figure contains 80 nodes. (b) ER(32) , the largest bag in this figure contains 3864 nodes. Figure 7: Visualization of ER(1.6) and ER(32) amd tree decomp ositions, colored b y the densit y of the bag subgraph. F or visualization purp oses, the tw o netw orks are not dra wn to the same scale. The bags in ER(32) hav e widths that are appro ximately 50 times larger that that of ER(1.6) . The blo wups sho w the upper-left corner of the visualization in greater details. The blo wups show the color of some of the smaller bags that are in the p eripheral part of eac h TD. In ER(1.6) , man y of the v ery the small bags are red (meaning they con tain a clique, the v ast ma jority of whic h are simply a single edge). Sligh tly larger bags are light blue or y ellow (indicating an edge density of ca. 0 . 25 (light blue) to ca. 0 . 75 (yello w)). In ER(32) , all of the bags of the TD are dark blue, indicating that these bags are all very sparse, regardless of whether they are p eripheral or central to the TD. The statistics in T able 4a confirm this. clusters/comm unities in a wide range of large so cial and information netw orks [1] and also when lo oking at the Gromov hyperb olicity and k -core prop erties of these real-w orld netw orks [3]. This con trasts sharply with the denser ER graphs, whic h are m uch more regular in terms of their degree v ariabilit y , core structure, etc. Our results (here on extremely sparse ER graphs and b e- lo w on PL graphs and many real-w orld graphs) demonstrate that TD heuristics can reflect this core-p eriphery structure. 5.1.2 In ternal bag structure Next, Figure 8 presen ts visualizations of three t ypical amd bags for ER(1.6) and ER(32) , re- sp ectiv ely . In eac h case, the three bags are the most central (low est eccentricit y) bag in the TD (whic h we call the central bag), a t ypical bag that is a leaf in the TD (a p eriphery bag), and a t ypical bag that is in b etw een these t wo in the TD (an intermediate bag). The color-co ding is b y k -core num b er, with high core no des b eing red and low core no des b eing blue. Note that the 20 cen tral bag for ER(1.6) is disconnected and consists of almost all singletons, while the central bag for ER(32) is well-connected; and that the intermediate and p eripheral bags for ER(1.6) are small, the latter consisting of only a single edge, while for ER(32) b oth the intermediate and the p eripheral bag hav e non-trivial internal structure. (a) ER(1.6) cen tral bag subgraph. (b) ER(1.6) in termediate bag subgraph. (c) ER(1.6) p eripheral bag subgraph. (d) ER(32) cen tral bag subgraph. (e) ER(32) in termediate bag subgraph. (f ) ER(32) p eripheral bag subgraph. Figure 8: Bag subgraphs of a amd TD of ER(1.6) and ER(32) graphs, colored by the k -core n umber of the no de (red is high k , blue is lo w k ). The central bag is the largest bag in the TD and one of the bags of minim um eccentricit y; the p eripheral bag is a leaf in the TD graph, and it ac hieves the minimum width in the TD; the in termediate bag is in b et ween these tw o extremes. The increased density of ER(32) o ver ER(1.6) is the obvious cause of these differences, but it is worth considering what structures, pro duced by the increased density , affect the formation of the amd TD. Recall from the to y graphs that heuristic TDs of cycles pro duced bags which had disconnected no des. There were several different wa ys of pro ducing the decomp osition, but an y TD of a small width on a cycle includes disconnected no des in most bags. The more complex SmallPlanar has many small o verlapping cycles. In that case, the heuristics hav e to put man y nonadjacen t no des in to a bag. Essentially , cycles force distan t no des in to the same bag, and man y o verlapping cycles will force many distan t no des into the same bag. This intuition suggests (and we hav e confirmed by insp ection) that a bag with many discon- nected no des, as in the central bag of ER(1.6) shown in Figure 8a, is due to a large num b er of o verlapping cyclical structures. The intermediate bags of ER(1.6) contain no des from the long, o verlapping cycles of the outer core (and as these cycles do not ov erlap as muc h in p eriphery , these bags ha ve few er no des), while the p eripheral bags each con tain a single edge, capturing the small trees on the p eriphery of the net work (see also Figure 7). The coloring of the no des indicates the core-periphery structure of the subgraph induced b y the bags. In ER(1.6) there is only a 1-core (blue) and a 2-core (red), th us the red no des in the cen tral bags are all in the 2-core, while the p eripheral trees are in the 1-core, which agrees with [90]. On the other hand, in ER(32) , whose core-p eriphery structure spans from a 7-core (blue) to a 23-core (red) although almost all of the no des (94%) are in the 23-core, the central bag con tains a relatively tigh tly-connected mass of 77% of the no des in the netw ork. This b egins to lo ok more like SmallER , which is a very dense ER netw ork. The intermediate bags contain sparser structures (with some of the disconnected no des and edges that are indicative of cyclical structures); and, although the p eripheral bags still con tain the smallest structures, in ER(32) they no longer contain only a single edge. This indicates that even the sparsest regions contain cycles and other complicated structures (but very few triangles, which agrees with the small clustering co efficient of these net works). 21 5.1.3 Large-scale organization T o provide a more quantitativ e ev aluation of these ideas and to characterize better the large- scale organization of these synthetic net works, consider Figures 9, 10, and 11. These figures plot bag cardinality histograms, av erage bag density versus bag cardinalit y (this is width + 1), and a verage k -core v ersus bag eccen tricity for t wo ER net works (as w ell as a suite of PL and real-w orld net works). W e will refer to other subfigures b elo w, but for now consider only Figures 9a and 9e, Figures 10a and 10e, and Figures 11a and 11e for results on ER(1.6) and ER(32) , resp ectiv ely . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 0 10 20 30 40 50 60 70 80 90 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (a) ER(1.6) . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (b) PL(2.5) . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 10 20 30 40 50 60 70 80 90 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (c) as20000102 . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 50 100 150 200 250 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (d) CA-GrQc . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 0 500 1000 1500 2000 2500 3000 3500 4000 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (e) ER(32) . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.1 0 10 20 30 40 50 60 70 80 90 100 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (f ) PL(3.0) . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 500 1000 1500 2000 2500 3000 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (g) FB-Lehigh . 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 0 5 10 15 20 25 Fraction of bags Bag Cardinality Fraction of Bags Cumulative Fraction (h) PowerGrid . Figure 9: Bag cardinality histograms with cumulativ e fraction of bags for a representativ e set of netw orks. F or all of the netw orks, there are man y more small cardinalit y bags than large cardinalit y bags. This is consisten t with a TD structure which has a few cen tral bags which quic kly tap er and branch off into many small p eripheral bags. As we will see in Figures 10 and 11, in netw orks with a strong core-p eriphery structure (e.g., not PowerGrid ), these p eripheral bags tend to ha ve a low a verage k -core and high relative densit y . FB-Lehigh has the most large bags, due to the tendency of the FB netw orks to form long, path-lik e trunks in its TDs. PowerGrid has the smallest tap ering effect; although the largest bags are still at the cen ter of the decomp osition, there is only a small c hange in size from the largest bags to the smallest, presumably since this netw ork has the weak est core-p eriphery structure. W e saw in Figure 8a the cen tral bag for ER(1.6) , and we in terpreted it in terms of the output of amd TD as due to ov erlapping cycles; the histograms in Figure 9a show that for ER(1.6) (and the netw orks in the rest of Figures 9) there are only a very few such cen tral bags. On the other hand, Figure 9 also shows that there are man y v ery small bags in ER(1.6) . Tw o features we noticed ab out these t wo t yp es of bags are that there is a change in edge density betw een the large and small bags and that there is a change in the k -cores represented b et ween the large and small bags. Figure 10 and Figure 11, where the a verage edge densit y of a bag is plotted against the bag cardinality , show tw o wa ys of measuring this. These figures show that small p eripheral bags are dense (relative to their small size—in the extreme case, this could b e a single edge), and they con tain low k -core no des, as indicated, e.g., by the do wnw ard slop e of the plots in Figure 11a. Man y of the results for ER(32) are v ery different than for ER(1.6) . The histograms in Figure 9a show that this netw ork has a muc h larger prop ortion of high-width bags than ER(1.6) . The largely homogeneous core-p eriphery structure of dense ER netw orks should also be clear since the no des, regardless of bag size, are mostly in the deep est core ( k = 23). These trends can b e seen 22 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 Average Density Bag Cardinality Avg. density (a) ER(1.6) 0 0.2 0.4 0.6 0.8 1 50 100 150 200 250 Average Density Bag Cardinality Avg. density (b) PL(2.5) 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 Average Density Bag Cardinality Avg. density (c) as20000102 0 0.2 0.4 0.6 0.8 1 50 100 150 200 250 Average Density Bag Cardinality Avg. density (d) CA-GrQc 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 3000 3500 4000 Average Density Bag Cardinality Avg. density (e) ER(32) 0 0.2 0.4 0.6 0.8 1 10 20 30 40 50 60 70 80 90 100 Average Density Bag Cardinality Avg. density (f ) PL(3.0) 0 0.2 0.4 0.6 0.8 1 0 500 1000 1500 2000 2500 3000 Average Density Bag Cardinality Avg. density (g) FB-Lehigh 0 0.2 0.4 0.6 0.8 1 5 10 15 20 25 Average Density Bag Cardinality Avg. density (h) PowerGrid Figure 10: Average bag density versus bag cardinality plots for a representativ e set of netw orks. In the PL netw orks the small bags are dense—in the extreme case consisting of a single edge— lik e the ER(1.6) netw ork; but the largest bags are larger and mostly connected, similar to the in termediate and central bags of ER(32) . The real-world net works all show denser bags even at large size scales as compared to the synthetic net works, and this is due to the increased clustering presen t in these netw orks. b y comparing the densit y of the smallest bags in ER(1.6) and ER(32) in Figures 10a and 10e. The flat plot of the av erage k -core in Figure 11e, which holds steady close to the v alue of the maxim um k -core, indicates the lac k of a core-p eriphery structure in the netw ork. Putting all of these results together, w e can conclude that when it exists (e.g., in extremely sparse ER graphs), the core-p eriphery structure of ER netw orks is captured b y the amd TD; and when the core-p eriphery structure do es not exist (e.g., for ER graphs for other even mo derately sparse v alues of p ), the large width of the TD indicates that the most of the net work is in the largest bag, which is analogous to most of the no des b eing in the core of the netw ork. 5.2 TDs of PL Netw orks Here, we giv e a summary of results of an analysis of TDs on PL random graphs, with an emphasis on the b eha vior as the degree heterogeneity parameter γ is v aried. Recall that T able 1 pro vides basic statistics for the PL graphs. PL graphs are a class of ER-like random graphs, except that degree heterogeneit y is exogenously-sp ecified. Previous w ork has shown that PL graphs hav e imp ortan t similarities with extremely sparse ER graphs, when one is in terested in small-scale v ersus large-scale tree-like structure [1, 3, 2]. In particular, the increased degree heterogeneit y pro duces a large-scale core-p eriphery structure in the PL net works, similar to the extremely sparse ER netw orks, but these PL netw orks also ha ve some of the characteristics of denser ER netw orks (e.g., the core is more strongly connected and the diameter of the netw ork is smaller). W e start with T able 4b, which show the basic features of the TDs of PL net works. PL(3.0) has the least amount of degree heterogeneity and has similar characteristics to ER(1.6) , while the lo wer degree exp onen ts ( PL(2.75), PL(2.5) ) hav e c haracteristics similar to b oth the dense and sparse ER netw orks. Most notably , the maxim um width increases (as it w ould if the densit y increased), while the median width and median bag density stay the same (low and high, resp ec- 23 1 1.5 2 2.5 3 20 25 30 35 40 45 Average k-core Bag Eccentricity Avg. k-core (a) ER(1.6) 1 1.5 2 2.5 3 16 18 20 22 24 26 28 30 32 Average k-core Bag Eccentricity Avg. k-core (b) PL(2.5) 1 2 3 4 5 6 7 8 9 10 16 18 20 22 24 26 28 30 32 Average k-core Bag Eccentricity Avg. k-core (c) as20000102 1 2 3 4 5 6 7 8 9 10 18 20 22 24 26 28 30 32 34 36 38 Average k-core Bag Eccentricity Avg. k-core (d) CA-GrQc 5 10 15 20 25 30 10 12 14 16 18 20 22 Average k-core Bag Eccentricity Avg. k-core (e) ER(32) 1 1.5 2 2.5 3 18 20 22 24 26 28 30 32 34 36 Average k-core Bag Eccentricity Avg. k-core (f ) PL(3.0) 10 20 30 40 50 60 70 80 90 100 110 120 130 140 150 Average k-core Bag Eccentricity Avg. k-core (g) FB-Lehigh 1 1.5 2 2.5 3 25 30 35 40 45 50 55 60 Average k-core Bag Eccentricity Avg. k-core (h) PowerGrid Figure 11: Av erage k -core versus bag eccen tricity for a represen tativ e set of net w orks. The correlation b et ween the core-p eriphery structure and the central-perimeter bags can b e seen in a do wnw ard slop e in these plots. Netw orks with no prominent core-p eriphery structure ( ER(32) and PowerGrid , for t wo very differen t reasons) hav e a flat plot here; while netw orks w ith mo derate core-p eriphery structure (the PL graphs and ER(1.6) ) hav e a down ward sloping line, but relatively shallow (i.e., not deep) cores. as20000102 and CA-GrQc b oth hav e prominen t, deep core-periphery structures that reveal themselv es in this plot. The dips that show up at small eccentricities in several of the syn thetic netw orks and CA-GrQc are due to the man y small “whiskers” (in the sense of [1]) that hang off of the core bag. FB-Lehigh also has a deep core-p eriphery structure (in the sense of k -core decomp ositions); but b ecause of the long path-lik e nature of the TD and since most of the no des are in the deep est cores, the plot is flat with larger do wnw ard dips as the bag eccentricit y increases. tiv ely), as in the ER(1.6) . In the previous section, we saw that the low median width and high densit y was related to the presence of a core-p eriphery structure in the net work. As we will see, this is also true of the PL netw orks, and the amd TDs are again able to capture this structure. Among other things, w e find that for the PL netw orks, for a giv en a verage degree, the presence of v ery high degree no des that tend to link to each other means that the density of the high- width bags, i.e., the core of the net work, is greater, making it more lik e the cores of the denser ER net works. On the other hand, the p eripheries of these PL net works are still v ery sparse, and TD bags including them lo ok more lik e the p eripheries of the sparse ER netw orks. The p eriphery results are reflected in the results presented in T able 4b, where w e see that the median width is lo w and the median densit y is high (man y bags with only a single edge within). The visualizations of the central, intermediate, and p eripheral bags from TDs of PL netw orks in Figure 12 reflect this. In particular, the central bag for PL(2.5) looks somewhat like the intermediate ER(32) bags, while the central bag for PL(3.0) is muc h less well-connected; and the p eripheral bags for b oth PL graphs lo ok like the ER(1.6) bags. The TD reflects the core-p eriphery structure via the central-peripheral bags as reflected in the down ward slop e of Figure 11b. As the p ow er law exp onen t γ is increased, recall that the amount of degree heterogeneit y in the resulting net work is reduced, i.e., the num b er of high degree no des sp ecified by the p ow er la w degree distribution is decreased. As a necessary consequence of maintaining this distribution as 24 (a) PL(2.5) cen tral bag subgraph. (b) PL(2.5) in termediate bag subgraph. (c) PL(2.5) p eripheral bag subgraph. (d) PL(3.0) cen tral bag subgraph. (e) PL(3.0) in termediate bag subgraph. (f ) PL(3.0) p eripheral bag subgraph. Figure 12: PL(2.5) and PL(3.0) bag subgraphs, colored by k -core num b er of the no de. the no des are connected, the high degree no des are lik ely to b e connected to other high degree no des. This causes a core-p eriphery structure to emerge (see [3] for empirical measurements b et w een the relationship b et ween γ and the k -core structure). F or example, the core-p eriphery structure of PL(3.0) is shallow er than that of PL(2.5) , as seen in Figure 11. Similarly , the width of the PL(2.5) amd TD is larger than that of the PL(3.0) amd TD. In all cases these widths are less than the corresp onding ER(2) net w ork, whereas one might exp ect these net works to ha ve larger widths b ecause of the increased core- p eriphery structure. This o ccurs b ecause there are several factors to consider as γ is decreased. The core do es b ecome denser as more edges are added to the core, causing these no des to b ecome more difficult to separate; but most of those extra edges come from the outer regions of the expander-lik e core, thus shrinking the size of the core and increasing the size of the p eriphery . In other w ords, when only a few medium degree no des are added, then there are still cyclical structures in the c ore, as is observ ed in ER(1.6) , except smaller; but as higher degree no des are added, the core b ecomes denser and b egins to b ecome larger, as this forces larger and larger pieces of the core to b e placed in the same bag, as is observed in ER(32) . The TD results on ER and PL graphs demonstrate that TDs (in particular, with the amd heuristic) can capture the core-p eriphery structure of t wo common random netw ork mo dels. In b oth cases, to the extent that there was a core-p eriphery structure (which itself dep ended on the sparsit y parameter p or the degree heterogeneity parameter γ ), the cen tral and p eripheral bags in the TD from amd were correlated with this structure. The p eripheral bags were smaller and m uch sparser than the central bags and contained no des from the shallo w (low) k -cores of the netw ork. With the exception of the tree-like p eriphery of the extremely sparse ER and the PL netw orks, the structures observ ed in TDs of the random net work mo dels were largely driven b y lo osely connected core structures (e.g., ov erlapping lo ops in sparser regions and expander-lik e cores in the denser regions). This is consisten t b oth with the results on the toy netw orks and previous work inv olving the structure of ER netw orks [99, 90]. (The one exception to this is the denser ER(32) , where the central bag contained 77% of the netw ork; in this case, the net work do es not exhibit the core-p eriphery structure of the other netw orks lo ok ed at in this section.) W e will see how we obtain similar results when applying the amd heuristic to real-world net works. 6 T ree decomp ositions of real-w orld net works In this section, we will describ e the results of using a v ariety of TD heuristics on a set of real-world net works. Our goal is to use the insights from the previous sections to ev aluate the performance of existing TD heuristics on real so cial and information netw orks and to understand how those TDs 25 can b e used to obtain an improv ed understanding of the prop erties of these realistic net works. Our main results in this section are three-fold. First, in Section 6.1, w e summarize results of a detailed empirical ev aluation of the amd TD heuristic applied to our suite of realistic netw orks. 13 The main fo cus is to illustrate how these TDs capture previously-identified core-p eriphery struc- ture, and also to illustrate how the internal structure of TD bags can b e understo o d in terms of large-scale cycles and small-scale clustering in the original graph. Second, in Section 6.2, we ev aluate the abilit y of amd to identify small-scale go o d-conductance comm unities such as those previously-iden tified b y the NCP with lo cal sp ectral metho ds [1, 2]. W e sho w connections b et ween bags that are more peripheral in the TD and small go od-conductance communities responsible for “dips” in the NCP . Third, in Section 6.3, we illustrate that TD heuristics can b e used to iden tify certain other types of large-scale non-conductance-based “ground truth” comm unities. In par- ticular, we will sho w connections b et w een bags that are more cen tral in the TD and large-scale comm unity-lik e (by a “ground truth” metric but not by conductance quality) clusters. 6.1 Results on identifying core-p eriphery structure Here, we will describ e the results of an empirical ev aluation of the amd TD heuristic applied to our suite of real-world net works (those in T able 1). W e will b egin in T able 5 with a brief surv ey of all of our real net works, and w e will then fo cus on four representativ e netw orks: as20000102 , CA-GrQc , FB-Lehigh , and Po werGrid . The first three all exhibit some form of previously-recognized core-p eriphery structure [3], while PowerGrid is kno wn to lac k a strong core-p eriphery structure (basically since it is hea vily tied to the underlying lo cally-Euclidean geometry of the Earth [3]). Note, though, that the F aceb ook net works are very core-heavy , in the sense that they ha ve many no des in deep cores, essentially b ecause of their significan tly higher av erage degree (see, e.g., [3] and T able 1). (Thus, informed by previous results on k -core decomp ositions and related tree-like tec hniques [3, 75, 1, 87], we exp ect to see evidence of the core- p eriphery structure in the TDs asso ciated with as20000102 , CA-GrQc , and, to a lesser exten t, FB-Lehigh , but a lack of substantial core-p eriphery structure in the TD of PowerGrid .) 6.1.1 Ov erview of core-p eriphery results for all netw orks In T able 5, w e presen t the num b er of bags in the amd TD ( N amd ), the maxim um eccen tricity (diameter) of the TD ( E amd ), the maxim um and median width of the TD ( W and ˜ W , resp ectiv ely), and median bag densit y ( ˜ D ). These measuremen ts pro vide us with an idea of ho w large is the most connected part of the netw ork (maximum width); how numerous are small bags (median width), whic h is indicativ e of areas of the net work that ha ve small separators, in our case small peripheral regions of the netw ork; and whether the small separators are more clique-lik e or consist of mostly disjoin t nodes (median density), with disjoint no des b eing indicativ e of cycles and clique-lik e structures b eing indicative of more meaningful communities. A large maxim um width combined with a lo w median width is evidence for a deep core and a shallow p eriphery , and high median densit y is evidence for a p eriphery based on more comm unity-lik e separators, rather than more disparate separators. These observ ations assume that high densit y bags are mostly small-width bags. This assumption is plausible, given that as the width w of a bag increases, the n umber of edges required to maintain a constant densit y increases like w 2 ; and in many cases w e ha ve confirmed this assumption indirectly or b y direct observ ation. F or example, see our discussion 13 W e should note that w e ran these computations with many different TD heuristics. In most of this section, ho wev er, we only sho w results from (the most scalable) amd heuristic. This is simply for brevity . There were some differences from heuristic to heuristic, but w e feel this one is representativ e of the type of b eha vior found. 26 of bag density and bag width b elo w, as well as Figure 10 b elo w for empirical evidence that high densit y bags are generally the smallest width bags. Net work N amd E amd W ˜ W ˜ D CA-GrQc 3014 39 222 2 1.0 CA-AstroPh 10708 78 3616 5 1.0 as20000102 6364 33 88 2 1.0 Gnutella 6475 33 1629 2 0.67 Email-Enron 26781 78 2237 3 1.0 FB-Cal tech 395 30 357 18 0.53 FB-Ha verford 516 56 891 37.5 0.38 FB-Lehigh 1919 151 2983 31 0.32 FB-Rice 1481 76 2553 31 0.37 FB-St anford 4809 100 6674 16 0.38 PowerGrid 4666 59 21 2 0.67 Polblogs 899 49 294 6 0.57 ro ad-TX 1 . 25 × 10 6 170 197 3 0.5 web-St anford 2 . 10 × 10 5 500 1419 5 0.83 T able 5: Statistics for TDs of real netw orks. Notation is the same as in T able 4. Net works based on an underlying Euclidean geometry (e.g., ro ad-TX , PowerGrid ) hav e lo w maxim um widths and low median widths, which indicates that they do not hav e a strong core-p eriphery structure. While these netw orks hav e many small width bags, whic h is indicative of tree-like sections of the netw ork, the internal subgraphs hav e a lo w median densit y (e.g., as compared to certain ER netw orks). More so cial net works, such as Polblogs and the F aceb ook net works, all ha ve higher a verage degrees and, consequen tly , higher widths, with low er median widths. This is one indicator of a core-p eriphery structure. As the median widths are higher in these netw orks, as compared to the other real net works (although low er than the ER netw orks), the peripheral structure in these so cial graphs tends to b e denser than in the other netw orks. Also, the median density , while low compared to the other real netw orks, is very high compared to the median density of the densest ER netw orks in T able 4a. Th us, although the p eriphery is more difficult to separate in to small go o d-conductance communit y-lik e clusters than some of the other real netw orks, e.g., CA-GrQc or CA-Astr oPh , it is still formed from more comm unity-lik e pieces than similar ER (or PL) netw orks. Observ e also that the tw o web net works, Gnutella and web-St anford , ha ve high widths, lik e the F aceb o ok net w orks, but low er median widths; and that they hav e higher median densities (esp ecially when compared to ER(4), ER(8) , and ER(16) ). This indicates that these netw orks ha ve a sparser, more tree-lik e p eriphery than other so cial net works, whic h is also consistent with previous results [1]. Imp ortantly , and also consistent with previous results [1], is that the sparsest, most tree-like p eripheries b elong to the collab oration, email, and autonomous systems net works ( CA-GrQc, CA-Astr oPh, Email-Enron, as20000102 ). These netw orks all ha ve lo w median widths, high median densities, and high maxim um widths, indicating that they exhibit the cleanest core-p eriphery structure, also consistent with the up ward-sloping NCPs [1, 2]. 6.1.2 More details on core-p eriphery structure of four representativ e netw orks W e will now lo ok at several represen tative netw orks in greater detail. Let us start by discussing Figures 9, 10, and 11 from Section 5. Figure 9 clearly sho ws that most of the bags in the TDs are small-width bags. In fact, prop ortionally , FB-Lehigh has the largest fraction of large bags, and yet 80% of the bags are b elo w width 200 (in a graph with 5073 no des, where the TD has a 27 maxim um width bag of 2983). Since the bags and edges of TDs form separators in the net work, this indicates that there are many relatively small separators. These are the largest in F aceb o ok net works, where the separators tend to ha ve around 100 nodes, while in most other netw orks man y of the separators hav e around 10 no des. This is consisten t with typical views of core-p eriphery structure, with a few more highly-connected no des in the core and man y less w ell-connected nodes in the p eriphery . That is, in order to separate off most pieces of the p eriphery (where “piece” is defined by the end of branches in the TD), only 10 or few er no des are needed for most of the so cial/information netw orks, while ca. 100 no des are needed for the F aceb o ok netw orks. In Figure 10, we see the av erage edge densit y for bags of a given cardinalit y plotted against the bag cardinality , showing that small-width bags hav e high densities. An imp ortan t distinguishing feature of the three representativ e real netw orks (that are not tied to an underlying Euclidean geometry) is that the curve has a hea vier tail than in the synthetic netw orks. This indicates that separators, up to muc h larger size scales, are less disparate (e.g., are denser or clumpier) than in the synth etic netw orks. In PowerGrid , on the other hand, the underlying Euclidean geometry leads the density to falls off more quickly . It falls off similarly to the sparse ER netw ork, except that the tail of the curve is shorter. In this case, only the smallest bags hav e tight separators. In Figure 11, we consider the relationship b et ween the core structure and lo w eccen tricity (cen tral) bags, and we compare that with the relationship b etw een the p eriphery structure and the high eccentricit y (p erimeter) bags in the TD. Figures 11c and 11d show that for as20000102 and CA-GrQc there is a clear do wn ward trend as the bag eccentricit y is increased. This indicates that low eccentricit y bags contain more high k -core no des on a v erage and that the high eccen tricit y bags contain more lo w k -core no des on av erage. Figure 11g shows that FB-Lehigh , due to its greater density , has a mostly flat profile until the most extreme reaches of eccentricit y are met, at whic h p oin t some of the bags b egin to contain no des of a lo wer k -core. Thus, the core-p eriphery structure is present in FB-Lehigh , but the core-p eriphery structure is mo derated by a v ery large core which pro duces long path-lik e sets of no des that in turn lead to large core bags and hence a m uch larger ecc en tricity . (This is t ypical of the results for most of the F aceb ook netw orks, which is consisten t with their flat NCP [2].) Finally , PowerGrid , whic h is not expected to ha v e exhibit a correlation b etw een k -core structure and bag eccentricit y , has a flat profile. T o illustrate these findings, w e presen t visualizations in Figures 13 and 14. Shown are a cen tral or v ery deep core bag, a p erimeter or very p eripheral bag, and an in termediate bag, for eac h of our four netw orks. These figures show the communit y-like nature of t ypical bags for the three information netw orks, i.e., as20000102 , CA-GrQc , and FB-Lehigh , as well as the more disparate separators of the PowerGrid . The coloring of the visualizations in these figures is by k -core: the red no des are in deep (high) k -cores while the blue no des are in shallo w (lo w) k -cores. (a) as20000102 cen tral bag (b) as20000102 in termediate bag (c) as20000102 p erimeter bag (d) CA-GrQc cen tral bag (e) CA-GrQc in termediate bag (f ) CA-GrQc p erimeter bag Figure 13: as20000102 and CA-GrQc amd bag subgraphs, colored by k -core num b er, with red indicating deep/high k -cores and blue indicating shallow/lo w k -cores. One final observ ation w e w ould like to mak e is to address the “dips” in the av erage k -core 28 (a) FB-Lehigh cen tral bag (b) FB-Lehigh in termediate bag (c) FB-Lehigh p erimeter bag (d) PowerGrid cen tral bag (e) PowerGrid in termediate bag (f ) PowerGrid p erimeter bag Figure 14: FB-Lehigh and PowerGrid amd bag subgraphs, colored by k -core n umber, with red indicating deep/high k -cores and blue indicating shallow/lo w k -cores. curv es shown in Figure 11 (e.g., the dip in Figure 11d at a bag eccen tricity of 21 or in Figure 11g throughout). These dips are due to what w e will call “twigs,” where a twig is a small (low width and short) branch off of a muc h larger (high width and long) trunk-like structure of the TD. F or example, in FB-Lehigh and in the other F aceb o ok netw orks, the high av erage degree results not only in high widths, but in larger collections of bags of high width. These are arranged in a long path (a “trunk”) with many branc hes at either end. Along this main trunk, there are o ccasional t wigs whic h contain p eripheral no des. Since the trunk is long, the av erage k -core at the p oint where the twig is attached is slightly lo wer, resulting in the dip in the curve. In Figure 15, w e pro vide a visualization of the twigs resp onsible for three of these dips. 6.1.3 Summary of large-scale core-p eriphery and tree-lik e structure in real-w orld net w orks These empirical observ ations suggest that many realistic so cial/information net w orks hav e a non- trivial core-p eriphery structure; and that in man y cases this is caused by many small ov erlapping cluster-lik e or moderately clique-like structures. That is, there is lo cal non-tree-like (com bina- torial) structure that “fits together” into a global core-p eriphery structure that is tree-like (in a metric and/or cut sense) when view ed from large size scales. This is in sharp con trast with man y mo dels and in tuitions. Most ob viously , this is in contrast with the random netw orks (in particular, the not extremely sparse ER net works and to a lesser exten t the PL net works, but man y other more p opular random generative mo dels) which ha ve a lo cally tree-lik e, but globally lo op y structure. Less ob viously , this is also in sharp con trast with net works suc h as PowerGrid , PlanarGrid , and ro ad-TX that are strongly tied to an underlying Euclidean geometry . Said another wa y , many realistic so cial/information netw orks hav e a more tightly-connected core-lik e structure than is present in typical random netw orks, and they hav e p eripheral and in termediate regions that are “clumpier” than these random netw orks. While these claims are p erhaps intu- itiv e, our empirical observ ations demonstrate that they can b e meaningfully identified with TDs and interpreted as leading to large-scale cut-based tree-like structure. In terestingly , aside from the lo cal clumpiness, the real-world so cial/information netw orks do ha ve a core-p eriphery structure that is reminiscent of that which is also seen in extremely sparse ER graphs and PL graphs with greater degree heterogeneity . (This to o is consistent with prior results suggesting that extreme sparsit y coupled with randomness/noise is responsible for the dips in the NCP [1, 2].) It is also worth emphasizing that in most of the intermediate bags of the real so cial/information netw orks, there are still a small num b er of disconnected no des. This indicates that there are still a small n um b er of alternate paths, whic h are disparate from the clusters, to the rest of the no des in the netw ork. The most prominen t exceptions to these general observ ations are net works that either do not hav e a strong core-p eriphery structure, e.g., PowerGrid that is 29 tied to a tw o-dimensional underlying Euclidean geometry , or net works that ha ve a relatively low clustering co efficient, e.g., Gnutella09 . In b oth of these cases (but for different reasons), the in ternal subgraphs of the intermediate and p eripheral bags hav e a larger num b er of disconnected no des than the other realistic netw orks. (a) CA-GrQc t wigs on central bag. (b) FB-Lehigh small t wig on cen tral trunk. (c) FB-Lehigh large branc hing t wig on central trunk. Figure 15: Twigs on FB-Lehigh and CA-GrQc . Bags are colored b y the densit y , blue indicating lo w densit y and red indicating high densit y . A small t wig and a larger, branc hing t wig on the FB-Lehigh trunk are shown. These twigs, combined with the long, path-like trunk, cause dips in the k -core eccen tricity plot. In CA-GrQc , the concen tration of the t wigs on one or t wo cen tral bags causes only a single, large dip as compared to the multiple dips in FB-Lehigh . Synthetic net works in Figure 11, ( PL(2.5), PL(3.0), and ER(1.6) ) also hav e twigs similar to CA-GrQc . 6.2 Connections with go o d-conductance comm unities results Here, we will consider how the p eripheral part of the tree-lik e core-p eriphery structure identified b y TDs relates to low-conductance clusters/communities that were previously-identified by the NCP metho d [1, 2]. T o do so, observe that one wa y to determine whether a TD “captures” clustering/comm unity structure is to see if those clusters/communities are w ell-lo calized in the TD. By “well-localized,” we mean here that the cluster/communit y is contained in a relativ ely small n umber of (contiguous) bags. W e follow ed previous p ersonalized page rank (PPR) lo cal sp ectral pro cedures [101] to generate a set of candidate clusters [1, 2]. Then, given a set of candidate clusters, we lo ok ed at how man y bags in the TD contain at least one no de from this cluster, i.e., we measured ho w w ell-lo calized the communit y is in the TD. As a crude threshold of whether a cluster/communit y is lo calized, we consider it to b e lo calized if it is con tained in few er bags than there are no des in the communit y . 14 W e apply this metho d using the amd heuristic. 14 T o understand this threshold, consider the following example: if a communit y of size n is a tree, e.g., whiskers in ER(1.6) , then it will b e contained in n bags in the (ideal) TD; if the communit y is a “clique whisker,” i.e., a clique connected to the rest of the netw ork by only one edge, it will b e contained in just one or tw o bags; and if the comm unity con tains deep core no des which are connected to many nodes outside of the communit y , the communit y will be spread across many bags in the netw ork. Other measures of TD lo calit y sho wed similar results. 30 Our results for sev eral real-w orld and synthetic netw orks are presen ted in Figures 16 – 20. F or eac h figure/subfigure, the horizontal axis represen ts communit y size in num b er of no des (on log scale), and the v ertical axis is either the conductance of the best communit y found using the PPR metho d (recall that a low conductance represen ts a b etter comm unity) or the num b er of num b er of bags that contain mem b ers of the communit y (again, on log scale). In the bag plots, the red line represen ts the n um b er of bags whic h contain a no de fr om the c ommunity in the c orr esp onding NCP plot , and the green dashed line represen ts the lo cality threshold. When the num b er of bags for a giv en comm unit y is localized b y our definition, the red plot will b e b elow the green threshold. 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (a) ER(1.6) NCP plot 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (b) ER(32) NCP plot 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (c) ER(1.6) bag lo calization 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (d) ER(32) bag lo calization Figure 16: ER(1.6 and ER(32) NCP plots and tree localization plots. The lo calization threshold is plotted in green. 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (a) CA-GrQc NCP plot 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (b) FB-Lehigh NCP plot 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (c) CA-GrQc bag lo calization 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (d) FB-Lehigh bag lo calization Figure 17: CA-GrQc and FB-Lehigh NCP plots and tree lo calization plots. The lo calization threshold is plotted in green. 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (a) as20000102 NCP plot 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (b) Gnutella09 NCP plot 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (c) as20000102 bag lo calization 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (d) Gnutella09 bag lo calization Figure 18: as20000102 and Gnutella09 NCP plots and tree lo calization plots. The localization threshold is plotted in green. 31 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (a) Email-Enron NCP plot 0.001 0.01 0.1 1 1 10 100 1000 Conductance Community Size Best community (b) Polblogs NCP plot 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (c) Email-Enron bag lo calization 1 10 100 1000 1 10 100 1000 Number of bags Community Size Best community Threshold (d) Polblogs bag lo calization Figure 19: Email-Enr on and Polblogs NCP plots and tree localization plots. The lo calization threshold is plotted in green. 0.001 0.01 0.1 1 1 10 100 1000 Conductance Community Size Best community (a) Planar NCP plot 0.001 0.01 0.1 1 1 10 100 1000 10000 Conductance Community Size Best community (b) PowerGrid NCP plot 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (c) Planar bag lo calization 1 10 100 1000 10000 1 10 100 1000 10000 Number of bags Community Size Best community Threshold (d) PowerGrid bag lo calization Figure 20: Planar and PowerGrid NCP plots and tree lo calization plots. The localization threshold is plotted in green. As a reference, consider the extremely sparse and somewhat denser ER netw orks, which are sho wn in Figure 16. Since it is so sparse, ER(1.6) do es hav e some very small go o d-conductance clusters. As shown in the figure, how ever, the small “communities” are contained in roughly the same n umber of bags as there are in the comm unity . This is exp ected, as these communities are largely p eripheral tree-lik e whiskers in the net work (Section 5). F or larger communities, which include core no des, the localization is slightly ab o ve the line defining our threshold. On the other hand, for the denser ER(32) , there are no go o d-conductance clusters at any size, and bag lo calization is ab ov e the line defining the lo calization threshold, indicating that the lo calization is p o or at all size scales. F or the small and in termediate-sized clusters in man y of the real netw orks (including man y of those from [1]), the smaller go o d-conductance clusters found using the PPR metho d are reasonably w ell-lo calized within the TD, while the larger po orer-conductance clusters are not. Consider, e.g., CA-GrQc in Figure 17 as an example. On the other hand, b oth large and small clusters found with the PPR metho d applied to the denser graphs from the F acebook100 set (i.e., those that do not hav e ev en small-cardinality goo d-conductance clusters [2]) are not well-localized in the TD. Consider, e.g., FB-Lehigh in Figure 17 as an example. Figure 18 shows as20000102 and Gnutella09 , which also shows NCP plots that do not yield small go o d conductance clusters, and whic h shows that the outputs of the PPR metho d are not particularly well-localized in the TD. Figure 19 sho ws that Email-Enron do es hav e some of its small go od-conductance clusters w ell-lo calized, and it also sho ws that the output of the PPR algorithm applied to Polblogs leads to medium-to-large clusters with p o or conductance v alues that are p oorly-lo calized in the TD. Finally , although net works with an underlying Euclidean geometry are of less interest for so cial/information netw ork applications, for completeness it is w orth considering ho w these TD 32 metho ds apply to them. Figure 20 presents results for Planar and Po werGrid . Both of these netw orks ha ve down w ard-sloping NCP plots which are differen t from the other so cial and information net works, reflecting the Euclidean geometry underlying these net w orks. In b oth cases, fairly uninteresting results are obtained, suggesting that the lo calization metric we prop ose is more interesting for realistic so cial graphs with non-trivial tree-lik e core-p eriphery structure. Although our results demonstrate that go o d-conductance clusters/comm unities in several re- alistic so cial graphs are well-localized in TDs found with existing heuristics, it is not ob vious ho w to address the reverse question of finding go o d-conductance communities from a TD. One could attempt to lo ok at all or some large n umber of com binations of bags in the TD. Since one is usually interested in w ell-connected comm unities/clusters, the running intersection prop ert y of TDs could b e used to restrict attention to connected subsets of a TD. There are, how ever, tw o ob vious issues. First, there do es not exist an obvious analogue of the “sw eep cut” used in the sp ectral partitioning metho d for finding the b est comm unity from a TD. Second, as a related practical matter, the presence of high degree (or deep core) no des in the intermediate and central bags of a TD cause bags to b e p oor conductance communities. These no des hav e many connec- tions and increase the “surface area” of most cuts, ev en if there is only a small n umber of them in a cluster. W e observ ed that, in the clusters we found using the PPR metho d, each cluster is typically w ell-represented b y a set of small bags plus a couple of no des in the larger bags. If w e then attempt to form clusters by combining bags, we get al l of the no des in the larger bags, including deep core no des. Additional metho ds of filtering no des for the larger bags, such as ordering by no de degree or k -core combined with a sweep cut, may impro ve these results. 6.3 Results on identifying ground-truth comm unities Here, we will consider other wa ys in which the output of TDs can b e useful in identifying clus- ters/comm unities of interest to the domain analyst. In particular, w e describ e tw o examples from the demographic data asso ciated with the F aceb o ok100 dataset [93]. Consider, first, Figure 21a, where we show the amd TD of the FB-Ha verf ord netw ork, and where eac h bag is colored-co ded by the a verage graduation year of the constituent no des. There is a large linear or trunk-like structure that dominates the large-scale structure of the TD. W e observ e that there is a strong ov erlap b et ween the no des that comprise successive bags in that trunk, and w e note that this trunk-lik e structure is t ypical of most of the F acebook100 net works (but is not seen in most other so cial graphs we ha ve considered). Also, each end of the long trunk correlates strongly with graduation year, and there is a gradual change in the av erage graduation y ear of each bag as w e mov e across the trunk. Thus, to the extent that one accepts graduation y ear as some sort of easily-quan tifiable “ground truth” communit y , the large bags in the TD of this netw ork seem to b e capturing a legitimate ground-truth structure in the netw ork. This fits w ell with prior results that rep ort that in most of the F aceb ook netw orks graduation y ear is b est predictor of the existence of edges b etw een tw o no des [93]. Consider, next, Figure 21b. It is kno wn that for a small num b er of the F a cebook100 net w orks (e.g., FB-Cal tech , FB-Rice , and FB-UCSC ), residence hall rather than class year is the b est edge predictor [93]. Th us, we considered the amd TD of (the studen ts-only subset of ) FB- Cal tech . In this case, a single simple trunk-like structure is not dominant, but there are sev eral relativ ely large p eripheral branc hes, and many of the p eripheral branc hes are dominated b y a particular residence hall. In Figure 21b, the bags are colored by the fraction of students in residence hall 170 (chosen arbitrarily). These examples are of particular interest since go o d- conductance clusters do not exist in F a cebook100 netw orks [2]. By lo oking at bags where the concen tration of a particular commu nity node is higher than the incidence of that communit y throughout the TD, we can form a very simple classification rule. In 33 (a) amd TD of FB-Ha verford , colored by graduation y ear (red = freshman, blue = alumni). The long, path- lik e trunk of this (and most other) F aceb ook netw orks is driv en b y the prop ensit y of students to b e friends with studen ts of a similar graduation year. (b) amd TD of (the studen ts-only subset of ) FB- Cal tech , colored by the fraction of students in residence hall 170 (blue = no no des b elong to residence, ..., red = all nodes b elong to residence). Figure 21: amd TD of FB-Ha verford and FB-Cal tech . FB-Ha verford is presen ted, rather than FB-Lehigh (whic h has similar large-scale TD structure), because its smaller eccen tricity (56 rather than 150) makes it easier to visualize. F or FB-Cal tech , this is a graphical representation of data presented in T able 7 for the amd TD and residence hall 170. particular, giv en residence hall X , w e collected all bags whose fraction of no des whic h w ere listed as b elonging to residence X was higher than the fraction of no des b elonging to that hall in the net work (the incidence in the netw ork is given in column F in T ables 6 and 7). W e then k ept the largest con tiguous set of bags and used membership in this set as the classifier. Although this is an ov erly simple classifier, the goal in this section is simply to provide a baseline ab out how the residence com m unities are lo cated in the TD. W e p erformed this pro cedure on the students-only restriction of FB-Cal tech . T ables 6 and 7 provide a summary of the classification results using this metho d on FB-Cal tech net work, with the (anonymized) listed residence hall for that student as the communit y . T able 6 sho ws the fraction of the “ground truth” comm unit y captured by the largest contiguous set of bags describ ed ab o v e. This is analogous to the recall of classifying the communit y using this branch in the TD. T able 7 shows the fraction of the no des in the union of all bags in this largest contiguous set whic h b elong to the communit y . This is analogous to the precision of classifying the communit y using this branch. Since FB-Cal tech is very small, w e can use a muc h larger v ariety of TD as classifiers than is possible for larger net w orks, and w e presen t results for all of these TD classifiers. Although the comm unities do seem to b e w ell-captured b y the TDs, there are also many other no des in the same bags as these comm unities (see T able 7). Although the only bags selected w ere bags where the residence hall in question w as ov er-represen ted, combining these bags actually resulted in a lo wer concen tration of residen ts than w ere present in the net work for some residence halls (see T able 7 where the v alues are low er the F for a given residence hall). This o ccurs since the non-resident no des in each of these bags are differen t, while the resident nodes are largely the same for each bag in the branch. In terms of heuristic p erformance, the mindeg, minfill, and amd seem to hav e similar 34 Hall F mindeg minfill lexm mcs amd metnnd None .134 .270 .257 .270 .324 .284 .297 165 .066 .472 .528 .556 .528 .472 .861 166 .090 .736 .736 .925 .811 .642 .792 167 .134 .642 .566 .453 .585 .491 .566 168 .116 .746 .762 .952 .889 .825 .143 169 .136 .726 .712 .904 .877 .658 .740 170 .090 .725 .725 .739 .783 .855 .362 171 .136 .714 .673 .776 .857 .592 .429 172 .098 .630 .630 .534 .699 .548 .863 T able 6: F raction of eac h FB-Cal tech residence hall captured in the largest con tiguous set of “frequen t” bags. A frequen t bag is a bag where the fraction of students who b elong to the given residence hall is greater than the fraction of students who b elong to that residence in the entire net work. Column F giv es the fraction of students who identified as b eing in the asso ciated hall (i.e., the threshold for b eing a frequent bag for that residence hall). This pro cedure was also p erformed for the no des which did not hav e a residence hall listed for comparison. Hall F mindeg minfill lexm mcs amd metnnd None .134 .065 .064 .068 .071 .071 .100 165 .066 .055 .057 .052 .052 .059 .086 166 .090 .104 .110 .111 .111 .092 .129 167 .134 .102 .097 .092 .087 .088 .092 168 .116 .137 .140 .150 .147 .157 .043 169 .136 .150 .151 .147 .163 .138 .187 170 .090 .134 .139 .145 .140 .171 .103 171 .136 .105 .099 .100 .104 .084 .074 172 .098 .131 .129 .100 .137 .115 .199 T able 7: F raction of the no des contained in the largest contiguous set of frequen t bags for a given residence hall whic h actual ly b elong to the given residence hall. A frequen t bag is a bag where the fraction of students who b elong to the given residence hall is greater than the fraction of students who b elong to that residence in the entire netw ork. Column F gives the fraction of studen ts who iden tified as b eing in the asso ciated hall (i.e., the threshold for b eing a frequen t bag for that residence hall). This pro cedure was also p erformed for the no des which did not hav e a residence hall listed for comparison. p erformance given a residence, although there seems to b e a larger gap b et w een amd than the other heuristics. This is not surprising as these are all greedy heuristics whic h work by reducing fill (or minim um degree, whic h is a proxy for fill) in eac h step. lexm and mcs also seem to b eha v e similarly , and they hav e the b est p erformance in terms of recall (T able 6). metnnd has a differen t profile from the other net works and seems to do the b est in terms of precision (T able 7). These results are comparable to what can b e obtained with other simple classification rules, and they suggest that TDs could b e useful in these types of machine learning applications. Ov erall, these results demonstrate that for these realistic so cial/information netw orks, several t yp es of plausible “ground truth” communities are well-correlated with the large-scale structure iden tified by existing TD heuristics. This striking since these heuristics make lo cal greedy deci- sions ab out ho w to form the TDs, and it suggests that improv ed results could b e obtained in this application by considering TD heuristics designed for graphs with this type of structure. 35 7 More details on tree decomp osition metho ds In this section, we consider the question of whether TDs and their treewidths can b e related to other parameters for tree-like structure, specifically the Gromov δ h yp erb olicit y . 15 It might app ear that there is no relation betw een TDs and δ (since, e.g., treewidth and δ tak e on opp osite extremal v alues on cliques and cycles), but there are in fact structural characterizations for when they align. W e will present here our new theoretical results on relating TDs and δ -hyperb olicit y . Although this result is a relatively-straigh tforward extension of previous w ork [102], and although most of the rest of the pap er can b e understo o d without this result, we include it here for completeness: first, since motiv ating prior w ork in [3] demonstrates an empirical connection b etw een the cut- based tree-like notion from TDs and the metric-based tree-like notion from δ -hyperb olicit y; and second, since our results in Section 6 demonstrate the inadequacy of a na ¨ ıv e optimization of treewidth and the imp ortance of large cycles for realistic so cial graphs. 7.1 T reewidth, T reelength, and Hyp erbolicity W e start with the following definition, which provides another qualit y measure of a TD; this was first introduced by Dourisb oure and Ga voille [68]. See also [103]. Definition 4. L et T = ( { X i } , T = ( I , F )) b e a tr e e de c omp osition of a gr aph G . The length of T is define d to b e max i ∈ I ,x,y ∈ X i d G ( x, y ) , wher e d G ( x, y ) is the shortest p ath distanc e in G . A nalo gously to tr e ewidth, the treelength of G , denote d tl ( G ) , is the minimum length achieve d by any tr e e de c omp osition of G . It is straigh t-forward to see that the treelength is at most the diameter of G . Like with treewidth, finding a tree decomp osition achieving minim um length (and in fact the treele ngth itself ) is NP- hard [69]. Giv en this, one migh t ask whether treelength and treewidth can b e sim ultaneously appro ximated. F or general graphs, Dourisb oure and Gav oille prov ed a negativ e result. Theorem 2. [68] Any algorithm c omputing a tr e e de c omp osition appr oximating the tr e ewidth (or the tr e elength) of an n -vertex gr aph by a factor α or less do es not give an α -appr oximation of the tr e elength (r esp. the tr e ewidth) unless α = Ω( n 1 / 5 ) . The sp ecific examples used b y [68] to prov e their negativ e result are mo difications of the 2- dimensional mesh (i.e., a lattice), whic h—due to long induced cycles—is not δ -h yp erb olic for small v alues of δ . This suggests that the situation migh t b e v ery differen t for “real-w orld” graphs— whic h hav e small diameter and which ha ve non-trivial em b edding properties into lo w-dimensional h yp erb olic spaces. (This is an op en area of researc h more generally .) Chep oi et al. [104] show ed that if tl ( G ) ≤ λ , then G is λ -h yp erb olic, and that a δ -h yp erb olic graph G on n vertices satisfies tl ( G ) ≤ 17 + 12 δ + 8 δ log 2 n . Unfortunately , for man y real netw orks of interest, this is not an impro vemen t on the trivial b ound of diameter as their diameter alone will b e less than O (log 2 n ). W e conjecture that under minimal additional conditions, a δ -hyperb olic graph with diameter D has treelength at most a function of log 2 D , a v ast improv emen t on b oth kno wn b ounds. W e turn to the question of using additional structural prop erties to characterize the in terplay b et w een δ , tw ( G ), and tl ( G ). The following theorem is our main result; this theorem follows from the work of M ¨ uller on atomic TDs [102], and its pro of is in Section 7.2. 15 As we mentioned in Section 2, this is not the main fo cus of our pap er, but there has b een recent theoretical and empirical in terest in this and related questions; see, e.g., [66, 67, 68, 69, 70, 71, 72, 73, 74]. 36 Theorem 3. [105] Say a sub gr aph H of G is ge o desic if d H ( u, v ) = d G ( u, v ) for al l u, v ∈ V ( H ) . L et ν ( G ) b e the length of a longest ge o desic cycle in G . Then δ ( G ) ≤ tl ( G ) ≤ ( tw ( G ) + 1) · ν ( G ) . F urther, this r esult is tight—ther e is a gr aph class G of unb ounde d tr e ewidth and c ontaining arbitr arily long ge o desic cycles such that δ ( G ) = Θ( tw ( G ) · ν ( G )) for every gr aph G ∈ G . In other words, if w e can eliminate long distance-preserving cycles and obstructions to low treewidth (large grid minors), then G will embed well in low-dimensional hyperb olic space. 7.2 Pro of of Theorem 3 Before we can give the pro of of Theorem 3, we need a few additional definitions. First, giv en a ro oted tree T and a no de s ∈ T , define T s to b e the subtree of T with ro ot s : T s := T [ { t ∈ T | s is an ancestor of t } ] . F or a graph G = ( V , E ) with tree decomp osition ( { X i } , T ) where T is ro oted arbitrarily , for s ∈ T define G s := G [ S t ∈ T s X t ] to b e the graph induced by those bags that are equal to or b elo w X s in the decomp osition. W e will write N ( S ) for the neigh b ors of a set S – more precisely , N ( S ) = { u ∈ V | ( u, s ) ∈ E for some s ∈ S } \ S . Finally , for notational conv enience, for x ∈ V and e ∈ E , w e will write G − x for the graph ( V \ { x } , E ) and G − e for the graph ( V , E \ { e } ). W e no w define a sp ecial type of tree decomp osition (so-called atomic tr e e de c omp ositions ), and giv e a crucial prop ert y of all vertices that co-o ccur in one of its bags. Definition 5. [atomic tr e e de c omp osition, as in [106]] L et G b e a gr aph on n vertic es. The fatness of a tr e e de c omp osition of G is the n -tuple ( a 0 , . . . , a n ) , wher e a h denotes the numb er of b ags that have exactly n − h vertic es. A tr e e de c omp osition of lexic o gr aphic al ly minimal fatness is c al le d an atomic tr e e de c omp osition. Prop osition 1. [Lemma 3.9 in M ¨ uller [102]] L et ( { X i } , T ) b e an atomic tr e e de c omp osition of a c onne cte d gr aph G = ( V , E ) . Then for any two distinct vertic es x, y that o c cur to gether in some b ag X t , either ( x, y ) ∈ E or ther e exists a neighb or s of t in T such that { x, y } ⊆ V s ∩ V t . W e also need the following prop osition, whic h follo ws from Lemmas 3.7 and 3.8 in M ¨ uller [102]. Prop osition 2. L et ( { X i } , T ) b e an atomic tr e e de c omp osition of a c onne cte d gr aph G , e = ( s, t ) ∈ E ( T ) b e any e dge and let T t b e the c onne cte d c omp onent of T − e r o ote d at t , and set X = X s ∩ X t . Then ther e exists a c onne cte d c omp onent C t in G t \ X such that N ( C t ) = X and X t ⊆ C t ∪ X . Finally , we are ready to giv e a b ound on treelength in terms of a graph’s treewidth and its longest geo desic cycle. Our pro of relies heavily on to ols from [102]. Theorem 4. [105] F or any gr aph G = ( V , E ) it holds that tl ( G ) ≤ ν ( G ) · ( tw ( G ) + 1) wher e ν ( G ) is the length of the longest ge o desic cycle in G . Pr o of. W e will prov e a stronger statemen t, namely that any atomic tr e e de c omp osition of a tw o- connected graph has treelength at most ν ( G ) · ( tw ( G ) + 1). Let us first show how this pro ves the lemma for graphs that are not tw o-connected. Assume G is not tw o-connected and x ∈ V is a cut vertex ( G − x has at least tw o connected comp onen ts). Let H 1 , . . . , H ` b e the connected comp onen ts of G − x . If w e prov e that the graphs 37 G [ H i ∪ { x } ], 1 ≤ i ≤ ` ha ve tree decomp ositions T i with treelength b ounded as in the statement of the theorem, then w e can easily construct a tree decomp osition for G with the same prop ert y: w e simply introduce a single new bag V x = { x } and connect it to an arbitrary bag containing x in each of the individual tree decomp ositions T i (since these graphs all contain the vertex x such a bag m ust exist). Note that the treelength of this decomp osition is simply max 1 ≤ i ≤ ` tl ( T i ) since the bag V x w e added con tains only the vertex x and thus cannot increase the treelength. Since w e will show the statement for t wo-connected graphs in the following, we recursively decomp ose the graph G ov er cut vertices un til the remaining connected comp onen ts are all t w o-connected and then construct a tree decomp osition of G as describ ed ab o ve. W e may no w assume G is t wo-connected. Giv en an atomic tree decomp osition ( { X i } , T ) of G , w e show that for every t wo vertices x, y that o ccur in a common bag X := X t , x and y are connected by a path whose length dep ends only on | X | and ν ( G ). T o this end, let C X b e the collection of geo desic cycles in G that ha ve at least one vertex in X . W e first show that if G [ C X ] is connected and X ⊆ V ( C X ), then every pair of vertices in X is connected b y a path of length at most | X | · ν ( G [ C X ]). Consider x, y ∈ X . Start a breadth-first searc h (bfs) from x that stops as so on as it reac hes y . Let L 1 , L 2 , · · · L p b e the lay ers of the bfs-tree where L 1 = { x } is the starting lay er. W e claim that for all L i with L i ∩ X 6 = ∅ , there is a j such that i < j ≤ i + ν ( G [ C X ]) and L j ∩ X 6 = ∅ . Consider suc h an L i , and denote by X l ⊆ X those v ertices of X that are con tained in S i k =1 L k . Denote by X r = X \ X l those v ertices of X that hav e not b een visited until step i . If there exists a geo desic cycle C in C X with vertices in b oth X l and X r w e are done – the bfs will hav e seen all of C in at most ν ( G [ C X ]) steps (and thus found C ∩ X r ). Otherwise, since C X is connected, there exist tw o geo desic cycles C l , C r ∈ C X with C l ∩ X l 6 = ∅ , C r ∩ X r 6 = ∅ and C r ∩ C l 6 = ∅ . Since the bfs will visit all vertices of C r ∪ C l in at most ( | C r | + | C l | ) / 2 ≤ ν ( G [ C X ]) steps, the claim follows. Therefore the num b er of lay ers p ≤ ν ( G [ C X ]) · | X t | and th us the distance b et ween x and y is b ounded by ν ( G [ C X ]) · | X t | as claimed. Therefore, if we sho w that for every bag X , the set C X of geo desic cycles touching X induces a connected graph G [ V ( C X )], we are done: then every vertex pair x, y ∈ X is indeed connected b y a path of length at most ν ( G [ V ( C X )]) | X | , which (by the definition of treewidth and the fact C X is a family of geo desic cycles) is b ounded by ν ( G )( tw ( G ) + 1). W e first prov e that for any choice of X := X t and any pair of vertices x, y ∈ X , x and y lie on some cycle of G . By Prop osition 1, the v ertices x, y are either connected by an edge (in whic h case we are done: G is t wo-connected, so ev ery edge lies on some cycle) or there exists some no de s ∈ N T ( t ) such that { x, y } ⊆ V s ∩ V t . In the latter case, w e in vok e Prop osition 2: for i ∈ { s, t } w e can find connected comp onen ts H i of G i \ X suc h that N ( H i ) = X and V i ⊆ H i ∪ X . Therefore, there exist tw o x - y -paths: one inside H s and another in H t , hence x and y lie on a cycle. Since the set of geo desic cycles forms a basis for the cycle space of a graph (see Theorem 3.1 of [107]), it follows that for ev ery t ∈ T , G [ V ( C X t )] is connected. The distance b et w een any v ertices in X t is th us b ounded b y ν ( C X t ) · | X t | , implying that tl ( G ) is at most ν ( G ) · ( tw ( G ) + 1), as claimed. Finally , we put all the pieces together and sho w why these b ounds are tight. Pro of of Theorem 3 This follows directly from Theorem 4, Chep oi’s result that hyperb olicit y is at most the treelength [104], and the observ ation that for any non-negative integers n and k , the k -sub division of the n × n planar grid has treelength n ( k + 1), treewidth n , a longest geo desic cycle of length 4( k + 1), and hyperb olicity ( n − 1)( k + 1) − 1. 38 8 Discussion and Conclusion Clearly , there is a need to dev elop TD heuristics that are b etter-suited for the prop erties of realistic informatics graphs. This might in volv e making more sophisticated c hoices than greedily minimizing degree or fill, but it migh t also inv olv e optimizing other parameters such as treelength (whic h has connections with δ -h yp erb olicit y) or minimizing the width of bags that are not cen tral (asso ciated with the deep core). In addition, it w ould b e in teresting to use TDs to help to com bine small lo cal clusters found with other metho ds, e.g., lo cal sp ectral metho ds, in to larger o verlapping clusters, in order to understand b etter what migh t b e termed the “lo cal to global” prop erties of realistic informatics graphs. Since these graphs are not well-described by simple lo w-dimensional structures or simple constant-degree expander-lik e structures, this coupling is particularly counterin tuitiv e, but it is very imp ortan t for applications such as the diffusion of information. Finally , given the connections b et ween TDs and graphical mo dels, it would b e in teresting to understand b etter the implications of our results for impro ved graphical mo deling and/or for improv ed inference on realistic net work data. W e expect that this will be a particularly c hallenging but promising direction for future work on so cial (as well as non-so cial) graphs. Ac kno wledgments. W e w ould lik e to thank F elix Reidl for considerable help in simplifying the pro of of Theorem 3. W e would also lik e to thank Mason Porter for helpful discussions and for pro viding several of the netw orks that we considered as well as Dima Kriouk ov and his collab orators for providing us access to their code for generating netw orks based on their h yp erb olic model. In addition, we w ould lik e to ackno wledge financial supp ort from the Air F orce Office of Scientific Research, the Army Research Office, the Defense Adv anced Researc h Pro jects Agency , the National Consortium for Data Science, and the National Science F oundation. An y opinions, findings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reflect the views of any of the ab o ve funding agencies. References [1] J. Lesko vec, K.J. Lang, A. Dasgupta, and M.W. Mahoney . Communit y structure in large netw orks: Natural cluster sizes and the absence of large well-defined clusters. Internet Mathematics , 6(1):29– 123, 2009. Also av ailable at: [2] L. G. S. Jeub, P . Balachandran, M. A. Porter, P . J. Mucha, and M. W. Mahoney . Think lo cally , act lo cally: Detection of small, medium-sized, and large communities in large netw orks. Physic al R eview E , 91:012821, 2015. [3] A. B. Adcock, B. D. Sulliv an, and M. W. Mahoney . T ree-like structure in large social and information net works. In Pr o c. of the 2013 IEEE ICDM , pages 1–10, 2013. [4] V. Batagelj and M. Zav e rsnik. Generalized cores. T ec hnical rep ort. Preprin t: arXiv:cs.DS/0202039 (2002). [5] V. Batagelj and M. Za versnik. An O ( m ) algorithm for cores decomp osition of netw orks. T echnical rep ort. Preprint: arXiv:cs.DS/0310049 (2003). [6] V. Batagelj and M. Za versnik. F ast algorithms for determining (generalized) core groups in so cial net works. A dvanc es in Data Analysis and Classific ation , 5(2):129–145, 2011. [7] N. Rob ertson and P . D. Seymour. Graph minors. I I. Algorithmic asp ects of tree-width. Journal of A lgorithms , 7(3):309–322, 1986. [8] S. Arnborg and A. Proskurowski. Linear time algorithms for NP-hard problems restricted to partial k-trees. Discr ete Applie d Mathematics , 23(1):11–24, 1989. 39 [9] M. W. Bern, E. L. Lawler, and A. L. W ong. Linear-time computation of optimal subgraphs of decomp osable graphs. Journal of Algorithms , 8(2):216–235, 1987. [10] A. M. C. A. Koster, S. P . M. v an Ho esel, and A. W. J. Kolen. Solving partial constraint satisfaction problems with tree decomp osition. Networks , pages 170–180, 2002. [11] J. Lagergren. Efficient parallel algorithms for graphs of b ounded tree-width. Journal of Algorithms , 20(1):20–44, 1996. [12] I. V. Hicks, A. M. C. A. Koster, and E. Koloto˘ glu. Branch and tree decomp osition techniques for discrete optimization. T utORials in Op er ation R ese ar ch: INFORMS–New Orle ans , 2005. [13] J. Zhao, R. L. Malmberg, and L. Cai. Rapid ab initio RNA folding including pseudoknots via graph tree decomp osition. In Pr o c e e dings of the 6th International Workshop on A lgorithms in Bioinfor- matics , pages 262–273, 2006. [14] J. Zhao, D. Che, and L. Cai. Comparative pathw ay annotation with protein-DNA interaction and op eron information via graph tree decomp osition. In Pacific Symp osium on Bio c omputing , pages 496–507, 2007. [15] C. Liu, Y. Song, B. Y an, Y. Xu, and L. Cai. F ast de no vo p eptide sequencing and sp ectral alignment via tree decomp osition. In Pacific Symp osium on Bio c omputing , pages 255–266, 2006. [16] S. L. Lauritzen and D. J. Spiegelhalter. Lo cal computations with probabilities on graphical structures and their application to exp ert systems (with discussion). Journal of the R oyal Statistic al So ciety series B , 50:157–224, 1988. [17] D. Karger and N. Srebro. Learning Marko v netw orks: maxim um b ounded tree-width graphs. In Pr o c e e dings of the 12th ACM-SIAM Symp osium on Discr ete algorithms , pages 392–401, 2001. [18] H. Chen. Quantified constrain t satisfaction and b ounded treewidth. In Pr o c e e dings of the 16th Eur op e an Confer enc e on Artificial Intel ligenc e , pages 161–165, 2004. [19] H. L. Bo dlaender and R. H. M¨ ohring. The pathwidth and treewidth of cographs. SIAM Journal on Discr ete Mathematics , 6(2):181–188, 1993. [20] C. Chekuri and J. Chuzho y . Polynomial b ounds for the grid-minor theorem. In Pr o c e e dings of the 46th Annual ACM Symp osium on The ory of Computing , pages 60–69, 2014. [21] P .D. Seymour and R. Thomas. Call routing and the ratcatc her. Combinatoric a , 14(2):217–241, 1994. [22] H. L. Bodlaender and A. M. C. A. Koster. T reewidth computations I. Upp er b ounds. Inf. Comput. , 208(3):259–275, 2010. [23] H. L. Bo dlaender. A linear-time algorithm for finding tree-decomp ositions of small treewidth. SIAM Journal on Computing , 25(6):1305–1317, 1996. [24] E. Amir. Appro ximation algorithms for treewidth. Algorithmic a , 56(4):448–479, 2010. [25] H. R¨ ohrig. T ree decomposition: a feasibility study . Master’s thesis, Universit¨ at des Saarlandes, Saarbr ¨ uck en, Germany , 1998. [26] K. Shoikhet and D. Geiger. A practical algorithm for finding optimal triangulations. In Pr o c e e dings of AAAI/IAAI , pages 185–190, 1997. [27] C. Gro¨ er, B. D. Sulliv an, and D. W eerapurage. INDDGO: Integrated netw ork decomposition & dynamic programming for graph optimization. T echnical Rep ort ORNL/TM-2012/176, Oak Ridge National Lab oratory , 2012. [28] B. D. Sulliv an et al. Integrated Net work Decomp ositions and Dynamic programming for Graph Optimization (INDDGO), 2012, 2013. h ttp://github.com/bdsulliv an/inddgo. [29] F. Gavril. The in tersection graphs of subtrees in trees are exactly the chordal graphs. Journal of Combinatorial The ory, Series B , 16(1):47–56, 1974. 40 [30] D. J. Rose and R. E. T arjan. Algorithmic asp ects of vertex elimination. In Pr o c e e dings of the 7th A nnual ACM Symp osium on The ory of Computing , pages 245–254, 1975. [31] A. Berry , J. R. S. Blair, and P . Heggernes. Maximum cardinalit y searc h for computing minimal triangulations. In Pr o c e e ding of the 28th International Workshop on Gr aph-The or etic Conc epts in Computer Scienc e , pages 1–12, 2002. [32] A. Berry , J. R. S. Blair, P . Heggernes, and B. W. Peyton. Maxim um cardinality search for computing minimal triangulations of graphs. A lgorithmic a , 39(4):287–298, 2004. [33] D. Rose, R. T arjan, and G. Lueker. Algorithmic asp ects of v ertex elimination on graphs. SIAM Journal on Computing , 5:266–283, 1976. [34] R. E. T arjan and M. Y annak akis. Simple linear-time algorithms to test chordalit y of graphs, test acyclicit y of hypergraphs, and selectiv ely reduce acyclic h yp ergraphs. SIAM Journal on Computing , 13:566–579, 1984. [35] R. E. T arjan and M. Y annak akis. Addendum: Simple linear-time algorithms to test chordalit y of graphs, test acyclicity of hypergraphs, and selectiv ely reduce acyclic hypergraphs. SIAM Journal on Computing , 14(1):254–255, 1985. [36] A. Beck er and D. Geiger. A sufficiently fast algorithm for finding close to optimal clique trees. A rtificial Intel ligenc e , 125(1–2):3–17, 2001. [37] H. L. Bo dlaender, J. R. Gilb ert, H. Hafsteinsson, and T. Kloks. Approximating treewidth, path width, and minimum elimination tree height. Journal of Algorithms , 18:238–255, 1995. [38] V. Bouchitt ´ e, D. Kratsc h, H. M ¨ uller, and I. T o dinca. On treewidth appro ximations. Discr ete Appl. Math. , 136(2-3):183–196, 2004. [39] B. A. Reed. Finding approximate separators and computing tree width quickly . In Pr o c e e dings of the 24th Annual ACM Symp osium on The ory of Computing , pages 221–228, 1992. [40] J. A. George. Nested dissection of a regular finite elemen t mesh. SIAM Journal of Numeric al A nalysis , 10:345–363, 1973. [41] J. R. Gilb ert and R. E. T arjan. The analysis of a nested dissection algorithm. Numerische Mathe- matik , 50(4):377–404, 1986. [42] G. Karypis and V. Kumar. A fast and high qualit y multilev el scheme for partitioning irregular graphs. SIAM Journal on Scientific Computing , 20:359–392, 1998. [43] H. M. Marko witz. The elimination form of the inv erse and its application to linear programming. Management Scienc e , 3(3):255–269, 1957. [44] P . R. Amestoy , T. A. Davis, and I. S. Duff. Algorithm 837: AMD, an approximate minimum degree ordering algorithm. ACM T r ansactions on Mathematic al Softwar e (TOMS) , 30(3):381–388, 2004. [45] P . R. Amestoy , T. A. Davis, and I. S. Duff. An approximate minimum degree ordering algorithm. SIAM Journal on Matrix Analysis and Applic ations , 17(4):886–905, 1996. [46] D. Koller and N. F riedman. Pr ob abilistic Gr aphic al Mo dels: Principles and T e chniques . MIT Press, 2009. [47] H. L. Bo dlaender. A tourist guide through treewidth. A cta Cyb ernetic a , 11:1–23, 1993. [48] H. L. Bo dlaender. Discov ering treewidth. In Pr o c e e dings of the 31st international c onfer enc e on The ory and Pr actic e of Computer Scienc e , pages 1–16, 2005. [49] H. L. Bo dlaender. T reewidth: Characterizations, applications, and computations. In Pr o c e e ding of the 32nd International Workshop on Gr aph-The or etic Conc epts in Computer Scienc e , pages 1–14, 2006. [50] H. L. Bodlaender and A. M. C. A. Koster. Combinatorial optimization on graphs of b ounded treewidth. The Computer Journal , 51(3):255–269, 2007. 41 [51] J. R. S. Blair and B. Peyton. An introduction to chordal graphs and clique trees. In A. George, J. R. Gilb ert, and J. W. H. Liu, editors, Gr aph The ory and Sp arse Matrix Computation , The IMA V olumes in Mathematics and its Applications, V olume 56, pages 1–29. Springer-V erlag, 1993. [52] E. Amir. Efficient appro ximation for triangulation of minimum treewidth. In Pr o c e e dings of the 17th A nnual Confer enc e on Unc ertainty in Artificial Intel ligenc e , pages 7–15, 2001. [53] A. M. C. A. Koster, H. L. Bo dlaender, and S. P . M. v an Ho esel. T reewidth: Computational experi- men ts. Ele ctr onic Notes in Discr ete Mathematics , 8:54–57, 2001. [54] A. Berry , P . Heggernes, and G. Simonet. The minimum degree heuristic and the minimal triangula- tion pro cess. In H. L. Bo dlaender, editor, Gr aph-The or etic Conc epts in Computer Scienc e , Lecture Notes in Computer Science, pages 58–70. Springer, 2003. [55] P . Heggernes. Minimal triangulations of graphs: A survey . Discr ete Mathematics , 306(3):297–317, 2006. [56] C. W ang, T. Liu, P . Cui, and K. Xu. A note on treewidth in random graphs. In Pr o c e e ding of the 5th International Confer enc e on Combinatorial Optimization and Applic ations , pages 491–499, 2011. [57] Y. Gao. T reewidth of Erd˝ os-R´ enyi random graphs, random intersection graphs, and scale-free random graphs. Discr ete Applie d Mathematics , 160(4–5):566–578, 2012. [58] A. B. Adco c k, B. D. Sulliv an, O. R. Hernandez, and M. W. Mahoney . Ev aluating Op enMP tasking at scale for the computation of graph hyperb olicit y . In Pr o c. of the 9th IWOMP , pages 71–83, 2013. [59] A. B. Adco c k. Char acterizing, identifying, and using tr e e-like structur e in so cial and information networks . PhD thesis, Stanford Univ ersity , 2014. [60] M. Gromov. Hyp erbolic groups. In S. M. Gersten, editor, Essays in Gr oup The ory , Math. Sci. Res. Inst. Publ., 8, pages 75–263. Springer, 1987. [61] J. M. Alonso, T. Brady , D. Co oper, V. F erlini, M. Lustig, M. Mihalik, H. Shapiro, and H. Short. Notes on word hyperb olic groups. In E. Ghys, A. Haefliger, and A. V erjovski, editors, Gr oup The ory fr om a Ge ometric al Viewp oint, ICTP T rieste Italy , pages 3–63. W orld Scien tific, 1991. [62] E. A. Jonckheere, P . Lohso on thorn, and F. Bonahon. Scaled Gromov hyperb olic graphs. Journal of Gr aph The ory , 57(2):157–180, 2008. [63] E. A. Jonckheere, P . Lohso on thorn, and F. Ariaei. Scaled Gromo v four-p oin t condition for netw ork graph curv ature computation. Internet Mathematics , 7(3):137–177, 2011. [64] W. Chen, W. F ang, G. Hu, and M. W. Mahoney . On the h yp erb olicit y of small-w orld and tree-like random graphs. Internet Mathematics , 9(4):434–491, 2013. Also a v ailable at: [65] K. V erb eek and S. Suri. Metric embedding, hyperb olic space, and social netw orks. In Pr o c e e dings of the 30th Annual Symp osium on Computational Ge ometry , pages 501–510, 2014. [66] G. Brinkmann, J. H. Ko olen, and V. Moulton. On the hyperb olicit y of chordal graphs. Annals of Combinatorics , 5(1):61–69, 2001. [67] Y. W u and C. Zhang. Hyperb olicity and chordalit y of a graph. The Ele ctr onic Journal of Combina- torics , 18(1):P43, 2011. [68] Y. Dourisb oure and C. Gav oille. T ree-decomp ositions with bags of small diameter. Discr ete Mathe- matics , 307(16):2008–2029, 2007. [69] D. Loksh tano v. On the complexit y of computing treelength. In Pr o c e e dings of the 32nd international c onfer enc e on Mathematic al F oundations of Computer Scienc e , pages 276–287, 2007. [70] M. Grohe and D. Marx. On tree width, bram ble size, and expansion. Journal of Combinatorial The ory Series B , 99(1):218–228, 2009. [71] A. Koso wski, B. Li, N. Nisse, and K. Suchan. k -c hordal graphs: F rom cops and robb er to compact routing via treewidth. In Pr o c e e dings of the 39th international c ol lo quium c onfer enc e on Automata, L anguages, and Pr o gr amming , pages 610–622, 2012. 42 [72] F. F. Dragan. T ree-lik e structures in graphs: A metric point of view. In Pr o c e e ding of the 39th International Workshop on Gr aph-The or etic Conc epts in Computer Scienc e , pages 1–4, 2013. [73] M. Abu-Ata and F. F. Dragan. Metric tree-like structures in real-life netw orks: an empirical study . Networks , 67(1):49–68, 2016. [74] M. M. Abu-Ata. T r e e-Like Structur e in Gr aphs and Emb e ddability to T r e es . PhD thesis, Kent State Univ ersity , 2014. [75] Y. Shavitt and T. T ankel. Hyperb olic embedding of Internet graph for distance estimation and o verla y construction. IEEE/ACM T r ansactions on Networking , 16(1):25–36, 2008. [76] M. P . Rombac h, M. A. P orter, J. H. F owler, and P . J. Mucha. Core-p eriphery structure in netw orks. SIAM Journal on Applie d Mathematics , 74(1):167–190, 2014. [77] S. B. Seidman. Net work structure and minimum degree. So cial Networks , 5(3):269–287, 1983. [78] J. Ignacio Alv arez-Hamelin, L. Dall’Asta, A. Barrat, and A. V espignani. Large scale netw orks fingerprin ting and visualization using the k-core decomp osition. In A nnual A dvanc es in Neur al Information Pr o c essing Systems 18: Pr o c e e dings of the 2005 Confer enc e , pages 41–50, 2006. [79] J. Ignacio Alv arez-Hamelin, L. Dall’Asta, A. Barrat, and A. V espignani. K-core decomp osition of in ternet graphs: hierarc hies, self-similarity and measurement biases. Networks and Heter o gene ous Me dia , 3(2):371–393, 2008. [80] J. Healy , J. Janssen, E. Milios, and W. Aiello. Characterization of graphs using degree cores. In W A W ’08: Pr o c e e dings of the 6th Workshop on Algorithms and Mo dels for the Web-Gr aph , pages 137–148, 2008. [81] V. Batagelj and A. Mrv ar. Pa jek—analysis and visualization of large netw orks. In Pr o c e e dings of Gr aph Dr awing , pages 477–478, 2001. [82] J. Cheng, Y. Ke, S. Chu, and M. T. Ozsu. Efficient core decomp osition in massiv e net works. In Pr o c e e dings of the 27th IEEE International Confer enc e on Data Engine ering , pages 51–62, 2011. [83] P . Colomer-de Simon, A. Serrano, M. G. Beiro, J. Ignacio Alv arez-Hamelin, and M. Boguna. De- ciphering the global organization of clustering in real complex netw orks. Scientific R ep orts , 3:2517, 2013. [84] M. Kitsak, L. K. Gallos, S. Havlin, F. Liljeros, L. Muc hnik, H. E. Stanley , and H. A. Makse. Iden tification of influen tial spreaders in complex netw orks. Natur e Physics , 6(11):888–893, 2010. [85] J. Ugander, L. Backstrom, C. Marlow, and J. Kleinberg. Structural diversit y in so cial contagion. Pr o c e e dings of the National A c ademy of Scienc es , 109(16):5962–5966, 2012. [86] V. Ramasubramanian, D. Malkhi, F. Kuhn, M. Balakrishnan, A. Gupta, and A. Akella. On the treeness of internet latency and bandwidth. In Pr o c e e dings of the 2009 ACM SIGMETRICS Inter- national Confer enc e on Me asur ement and mo deling of c omputer systems , pages 61–72, 2009. [87] F. de Mon tgolfier, M. Soto, and L. Viennot. T reewidth and hyperb olicit y of the in ternet. In Pr o c e e d- ings of the 10th IEEE International Symp osium on Network Computing and Applic ations (NCA) , pages 25–32, 2011. [88] T. Maehara, T. Akiba, Y. Iwata, and K. Kaw araba yashi. Computing p ersonalized P ageRank quickly b y exploiting graph structures. Pr o c e e dings of the VLDB Endowment , 7:1023–1034, 2014. [89] B. Courcelle and M. Mosbah. Monadic second-order ev aluations on tree-decomposable graphs. The- or etic al Computer Scienc e , 109(12):49–82, 1993. [90] A. G. P ercus, G. Istrate, B. Goncalves, R. Z. Sumi, and S. Bo ettc her. The p eculiar phase structure of random graph bisection. Journal of Mathematic al Physics , 49(12):125219, 2008. [91] F.R.K. Ch ung and L. Lu. Complex Gr aphs and Networks , v olume 107 of CBMS R e gional Confer enc e Series in Mathematics . American Mathematical So ciet y , 2006. [92] Supporting website. http://snap.stanford.edu/data/index.html . 43 [93] A. L. T raud, P . J. Mucha, and M. A. P orter. So cial structure of Faceb o ok netw orks. Physic a A , 391:4165–4180, 2012. [94] L.A. Adamic and N. Glance. The p olitical blogosphere and the 2004 U.S. election: divided they blog. In LinkKDD ’05: Pr o c e e dings of the 3r d International Workshop on Link Disc overy , pages 36–43, 2005. [95] D.J. W atts and S.H. Strogatz. Collective dynamics of small-world netw orks. Natur e , 393:440–442, 1998. [96] E. R. Gansner and S. C. North. An op en graph visualization system and its applications to soft ware engineering. Softwar ePr actic e and Exp erienc e , 30(11):1203–1233, 2000. [97] T. A. Davis and Y. Hu. The Universit y of Florida Sparse Matrix Collection. A CM T r ansactions on Mathematic al Softwar e (TOMS) , 38(1):1:1–1:25, 2011. [98] T. Malisiewicz. Op en source co de: Graph viz matlab magic. https://github.com/quantombone/ graphviz_matlab_magic , May 2010. [99] P . Erd˝ os and A. R´ enyi. On the evolution of random graphs. Publ. Math. Inst. Hungar. A c ad. Sci. , 5:17–61, 1960. [100] B. Bollob´ as. R andom Gr aphs . Academic Press, London, 1985. [101] R. Andersen, F.R.K. Ch ung, and K. Lang. Local graph partitioning using P ageRank v ectors. In F OCS ’06: Pr o c e e dings of the 47th Annual IEEE Symp osium on F oundations of Computer Scienc e , pages 475–486, 2006. [102] R. Diestel andM. M ¨ uller. Connected tree-width. T echnical rep ort. Preprin t: (2012). [103] F. F. Dragan and I. Lomonosov. On compact and efficient routing in certain graph classes. Discr ete Applie d Mathematics , 155(11):1458–1470, 2007. [104] V. Chep oi, F. Dragan, B. Estellon, M. Habib, and Y. V ax` es. Diameters, cen ters, and approximating trees of δ -hyperb olic geo desic spaces and graphs. In Pr o c e e dings of the 24th Annual Symp osium on Computational Ge ometry , pages 59–68, 2008. [105] F. Reidl and B. Sulliv an. P ersonal communication, 2014. [106] P . Bellenbaum and R. Diestel. Tw o short pro ofs concerning tree-decomp ositions. Combinatorics, Pr ob ability, and Computing , 11:541–547, 2002. [107] A. Georgakopoulos and P . Sprussel. Geodesic top ological cycles in lo cally finite graphs. The Ele c- tr onic Journal of Combinatorics , 16(1):R144, 2009. 44

Original Paper

Loading high-quality paper...

Comments & Academic Discussion

Loading comments...

Leave a Comment