Identifying networks with common organizational principles
Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relevant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but comp…
Authors: Anatol E. Wegner, Luis Ospina-Forero, Robert E. Gaunt
Page 1 of 26 Identifying networks with common organizational principles A NATO L E . W E G N E R ∗ , University Colle ge London, Department of Statistical Science, Gower Str eet, London WC1E 6BT , UK University of Oxfor d, Department of Statistics, 24-29 St. Giles’, Oxford, O X1 3LB, UK ∗ Corresponding author: a.wegner@ucl.ac.uk L U I S O S P I NA - F O R E R O University of Oxfor d, Department of Statistics, 24-29 St. Giles’, Oxford, O X1 3LB, UK luis.ospinafor er o@linacre .ox.ac.uk R O B E RT E . G AU N T The University of Manchester , School of Mathematics, Manc hester M13 9PL, UK University of Oxfor d, Department of Statistics, 24-29 St. Giles’, Oxford, O X1 3LB, UK r obert.gaunt@manchester .ac.uk C H A R L OT T E M . D E A N E University of Oxfor d, Department of Statistics, 24-29 St. Giles’, Oxford, O X1 3LB, UK deane@stats.ox.ac.uk A N D G E S I N E R E I N E RT University of Oxfor d, Department of Statistics, 24-29 St. Giles’, Oxford, O X1 3LB, UK r einert@stats.ox.ac.uk Many complex systems can be represented as networks, and the problem of network comparison is becoming increasingly relev ant. There are many techniques for network comparison, from simply comparing network summary statistics to sophisticated but computationally costly alignment-based approaches. Y et it remains challenging to accurately cluster networks that are of a dif ferent size and density , b ut hypothesized to be structurally similar . In this paper , we address this problem by introducing a new network comparison methodology that is aimed at identifying common organizational principles in networks. The methodology is simple, intuitiv e and applicable in a wide variety of settings ranging from the functional classification of proteins to tracking the ev olution of a world trade network. K eywor ds : networks | network comparison | machine learning | earth mover’ s distance | network topology 1. Introduction Many complex systems can be represented as networks, including friendships, the W orld W ide W eb, global trade flows and protein-protein interactions [Newman, 2010]. The study of networks has been a very acti ve area of research in recent years, and in particular, network comparison has become increas- ingly rele vant e.g . [W ilson and Zhu, 2008, Neyshab ur et al., 2013, Ali et al., 2014, Y a veroglu et al., 2014]. Network comparison itself has many wide-ranging applications, for example, comparing protein- 2 of 26 protein interaction networks could lead to increased understanding of underlying biological processes [Ali et al., 2014]. Network comparison can also be used to study the ev olution of networks ov er time and for identifying sudden changes and shocks. Network comparison methods hav e attracted increasing attention in the field of machine learning, where they are mostly referred to as graph kernels, and have numerous applications in personalized medicine e.g . [Borgwardt et al., 2007], computer vision and drug discovery e.g . [W ale et al., 2008]. In the machine learning setting, the problem of interest is to obtain classifiers that can accurately predict the class membership of graphs. Methods for comparing networks range from comparison of summary statistics to sophisticated but computationally expensi ve alignment-based approaches [Neyshab ur et al., 2013, Kuchaiev and Pr ˇ zulj, 2011, Mamano and Hayes, 2017]. Real-world netw orks can be v ery lar ge and are often inhomogeneous, which makes the problem of network comparison challenging, especially when networks differ signifi- cantly in terms of size and density . In this paper , we address this problem by introducing a new netw ork comparison methodology that is aimed at comparing networks according to their common organiza- tional principles. The observation that the degree distribution of many real world networks is highly right skewed and in many cases approximately follows a power law has been very influential in the dev elopment of network science [Barab ´ asi and Albert, 1999]. Consequently , it has become widely accepted that the shape of the degree distribution (for example, binomial vs power law) is indicativ e of the generating mechanism underlying the network. In this paper , we formalize this idea by introducing a measure that captures the shape of distributions. The measure emerges from the requirement that a metric between forms of distributions should be inv ariant under rescalings and translations of the observ ables. Based on this measure, we then introduce a new netw ork comparison methodology , which we call N et E md . Although our methodology is applicable to almost any type of feature that can be associated to nodes or edges of a graph, we focus mainly on distributions of small connected subgraphs, also known as graphlets. Graphlets form the basis of many of the state of the art network comparison methods [Ali et al., 2014, Y a veroglu et al., 2014, Pr ˇ zulj, 2007a] and hence using graphlet based features allows for a comparative assessment of the presented methodology . Moreov er, certain graphlets, called network motifs [Milo et al., 2002], occur much more frequently in many real world networks than is expected on the basis of pure chance. Network motifs are considered to be basic building blocks of networks that contribute to the function of the network by performing modular tasks and have therefore been conjectured to be fav oured by natural selection. This is supported by the observation that network motifs are largely conserv ed within classes of networks [Milo et al., 2004, W egner, 2014]. Our methodology provides an effecti ve tool for comparing networks ev en when networks differ significantly in size and density , which is the case in most applications. The methodology performs well on a wide v ariety of networks ranging from chemical compounds having as few as 10 nodes to tens of thousands of nodes in internet netw orks. The method achie ves state of the art performance e ven when it is based on rather restricted sets of inputs that can be computed efficiently and hence scales fav ourably to netw orks with millions and ev en billions of nodes. The method also behav es well under network sub- sampling as described in Ali et al. [2016]. The methodology further meets the needs of researchers from a v ariety of fields, from the social sciences to the biological and life sciences, by being computationally efficient and simple to implement. W e test the presented methodology in a large number of settings, starting with clustering synthetic and real world networks, where we find that the presented methodology outperforms state of the art graphlet-based network comparison methods in clustering networks of dif ferent sizes and densities. W e then test the more fine grained properties of N e t E md using data sets that represent ev olving networks at 3 of 26 different points in time. Finally , we test whether N et E md can predict functional categories of networks by e xploring machine learning applications and find that classifiers based on N et E md outperform state- of-the art graph classifiers on sev eral benchmark data sets. 2. A measur e f or comparing shapes of distrib utions Here we build on the idea that the information encapsulated in the shape of the degree distribution and other network properties reflects the topological organization of the network. From an abstract point of view we think of the shape of a distrib ution as a property that is inv ariant under linear deformations i.e . translations and re-scalings of the axis. For example, a Gaussian distribution always has its characteristic bell curve shape regardless of its mean and standard deviation. Consequently , we postulate that any metric that aims to capture the similarity of shapes should be inv ariant under linear transformations of its inputs. Based on these ideas we define the following measure between distributions p and q that are sup- ported on R and ha ve non-zero, finite v ariances: E M D ∗ ( p , q ) = inf c ∈ R E M D ˜ p ( · + c ) , ˜ q ( · ) , (2.1) where E M D is the earth mover’ s distance and ˜ p and ˜ q are the distributions obtained by rescaling p and q to hav e variance 1. More precisely , ˜ p is the distribution obtained from p by the transformation x → x σ ( p ) , where σ ( p ) is the standard deviation of p . Intuitively , E M D (also known as the 1st W asserstein metric [Runber et al., 1998] can be thought of as the minimal work, i.e . mass times distance, needed to “transport” the mass of one distribution onto the other . For probability distributions p and q with support in R and bounded absolute first moment, the E M D between p and q is given by E M D ( p , q ) = R ∞ − ∞ | F ( x ) − G ( x ) | d x , where F and G are the cumulativ e distribution functions of p and q respectively . In principle, E M D in Equation (2.1) can be replaced by almost any other probability metric d to obtain a corresponding metric d ∗ . Here we choose E M D because it is well suited to comparing shapes, as shown by its many applications in the area of pattern recognition and image retrie val [Runber et al., 1998]. Moreover , we found that E M D produces superior results to classical L 1 and Kolmogoro v dis- tances, especially for highly irregular distributions that one frequently encounters in real world networks. For two networks G and G 0 and given network feature t , we define the corresponding N et E md t measure by: N et E md t ( G , G 0 ) = E M D ∗ ( p t ( G ) , p t ( G 0 )) , (2.2) where p t ( G ) and p t ( G 0 ) are the distributions of t on G and G 0 respectiv ely . N et E md t can be shown to be a pseudometric between graphs for any feature t (see Sec . B), that is it is non-negati ve, symmetric and satisfies the triangle inequality . Figure 1 giv es examples where t is taken to be the degree distribution, and p t ( G ) is the de gree distribution of G . Measures that are based on the comparison of multiple features can be expected to be more effecti ve at identifying structural differences between networks than measures that are based on a single feature t , because for two networks to be considered similar they must show similarity across multiple fea- tures. Hence, for a giv en set T = { t 1 , t 2 , .. ., t m } of network features, we define the N et E md measure corresponding to T simply as: N et E md T ( G , G 0 ) = 1 m m ∑ j = 1 N et E md t j ( G , G 0 ) . (2.3) 4 of 26 10 0 10 20 30 40 50 60 70 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 4.0 4.5 P Emd*=0.212 BA N=5,000 k=100 BA N=50,000 k=10 (a) 4 3 2 1 0 1 2 3 4 5 x 0.00 0.05 0.10 0.15 0.20 0.25 0.30 0.35 0.40 0.45 P Emd*=0.039 ER N=5,000 k=100 ER N=50,000 k=10 (b) 4 2 0 2 4 6 8 10 x 0.0 0.5 1.0 1.5 2.0 2.5 3.0 3.5 P Emd*=0.49 BA N=5,000 k=100 ER N=5,000 k=100 (c) F I G . 1: Plots of rescaled and translated degree distributions for Barabasi-Albert (BA) and Erd ˝ os-R ´ enyi (ER) models with N nodes and average degree k : a) B A N = 5 , 000, k = 100 vs BA N = 50 , 000, k = 10. b) ER N = 5 , 000, k = 100 vs ER N = 50 , 000, k = 10. c) B A N = 5 , 000, k = 100 vs ER N = 5 , 000, k = 100. The E M D ∗ distances between the degree distribution of two BA or ER models with quite different values of N and k are smaller than the E M D ∗ distance between the degree distrib ution of a B A and ER model when the number of nodes and av erage degree are equal. 5 of 26 Although N et E md can in principle be based on any set T of network features to which one can associate distributions, we initially consider only features that are based on distributions of small con- nected subgraphs, also known as graphlets. Graphlets form the basis of many state of the art network comparison methods and hence allow for a comparati ve assessment of the proposed methodology . First, we consider graphlet degree distributions ( GDD s) [Pr ˇ zulj, 2007b] as our set of features. For a giv en graphlet m , the graphlet degree of a node is the number of graphlet- m induced subgraphs that are attached to the node. One can distinguish between the dif ferent positions the node can hav e in m , which correspond to the automorphism orbits of m , see Figure 2. For graphlets up to size 5 there are 73 such orbits. W e initially take the set of 73 GDD s corresponding to graphlets up to size 5 to be the default set of inputs, for which we denote the metric as N et E md G 5 . Later we also explore alternativ e definitions of subgraph distributions based on ego networks, as well as the effect of varying the size of subgraphs considered in the input. Finally , we consider the eigen value spectra of the graph Laplacian and the normalized graph Laplacian as inputs. F I G . 2: Graphlets on two to four nodes. The different shades in each graphlet represent different auto- morphism orbits, numbered from 0 to 14. 3. Results In order to giv e a comparative assessment of N et E md , we consider to other graphlet based network comparison methods, namely GDD A [Pr ˇ zulj, 2007b], G C D [Y av eroglu et al., 2014] and Netdis [Ali et al., 2014]. These represent the most effecti ve alignment-free network comparison methodologies in the existing literature. While GDD A directly compares distributions of graphlets up to size 5 in a pairwise fashion, G C D is based on comparing rank correlations between graphlet degrees. Here we consider both default settings of GCD [Y averoglu et al., 2014], namely G CD 11, which is based on a non-redundant subset of 11 graphlets up to size 4, and G C D 73 which uses all graphlets up to size 5. N et d is differs from GDD A and G C D in that it is based on subgraph counts in ego-networks of nodes. Another important distinction is that N et d is first centers these raw counts by comparing them to the counts that could be expected under a particular null model before computing the final statistics. In our analysis, we consider two null models: an Erd ¨ os-R ´ enyi random graph and a duplication diver gence [V ´ azquez et al., 2002] graph which has a scale-free degree distribution as well as a high clustering coefficient. W e denote these two v ariants as N et d is E R and N et d is SF respectiv ely . 3.1 Clustering synthetic and r eal world networks W e start with the classical setting of network comparison where the task is to identify groups of struc- turally similar networks. The main challenge in this setting is to identify structurally similar networks ev en though they might dif fer substantially in terms of size and density . Giv en a set S = { G 1 , G 2 , ..., G n } of networks consisting of disjoint classes C = { c 1 , c 2 , ..., c m } one would like a network comparison measure d to position networks from the same class closer to each other when compared to networks from other classes. Given a network G , this can be measured in terms 6 of 26 of the empirical probability P ( G ) that d ( G , G 1 ) < d ( G , G 2 ) where G 1 is a randomly selected network from the same class as G (excluding itself) and G 2 is a randomly selected network from outside the class of G and d is the network comparison statistic. Consequently , the performance over the whole data set is measured in terms of the quantity P = 1 | S | ∑ G ∈ S P ( G ) . It can be shown that P is equiv alent to the average area under the receiver operator characteristic curve of a classifier that for a given network G classifies the k nearest neighbours of G with respect to d as being similar to G . Hence, a measure that positions networks randomly has an expected P of 0.5 whereas P = 1 corresponds to perfect separation between classes. Other measures are discussed in the Appendix. Conclusions reached in this paper hold regardless of which performance measure one uses. W e first test N et E md on synthetic networks corresponding to realizations of eight random graph models, namely the Erd ˝ os-R ´ enyi random graphs [Erd ˝ os and R ´ enyi, 1960], the Barabasi Albert preferen- tial attachment model [Barab ´ asi and Albert, 1999], two duplication div ergence models [V ´ azquez et al., 2002, Ispolatov et al., 2005], the geometric gene duplication model [Higham et al., 2008], 3D geometric random graphs [Penrose, 2003], the configuration model [Molloy and Reed, 1995], and W atts-Strogatz small world networks [W atts and Strogatz, 1998] (see Sec . F .1 in the Appendix for details). For synthetic networks we consider three experimental settings of increasing dif ficulty , starting with the task of clustering networks that hav e same size N and av erage degree k according to generating mechanism - a task that is relev ant in a model selection setting. For this we generate 16 data sets, which collectively we call RG 1 , corresponding to combinations of N ∈ { 1250 , 2500 , 5000 , 10000 } and k ∈ { 10 , 20 , 40 , 80 } , each containing 10 realizations per model, i.e. 80 networks. This is an easier problem than clustering networks of different sizes and densities, and in this setting we find that the P scores (see T able 3c) of top performing measures tend to be within one standard de viation of each other . W e find that N et E md G 5 and G CD 73 achieve the highest scores, follo wed by G C D 11 and N e t d is SF . Having established that N et E md is able to differentiate networks according to generating mech- anism, we move on to the task of clustering networks of different sizes and densities. For this we generate two data sets: RG 2 in which the size N and average degree k are increased independently in linear steps to twice their initial v alue ( N ∈ { 2000 , 3000 , 4000 } and k ∈ { 20 , 24 , 28 , 32 , 36 , 40 } ) and RG 3 in which the size and average degree are increased independently in multiples of 2 to 8 times their initial value ( N ∈ { 1250 , 2500 , 5000 , 10000 } and k ∈ { 10 , 20 , 40 , 80 } ). In RG 3 , the number of nodes and a ver- age degrees of the networks both v ary by one order of magnitude, and therefore clustering according to model type is challenging. Both RG 2 and RG 3 contain 10 realizations per model parameter i.e. contain 3 × 6 × 8 × 10 = 1440 and 4 × 4 × 8 × 10 = 1280 networks, respectiv ely . Finally , we consider a data set consisting of networks from 10 different classes of real world networks (R WN) as well as a data set from [Ali et al., 2014] that consists of real world and synthetic networks from the larger collection compiled by Onnela et al . [Onnela et al., 2012]. W e find that N et E md G 5 outperforms all of the other three methods at clustering netw orks of different sizes and densities on all data sets. The dif ference can also be seen in the heatmaps of N et E md G 5 and G CD 73, the second best performing method for RG 2 , giv en in Figures 3a and 3b. While the heatmap of N et E md G 5 shows eight clearly identifiable blocks on the diagonal corresponding to different gen- erativ e models, the heatmap of G C D 73 shows signs of off-diagonal mixing. The difference in perfor- mance becomes ev en more pronounced on more challenging data sets, i.e . on RG 3 (see Fig . A.6 in the Appendix) and the Onnela et al . data set. 7 of 26 (a) Heatmap of N et E md G 5 for RG 2 . (b) Heatmap of G C D 73 for RG 2 . Dataset N e t E md G 5 N e t d is E R N e t d is SF G C D 11 G C D 73 GDDA Synthetic Networks RG 1 0.997 ± 0.003 0.981 ± 0.013 0.986 ± 0.011 0.992 ± 0.012 0.996 ± 0.005 0.952 ± 0.056 RG 2 0.988 0.897 0.919 0.976 0.976 0.956 RG 3 0.925 0.790 0.800 0.872 0.861 0.812 R WN 0.942 0.898 0.866 0.898 0.906 0.745 Onnela et al. 0.890 0.832 0.809 0.789 0.819 0.783 (c) P values for dif ferent network measures on data sets of synthetic and real world networks. F I G . 3: a) and b) show the heatmaps of pairwise distances on RG 2 ( N ∈ { 2000 , 3000 , 4000 } and k ∈ { 20 , 24 , 28 , 32 , 36 , 40 } ) according to N et E md G 5 and G C D 73, respectiv ely . In the heat map, networks are ordered from top to bottom in the following order: model, average degree and node count. The heatmap of N et E md shows eight clearly identifiable blocks on the diagonal corresponding to different generativ e models while the heatmap of G C D 73 shows signs of off-diagonal mixing. c) P values for various comparison measures for data sets of synthetic and real world netw orks. For RG 1 we calculated the value of P for each of the 16 sub-data sets. The table shows the av erage and standard deviation of the P values obtained o ver these 16 sub-data sets. 3.2 T ime or der ed networks A network comparison measure should ideally not only be able to identify groups of similar networks but should also be able to capture structural similarity at a finer local scale. T o study the behavior of N et E md at a more local lev el, we consider data sets that represent a system measured at dif ferent points in time. Since such networks can be assumed to ev olve gradually over time they offer an ideal setting for testing the local properties of network comparison methodologies. W e consider two data sets, named AS-caida and AS-733 [Lesko vec et al., 2005], that represent the topology of the internet at the lev el of autonomous systems and a third data set that consists of bilateral trade flows between countries for the years 1962–2014 [Feenstra et al., 2005, Division, 2015]. Both edges and nodes are added and deleted over time in all three data sets. As was noted in [Leskov ec et al., 2005] the time ranking in ev olving networks is reflected to a certain degree in simple summary statistics. Hence, recovering the time ranking of evolving networks should be regarded as a test of 8 of 26 consistency rather than an e valuation of performance. In order to minimize the dependence of our results on the algorithm that is used to rank networks, we consider four different ways of ranking networks based on their pairwise distances as follows. W e assume that either the first or last network in the time series is given. Rankings are then constructed in a step-wise fashion. At each step one either adds the network that is closest to the last added network (Algorithm 1), or adds the network that has smallest average distance to all the networks in the ranking constructed so far (Algorithm 2). The performance of a measure in ranking networks is then measured in terms of Kendall’ s rank correlation coef ficient τ between the true time ranking and the best ranking obtained by any of the 4 methods. W e find that N et E md G 5 successfully recovers the time ordering for (a) Heatmap of N et E md G 5 for AS- 733 (b) Heatmap of N et E md G 5 for AS- caida (c) Heatmap of N et E md G 5 for world trade networks. Dataset N et E md G 5 N et d is E R N et d is SF G CD 11 G CD 73 GDD A AS-733 0.874 0.867 0.933 0.763 0.770 0.740 AS-caida 0.890 0.844 0.849 0.897 0.878 0.870 W orld Trade 0.821 0.666 0.388 0.380 0.567 0.649 (d) Kendall’ s τ between the true time ranking and rankings inferred from network comparison methodologies. F I G . 4: (a), (b) & (c) Heatmaps of N et E md G 5 for networks representing the internet at the lev el of autonomous systems networks and world trade networks. The date of measurement increases from left to right/ top to bottom. N et E md G 5 accurately captures the ev olution over time in all three data sets by positioning networks that are close in time closer to each other resulting in a clear signal along the diagonal.(d) Kendall’ s rank correlation coef ficient between the true time ranking and rankings inferred from different netw ork comparison measures. all three data sets, as can be seen in the time ordered heatmaps gi ven in Figure 4 which all show clear groupings along the diagonal. The red regions in the tw o internet data sets correspond to outliers which can also be identified as sudden jumps in summary statistics e.g. the number of nodes. The two large clusters in the heatmap of w orld trade netw orks (Figure 4c) coincide with a change in the data gathering methodology in 1984 [Feenstra et al., 2005]. Although N et E md G 5 comes second to N et d is SF on AS- 733 and to G C D 11 on AS-caida, N et E md G 5 has the highest overall score and is the only measure that achiev es consistently high scores on all three data sets. 3.3 NetEmd based on differ ent sets of inputs W e examine the effect of reducing the size of graphlets considered in the input of N et E md , which is also relev ant from a computational point of vie w , since enumerating graphlets up to size 5 can be challenging 9 of 26 for very large networks. W e consider v ariants based on the graphlet de gree distributions of graphlets up to size 3 and 4, which we denote as N et E md G 3 and N et E md G 4 . W e also consider N et E md DD which is based only on the degree distrib ution as a baseline. Results are given in T able 1. W e find that reducing the size of graphlets from 5 to 4 does not significantly decrease the perfor- mance of N et E md and actually produces better results on three data sets ( RG 3 , Real world and Onnela et al . ). Even when based on only graphlets up to size 3, i.e. just edges, 2-paths and triangles, N et E md outperforms all other non- N et E md methods that we tested on at least 6 out of 8 data sets. Giv en that the complexity of enumerating graphlets up to size s in a network on N nodes having maximum degree k mak is O ( N k s − 1 max ) , N e t E md G 4 offers an optimal combination of performance and com- putational efficienc y in most cases. The even less computationally costly N et E md G 3 scales fav ourably ev en to networks of billions of edges for which enumerating graphlets of size 4 can be computation- ally prohibitive. This opens the door for comparing very large networks which are outside the reach of current methods while still retaining state of the art performance. Furthermore, the N et E md measures perform well under sub-sampling of nodes [Ali et al., 2016] (see Appendix D) which can be leveraged to further improv e computational efficiency . W e find that in some cases restricting the set of inputs actually leads to an increase in the perfor- mance of N et E md . This indicates that not all graphlet distributions are equally informativ e in all settings [Maugis et al., 2017]. Consequently , identifying (learning) which graphlet distributions contain the most pertinent information for a given task might lead to significant impro vements in performance. Such gen- eralizations can be incorporated into N et E md in a straightforward manner , for instance by modifying the sum in Equation (2.3) to incorporate weights. N et E md is ideally suited for such metric learning [Xing et al., 2003] type generalizations since it constructs an indi vidual distance for each graphlet distrib ution. Moreov er, such single feature N et E md measures are in many cases highly informative even on their own. For instance N et E md DD , which only uses the degree distribution, outperforms the non- N et E md measures we tested individually on more than half the data sets we considered. W e also considered counts of graphlets up to size 4 in 1-step ego networks of nodes ( N et E md E 4 ) [Ali et al., 2014] as an alternativ e way of capturing subgraph distributions, for which we denote the measure as N et E md E 4 . Although we find that N et E md E 4 achiev es consistently high scores, we find that variants based on graphlet de gree distributions tend to perform better on most data sets. Finally , we consider spectral distributions of graphs as a possible alternative to graphlet based fea- tures. The spectra of various graph operators are closely related to topological properties of graphs [Chung, 1997, Mohar et al., 1991, Banerjee and Jost, 2008] and have been widely used to characterize and compare graphs [Wilson and Zhu, 2008, Gu et al., 2016]. W e used the spectra of the graph Laplacian and normalized graph Laplacian as inputs for N et E md for which we denote the measure as N et E md S . For a given graph the Laplacian is defined as L = D − A where A is the adjacency matrix of the graph and D is the diagonal matrix whose diagonal entries are the node degrees. The normalized Laplacian ˆ L is defined as D − 1 2 LD − 1 2 . Giv en the eigenv alue distributions S ( L ) and S ( ˆ L ) of L and ˆ L we define N et E md S to be 1 2 ( N et E md S ( L ) + N et E md S ( ˆ L ) ) . W e find that in general N et E md S performs better in clustering random graphs of different sizes and densities when compared to graphlet based network comparison measures. Howe ver , on the R WN and Onnela et al. data sets graphlet based N e t E md measures tend to perform better than the spectral v ariant which can be attributed to the prev alence of network motifs in real world networks, giving graphlet based measures an advantage. The spectral variant is also outperformed on the time ordering of data sets which in turn might be a result of the sensiti vity of graph spectra to small changes in the underlying graph [W ilson and Zhu, 2008]. 10 of 26 Data set N et E md G 3 N et E md G 4 N et E md E 4 N et E md S N et E md DD RG 1 0.989 ± 0.008 0.995 ± 0.005 0.993 ± 0.004 0.992 ± 0.007 0.957 ± 0.024 RG 2 0.982 0.987 0.983 0.992 0.944 RG 3 0.940 0.941 0.947 0.972 0.902 R WN 0.952 0.950 0.933 0.933 0.907 Onnela et al. 0.892 0.898 0.892 0.858 0.867 AS-733 0.808 0.874 0.922 0.855 0.928 AS-caida 0.898 0.892 0.820 0.780 0.821 W orld Trade 0.697 0.785 0.665 0.430 0.358 T able 1: Results for different variants of N et E md based on distributions of graphlets up to size 3 and 4 ( N et E md G 3 and N et E md G 4 ), counts of graphlets up to size 4 in 1-step ego networks of nodes ( N e t E md E 4 ), eigen value spectra of Laplacian operators ( N et E md s ) and the degree distribution ( N et E md DD ). V alues in bold indicate that a measure achiev es the highest score among all measures considered in the manuscript. For RG 1 we calculate the v alue of P for each of the 16 sub-data sets. The table shows the a verage and standard de viation of the P values obtained ov er these 16 sub-data sets. 3.4 Functional classification of networks One of the primary motiv ations in studying the structure of networks is to identify topological features that can be related to the function of a network. In the conte xt of network comparison this translates into the problem of finding metrics that can identify functionally similar networks based on their topological structure. In order to test whether N et E md can be used to identify functionally similar networks, we use sev eral benchmarks from the machine learning literature where graph similarity measures, called graph kernels, ha ve been intensively studied o ver the past decade. In the context of machine learning the goal is to construct classifiers that can accurately predict the class membership of unknown graphs. W e test N et E md on benchmark data sets representing social networks [Y anardag and V ishwanathan, 2015] consisting of Reddit posts, scientific collaborations and ego networks in the Internet Movie Database (IMDB). The Reddit data sets Reddit-Binary , Reddit-Multi-5k and Reddit-Multi-12k consist of networks representing Reddit treads where nodes correspond to users and two users are connected whenev er one responded to the other’ s comments. While for the Reddit-Binary data sets the task is to classify networks into discussion based and question/answer based communities, in the data sets Reddit- Multi-5k and Reddit-Multi-12k the task is to classify networks according to their subreddit categories. COLLAB is a data set consisting of ego-networks of scientists from the fields High Energy Physics, Condensed Matter Physics and Astro Physics and the task is to determine which of these fields a given researcher belongs to. Similarly , the data sets IMDB-Binary and IMDB-Multi represent collaborations between film actors deri ved from the IMDB and the task is to classify ego-networks into different genres i.e. action and romance in the case of IMDB-Binary and comedy , action and Sci-Fi genres in the case of IMDB-Multi. W e use C - support vector machine (C-SVM) [Cortes and V apnik, 1995] classifiers with a Gaussian kernel K ( G , G 0 ) = exp ( − α N et E md ( G , G 0 ) 2 ) , where α is a free parameter to be learned during training. Performance e valuation is carried out by 10 fold cross validation, where at each step of the validation 9 folds are used for training and 1 fold for ev aluation. Free parameters of classifiers are learned via 10 fold cross validation on the training data only . Finally , e very e xperiment is repeated 10 fold and a verage prediction accuracy and standard de viation are reported. 11 of 26 Kernel Reddit-Binary Reddit-Multi-5k Reddit-Multi-12k COLLAB IMDB-Binary IMDB-Multi N e t E md G 5 92.67 ± 0.30 54.61 ± 0.18 48.09 ± 0.21 79.32 ± 0.27 66.99 ± 1.19 41.45 ± 0.70 N e t E md S 88.59 ± 0.35 53.05 ± 0.34 44.45 ± 0.18 79.05 ± 0.20 71.68 ± 0.88 46.06 ± 0.50 DGK 78.04 ± 0.39 41.27 ± 0.18 32.22 ± 0.10 73.09 ± 0.25 66.96 ± 0.56 44.55 ± 0.52 GK 77.34 ± 0.18 41.01 ± 0.17 31.82 ± 0.08 72.84 ± 0.28 65.87 ± 0.98 43.89 ± 0.38 RF 88.7 ± 1.99 50.9 ± 2.07 42.7 ± 1.28 76.5 ± 1.68 72.4 ± 4.68 47.8 ± 3.55 PCSN 86.30 ± 1.58 49.10 ± 0.70 41.32 ± 0.42 72.60 ± 2.15 71.00 ± 2.29 45.23 ± 2.84 T able 2: 10 fold cross validation accuracies of Gaussian kernels based on N et E md measures using the distributions of graphlets up to size 5 ( N et E md G 5 ) and Laplacian spectra ( N et E md S ) and other graph kernels, namely the deep graphlet kernels (DGK)[Y anardag and V ishwanathan, 2015] and the graphlet kernel (GK) [Shervashidze et al., 2009]. W e also consider alternativ es to support vector machines clas- sifiers, namely the random forest classifiers (RF) introduced in [Barnett et al., 2016] and con volutional neural networks (PCSN) [Niepert et al., 2016]. V alues in bold correspond to significantly higher scores, which are scores with t-test p-values less than 0.05 when compared to the highest score. T able 2 gives classification accuracies obtained using N et E md measures based on graphlets up to size fi ve ( N e t E md G 5 ) and spectra of Laplacian operators ( N et E md S ) on the data sets representing social networks. W e compare N et E md based kernels to graphlet kernels [Shervashidze et al., 2009] and deep graphlet kernels [Y anardag and V ishwanathan, 2015] as well as two non-SVM classifiers namely the random forest classifier introduced in [Barnett et al., 2016] and the conv olutional neural network based classifier introduced in [Niepert et al., 2016]. On the Reddit data sets and the COLLAB data set, N et E md G 5 significantly outperforms other state- of-the-art graph classifiers. On the other hand, we find that N et E md G 5 performs poorly on the IMDB data sets. This can be traced back to the large number of complete graphs present in the IMDB data sets: 139 out of the 1000 graphs in IMDB-Binary and 789 out of 1500 graphs in IMDB-Multi are complete graphs which correspond to ego-networks of actors having acted only in a single film. By definition, N et E md G 5 cannot distinguish between complete graphs of different sizes since all graphlet degree dis- tributions are concentrated on a single value in complete graphs. The spectral variant N et E md S is not affected by this and we find that N et E md S is either on par with or outperforms the other non- N et E md graph classifiers on all six data sets. W e also tested N et E md on benchmark data sets representing chemical compounds and protein struc- tures. Unlike the social network data sets, in these data sets nodes and edges are labeled to reflect domain specific knowledge such as atomic number , amino acid type and bond type. Although N et E md , in con- trast to the other graph kernels, does not rely on domain specific kno wledge in the form of node or edge labels, we found that N et E md outperforms many of the considered graph kernels coming only second to the W eisfeiler-Lehman [Shervashidze et al., 2011] type kernels in terms of overall performance (see Appendix E). 4. Discussion Starting from basic principles, we have introduced a general network comparison methodology , N et E md , that is aimed at capturing common generating processes in networks. W e tested N et E md in a large v ari- ety of e xperimental settings and found that N et E md successfully identifies similar networks at multiple scales ev en when networks differ significantly in terms of size and density , generally outperforming other graphlet based network comparison measures. Even when based only on graphlets up to size 3 (i.e. edges, 2-paths and triangles), N et E md has performance comparable to the state of the art, making 12 of 26 N et E md feasible ev en for networks containing billions of edges and nodes. By exploring machine learning applications we showed that N et E md captures topological similarity in a way that relates to the function of networks and outperforms state-of-the art graph classifiers on sev eral graph classification benchmarks. Although we only considered variants of N et E md that are based on distributions of graphlets and spectra of Laplacian operators in this paper , N et E md can also be applied to other graph features in a straightforward manner . For instance, distrib utions of paths and centrality measures might capture larger scale properties of networks and their inclusion into N e t E md might lead to a more refined measure. Data a vailability The source code for N et E md is freely av ailable at:www .opig.ox.ac.uk/resources Acknowledgements This w ork was in part supported by EPSRC grant EP/K032402/1 (A.W , G.R, C.D and R.G) and EPSRC grants EP/G037280/1 and EP/L016044/1 (C.D). L.O acknowledges the support of Colciencias through grant 568. R.G. ackno wledges support from the COST Action CA15109 and is currently supported by a Dame Kathleen Ollerensha w Research Fello wship. C.D. and G.R. acknowledge the support of the Alan T uring Institute (grant EP/NS10129/1). W e thank Xiaochuan Xu and Martin O’Reilly for useful discussions. A. Implementation A.1 Graphlet distrib utions. In the main paper , both the graphlet de gree distrib ution and graphlet counts in 1-step ego networks were used as inputs for N et E md . G R A P H L E T D E G R E E D I S T R I B U T I O N S The graphlet degree [Pr ˇ zulj, 2007b] of a node specifies the number of graphlets (small induced subgraphs) of a certain type the node appears in, while distinguish- ing between different positions the node can hav e in a graphlet. Dif ferent positions within a graphlet correspond to the orbits of the automorphism group of the graphlet. Among graphs on two to four nodes, there are 9 possible graphs and 15 possible orbits. Among graphs on two to five nodes there are 30 possible graphs and 73 possible orbits. G R A P H L E T D I S T R I B U T I O N S B A S E D O N E G O - N E T W O R K S . Another way of obtaining graphlet distri- butions is to consider graphlet counts in ego-networks [Ali et al., 2014]. The k -step ego-network of a node i is defined as the subgraph induced on all the nodes that can be reached from i (including i ) in less than k steps. For a giv en k , the distribution of a graphlet m in a network G is then simply obtained by counting the occurrence of m as an induced subgraph in the k -step ego-netw orks of each indi vidual node. A.2 Step-wise implementation In this paper , for integer valued network features such as graphlet based distributions, we base our implementation on the probability distribution that corresponds to the histogram of feature t with bin 13 of 26 width 1 as p t ( G ) . N et E md can also be defined on the basis of discrete empirical distributions i.e. distributions consisting of point masses (See Section C). Here we summarise the calculation of the N et E md T ( G , G 0 ) distance between networks G and G 0 (with N and N 0 nodes respecti vely), based on the comparison of the set of local network features T = { t 1 , . . . , t m } of graphlet de grees corresponding to graphlets up to size k . 1. First one computes the graphlet degree sequences corresponding to graphlets up to size k for networks G and G 0 . This can be done efficiently using the algorithm ORCA [Ho ˇ cev ar and Dem ˇ sar, 2014]. For the graphlet degree t 1 compute a histogram across all N nodes of G having bins of width 1 of which the centers are at their respectiv e values. This histogram is then normalized to have total mass 1. W e then interpret the histogram as the (piecewise continuous) probability density function of a random variable. This probability density function is denoted by p t 1 ( G ) . The standard deviation of p t 1 ( G ) is then computed, and is used to rescale the distribution so that it has variance 1. This distribution is denoted by ^ p t 1 ( G ) . 2. Repeat the abov e step for network G 0 , and denote the resulting distribution by ^ p t 1 ( G 0 ) . Now compute N et E md ∗ t 1 ( G , G 0 ) = inf c ∈ R E M D ^ p t 1 ( G )( · + c ) , ^ p t 1 ( G 0 )( · ) . In practice, this minimisation over c is computed using a suitable optimization algorithm. In our implementation we use the Brent-Dekker algorithm [Brent, 1971] with an error tolerance of 0.00001 and with the number of iterations upper bounded by 150. 3. Repeat the abov e two steps for the network features t 2 , . . . , t m and compute N et E md T ( G , G 0 ) = 1 m m ∑ j = 1 N et E md ∗ t j ( G , G 0 ) . A.3 Example: E M D ∗ for Gaussian distributions Suppose that p and q are N ( µ 1 , σ 2 1 ) and N ( µ 2 , σ 2 2 ) distrib utions, respectiv ely . Then E M D ∗ ( p , q ) = inf c ∈ R E M D ˜ p ( · + c ) , ˜ q ( · ) = E M D ˜ p ( · − µ 1 σ 1 + µ 2 σ 2 ) , ˜ q ( · ) = E M D ˜ q ( · ) , ˜ q ( · ) = 0 . Here we used that if X ∼ N ( µ 1 , σ 2 1 ) and Y ∼ N ( µ 2 , σ 2 2 ) , then X σ 1 + c ∼ N ( µ 1 σ 1 − c , 1 ) and Y σ 2 ∼ N ( µ 2 σ 2 , 1 ) , and these two distrib utions are equal if c = µ 1 σ 1 − µ 2 σ 2 . A.4 Spectral NetEmd When using spectra of graph operators, which take real values instead of the integer values one has in the case of graphlet distributions, we use the empirical distribution consisting of point masses for computing N et E md . For more details see Section C of this appendix. 14 of 26 A.5 Computational complexity The computational complexity of graphlet based comparison methods is dominated by the complexity of enumerating graphlets. For a network of size N and maximum degree d , enumerating all connected graphlets up to size m has complexity O ( N d m − 1 ) , while counting all graphlets up to size m in all k - step e go-networks has complexity O ( N d k + m − 1 ) . Because most real world netw orks are sparse, graphlet enumeration algorithms tends to scale more fav ourably in practice than the worst case upper bounds giv en above. In the case of spectral measures, the most commonly used algorithms for computing the eigen- value spectrum hav e complexity O ( N 3 ) . Recent results show that the spectra of graph operators can be approximated efficiently in O ( N 2 ) time [Th ¨ une, 2013]. Giv en the distribution of a feature t , computing E M D ∗ t ( G , G 0 ) has complexity O ( k ( s + s 0 ) log ( s + s 0 )) , where s and s 0 are the number of different values t takes in G and G 0 respectiv ely and k is the maximum number function calls of the optimization algorithm used to align the distributions. For node based features such as motif distributions, the worst case complexity is O ( k ( N ( G ) + N ( G 0 )) log ( N ( G ) + N ( G 0 ))) , where N ( G ) is the number of nodes of G , since the number of different values t can take is bounded by the number of nodes. B. Proof that N et E md is a distance measure W e begin by stating a definition. A pseudometric on a set X is a non-negati ve real-valued function d : X × X → [ 0 , ∞ ) such that, for all x , y , z ∈ X , 1. d ( x , x ) = 0; 2. d ( x , y ) = d ( y , x ) (symmetry); 3. d ( x , z ) 6 d ( x , y ) + d ( y , z ) (triangle inequality). If Condition 1 is replaced by the condition that d ( x , y ) = 0 ⇐ ⇒ x = y then d defines a metric . Note that this requirement can only be satisfied by a network comparison measure that is based on a complete set of graph in variants and hence netw ork comparison measures in general will not satisfy this requirement. P RO P O S I T I O N Let M denote the space of all real-valued probability measures supported on R with finite, non-zero variance. Then the E M D ∗ distance between probability measures, µ X and µ Y in M defined by E M D ∗ ( µ X , µ Y ) = inf c ∈ R E M D ( ˜ µ X ( · ) , ˜ µ Y ( · + c )) , defines a pseudometric on the space of probability measures M . P RO O F W e first note that if µ X ∈ M then ˜ µ X ( · + c ) ∈ M for any c ∈ R . Let us now verify that E M D ∗ satisfies all properties of a pseudometric. Clearly , for any µ X ∈ M , we have 0 6 E M D ∗ ( µ X , µ X ) 6 15 of 26 E M D ( ˜ µ X ( · ) , ˜ µ X ( · )) = 0, and so E M D ∗ ( µ X , µ X ) = 0. Symmetry holds, since for , any µ X and µ Y in M , E M D ∗ ( µ X , µ Y ) = inf c ∈ R E M D ( ˜ µ X ( · ) , ˜ µ Y ( · + c )) = inf c ∈ R E M D ( ˜ µ Y ( · + c ) , ˜ µ X ( · )) = inf c ∈ R E M D ( ˜ µ Y ( · ) , ˜ µ X ( · + c )) = E M D ∗ ( µ Y , µ X ) . Finally , we verify that E M D ∗ satisfies the triangle inequality . Suppose µ X , µ Y and µ Z are probability measures from the space M , then so are ˜ µ X ( · + a ) , ˜ µ Y ( · + b ) for any a , b ∈ R . Since EMD satisfies the triangle inequality , we have, for any a , b ∈ R , E M D ( ˜ µ X ( · + a ) , ˜ µ Y ( · + b )) 6 E M D ( ˜ µ X ( · + a ) , ˜ µ Z ( · )) + E M D ( ˜ µ Y ( · + b ) , ˜ µ Z ( · )) . Since the abov e inequality holds for all a , b ∈ R , we have that E M D ∗ ( µ X , µ Y ) = inf c ∈ R E M D ( ˜ µ X ( · + c ) , ˜ µ Y ( · )) = inf a , b ∈ R E M D ( ˜ µ X ( · + a ) , ˜ µ Y ( · + b )) 6 inf a , b ∈ R E M D ( ˜ µ X ( · + a ) , ˜ µ Z ( · )) + E M D ( ˜ µ Y ( · + b ) , µ Z ( · )) = inf a ∈ R E M D ( ˜ µ X ( · + a ) , ˜ µ Z ( · )) + inf b ∈ R E M D ( ˜ µ Y ( · + b ) , ˜ µ Z ( · )) = inf a ∈ R E M D ( ˜ µ X ( · + a ) , ˜ µ Z ( · )) + inf b ∈ R E M D ( ˜ µ Y ( · + b ) , ˜ µ Z ( · )) = E M D ∗ ( µ X , µ Z ) + E M D ∗ ( µ Y , µ Z ) , as required. W e have thus v erified that E M D ∗ satisfies all properties of a pseudometric. 2 C. Generalization of E M D ∗ to point masses Although in the case of graphlet based features we based our implementation of N e t E md on probability distribution functions that correspond to normalized histograms havning bin width 1 N et E md can also be based on empirical distributions consisting of collections of point masses located at the observed values. The definition of E M D ∗ can be generalized to include distributions of zero variance, i.e. unit point masses. Mathematically , the distribution of a point mass at x 0 is giv en by the Dirac measure δ x ( x 0 ) . Such distributions are frequently encountered in practice since some graphlets do not occur in certain networks. First, we note that unit point masses are always mapped onto unit point masses under rescaling operations. Moreover , for a unit point mass δ x ( x 0 ) we have that inf c ∈ R ( E M D ( ˜ p ( · + c ) , δ x ( x 0 ))) = inf c ∈ R ( E M D ( ˜ p ( · + c ) , δ x ( kx 0 ))) for all p ∈ M and k > 0. Consequently , E M D ∗ can be generalized to include unit point masses in a consistent fashion by alw ays rescaling them by 1: E M D ∗ ( p , q ) = inf c ∈ R E M D ( ˆ p ( · + c ) , ˆ q ) , where ˆ p = ˜ p (as in Eq. 2.1) if p has a non-zero variance, and ˆ p = p if p has variance zero. 16 of 26 D. Sub-sampling N et E md is well suited for the sub-sampling procedure from [Ali et al., 2016]. Following this procedure we base the graphlet distributions used as an input of N et E md on a sample of nodes rather than the whole network. Figure A.5 shows the P scores for variants of N e t E md on a set of synthetic networks and the Onnela et al. data set. W e find that the performance of N et E md is stable under sub-sampling and that in general using a sample of only 10% of the nodes produces results comparable to the case where all nodes are used. 1 0.9 0.75 0.5 0.25 0.1 0.05 0.01 Sampled Proportion 0.80 0.85 0.90 0.95 1.00 P N e t E m d G 3 N e t E m d G 4 N e t E m d G 5 (a) Synthetic networks 1 0.9 0.75 0.5 0.25 0.1 0.05 0.01 Sampled Proportion 0.5 0.6 0.7 0.8 0.9 1.0 P N e t E m d G 3 N e t E m d G 4 N e t E m d G 5 (b) Onnela et al. F I G . A.5: The P v alues for dif ferent variants of N e t E md under sub-sampling for a) a set of 80 synthetic networks coming from eight different random graph models with 2500 nodes and average de gree 20, b) for the Onnela et al. data set showing the average and standard deviation over 50 experiments for each sampled proportion. Note that the performance of N et E md under sub-sampling is remarkably stable and is close to optimal even when only 10% of nodes are sampled. For synthetic networks we find that the stability of N et E md increases as the size of the graphlets used in the input is increased. E. Results f or data sets of chemical compounds and proteins W e also tested N e t E md on benchmark data sets representing chemical compounds (MUT A G, NCI1 and NCI109) and protein structures (ENZYMES and D&D). MUT AG [Debnath et al., 1991] is a data set of 188 chemical compounds that are labelled according to their mutagenic effect on Salmonella typhimurium. NCI1 and NCI109 represent sets of chemical compounds which are labelled for their activity against non-small cell lung cancer and ov arian cancer cell lines, respectiv ely [W ale et al., 2008]. Nodes and edges in MUT A G, NCI1 and NCI109 are labeled by atomic number and bound type, respec- tiv ely . ENZYMES and D&D [Borgwardt et al., 2005] consist of networks representing protein structures at the lev el of tertiary structure and amino acids respectively . While networks in ENZYMES are classi- fied into six different enzyme classes, networks in D&D are classified according to whether or not they correspond to an enzyme. Nodes in ENZYMES are labelled according to structural element type and according to amino acid types in D&D. Classification accuracies obtained using N et E md on the data sets of chemical compounds and pro- tein structures are given in T able A.3, along with results for other graph kernels reported in [Sher- vashidze et al., 2011]. For a detailed description of these kernels we refer to [Sherv ashidze et al., 2011] 17 of 26 (a) Heatmap of N et E md G 5 for RG 3 . (b) Heatmap of G C D 11 for RG 3 . F I G . A.6: a) and b) show the heatmaps of pairwise distances on RG 3 ( N ∈ { 1250 , 2500 , 5000 , 10000 } and k ∈ { 10 , 20 , 40 , 80 } ) according to N et E md G 5 and next best performing measure G C D 11, respectively . In the heat map, networks are ordered from top to bottom in the follo wing order: model, a verage degree and node count. Although we observe some de gree of off diagonal mixing the heatmap of N e t E md still shows 8 diagonal blocks corresponding to different generativ e models in contrast to the heat map of G CD 11. Kernel MUT AG NCI1 NCI109 ENZYMES D & D N et E md G 5 83.71 ± 1.16 78.59 ± 0.28 76.71 ± 0.34 46.55 ± 1.25 78.01 ± 0.38 N et E md S 83.30 ± 1.20 77.36 ± 0.38 76.14 ± 0.27 42.75 ± 0.78 76.74 ± 0.43 WL subtree 82.05 ± 0.36 82.19 ± 0.18 82.46 ± 0.24 52.22 ± 1.26 79.78 ± 0.36 WL edge 81.06 ± 1.95 84.37 ± 0.30 84.49 ± 0.20 53.17 ± 2.04 77.95 ± 0.70 WL shortest path 83.78 ± 1.46 84.55 ± 0.36 83.53 ± 0.30 59.05 ± 1.05 79.43 ± 0.55 Ramon & G ¨ artner 85.72 ± 0.49 61.86 ± 0.27 61.67 ± 0.21 13.35 ± 0.87 57.27 ± 0.07 p-random walk 79.19 ± 1.09 58.66 ± 0.28 58.36 ± 0.94 27.67 ± 0.95 66.64 ± 0.83 Random W alk 80.72 ± 0.38 64.34 ± 0.27 63.51 ± 0.18 21.68 ± 0.94 71.70 ± 0.47 Graphlet count 75.61 ± 0.49 66.00 ± 0.07 66.59 ± 0.08 32.70 ± 1.20 78.59 ± 0.12 Shortest path 87.28 ± 0.55 73.47 ± 0.11 73.07 ± 0.11 41.68 ± 1.79 78.45 ± 0.26 T able A.3: 10 fold cross validation accuracies of Gaussian kernels based on N et E md G 5 and N et E md S and other kernels reported in [Sherv ashidze et al., 2011]. and the references therein. Note that, in contrast to all other k ernels in T able A.3, N e t E md does not use any domain specific knowledge in the form of node or edge labels. Node and edge labels are highly informativ e for all fiv e classification tasks - as shown in [Sugiyama and Borgw ardt, 2015]. On MUT A G, N et E md achieves an accuracy that is comparable to the W eisfeiler -Lehman (WL) shortest path kernel, but is outperformed by the shortest path kernel and the kernel by Ramon & G ¨ artner . While on NCI1, NCI109 and ENZYMES, N et E md is outperformed only by WL kernels, on D&D N et E md achiev es a classification accuracy that is comparable to the best performing kernels. Notably , on D&D N et E md also outperforms the vector model by Dobson and Doig [Dobson and Doig, 2003] (classification accuracy: 76.86 ± 1.23) which is based on 52 physical and chemical features without using domain specific knowledge i.e. solely based on graph topology . 18 of 26 E.1 Implementation of C-SVMs Follo wing the procedure in [Shervashidze et al., 2011] we use 10-fold cross validation with a C-SVM [Cortes and V apnik, 1995] to test classification performance. W e use the python package scikit-learn [Pedregosa et al., 2011] which is based is build on libsvm implementation [Chang and Lin, 2011]. The C − val ue of the C-SVM and the α for the Gaussian kernel is tuned independently for each fold using training data from that fold only . Each experiment is repeated 10 times, and average prediction accuracies and their standard deviations are reported. W e also note that note for all values of α is the Gaussian NetEmd kernel is positiv e semidefinite (psd) [Jayasumana et al., 2015]. The implication is that the C-SVM conv erges to a stationary point that is not always guaranteed to be global optimum. Although there exist alternativ e algorithms [Luss and d’Aspremont, 2008] for training C-SVMs with indefinite kernels which might result in better classifica- tion accuracy , here we chose to use the standard libsvm-algorithm in order to ensure a fair comparison between kernels. For a discussion of support vector machines with indefinite kernels see [Haasdonk, 2005]. F . Detailed description of data sets and models F .1 Synthetic networks and random graph models RG 1 consists of 16 sub data sets corresponding to combinations of N ∈ { 1250 , 2500 , 5000 , 10000 } and k ∈ { 10 , 20 , 40 , 80 } containing 10 realizations for each model i.e. contain 80 networks each. In RG 2 the size N and av erage degree k are increased independently in linear steps to twice their initial value ( N ∈ { 2000 , 3000 , 4000 } and k ∈ { 20 , 24 , 28 , 32 , 36 , 40 } ) and contains 10 realizations per model parameter combination, resulting in a data set of 3 × 6 × 8 × 10 = 1440 networks. In RG 3 the size N and a verage degree k are increased independently in multiples of 2 to 8 times their initial value ( N ∈ { 1250 , 2500 , 5000 , 10000 } and k ∈ { 10 , 20 , 40 , 80 } ) and again contains 10 realizations per model parameter combination, resulting in a data set of 4 × 4 × 8 × 10 = 1280 netw orks. The models are as follows. F .1.1 The Erd ˝ os-R ´ enyi model. W e consider the Erd ˝ os-R ´ enyi (ER) model [Erd ˝ os and R ´ enyi, 1960] G ( N , m ) where N is the number of nodes and m is the number of edges. The edges are chosen uniformly at random without replacement from the N 2 possible edges. F .1.2 The configuration model. Given a graphical degree sequence, the configuration model creates a random graph that is drawn uniformly at random from the space of all graphs with the given degree sequence. The degree sequence of the configuration models used in the paper is taken to be degree sequence of a duplication div ergence model that has the desired a verage degree. F .1.3 The Barab ´ asi Albert pr eferential attachment model. In the Barab ´ asi-Albert model [Barab ´ asi and Albert, 1999] a network is generated starting from a small initial network to which nodes of degree m are added iterativ ely and the probability of connecting the new node to an existing node is proportional to the degree of the e xisting node. F .1.4 Geometric random graphs. Geometric random graphs [Gilbert, 1961] are constructed under the assumption that the nodes in the network are embedded into a D dimensional space, and the presence 19 of 26 of an edge depends only on the distance between the nodes and a given threshold r . The model is constructed by placing N nodes uniformly at random in an D -dimensional square [ 0 , 1 ] D . Then edges are placed between any pair of nodes for which the distance between them is less or equal to the threshold r . W e use D = 3 and set r to be the threshold that results in a network with the desired average degree, while the distance is the Euclidean distance. F .1.5 The geometric gene duplication model. The geometric gene duplication model is a geometric model [Higham et al., 2008] in which the nodes are distributed in 3 dimensional Euclidean space R 3 according to the following rule. Starting from an small initial set of nodes in three dimensions, at each step a randomly chosen node is selected and a new node is placed at random within a Euclidean distance d of this node. The process is repeated until the desired number of nodes is reached. Nodes within a certain distance r are then connected. W e fix r to obtain the desired average de gree. F .1.6 The duplication diver gence model of V ´ azquez et al.. The duplication div ergence model of V ´ azquez et al. [V ´ azquez et al., 2002] is defined by the following growing rules: (1) Duplication: A node v i is randomly selected and duplicated ( v 0 i ) along with all of its interactions. An edge between v i and v 0 i is placed with probability p . (2) Div ergence: For each pair of duplicated edges { ( v i , v k ) ; ( v 0 i , v k ) } ; one of the duplicated edges is selected uniformly at random and then deleted with probability q . This process is followed until the desired number of nodes is reached. In our case we fix p to be 0.05 and adjust q through a grid search to obtain a network that on a verage has the desired av erage degree. F .1.7 The duplication diverg ence of Ispolatov et al.. The duplication diver gence model of Ispolatov et al. [Ispolatov et al., 2005] starts with an initial network consisting of a single edge and then at each step a random node is chosen for duplication and the duplicate is connected to each of the neighbours of its parent with probability p . W e adjust p to obtain networks that hav e on av erage the desired av erage degree. F .1.8 The W atts-Str ogatz model. The W atts-Strogatz model, [W atts and Strogatz, 1998] creates graphs that interpolate between re gular graphs and ER graphs. The model starts with a ring of n nodes in which each node is connected to its k -nearest neighbours in both directions of the ring. Each edges is rewired with probability p to a node which is selected uniformly at random. While k is adjusted to obtain networks ha ving the desired average de gree we take p to be 0.05. F .2 Real world data sets Summary statistics of the data sets are giv en in T able A.4. F .2.1 Real world networks fr om differ ent classes (RWN). W e compiled a data set consisting of 10 different classes of real world networks: social networks, metabolic networks, protein interaction net- works, protein structure networks, food webs, autonomous systems networks of the internet, world trade networks, airline networks, peer to peer file sharing networks and scientific collaboration networks. Although in some instances larger versions of these data sets are available, we restrict the maximum number of networks in a certain class to 20 by taking random samples of larger data sets in order to av oid scores being dominated by larger network classes. 20 of 26 Data set #Networks N min Median( N ) N max E min Median( E ) E max d min Median( d ) d max R WN 167 24 351 62586 76 2595 824617 7.55e-05 0.0163 0.625 Onnela et al. 151 30 918 11586 62 2436 232794 4.26e-5 0.0147 0.499 AS-caida 122 8020 22883 26475 18203 46290 53601 1.48e-4 1.78e-4 5.66e-4 AS-733 732 493 4180.5 6474 1234 8380.5 13895 6.63e-4 9.71e-4 1.01e-2 W orld Trade Networks 53 156 195 242 5132 7675 18083 0.333 0.515 0.625 Reddit-Binary 2000 6 304.5 3782 4 379 4071 5.69e-4 8.25e-3 0.286 Reddit-Multi-5k 4999 22 374 3648 21 422 4783 6.55e-4 6.03e-3 0.091 Reddit-Multi-12k 11929 2 280 3782 1 323 5171 5.69e-4 8.27e-3 1.0 COLLAB 5000 32 52 492 60 654.5 40120 0.029 0.424 1.0 IMDB-Binary 1000 12 17 136 26 65 1249 0.095 0.462 1.0 IMDB-Multi 1500 7 10 89 12 36 1467 0.127 1.0 1.0 MUT AG 188 10 17.5 28 10 19 33 0.082 0.132 0.222 NCI1 4110 3 27 111 2 29 119 0.0192 0.0855 0.667 NCI109 4127 4 26 111 3 29 119 0.0192 0.0862 0.5 ENZYMES 600 2 32 125 1 60 149 0.0182 0.130 1.0 D&D 1178 30 241 5748 63 610.5 14267 8.64e-4 0.0207 0.2 T able A.4: Summary statistics of data sets N , E and d stand for the number of nodes, number of edges and edge density , respectively . The class of social networks consists of 10 social networks from the Pajek data set which can be found at http://vlado.fmf.uni-lj.si/pub/networks/data/def ault.htm (June 12th 2015) (Networks: ’bkfrat’, ’bkham’, ’bkof f ’, ’bktec’, ’ dolphins’, ’kaptailS1’, ’kaptailS2’, ’kaptailT1’, ’kaptailT2’, ’karate’, ’lesmis’, ’prison’) and a sample of 10 Facebook networks from [T raud et al., 2012] (Networks:’Auburn71’, ’Bucknell39’, ’Caltech36’, ’Duke14’, ’Harvard1’, ’JMU79’, ’MU78’, ’Maine59’, ’Maryland58’, ’Rice31’, ’Rutgers89’, ’Santa74’, ’UC61’, ’UC64’, ’UCLA26’, ’UPenn7’, ’UV A16’, ’V assar85’, ’W ashU32’, ’Y ale4’). The class of metabolic networks consists of 20 networks taken [Jeong et al., 2000] (Networks: ’AB’, ’A G’, ’AP’, ’A T’, ’BS’, ’CE’, ’CT’, ’EF’, ’HI’, ’MG’, ’MJ’, ’ML ’, ’NG’, ’OS’, ’P A ’, ’PN’, ’RP’, ’TH’, ’TM’, ’YP’). The class of protein interaction networks consists of 6 networks from BIOGRID [Stark et al., 2006] (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster , Homo sapi- ens, Mus musculus and Saccharomyces cerevisiae downloaded: October 2015) and 5 networks from HINT [Das and Y u, 2012] (Arabidopsis thaliana, Caenorhabditis elegans, Drosophila melanogaster, Homo sapiens and Mus musculus (V ersion: June 1 2014)) and the protein interaction network of Echeria coli by Rajagopala et al. [Rajagopala et al., 2014]. The class of protein structure networks consists of a sample of 20 networks from the data set D&D (Networks: 20, 119, 231, 279, 335, 354, 355, 369, 386, 462, 523, 529, 597, 748, 833, 866, 990, 1043, 1113, 1157). The class of food webs consists of 20 food webs from the Pajek data set: http://vlado.fmf.uni-lj.si/pub/networks/data/default.htm (June 10th 2015) (Networks: ’ChesLower’, ’ChesMiddle’, ’ChesUpper’, ’Chesapeake’, ’CrystalC’, ’CrystalD’, ’Ever glades’, ’Florida’, ’Michigan’, ’Mondego’, ’Narragan’, ’StMarks’, ’baydry’, ’baywet’, ’cypdry’, ’cypwet’, ’gramdry’, ’gramwet’, ’mangdry’, ’mangwet’). The class of internet networks consists of 10 randomly chosen networks from AS-733 [Leskov ec et al., 2005] (Netw orks:’1997/11/12’, ’1997/12/28’, ’1998/01/01’, ’1998/06/06’, ’1998/08/13’, ’1998/12/04’, ’1999/03/30’, ’1999/04/17’, ’1999 /06/18’, ’1999/08/30’) and 10 randomly chosen networks from AS-caida [Leskov ec et al., 2005] (Networks: ’2004/10/04’, ’2006/01/23’, ’2006/03/27’, ’2006/07/10’, ’2006/09/25’, ’2006/11/27’, ’2007/01/15’, ’2007/04/30’, ’2007/05/28’, ’2007/09/24’). Both datasets are from SN AP [Jure and Kre vl, 2014](June 1 2016). The class of world trade networks is a sample of 20 networks of the larger data set considered in [Feenstra et al., 2005, Division, 2015] (Networks: 1968, 1971, 1974, 1975, 1976, 1978, 1980, 1984, 1989, 1992, 1993, 1996, 1998, 2001, 2003, 2005, 2007, 2010, 2011, 2012). The airline networks were deriv ed from the data av ailable at: http://openflights.org/ (June 12 2015). For this we considered the 50 largest air- lines from the database in terms of the number of destinations that the airline serves. For each airline a network is obtained by the considering all airports that are serviced by the airlines which are connected 21 of 26 whenev er there is direct flight between a pair of nodes. W e then took a sample of 20 networks from this larger data set (Airline codes of the networks: ’AD’, ’AF’, ’AM’, ’B A ’, ’D Y’, ’FL ’, ’FR’, ’JJ’, ’JL ’, ’MH’, ’MU’, ’NH’, ’QF’, ’SU’, ’SV’, ’U2’, ’UA ’, ’US’, ’VY’, ’ZH’). The class of peer to peer networks consist of 9 networks of the Gnutella file sharing platform measured at different dates which are av ailable at [Jure and Krevl, 2014]. The scientific collaboration networks consists of 5 networks representing different scientific disciplines which were obtained from [Jure and Krevl, 2014] (June 1 2015). F .2.2 Onnela et al. data set. The Onnela et al. data set consists of all undirected and unweighted networks from the larger collection analysed in [Onnela et al., 2012]. A complete list of networks and class membership can be found in the supplementary information of [Ali et al., 2014]. F .2.3 T ime or dered data sets. The data sets AS-caida and AS-733 each represent the internet mea- sured at the level of autonomous systems at various points in time. Both data sets were downloaded from [Jure and Krevl, 2014](June 1 2015). The W orld T rade Networks data set is based on the data set [Feenstra et al., 2005] for the years 1962- 2000 and on UN COMTRADE [Division, 2015] for the years 2001-2015. T wo countries are connected in the network whenev er they import or export a commodity from a each other within the giv en calendar year . The complete data set was downloaded from : http://atlas.media.mit.edu/en/resources/data/ on July 12 2015. F .2.4 Machine learning benchmarks. A short description of the social networks datasets was given in the main text. A more detailed description can be found in [Y anardag and V ishwanathan, 2015]. The social network data sets were downloaded from https://ls11-www .cs.tu-dortmund.de/staf f/morris/graphkerneldatasets on September 2 2016. A short short description of the chemical compound and protein structure data sets was given in Sec- tion E. A more detailed description of the data set can be found in [Shervashidze et al., 2011]. These data sets were downloaded from: https://www .bsse.ethz.ch/mlcb/research/machine-learning/graph-kernels.html on June 12 2016. G. Perf ormance e valuation via area under precision recall curv e ? The area under precision recall curv e (A UPRC) was used as a performance metric for network compar- ison measures by Y av eroglu et al. [Y av eroglu et al., 2014]. The A UPRC is based on a classifier that for a giv en distance threshold ε classifies pairs of networks to be similar whenev er d ( G , G 0 ) < ε . A pair satisfying d ( G , G 0 ) < ε is taken to be a true positiv e whenever G and G 0 are from the same class. The A UPRC is then defined to be the area under the precision recall curve obtained by varying ε in small increments. Howe ver , A UPRC is problematic, especially in settings where one has more than two classes and when classes are separated at dif ferent scales. Figure A.7 giv es three examples of metrics for a problem that has three classes: a) shows a metric d 1 (A UPRC=0.847) that clearly separates the 3-classes which, howe ver , has a lower A UPRC than the metrics given in b) (A UPRC=0.902) which confuses half of Class-1 with Class-2 and c) (PRC=0.896) which shows 2 rather than 3 classes. The colour scale in the figure represents the magnitude of a comparison between a pair of individuals according to the corresponding metric. Some of the problems of A UPRC are the following. First, A UPRC is based on a classifier that identifies pairs of similar networks and hence is only indirectly related to the problem of separating 22 of 26 classes. Moreover , the classifier uses a single global threshold ε for all networks and classes, and hence implicitly assumes that all classes are separated on the same scale. The A UPRC further lacks a clear statistical interpretation, which complicates its use especially when one has multiple classes and when precision recall curves of dif ferent measures intersect. Despite its problems we give A UPRC values for all measures we considered in the main text in T able A.5 for the sake of completeness. Note that N et E md measures achieve the highest A UPRC on all data sets. C la ss -1 Class- 2 Class- 3 Class-1 Class-2 Class-3 AUPRC = 0 .8 4 7 0.0 1.5 3.0 4.5 6.0 7.5 9.0 (a) d 1 C la ss -1 Class- 2 Class- 3 Class-1 Class-2 Class-3 AUPRC = 0 .902 0.0 1.5 3.0 4.5 6.0 7.5 9.0 (b) d 2 C la ss -1 Class- 2 Class- 3 Class-1 Class-2 Class-3 AUPRC = 0 .8 9 6 0.0 1.5 3.0 4.5 6.0 7.5 9.0 (c) d 3 F I G . A.7: Heat maps of three measures for in an example of 3 equally sized classes. a) Metric d 1 shows clear separation between the 3 classes. b) d 2 shows 3 classes with half of Class-1 positioned closer to Class-2. c) d 3 identifies 2 rather than 3 classes. Note that d 1 has lower A UPRC than d 2 and d 3 despite being best at identifying the 3 classes whereas P v alues for the metrics are d 1 =1.0, d 2 =0.887 and d 3 =0.869. RG 1 RG 2 RG 3 R WN Onnela et al. N et E md G 3 0.917 ± 0.039 0.869 0.702 0.800 0.756 N et E md G 4 0.959 ± 0.030 0.930 0.759 0.774 0.786 N et E md G 5 0.981 ± 0.018 0.957 0.766 0.722 0.757 N et E md S 0.967 ± 0.015 0.958 0.833 0.702 0.672 N et E md E 4 0.966 ± 0.030 0.945 0.801 0.777 0.739 N et E md DD 0.756 ± 0.044 0.708 0.516 0.655 0.612 N et d is E R 0.867 ± 0.044 0.579 0.396 0.607 0.621 N et d is SF 0.852 ± 0.028 0.657 0.437 0.522 0.592 G CD 11 0.888 ± 0.084 0.709 0.478 0.713 0.693 G CD 73 0.966 ± 0.052 0.858 0.571 0.736 0.743 GGD A 0.815 ± 0.176 0.740 0.481 0.500 0.625 T able A.5: A UPRC scores for measures and data sets considered in the main text. N et E md measures hav e the highest A UPRC score (given in bold) on all data sets.For RG 1 we calculated the value of the A UPRC score for each of the 16 sub-data sets. The table shows the av erage and standard deviation of the A UPRC v alues obtained over these 16 sub-data sets. REFERENCES 23 of 26 References M. E. J. Newman. Networks: an intr oduction . Oxford Univ ersity Press, 2010. R. C. W ilson and P . Zhu. A study of graph spectra for comparing graphs and trees. P attern Recognition , 41(9):2833–2841, 2008. B. Ne yshabur , A. Khadem, S. Hashemifar , and S. S. Arab . Netal: a new graph- based method for global alignment of proteinprotein interaction networks. Bioinformatics , 27:1654–1662, 2013. W . Ali, T . Rito, G. Reinert, F . Sun, and C. M. Deane. Alignment-free protein interaction network comparison. Bioinformatics , 30:i430–i437, 2014. ¨ O. N. Y av eroglu, N. Malod-Dognin, D. Davis, Z. Levnajic, V . Janjic, R. Karapandza, A. Stojmirovic, and N. Przulj. Rev ealing the hidden language of complex networks. Sci. Rep. , 4, 04 2014. K. M. Borgwardt, H.-P . Kriegel, S. V . N. V ishwanathan, and N. N. Schraudolph. Graph kernels for disease outcome prediction from protein-protein interaction networks. In P acific symposium on bio- computing , volume 12, pages 4–15, 2007. N. W ale, I. A. W atson, and G. Karypis. Comparison of descriptor spaces for chemical compound retriev al and classification. Knowl. Inf. Syst. , 14(3):347–375, 2008. O. Kuchaiev and N. Pr ˇ zulj. Integrati ve network alignment rev eals large regions of global network similarity in yeast and human. Bioinformatics , 27(Sep):2539–2561, 2011. Nil Mamano and W ayne B Hayes. Sana: Simulated annealing far outperforms many other search algorithms for biological network alignment. Bioinformatics (Oxford, England) , 2017. A.-L. Barab ´ asi and R. Albert. Emergence of scaling in random networks. Science , 286(5439):509–512, 1999. N. Pr ˇ zulj. Biological network comparison using graphlet degree distribution. Bioinformatics , 23(2): e177–e183, 2007a. R. Milo, S. Shen-Orr , S. Itzkovitz, N. Kashtan, D. Chklovskii, and U. Alon. Network motifs: Simple building blocks of comple x networks. Science , 298(5594):824–827, 2002. R. Milo, S. Itzkovitz, N. Kashtan, R. Levitt, S. Shen-Orr , I. A yzenshtat, M. Sheffer , and U. Alon. Superfamilies of e volved and designed netw orks. Science , 303(5663):1538–1542, 2004. A. E. W e gner . Subgraph covers: An information-theoretic approach to motif analysis in networks. Phys. Rev . X , 4:041026, Nov 2014. doi: 10.1103/PhysRe vX.4.041026. W . Ali, A. E. W egner , R. E. Gaunt, C. M. Deane, and G. Reinert. Comparison of large networks with sub-sampling strategies. Sci. Rep. , 6, 2016. Y . Runber , C. T omasi, and L. J. Guibas. A metric for distributions with applications to image databases. In In IEEE I. Conf. Comp. V is. , pages 59–66, 1998. N. Pr ˇ zulj. Biological network comparison using graphlet degree distribution. Bioinformatics , 23(2): e177–e183, 2007b. 24 of 26 REFERENCES A. V ´ azquez, A. Flammini, A. Maritan, and A. V espignani. Modeling of protein interaction networks. Complexus , 1(1):38–44, 2002. P . Erd ˝ os and A. R ´ enyi. On the ev olution of random graphs. Publications of the Mathematical Institute of the Hungarian Academy of Sciences , 5:17–61, 1960. I. Ispolatov , P . L. Krapivsk y , and A. Y urye v . Duplication-di vergence model of protein interaction net- work. Phys. Rev . E , 71(6):061911, 2005. D. J. Higham, M. Ra ˇ sajski, and N. Pr ˇ zulj. Fitting a geometric graph to a protein–protein interaction network. Bioinformatics , 24(8):1093–1099, 2008. M. Penrose. Random Geometric Graphs . Oxford Univ ersity Press, 2003. M. Molloy and B. Reed. A critical point for random graphs with a given degree sequence. Random Struct. Algor . , 6(2-3):161–180, 1995. D. J. W atts and S. H. Strogatz. Collective dynamics of ‘small-world’networks. Natur e , 393(6684): 440–442, 1998. J.-P . Onnela, D. J. Fenn, S. Reid, M. A. Porter , P . J. Mucha, M. D. Fricker , and N. S. Jones. T axonomies of networks from community structure. Phys. Rev . E , 86(3):036104, 2012. J. Leskovec, J. Kleinberg, and C. Faloutsos. Graphs over time: densification laws, shrinking diameters and possible explanations. In Pr oceedings of the eleventh ACM SIGKDD International Confer ence on Knowledge Discovery in Data Mining , pages 177–187. A CM, 2005. R. C. Feenstra, R. E. Lipsey , H. Deng, A. C. Ma, and H. Mo. W orld trade flows: 1962-2000. T echnical report, National Bureau of Economic Research, 2005. United Nations Statistics Division. United nations commodity trade statistics database (un comtrade). http://comtrade.un.org/, 2015. P . G. Maugis, C. E. Priebe, S. C. Olhede, and P . J. W olfe. Statistical inference for network samples using subgraph counts. ArXiv e-prints , January 2017. E. P . Xing, A. Y . Ng, M. I. Jordan, and S. Russell. Distance metric learning with application to clustering with side-information. Adv . Neur . In. , 15:505–512, 2003. F . R. K. Chung. Spectral graph theory , v olume 92. American Mathematical Soc., 1997. B. Mohar, Y . Alavi, G. Chartrand, and O. R. Oellermann. The laplacian spectrum of graphs. Graph theory , combinatorics, and applications , 2(871-898):12, 1991. A. Banerjee and J. Jost. On the spectrum of the normalized graph laplacian. Linear algebra and its applications , 428(11-12):3015–3022, 2008. J. Gu, J. Jost, S. Liu, and P . F . Stadler . Spectral classes of regular , random, and empirical graphs. Linear algebra and its applications , 489:30–49, 2016. P . Y anardag and S. V . N. V ishwanathan. Deep graph k ernels. In Pr oceedings of the 21th A CM SIGKDD International Confer ence on Knowledge Disco very and Data Mining , pages 1365–1374. A CM, 2015. REFERENCES 25 of 26 C. Cortes and V . V apnik. Support-vector networks. Mach. Learn. , 20(3):273–297, 1995. N. Shervashidze, S. V . N. V ishwanathan, T . Petri, K. Mehlhorn, and K. M. Borgw ardt. Efficient graphlet kernels for lar ge graph comparison. In AIST A TS , volume 5, pages 488–495, 2009. I. Barnett, N. Malik, M. L. Kuijjer , P . J. Mucha, and J.-P . Onnela. Feature-based classification of networks. CoRR , abs/1610.05868, 2016. URL . M. Niepert, M. Ahmed, and K. Kutzko v . Learning con volutional neural networks for graphs. CoRR , abs/1605.05273, 2016. URL . N. Shervashidze, P . Schweitzer , E. J. v . Leeuwen, K. Mehlhorn, and K. M. Borgwardt. W eisfeiler- lehman graph kernels. J.Mach. Learn. Res. , 12(Sep):2539–2561, 2011. T . Ho ˇ cev ar and J. Dem ˇ sar . A combinatorial approach to graphlet counting. Bioinformatics , 30(4): 559–565, 2014. R. P . Brent. An algorithm with guaranteed conv ergence for finding a zero of a function. Comput. J. , 14 (4):422–425, 1971. M. Th ¨ une. Eigen values of matrices and graphs . PhD thesis, Univ ersity of Leipzig, 2013. A. K. Debnath, R. L. Lopez de Compadre, G. Debnath, A. J. Shusterman, and C. Hansch. Structure- activity relationship of mutagenic aromatic and heteroaromatic nitro compounds. correlation with molecular orbital energies and hydrophobicity . J. Med. Chem. , 34(2):786–797, 1991. K. M. Borgw ardt, C. S. Ong, S. Sch ¨ onauer , S. V . N. V ishwanathan, A. J. Smola, and H.-P . Kriegel. Protein function prediction via graph kernels. Bioinformatics , 21(suppl 1):i47–i56, 2005. M. Sugiyama and K. M. Borgwardt. Halting in random walk kernels. In Adv . Neu. In. , pages 1639–1647, 2015. P . D. Dobson and A. J. Doig. Distinguishing enzyme structures from non-enzymes without alignments. J. Mol. Biol. , 330(4):771–783, 2003. F . Pedregosa, G. V aroquaux, A. Gramfort, V . Michel, B. Thirion, O. Grisel, M. Blondel, P . Prettenhofer , R. W eiss, V . Dubour g, et al. Scikit-learn: Machine learning in python. Journal of Mac hine Learning Resear ch , 12(Oct):2825–2830, 2011. Chih-Chung Chang and Chih-Jen Lin. Libsvm: a library for support vector machines. A CM T ransactions on Intelligent Systems and T ec hnology (TIST) , 2(3):27, 2011. S. Jayasumana, R. Hartley , M. Salzmann, H. Li, and M. Harandi. Kernel methods on riemannian mani- folds with gaussian rbf kernels. IEEE transactions on pattern analysis and machine intelligence , 37 (12):2464–2477, 2015. R. Luss and A. d’Aspremont. Support v ector machine classification with indefinite kernels. In Advances in Neural Information Pr ocessing Systems , pages 953–960, 2008. B. Haasdonk. Feature space interpretation of svms with indefinite kernels. IEEE T ransactions on P attern Analysis and Machine Intelligence , 27(4):482–492, 2005. 26 of 26 REFERENCES E. N. Gilbert. Random plane networks. J. Soc. Ind. Appl. Math. , 9(4):533–543, 1961. A. L. T raud, P . J. Mucha, and M. A. Porter . Social structure of facebook networks. Physica A , 391(16): 4165–4180, 2012. H. Jeong, B. T ombor , R. Albert, Z. N. Oltvai, and A.-L. Barab ´ asi. The large-scale organization of metabolic networks. Nature , 407(6804):651–654, 2000. C. Stark, B.-J. Breitkreutz, T . Reguly , L. Boucher, A. Breitkreutz, and M. T yers. Biogrid: a general repository for interaction datasets. Nucleic Acids Res. , 34(suppl 1):D535–D539, 2006. J. Das and H. Y u. Hint: High-quality protein interactomes and their applications in understanding human disease. BMC Syst Biol. , 6(1):92, 2012. S. V . Rajagopala, P . Sik orski, A. Kumar , R. Mosca, J. Vlasblom, R. Arnold, J. Franca-Koh, S. B. Pakala, S. Phanse, A. Ceol, et al. The binary protein-protein interaction landscape of escherichia coli. Nat. Biotechnol. , 32(3):285–290, 2014. L. Jure and A. Kre vl. SN AP Datasets: Stanford large network dataset collection. http://snap.stanford.edu/data , June 2014.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment