A New Space for Comparing Graphs

A Ne w Space for Comparing Graphs Anshumali Shrivasta va Departmen t of Computer Scien ce Computing and In forma tion Science Cornell Un iv ersity Ithaca, NY 1 4853 , USA Email: an shu@cs.cornell. edu Ping Li Departmen t of Statistics and Biostatistics Departmen t of Com puter Scien ce Rutgers Un iv ersity Piscataw ay , NJ 08 854, USA Email: p ingli@stat.rutgers.ed u Abstract —Finding a new mathematical repre sentations fo r graph, which allows di rect comparison between different graph structures, is an open-end ed resea rch direction. Ha ving such a repre sentation is the ﬁrst prerequisite f or a v ariety of ma- chine lear ning algo rithms like classiﬁcation, clustering, etc., ov er graph datasets. In this paper , we propose a symmetric p ositiv e semideﬁnite matrix with the ( i, j ) -th entry equal to the co var iance between normalized vectors A i e and A j e ( e being vector of all ones) as a representation for graph wi th adj acency matrix A . W e sh ow that th e proposed matrix repr esentation encodes the spectrum of the underlying adjacency matrix an d it also contains informa tion about the count s of small sub-stru ctures present in the graph such as trian gles and small paths. In addition, we sh ow that this matrix is a “graph invaria nt” . All these properties make the proposed matrix a suitable object for representing graphs. The r epresentation, b eing a covariance matrix in a ﬁxed di- mensional metric space, gives a mathematical embed ding f or graphs. This naturally leads to a measure of similarity on graph objects. W e deﬁne similarity b etween two given graphs as a Bhattacharya similarity mea sure between their corresponding cov ariance matrix representations. As shown in our experimen- tal study on the task of social networ k classiﬁcation, such a similarity measure outperforms other widely u sed state-of-the- art methodologies. Our proposed method is also computationally efﬁcient. The computation of both the matrix representation and the similarity value can be performed in operations l inear in the number of edges. This makes our method scalable in practice. W e believe our theoretical and empirical results provide evidence fo r studying truncated power iterations, of the adjacency matrix, to characterize social networks. I . I N T R O D U C T I O N The study of social networks is b ecoming inc reasingly popular . A wh ole new set of info rmation about a n in dividual is gained b y analy zing the data that is d erived from his/her so cial network. Persona l social network of an individual consisting only of neig hbor s and conn ections betwe en them, also known as “ego network”, h as r ecently grabbed signiﬁcant atten- tion [20], [26]. This ne w view of the gigantic incompre hensible social ne twork as a c ollection of small informative overlapping ego networks gener ates a huge c ollection of g raphs, which leads to a closer and mo re tra ctable investigation. This enorm ous collection of ego ne tworks, on e centered at each user , opens do ors fo r many interesting po ssibilities which were not explored before. For in stance, c onsider the scientiﬁc co llaboration ego n etwork of an individual. It is known tha t collabora tion follows d ifferent pattern s across dif- ferent ﬁelds [22]. Some scientiﬁc commun ities are more tightly linked am ong themselves co mpared to other ﬁelds ha v ing fewer dep enden cies among the collabora tors. For in stance, people working in experimental hig h en ergy phy sics ar e very much dep enden t on specialized lab s world wide (for example CERN), and hence it is more likely tha t scientists in this ﬁeld have a lot of collabor ation amon g the mselves. Collab oration network in su ch a scientiﬁc do main will exhib it mo re densely connected ne twork comp ared to other ﬁelds wh ere peop le prefer to work more independ ently . The p eculiarity in the collabora tion network gets reﬂected in the ego network a s well. For an individual belong ing to a more tightly conn ected ﬁeld , suc h as high energy p hysics, it is more likely that there is collabor ation a mong the individual’ s coautho rs. Thu s, we can expect th e co llaboration ego network of an in dividual to contain information ab out the ch aracteristic of h is/her research . By u tilizing th is info rmation, it sh ould be possible to discrim inate (classify) between scientists based on the ego networks of their collabo ration. T his inform ation can be u seful in m any applications, for instance, in user based recommen dations [21], [11], recommen ding jo bs [23], discov- ering new co llaboratio ns [4], citatio n re commen dations [12]. The f ocus of this pap er is on social network classiﬁcation or eq uiv alently g raph classiﬁcation. T he ﬁrst prereq uisite fo r classifying networks is ha ving t he “right” measure of similarity between different grap h structu res. Find ing suc h a similar ity measure is directly related to the prob lem of compu ting meaningf ul mathematical embed ding of network structu res. In this work, we address this fund amental p roblem of ﬁn ding an approp riate tractable mathematical r epresentation fo r g raphs. There are many theories that show the peculiarities of social networks [25], [2], [17]. F or instance, it is known th at the spectrum of the adjacen cy matrix of a rea l-world grap h is very speciﬁc. I n particular, it has been o bserved that scale-f ree graphs develop a triangle like spectral density with a power- law tail, wh ile sm all-world grap hs have a comp lex spectral density consisting of several sharp p eaks [9]. Despite such insight into social graph structur es, ﬁnding a m eaningfu l math- ematical representatio n for these networks where v arious graph structures can be d irectly compar ed or analyzed in a comm on space is an u nderstudie d ar ea. No te that the eigenv alu es of a graph, which characterize its spectrum, are not dire ctly compara ble. M oreover , the eigen values as featur e vector is not a common space becau se a larger graph will have more number of signiﬁcant eigen values compar ed to a smaller gr aph. Recently it was shown that rep resenting graph s as a no r- malized frequ ency vector, by coun ting the num ber of occu r- rences of various small k -size subgraph s ( k = 3 or 4), leads to an informative re presentation [24], [26]. It was shown that this rep resentation naturally mo dels known distinctiv e social network characteristics like the “ triadic closur e ”. Compu ting similarity between two gra phs as the in ner prod uct between such frequency vector representatio ns leads to the state-o f-the- art social network classiﬁcation alg orithms. It is n ot clear that a histogram b ased only on cou nting small subgraph s su fﬁciently captures all th e pr operties of a graph stru cture. Only co unting small k -subg raphs ( k = 3 or 4) loses informa tion. It is also not very clear what is the right size k tha t provides the right tradeoff between computation and expressiv eness. For instance, we observe that ( see Section VII) k = 5 leads to imp rovement over k = 4 but it com es with a signiﬁcan t compu tational cost. Alth ough, it is known that histograms based on countin g subgraphs of size k c an be reasonably approxim ated by sam pling few in duced sub graph s of size k , counting su bgraph s with k ≥ 5 is still c omputatio n- ally expen siv e b ecause it req uires testing the given sampled subgrap h with the representative set of g raphs for isomor phism (see Section VII). Fin ding o ther rich rep resentation for graph , which aptly captures its behavior and is also computation ally inexpensive, is a n important r esearch dire ction. One challen ge in meanin gfully r epresenting grap hs in a common spac e is the basic req uiremen t that isomorph ic graphs should m ap to the same object. Featu res b ased o n c ounting substructure s, for example the frequ ency o f subgr aphs, satisfy this require ment by default but en suring this p roperty is not trivial if we take a no n-cou nting based ap proach . Our Cont ribu tions: W e ta ke an alternate route and ch arac- terize graph based on the truncated power iteration of the correspo nding adjacency matrix A , starting with the vector of all ones denoted by e . Such a power iteration gener ates vector A i e in the i th iteration. W e argue th at the covariance between vectors o f the form A i e an d A j e , given some i a nd j , is a n inform ativ e feature for a given graph. W e show that these covariances ar e “g raph inv ariants”. Th ey also contain informa tion a bout the spectrum of the adjace ncy matr ix which is an importan t c haracteristic of a r andom grap h [5]. In addition, takin g an altogethe r different v iew , it ca n be shown that these covariances are also related to the coun ts of sma ll local stru ctures in the given graph. Instead o f a histogram based fea ture vector representatio n, we represent graph as a symmetric positi ve semideﬁnite cov ari- ance matrix C A whose ( i, j ) -th entry is the cov ariance between vectors A i e a nd A j e . T o the b est of ou r kn owledge this is the ﬁrst r epresentation of its kin d. W e furth er comp ute sim ilar- ity between two given g raphs as the stan dard Bhattacharya similarity between th e co rrespon ding cov ariance matrix repre- sentations. Ou r pro posal follows a simple p rocedu re in volving only matrix vector m ultiplications and summ ations. Th e entire proced ure can be compu ted in time linear in the numb er of ed ges which makes our app roach scalable in pra ctice. Similarity based on this new representation outper forms exiting methods on the task of real social n etwork classiﬁcation. For example, using the similarity based on the histogram based representatio n, by countin g th e number o f small subg raphs, perfor ms poorly compared to th e p roposed mea sure. Th ese encour aging results provid e motiv ation for study ing power iteration of the adjacency matrix for social network analysis. In ad dition to the above contributions, this p aper p rovides some in teresting in sights in the do main of the collabora tion networks. W e show that it is po ssible to distingu ish researchers working in different experimental phy sics sub-d omains just based on th e ego network of the researcher ’ s scien tiﬁc col- laboration . T o the b est of our knowledge this is the ﬁrst work that explores the in formatio n contain ed in the ego network of scientiﬁc collab oration s. The r esults presented cou ld be of indepen dent interest in itself. I I . N OTA T I O N S A N D R E L A T E D C O N C E P T S The focu s o f this work is o n und irected, un weighted and connected graph s. Any graph G = { V , E } , with | V | = n and | E | = m , is r epresented by an adjace ncy matrix A ∈ R n × n , where A i,j = 1 if and only if ( i, j ) ∈ E . For a m atrix A, we use A ( i ) , (:) ∈ R 1 × n to denote the i th row of matrix A , while A (:) , ( j ) ∈ R n × 1 denotes its j th column. W e u se e to deno te the vector with all c ompon ents b eing 1. Dimension of vector e will be implicit depend ing on the ope ration. V ectors are by default co lumn vector s ( R n × 1 ). The transpose of a matrix A is denoted b y A T , deﬁned a s A T i,j = A j,i . For a vector v , we use v ( i ) to d enotes its i th compon ent. T w o gra phs G and H ar e isomorphic if there is a bijectio n between th e vertex sets of G and H , f : V ( G ) → V ( H ) , such that any two vertices u, v ∈ V G are a djacent in G if and only if f ( u ) an d f ( v ) ar e a djacent in H . E very permutatio n π : { 1 , 2 , .., n } → { 1 , 2 , .., n } is associated with a cor respond ing permutation matrix P . The m atrix ope rator P left multiplied to matr ix A shu fﬂes the r ows accord ing to π while rig ht multiplicatio n with P shu fﬂes the colu mns, i.e., matrix P A can be obtained by shufﬂing the rows o f A under π and AP can be ob tained by shu fﬂing the column s of A under π . Giv en an adja cency matr ix A , grap hs cor respond ing to adjacency m atrix A and P AP T are isomo rphic, i.e., they represent th e same graph structur e. A p roper ty of grap h, which does not change under th e transformation of reo rdering of vertices is called Graph In variant . For ad jacency matr ix A , let λ 1 ≥ λ 2 ≥ ... ≥ λ n be the eigenv a lues and v 1 , v 2 , ...v n be the correspondin g eig en vectors. W e den ote th e com ponen t-wise sum of the eigenvectors by s 1 , s 2 , ..., s n , i. e., s i denotes the c ompon ent-wise sum of v i . A p ath p of length L is a sequen ce of L + 1 vertices { v 1 , v 2 , ...v L +1 } , such that there exists an edg e between any two consecutive terms in the sequ ence, i.e. , ( v i , v i +1 ) ∈ E ∀ i ∈ { 1 , 2 , ..., L } . An edge x b elongs to a p ath p = { v 1 , v 2 , ...v L +1 } if there exists i such that x = ( v i , v i +1 ) . In our an alysis, we can have pa ths with repea ted nod es, i.e. we w ill encoun ter pa ths where v i = v j for i 6 = j . A path will be called “simple” if th ere is no such repetition of nodes. Formally , a simp le path of length L is a p ath of len gth L , such that, v i 6 = v j whenever i 6 = j . T wo paths p and q are different if ther e exist an edge e , such that either of th e two condition s ( e ∈ p and e / ∈ q ) or ( e ∈ q an d e / ∈ p ) h olds, i.e., ther e exists one edg e which is not con tained in o ne of th e paths but contain ed in the o ther . W e den ote the nu mber of all the d ifferent “simple p aths” of len gth L in a given gr aph by P L and the to tal nu mber of triang les by ∆ . For clarity we will use [] to highlight scalar qu antities such as [ e t Ae ] . I I I . G R A P H S A S A P O S I T I V E S E M I D E FI N I T E M A T R I X A graph is fully descr ibed b y its adjacency matrix. A good characterizatio n of a m atrix o perator is a sma ll h istory of its power iteration . P ower iteration o f a m atrix A ∈ R n × n on a giv en starting vector v ∈ R n × 1 computes n ormalized A i v ∈ R n × 1 in the i th iteration. In one of the ear ly results [1 6], it was s hown that the characteristic poly nomial of a matrix can be comp uted by using the set of vector s g enerated from its truncated power iterations, i. e., { v , Av , A 2 v , ..., A k v } . This set of vectors are more commo nly known as the “ k -order Kry lov subspace” of matrix A . Th e “Krylov subspace ” lead s to some of the fast linear algebraic algo rithms for sparse matrices. In web domain , power iteration are used in kn own algor ithms inclu ding P age- rank and HI TS [1 4]. It is also k nown [19] tha t a truncated power iteration o f the da ta similarity matrix leads to in for- mativ e feature rep resentation for clusterin g. Thus, the k -order Krylov s ubspace for some ap propr iately chosen k con tains sufﬁcient inform ation to describe the associated matrix. T o r epresent grap hs in a common mathematical space, it is a basic requirem ent that two isomorphic graph s sho uld map to the same ob ject. Although the k -order Krylov subspace characterizes the adjacency matrix, it can not b e directly used as a com mon represen tation fo r the associated graph, because it is sensitive to the reor dering of nod es. G i ven a p ermutation matrix P , the k -or der Kr ylov subspa ces o f A and P AP T can be very different. In other words the m apping M : A → { v , Av , A 2 v , ..., A k v } is no t a “graph invariant” mapping. Note th at A an d P AP T represent same graph stru cture with different order ing of n odes and hen ce are same entities f rom graph p erspective but no t from the matrix p erspective. It turns ou t that if we use v = e , the vector of all ones, then the c ovariances between th e different vectors in the power iteration are “graph invariant” (see T heorem 1 ), i.e., their values do not chang e with the spurio us reord ering of t he nodes. W e start by deﬁning our covariance ma trix representation for the gi ven graph, and the algorithm to compute it. In later sections we will argue why such a represen tation is suitable for discriminating b etween grap h structur es. Giv en a graph with ad jacency matrix A ∈ R n × n and a ﬁx ed nu mber k , we compu te the ﬁrst k ter ms o f p ower iteration, which gen erates norm alized vectors of th e form A i e i ∈ { 1 , 2 , ..., k } . Since we start with e , we cho ose to normalize the sum eq ual to n f or the ease o f analysis. After generating k vector s, we compu te matrix C A ∈ R k × k where C A i,j = C ov ( nA i e || A i e || 1 , nA j e || A j e || 1 ) , as summ arized in Algor ithm 1. Algorithm 1 CovarianceRep r esentation(A,k) Input: Adjacency matrix A ∈ R n × n , k , the number of power iterations. I nitialize x 0 = e ∈ R n × 1 . for t = 1 to k do M (:) , ( t ) = n × Ax t − 1 || Ax t − 1 || 1 , x t = M (:) , ( t ) end for µ = e ∈ R k × 1 C A = 1 n P n i =1 ( M ( i ) , (:) − µ )( M ( i ) , (:) − µ ) T return C A ∈ R k × k Algorithm 1 maps a given graph to a positive semideﬁnite matrix, which is a graph inv ariant. Theor em 1: C A is symme tric positiv e semideﬁnite. For any given per mutation matrix P we have C A = C P A P T , i.e., C A is a grap h inv ariant. Pr oo f: C A is sam ple covariance matrix of M ∈ R n × k and hence C A is symmetric positiv e semideﬁn ite. Using the identity P T = P − 1 , it is not difﬁcult to sh ow that for any permutatio n matrix P , ( P AP T ) k = P A k P T . This along with the fact P T × e = e , yields ( P AP T ) i e = P × A i e. (1) Thus, C P A P T i,j = C ov ( P × A i e, P × A j e ) . Th e pr oof follows from the fact th at shufﬂing vectors u nder same permu tation does not c hange th e value of covariance between them , i.e. , C ov ( x, y ) = C ov ( P × x, P × y ) which implies C A i,j = C P A P T i,j ∀ i, j Note th at the c onv erse of Theorem 1 is no t true. W e can n ot hope for it bec ause then we would have solved the intractable Graph Isomorphism Pr oblem by u sing this tractab le matrix representatio n. For examp le, consider ad jacency m atrix of a regular graph. It has e as one of its eigenvectors with eigenv a lue equal to d , the constant d egree of the regular graph. So, we have A i e = d i e and C ov ( d i e, d j e ) = 0 . Th us, all regular g raphs are map ped to the same zero matrix. Perfectly regular grap hs never o ccur in practice , there is always some variation in th e degree distribution of real-world grap hs. For non regular g raphs, i.e. , wh en e is not a eigenvector of the adjacency matrix, we will sh ow in the Sectio n IV th at the propo sed C A representatio n is informative. Alternate Motivation: Graphs as a Set of V ec tors . Th ere is an alternate w ay to motiv ate this representation and Theorem 1 . At time t = 0 , we start with a value of 1 on each of th e nodes. At every time step t w e update every v alue o n each node to the sum of num bers, f rom time t − 1 , on e ach of its neighb ors. It is no t d ifﬁcult to show that under this process, for Node i , at time step t we ob tain A t e ( i ) . These kind of updates are key in m any link analysis algorithms in cluding Hy per-text Induced T opic Search (HITS) [14]. Ignorin g n ormalization the sequence of number s obtained over time, b y su ch p rocess, on no de i correspo nds to the row i of the matr ix M . Eq . (1) simp ly tells us that reor dering of nodes u nder any per mutation do es not affect the sequ ence of these num bers ge nerated on each node. Hence, we can associate a set of n vectors, the n rows of M ∈ R n × k , with graph G . This set of vectors do no t c hange with reo rdering of no des, they just shufﬂe among them selves. W e are ther efore looking for a mathematical representation that describes this set of n ( k dimensional) vectors. Probab ility distributions, in pa rticular Gaussian, are a natu ral way to model a set of vectors [15]. The id ea is to ﬁn d the maximum likelihood Gaussian distribution ﬁtting the given set o f vectors and use th is distribution, a mathematical objec t, as the req uired representatio n. Note that this distribution is inv ariant under the or dering of vectors, and hence we get Theore m 1 . Th e central com ponen t o f a multiv ariate Gaussian d istribution is its covariance matr ix and this natu rally motiv ate us to study the object C A , which is the c ovariance matrix of row vector s in M associated with the gra ph. I V . M O R E P RO P E RT I E S O F M AT R I X C A In this section , we argue that C A encodes key features of the given g raph, mak ing it an inf ormative repre sentation. In par ticular, we show that C A contains inf ormation abou t the spectr al properties o f A as well as th e cou nts of small substructure s p resent in the grap h. W e assume that the graph is not perf ectly regular , i.e., e is n ot one of the eigenvectors of A . This is a reasonab le assumption because in real n etworks there are always ﬂuctu ations in the degree distribution. W e ﬁrst start by showing conne ctions b etween the m atrix C A and the spectral proper ties of A . See Sec tion II for the notation, for examp le, λ t and s t . Theor em 2: C A i,j =  n ( P n t =1 λ i + j t s 2 t ) ( P n t =1 λ i t s 2 t )( P n t =1 λ j t s 2 t )  − 1 Pr oo f: The mean of vector A i e can be wr itten as [ e T A i e ] n . W ith this ob servation the cov ariance b etween no rmalized A i e and A j e (which is equa l to C A ( i, j ) ) can be wr itten as C ov ( A i e, A j e ) = 1 n  n A i e [ e T A i e ] − e  T  n A j e [ e T A j e ] − e  = 1 n  n 2 [ e T A i + j e ] [ e T A i e ][ e T A j e ] − n − n + e T e  =  n [ e T A i + j e ] [ e T A i e ][ e T A j e ]  − 1 Thus, we h av e C A i,j =  n [ e T A i + j e ] [ e T A i e ][ e T A j e ]  − 1 (2) T o compute [ e T A i e ] , we use the fact that the vector A i e can be wr itten in terms of eigen values an d eig env ectors of A as A i e = [ s 1 λ i 1 ] v 1 + [ s 2 λ i 2 ] v 2 + ... + [ s n λ i n ] v n . (3) This fo llows fro m the representation of e in the eigenb asis of A , i.e., e = s 1 v 1 + s 2 v 2 + ... + s n v n . Using the eigenvector proper ty A i v t = λ t v t , we have [ e T A i e ] = n X t =1 λ i t s t [ e T v t ] = n X t =1 λ i t s 2 t Substituting this value for terms [ e T A i e ] in Eq. (2) leads to the desired expression. Remarks on Theorem 2: W e c an see th at different elem ents of m atrix C A are ratios of polyno mial expre ssions in λ t and s t . Gi ven C A , recovering values of λ t and s t ∀ t boils d own to solving a set of no nlinear polyn omial eq uations of the fo rm giv en in Th eorem 2 for d ifferent values of i and j . For a giv en value o f k , we obtain a set o f k ( k +1) 2 different suc h equ ations. Although it may be hard to character ize the solution o f this set of equ ations, but we can n ot expect many com binations of λ t and s t to satisfy all such equatio ns, fo r some reaso nably large value of k ( k +1) 2 . Th us C A can b e thou ght of as an almost lossless encoding of λ t and s t ∀ t . It is known that there is sharp concen tration of eigenv alues of adjacency m atrix A f or random graphs [ 5]. The eigen values of adjacency matrix for a ran dom Erdos-Reyni g raph follows W ig ner’ s semi-circle la w [30] while for po wer law graphs these eigenv a lues obeys power law [5]. These p eculiar distributions of th e eige n values are captured in the elem ents of C A i,j which are th e ra tios of different polyn omials in λ i . Hen ce we can expect the C A representatio ns, for graph s ha ving different spectrum, to be very d ifferent. In Theo rem 2, we have sho wn that the representation C A is tightly linked with the spectrum of adjacen cy matrix A , wh ich is an importan t char acteristic of th e given grap h. It is fu rther known that the counts of v arious s mall local substructures contained in the g raph such as the numbe r of triangles, nu mber of sma ll p aths, etc ., are also imp ortant fe atures [26]. W e next show that the matrix C A is actu ally sen siti ve to th ese counts. Theor em 3: Gi ven the adjacency matrix A of an undirected graph w ith n n odes a nd m edg es, we have C A 1 , 2 = n 2 m 3∆ + P 3 + n ( V ar ( deg )) + m  4 m n − 1  ( P 2 + m ) ! − 1 where ∆ den otes th e total num ber o f triangles, P 3 is the total number of distinc t simple pa ths of length 3, P 2 is th e total number of distinct simple paths of len gth 2 and V ar ( deg ) = 1 n n X i =1 deg ( i ) 2 − 1 n n X i =1 deg ( i ) ! 2 is the variance of degree. Pr oo f: From Eq. (2), we have C A 1 , 2 =  n [ e T A 3 e ] [ e T Ae ][ e T A 2 e ]  − 1 (4) The term [ e T Ae ] is the sum of all elements of adjacency m atrix A , w hich is eq ual to twice the number of edg es. So, [ e T Ae ] = 2 m, (5) W e need to quan tify other terms [ e T A 2 e ] and [ e T A 3 e ] . Th is quantiﬁcation is pr ovided in the two Lemmas b elow . Lemma 1: [ e T A 2 e ] = 2 m + 2 P 2 . Pr oo f: W e start with a simp le observation that the value of A 2 i,j is equal to the number of paths of length 2 between i and j . Thus, [ e t A 2 e ] , wh ich is the sum of all th e elem ents of A 2 , c ounts a ll p ossible paths of leng th 2 in the (undire cted) graph twice. W e sho uld also have to cou nt paths of length 2 with repeated nodes bec ause un directed edg es go both ways. There are two po ssible ty pes of paths o f length 2 as shown in Figu re 1: i) Node r epeated p aths of length 2 and ii) simple paths of leng th 2 having no node repetitions. Node repeated paths of length 2 have only on e p ossibility . It mu st be a lo op of length 2, which is just an edg e a s shown in Figure 1(a). The to tal contribution of such no de repeated paths ( or edges) to [ e T A 2 e ] is 2m. By our notatio n, the total number of simple paths of len gth 2 (Figur e 1(b)) in th e given graph is P 2 . Both sets of paths are d isjoint. Thus, we h av e [ e T A 2 e ] = 2 m + 2 P 2 as r equired . Q P R P (a) (b) Q Fig. 1. Possible types of paths of length 2, each of these two structure s is counted twice in the expression [ e T A 2 e ] . a) (Node Repea ted Pa ths): Every edge lea ds to two paths P → Q → P and Q → P → Q b) (Simple paths): Every simple path of length 2 is counte d twice, here P → Q → R and R → Q → P are the two path s contrib uting to the term [ e T A 2 e ] . Lemma 2 : [ e T A 3 e ] = 6∆ + 2 P 3 + 2 n ( V a r ( deg )) + 2 m  4 m n − 1  , where V ar ( deg ) = 1 n n X i =1 deg ( i ) 2 − 1 n n X i =1 deg ( i ) ! 2 Pr oo f: On similar lines a s Lemma 1, A 3 i,j counts n umber of different p aths of length 3. Th ere are 3 different kind s of paths of length 3 , as explain ed in Figure 2, which we n eed to consider . W e can coun t the con tribution from each of these types indep endently as their con tributions do n o overlap an d so there is no dou ble cou nting. Again [ e T A 3 e ] is twice the sum o f the total num ber of all such path s. Simple paths: Just like in Lemma 1, any simple path without node rep etition (Figur e 2(c)) will be cou nted twice in the term [ e T A 3 e ] . Their total contribution to [ e T A 3 e ] is 2 P 3 . P 3 is the total num ber of simple paths with length 3. T riangles: A triang le is the only possible loo p of length 3 in the g raph and it is co unted 6 times in the term [ e T A 3 e ] . There are two orien tations in which a trian gle can be cou nted from each of the three participating n odes, ca using a factor of 6. For instance in Figure 2(b), from node P th ere are 2 loops of length 3 to itself, P → R → Q → P and P → Q → R → P . There are 2 such lo op for each of the contributing no des Q and R . Thus, if ∆ den otes the num ber of different triangles in the gra ph, then this type o f structure will contribute 6∆ to the term [ e T A 3 e ] . Node Repeated Paths: A pecu liar set o f paths of length 3 are gener ated because of an edge ( i, j ) . I n Figure 2(a), consider nod es P an d Q , there are many paths of length 3 with repeated nodes between P an d Q . T o go fro m P to Q , we can choose any o f th e neighb ors of Q , say V and then there is a cor respond ing path P → Q → V → Q . W e can also choose a ny neig hbor of P , say R a nd we have a path P → R → P → Q of len gth 3. Thus, given an ed ge ( i, j ) , the total nu mber of node repeated paths of length 3 is N odeRep e atedP ath ( i, j ) = deg ( i ) + deg ( j ) − 1 . Note that the path P → Q → P → Q , will be counted twice and therefore we subtract 1. Th us, th e total con tribution of these kinds o f pa ths in the term [ e T A 3 e ] is X ( i,j ) ∈ E ( deg ( i ) + deg ( j ) − 1) , (a) Q P U T V P P Q Q R R S (b) (c) R S Fig. 2. All possible types of paths of length 3, from dif ferent structures contrib utin g to the term [ e T A 3 e ] . a) (Node Repeated Paths): T here are 6 paths of length 3 from P t o Q (and vice versa) with repeated nodes like P → Q → V → Q . The total number of paths of length 3 due to edge betwee n i and j is equal to deg ( i ) + deg ( j ) − 1 . b) (Triangl es): A triangle is counted 6 times in the expression [ e T A 3 e ] , in two differen t orientati ons from each of three nodes. c) (Simple Paths): A simple paths with no node repeti tion of length 3 will be counted twice in [ e T A 3 e ] . Since the g raph is undirected both ( i, j ) ∈ E = ⇒ ( j, i ) ∈ E , so w e d o n ot have to use a factor o f 2 like we did in other cases. W e have X ( i,j ) ∈ E ( deg ( i ) + deg ( j ) − 1) = n X i =1 X j ∈ N g h ( i ) ( deg ( i ) + deg ( j ) − 1) = n X i =1 ( deg ( i ) 2 − deg ( i )) + n X i =1 X j ∈ N g h ( i ) deg ( j ) = 2 n X i =1 deg ( i ) 2 − n X i =1 deg ( i ) Adding con tributions of all possible types of p aths and using P n i =1 deg ( i ) = 2 m yields Lemm a 2 after som e alge bra. Substituting fo r the term s [ e T A 2 e ] an d [ e T A 3 e ] in E q. (4) from Lemmas 1 and 2 leads to the desired expre ssion. Remarks on Theorem 3: From its pro of, it is clear that terms of th e form [ e T A t e ] , for sma ll values of t like 2 o r 3, are weighted combin ations of coun ts of small sub-stru ctures like triangles and small paths along with g lobal features like degree variance. Th e key ob servation behind the p roof is th at A t i,j counts paths (with repeated nodes and edges) of length t , which in turn can b e decom posed into disjoin t structures over t + 1 nodes an d can be c ounted separa tely . Extendin g th is an alysis for t > 3 , in volves dealing with more comp licated big ger patterns. For instan ce, while co mputing the term [ e T A 4 e ] , we will encou nter counts of q uadrilater als along with more complex patter ns. T he r epresentation C A is informative in that it captu res all such inf ormation and is sensitiv e to the co unts of these different sub structures pr esent in the graph. Empirical Evidence for Theorem 3: T o empir ically validate Theorem 3, we took p ublicly available twitter graphs 1 , which consist of arou nd 95 0 ego networks of users on twitter [ 20]. These graphs have arou nd 130 node s and 1700 ed ges on an av erage. W e com puted the value o f Σ A e (1 , 2) fo r each g raph (and the m ean an d stand ard e rror). I n addition, fo r ea ch twitter graph, we also g enerated a correspon ding random graph with same n umber of no des and edg es. T o generate a ran dom graph, 1 http:/ /snap.stanfor d.edu/data/e gonets-T witter .html we start with the req uired num ber of no des and th en select two nodes at ran dom and add an edge between th em. The pro cess is repea ted until the graph has the same nu mber of edges as the twitter gr aph. W e then com pute the value of Σ A e (1 , 2) for all these ge nerated rand om graph s. The mean ( ± standard erro r , SE) value of Σ A e (1 , 2) fo r twitter graph s is 0.61 88 ± 0 .0099 , while for the rand om grap hs this value is 0 .0640 ± 0.0 033. The m ean ( ± SE) nu mber of triangle fo r twitter ego network is 14384 .16 ± 819.39, while that for rand om graphs is 4578. 89 ± 406. 54. It is known that social n etwork graphs h av e a hig h value of triadic closure pr obab ility compared to rand om graphs [8]. For any 3 rando mly chosen vertices A, B an d C in the grap h, triadic closure probab ility (common fr iendships induce new fr iendships) is a pr obability of having an edge AC condition al on the e vent that the graph alread y has edges AB and BC. Social network graphs hav e mor e triang les com pared to a rand om graph . Thus, Theorem 3 suggests that the value of Σ A e (1 , 2) would be high for a social network g raph co mpared to a r andom grap h with same numbe r o f nodes and edges. Combining Theo rems 2 and 3, we can infer that ou r propo sed representatio n C A encodes important informatio n to discriminate b etween d ifferent network structur es. Th eorem 1 tells us that this o bject is a g raph inv ar iant and a covariance matrix in a ﬁxed d imensional space. Henc e C A is directly compara ble between different g raph stru ctures. V . S I M I L A R I T Y B E T W E E N G R A P H S Giv en a ﬁxed k , we have a representatio n f or graphs in a common mathema tical spac e, the space o f symmetric positive semideﬁnite matrices S k × k , whose mathem atical properties are well und erstood. In particular, the re are standa rd notions of similarity between such matrices. W e deﬁne similarity between two graphs, with ad jacency matr ices A ∈ R n 1 × n 1 and B ∈ R n 2 × n 2 respectively , as the Bhattachary a similarity between correspo nding covariance matrices C A and C B respectively: S im ( C A , C B ) = exp − Dist ( C A ,C B ) (6) D ist ( C A , C B ) = 1 2 log det (Σ) p ( det ( C A ) det ( C B )) ! Σ = C A + C B 2 Here, de t () is the de terminant. Note th at C A ∈ R k × k and C B ∈ R k × k are co mputed using the same value of k . W e summarize th e p rocedu re of com puting similarity between two graphs with adjac ency matrices A and B in Algo rithm 2. Algorithm 2 ComputeS imilarity(A,B,k) Input: Adjacency matrices A ∈ R n 1 × n 1 and B ∈ R n 2 × n 2 , k , the n umber o f p ower iterations. C A = C ov ar ianceR epr esenta t i on ( A, k ) C B = C ov ar ianceR epr esenta t i on ( B , k ) return S im ( C A , C B ) co mputed using Eq . (6) Theor em 4: The similarity S im ( C A , C B ) , deﬁned be - tween grap hs with adjacen cy matr ices A an d B , is positive semideﬁnite and is a valid kernel. This similarity is p ositiv e semideﬁnite, which follows fro m the fact that the Bhattacha rya similarity is p ositiv e semideﬁnite. Thus, the similarity function deﬁned in Eq . (6) is a valid kernel [ 13] a nd hence can be directly used in existing mac hine learning algo rithms operating over kernels such as SVM. W e will see perform ance of this kernel on the task o f social network c lassiﬁcation later in Section VI. Although C A is d etermined by the sp ectrum of ad jacency matrix A , we will see in Sectio n VI-C, th at simply tak ing a featu re vector of graph inv ar iants such as eigenv alues and computin g the vector inner prod ucts is not th e right way to compute similarity between graph s. It is c rucial to co nsider th e fact tha t w e are work ing in the space of positi ve semideﬁnite covariance matrices and a similarity measure should utilize the mathematical structure of the space u nder co nsideration . A. R ange for V alues of k Our repr esentation spac e, the space of symme tric positiv e semideﬁnite matrices, S k × k is d ependen t on the cho ice of k . In general, we only n eed to look at small values of k . It is k nown that power iteration c onv erges at a geom etric rate of λ 2 λ 1 to th e largest eigenvector of the matr ix, and hence covariance between no rmalized A i e and A j e will co nv erge to a constant very quickly as the values of i and j increase. Thus, large values o f k will make the matrix sing ular and hurt the repr esentation. W e there fore want the value of k to b e reasonably sm all to av o id singularity o f matr ix C A . The exact choice of k d epends on the dataset u nder consideration . W e observe k = 4 ∼ 6 sufﬁces in gener al. B. Comp utation Complexity For a chosen k , comp uting the set of vectors { Ae, A 2 e, A 3 e, ..., A k e } recu rsiv ely as d one in Alg orithm 1 has comp u- tation com plexity of O ( mk ) . No te that the num ber of nonzeros in matrix A is 2 m and each o peration in side the for-loop is a sparse matrix vector multiplicatio n, which has co mplexity O ( m ) . Computin g C A requires summatio n of n outer prod ucts of vectors of dimension k , which h as complexity O ( nk 2 ) . The total complexity of Algorithm 1 is O ( mk + nk 2 ) . Computing similarity between two gr aphs, with ad jacency matrices A an d B in addition req uires compu tation o f Eq. (6), which in v olves compu ting deter minants o f k × k matrices. This operation has computation al co mplexity O ( k 3 ) . Let the number of n odes and edges in the two graph s be ( n 1 , m 1 ) and ( n 2 , m 2 ) respectively . Also, let m = max( m 1 , m 2 ) and n = max( n 1 , n 2 ) . Co mputing similarity using Algorithm 2 requires O ( mk + nk 2 + k 3 ) computatio n time. As argued in Section V -A, the value o f k is alw ays a small constant like 4, 5 or 6. Thus, the total time comp lexity of compu ting the similarity between two graphs reduces to O ( m + n ) = O ( m ) (as usually m ≥ n ). The most costly step is the m atrix vector multiplicatio n which can be ea sily parallelized, for example on GPUs, to obtain fu rther speedups. This makes our propo sal easily scalable in practice. V I . S O C I A L N E T W O R K C L A S S I FI C A T I O N In this section, we d emonstrate th e usefuln ess of the propo sed r epresentation fo r grap hs and th e new similarity T ABLE I. G R A P H S TA T I S T I C S O F E G O - N E T W O R K S U S E D I N T H E PA P E R . T H E “ R A N D O M ” D AT A S E T S C O N S I S T O F R A N D O M E R D O S - R E Y N I G R A P H S ( S E E S E C T I O N V I - A F O R M O R E D E TA I L S ) ST A TS High Energy Condensed Matter Astro Physics T witter Random Number of Graphs 1000 415 1000 973 973 Mean Number of Nodes 131.95 73.87 87.40 137.57 137.57 Mean Number of Edges 8644.53 410.20 13 05.00 1709.20 1709.20 Mean C lustering Coefﬁcient 0. 95 0.86 0.85 0.55 0.18 measure in some inte resting graph classiﬁcation tasks. W e start by d escribing these tasks and the co rrespon ding datasets. A. T a sk a nd Datasets Finding p ublicly available datasets for graph classiﬁcation task, with meaning ful labe l, is difﬁcult in the do main of so cial networks. Howev er , d ue to th e incre asing availability of m any different network stru ctures 2 we can create interesting an d meaningf ul classiﬁcation tasks. W e cr eate two social n etwork classiﬁcation tasks fr om real networks. 1. Ego Network Classiﬁcation in Scientiﬁc Collabora tion (COLLAB): Different research ﬁelds h ave different co llabo- ration patter ns. For instan ce, researche rs in exper imental h igh energy physics are depen dent on few specialized labs world- wide (e.g., CERN). Because of th is dependen cy on specialized labs, v arious research grou ps in such domains are tightly linked in terms of collabo ration comp ared to other do mains wh ere more indep enden t resear ch is p ossible. It is an interesting task to classify the research area of an individual by taking in to account the inf ormation contained in the structu re of his/her ego collab oration network. W e used 3 pu blic collabo ration network datasets [ 18]: 1) High energy p hysics collaboration network 3 , 2) Condensed matter physics collaboration network 4 , 3) Astro physics collab- oration network. 5 These networks are ge nerated from e- print arXiv and cover scien tiﬁc collabor ations betwee n author’ s pa- pers submitted to respective categories. I f au thor i co- author ed a pap er with author j , the g raph contains an u ndirected edge from i to j . If the paper is co -author ed by p au thors, this generates a comp letely conn ected subg raph o n p nodes. T o generate meanin gful ego-networks fro m e ach of these huge c ollaboratio n networks, we select dif ferent users who have collab orated with mo re than 50 researchers and extrac t their e go n etworks. The ego network is the subgraph con- taining the selected node along with its neighb ors and all the interco nnection s among them. W e rand omly choose 1000 such user s fro m each of th e h igh energy p hysics collab oration network and the astro phy sics collabo ration network. I n ca se of conden sed matter physics, the collab oration n etwork only had 415 individuals with mo re than 50 neighb ors we take all the a vailable 415 ego networks. In this way , we o btain 2415 un directed ego n etwork structures. The b asic statistics of th ese ego n etworks are summarized in T able I. W e label each of the gra phs acco rding to which of the three collabor ation network it belongs to. Thus, ou r classiﬁcation task is to take a researcher’ s ego collaboratio n n etwork and de termine wheth er he/she belongs 2 http:/ /snap.stanfor d.edu/data/ 3 http:/ /snap.stanfor d.edu/data/ca-HepPh.html 4 http:/ /snap.stanfor d.edu/data/ca-CondMat.html 5 http:/ /snap.stanfor d.edu/data/ca-AstroPh.html to h igh energy physics grou p, conden sed matter physics gr oup, or Astro physics group. This is a speciﬁc version of a mor e general problem that arises in social media: “h ow audien ces differ with respect to their social graph structu re ?” [1]. For better insight in to perfo rmance, we break the pr oblem class-wise in to 4 different classiﬁcation tasks: 1) classify ing between high energy physics and conden sed m atter physics (COLLAB (HEP V s CM)) 2) classifying betwe en high energy physics and astrophysics (COLL AB (HEP Vs ASTR O)) 3) classifying between astrophy sics and conden sed matter phy sics (COLLAB ( ASTR O Vs CM)) and 4) classifying a mong all th e three domains ( COLLAB (Full)). 2. Social Net work Cla ssiﬁcation (SOCIAL): I t is known that social network grap hs behave very differently from rando m Erdos-Reyni g raphs [29]. In p articular, a r andom Erdos-Reyni graph doe s not h av e th e following two importan t prop erties observed in many rea l-world networks: • They do not gen erate local clustering and triadic closures. Because they have a constan t, ran dom, and indepen dent prob ability of two nod es being connected, Erdos-Reyni graph s have a lo w clustering co efﬁcient. • They do no t acco unt for the fo rmation of h ubs. For - mally , the degree distribution of Erdos-Reyni random graphs con verges to a Poisson distribution, rather than a power law observed in many real-world ne tworks. Thus, one reasonable task is to discriminate social net- work structures from random Erdos-Reyni graph s. W e expect methodo logies which capture prop erties like triadic closure and the degree distribution to perfo rm well o n th is task . W e u sed the T witter 6 ego networks [20], which is a large public d ataset of social ego networks, which contain s around 950 ego networks of users fr om T witter with a me an o f ar ound 130 no des an d 170 0 edges per grap h. Since we are interested only in th e gr aph structu re, these dir ected grap hs were made undirected . W e do n ot use any information other than the adjacency matrix of the graph structu re. For each of th e un directed twitter gr aphs, we g enerated a correspo nding r andom gr aph with the same num ber of nodes and edg es. W e start with the requ ired numb er of nodes and then selec t two nodes at rando m a nd a dd an edge between them. Th is process is rep eated until the graph has th e same number of edg es as the corresp onding twitter graph. W e label these gr aphs accordin g to whether they are a T witter graph or a random graph. Thu s, we have a bina ry classiﬁcation task consisting o f arou nd 2000 g raph structu res. The basic statistics of this dataset are summarize d in T ab le I. 6 http:/ /snap.stanfor d.edu/data/e gonets-T witter .html B. Competing Methodo logies For classiﬁcation task it sufﬁces to have a similarity mea- sure (co mmonly known as kernel) between two graphs, which is positi ve semideﬁnite. Our e v aluation c onsists of running standard kernel C -SVMs [3 ] for classiﬁcation, based on the following ﬁv e similarity measures. The Proposed Similarit y (PROPOSED): This is the proposed similarity measure. For the g iv en two graphs, we co mpute the similarity b etween them using Algorithm 2. W e sh ow results for 3 ﬁxed values o f k = { 4 , 5 , 6 } . 4-Subgraph Frequency (SUBFREQ-4 ): Following [26], for each of the gr aphs we ﬁrst generate a featu re vector of normalized frequen cy of subgrap hs of size f our . I t is k nown that the sub graph frequ encies of arbitr arily large graphs can be accurate ly appr oximated by samplin g a small n umber of induced subgraph s. In lin e with the recent work, we computed such a histog ram by sampling 10 00 ran dom subgr aphs over 4 no des. W e ob serve that 1 000 is a stable sam ple size an d increasing th is numb er has almo st no effect o n th e accuracy . This pr ocess gen erates a normalize d histograms of dimension 11 f or each gr aph as there are 11 non -isomorp hic different graphs with 4 nodes (see [26] for more details). The similarity value be tween two graphs is the inner pr oduct between the correspo nding 11 d imensiona l vectors. 5-Subgraph Frequency (SUBFREQ-5): Recen t success o f counting induced sub graph s of size 4 in the domain of soc ial networks leads to a natural curio sity “whether coun ting all subgrap hs of size 5 im proves the accu racy values over only counting subgraph s of size 4?” T o answer this, we also consider the histogram of no rmalized f requen cy of sub graphs of size 5. Similar to the case of SUBFREQ-4, w e samp le 1 000 random induced subg raphs of size 5 to gener ate a histo gram representatio n. T here are 34 non -isomorp hic different graphs on 5 nod es and so this procedur e gener ates a vector of 34 dimensions and the similarity be tween two gr aphs is the inn er produ ct between the correspond ing 34 -dim feature vectors. Even with samp ling, th is is a n expensive task and takes signiﬁcantly m ore time th an SUBFREQ-4. Th e main r eason for this is th e increase in the numb er o f isomo rphic variants. Matching a given samp led grap h to on e of the r epresentative structure is a ctually solving g raph isomorphism over g raphs of size 5 , wh ich is co stly (see Section VII). 3-Subgraph Frequency (SUBFREQ-3): T o quantify the im- portance o f size 4 subgrap hs, we also co mpare with th e histogram representa tion based on frequ encies of subg raphs of size 3. Th ere are 4 non- isomorph ic different graph s with 3 nodes an d h ence her e we generate a histogram of dimension 4. As co unting subg raphs of size 3 is comp utationally cheap we do n ot need sam pling for th is case. This simple represen tation is k nown to perform q uite well in practice [ 24]. Random W a lk Similarity ( R W): Rand om walk similar ity is one of the widely used similarity measu res over grap hs [7], [27]. It is based o n a simp le idea: given a pair of graphs, perfor m random walks on bo th, and co unt the num ber of similar walks. There is a rich set of literature regarding connec- tions of this similarity with well-k nown similarity m easures in different d omains such as Binet-Cauch y K ernels fo r ARMA models [28], ra tional kernels [6], r-conv olution kernels [10]. The random walk similarity [28] between tw o g raphs with adjacency matrix A and B is deﬁned as RW S i m ( A, B ) = 1 n 1 n 2 e T M e, where M is the solution of Sylvester equa tion M = ( A T M B ) exp − ν + ee T . This can be compu ted in closed-fo rm in O ( n 3 ) time. W e use standard recommen dations for ch oosing th e value o f ν . T op- k Eigen values (EIGS): It is known that the eigen values of the adjacency matrix are th e most impo rtant gra ph inv ariants. Therefo re it is worth co nsidering the power of simply using the domin ant eigenv alues. Note that we can not take all eigenv a lues because the to tal number of eigenv alues varies with the graph size. Instead, we take top- k eigenv alues of the correspo nding adjacency matrices and compute the n ormalized inner pro duct betwee n them. W e show the results for k = 5 (EIGS-5) and k = 10 (E IGS-10). C. Eva luations and Results The ev a luations con sist o f r unning kern el SVM on all the tasks using six different similarity measur es a s d escribed above, b ased on the stan dard cross-validation estimation of classiﬁcation accuracy . First, we split each dataset into 10 folds of id entical size. W e then combine 9 of these fo lds an d again split it in to 10 p arts, then u se the ﬁrst 9 parts to train th e kerne l C -SVM [3] and use the 10 th part as validation set to ﬁnd th e best per formin g value of C fr om { 1 0 − 7 , 10 − 6 , ..., 10 7 } . W ith this ﬁxed ch oice of C , we the n train the C -SVM o n all the 9 folds (from in itial 10 folds) and pr edict o n the 10 th f old actin g as an in depend ent ev alu ation set. The proced ure is repeated 1 0 times with each fold actin g as an indep endent test set once. For each task, the proced ure is repeated 10 time s rand omizing over partitions. The m ean classiﬁcation accu racies and the standa rd errors a re shown in T able II. Since we have not tun ed anythin g other than the “ C ” for SVM, the results are easily reproduc ible. In those tasks, using ou r p ropo sed representatio n and sim- ilarity measure o utperf orms all the co mpeting state-of-the-a rt methods, m ostly with a signiﬁcant m argin. This demonstrates that th e covariance matrix r epresentation captur es sufﬁcient informa tion a bout the ego ne tworks and is cap able of discrim- inating b etween them. Th e accurac ies for thr ee different v alues of k are not much different form each other, except in some cases with k = 6 . This is in line with the argument presented in Section V -A that large values of k can hu rt the representation . As lo ng as k is small an d is in th e r ight ran ge, slight variations in k do no t have signiﬁcant chan ge in the pe rforma nce. Ideally , k can b e tu ned based on the da taset, but fo r easy replicatio n of results we used 3 ﬁxed choices of k . W e see that r andom walk similarity perfor ms similarly (sometimes better ) to SUBFREQ-3 which coun ts all the sub - graphs o f size 3. The perfo rmance of EIGS is very mu ch like the ran dom walk similarity . As expected (e.g ., from th e recent work [26]), counting subgraph s o f size 4 (SUBFREQ 4) always improve signiﬁcantly over SUBGREQ-3. Interestingly , cou nt- ing su bgraph s of size 5 (SUBFREQ-5) improves signiﬁcantly over SUBFREQ-4 on all tasks, except f or HEnP Vs CM task. This illustrates th e sub-optimality of histog ram ob tained by T ABLE II. P R E D I C T I O N A C C U R A C I E S I N P E R C E N TAG E F O R P R O P O S E D A N D T H E S TATE - O F - T H E - A RT S I M I L A R I T Y M E A S U R E S O N D I FF E R E N T S O C I A L N E T W O R K C L A S S I FI C ATI O N TA S K S . T H E R E P O RT E D R E S U LT S A R E A V E R A G E D O V E R 1 0 R E P E T I T I O N S O F 1 0 - F O L D C R O S S - VAL I DAT I O N . S TA N D A R D E R R O R S A R E I N D I C AT E D U S I N G PA R E N T H E S E S . B E S T R E S U LT S M A R K E D I N B O L D . Methodology COLLAB (HEnP Vs CM) COLLAB (HEnP Vs ASTR O) COLLAB (ASTRO Vs CM) COLLAB (Full) SOCIAL (T witter Vs Random) PROP OSED (k =4) 98.06(0.05) 87.70(0.13) 89.29(0.18) 82.94(0.16) 99.18(0.03) PROP OSED (k =5) 98.22(0.06) 87.47(0.04) 89.26(0.17) 83.56(0.12) 99.43(0.02) PROP OSED (k =6) 97.51(0.04) 82.07(0.06) 89.65(0.09) 82.87(0.11) 99.48(0.03) SUBFREQ-5 96.97 (0.04) 85.61(0.1) 88.04(0.14) 81.50(0.08) 99.42(0.03) SUBFREQ-4 97.16 (0.05) 82.78(0.06) 86.93(0.12) 78.55(0.08) 98.30(0.08) SUBFREQ-3 96.38 (0.03) 80.35(0.06) 82.98(0.12) 73.42(0.13) 89.70(0.04) R W 96.12 (0.07) 80.43(0.14) 85.68(0.03) 75.64(0.09) 90.23(0.06) EIGS-5 94.85(0.18) 77.69(0.24) 83.16(0.47) 72.02(0.25) 90.74(0.22) EIGS-10 96.92(0.21) 78.15(0.17) 84.60(0.27) 72.93(0.19) 92.71(0.15) counting very small graphs ( k ≤ 4 ). Even with sampling, SUBFREQ-5 is an o rder of magnitud e m ore expensive than other methodo logies. A s shown in the next section, with in- creasing k , we loose the comp utational tr actability o f c ounting induced k - subgrap hs (ev en with sampling) . Our covariance metho dolog y co nsistently p erform s b etter than (SUBFREQ 5), demon strating the supe riority of the C A representatio n. As argued in Section IV, the matrix C A ev en fo r k = 4 or 5 , d oes incorpo rate in formatio n r egarding the coun ts of bigg er c omplex sub-structu res in the graph . This along with the inf ormation of the full spectr um o f the ad jacency m atrix lea ds to a soun d repre sentation which outperf orms state-of-the- art similarity measures over gr aphs. D. Why Simply Computing Graph In variants is Not E nough ? It can b e seen th at vector representatio n of dom inant eigen- values perfor ms very poo rly comp ared to the p roposed repre- sentation even thoug h Theo rem 2 says that every elemen t of the propo sed matrix representatio n is a function of eigen v alues. It is not very clear h ow to compar e eig env alu es acr oss grap hs. For instance, two graphs with different sizes will usually hav e different n umber of eigenvalues. A vector consisting of f ew dominan t eigenv alues does no t seem to be the right object describing gr aphs, although, most of the ch aracteristics abo ut a giv en gra ph can b e inf erred f rom it. A go od an alogy to explain this would be that the mean µ and variance σ 2 fully determin es a Gaussian random variable, but to co mpute distance between two Gau ssian d istributions, simply computing the euclidian distance between corr espondin g ( µ, σ 2 ) doe s n ot work we ll. The propo sed C A representatio n, a grap h inv ariant, seem s a better o bject which being a cov ariance matrix is com parable and stand ard similarity mea sures over C A perfor ms quite well. The informa ti veness of features is necessary but not sufﬁcient for learning , a c lassical prob lem in machin e learning where ﬁnding th e right re presentation is the key . V I I . R U N N I N G T I M E C O M PA R I S O N S T o obtain an estimate of th e compu tational requirem ents, we compar e the time requir ed to comp ute the similarity values b etween tw o gi ven graphs using different m ethodo logies presented in Sectio n VI-B. For b oth data sets, we rec ord the cpu-time taken fo r computing pairwise similarity between all possible pairs of graph s. Thus, fo r COLLAB data set, this is the time taken to compute similarity between 24 15(2 415 -1 )/2 pairs o f networks w hile in case of SOCIAL it is the time taken to co mpute similarity be tween 194 6(194 6 -1)/2 pair s of networks. The times taken by d ifferent method ologies on these two datasets are summarized in T able III. All experiments were perfor med in MA TLAB on an Intel( R) Xenon 3. 2 Ghz CPU machine having 72 G B o f RAM. EIGS, a lthough it perfor ms poo rly in terms of accuracy , is th e fastest compare d to all other algorithms, becau se there are very fast linear algebraic metho ds for com puting top- k eigenv a lues. W e can see th at, e xcept for SUBFREQ-5 an d R W , all other methods are qu ite com petitive in terms of ru n- time. I t is n ot sur prising that R W kern els are slower because they are known to have cu bic run -time comp lexity . From Section V -B, we know that the p roposed methodo logy is actually linear in O ( E ) . Also, there are very efﬁcient ways of co mputing SUBFREQ-3 [24] from the adjacency list re p- resentation which is being used in th e compar isons. Although computin g histogr am b ased o n countin g all th e subgrap hs of size 4 is m uch mor e co stly than cou nting subg raphs of size 3, approx imating the histogram by sampling is fairly ef ﬁcien t. For example, on the COLLAB d ataset, appro ximating SUBFREQ- 4 by tak ing 100 0 samples is even m ore efﬁcient than cou nting all su bgraph s of size 3. T ABLE III. T I M E ( I N S E C ) R E Q U I R E D F O R C O M P U T I N G A L L PA I RW I S E S I M I L A R I T I E S O F T H E T W O DAT A S E T S . SOCIAL COLLAB (Full) T otal Number of Graphs 1946 2415 PROP OSED (k =4) 177.20 260.56 PROP OSED (k =5) 200.28 276.77 PROP OSED (k =6) 207.20 286.87 SUBFREQ-5 (1000 Samp) 5678.67 7433.41 SUBFREQ-4 (1000 Samp) 193.39 265.77 SUBFREQ-3 (All) 115.58 369.83 R W 19669.24 25195.54 EIGS-5 36.84 26.03 EIGS-10 41.15 29.46 Howe ver, ev en with samp ling, SUBFREQ-5 is an ord er of magnitud e slo wer . T o und erstand this, let us re view t he pro cess of comp uting the histogr am by counting subgrap hs. There are 34 grap h structures over 5 nodes un ique up to iso morph ism. Each of these 3 4 stru ctures has 5! = 12 0 many isomor phic variants ( one for e very permutation ). T o comp ute a histogr am over these 34 structures, we ﬁrst sample an induc ed 5-sub graph from the given gra ph. The next step is to match this subgrap h to one of the 34 structures. This requ ires determin ing which of the 34 g raphs is isomorph ic with the giv en sampled subgraph . The process is repeated 1000 tim es for ev ery sample. Thu s every sampling step requires solv ing gra ph isomor phism prob lem. Even SUBFREQ-4 has the same problem but there are only 11 possible subg raphs and the number of isomorp hic s tructures for e ach g raph is on ly 4! = 2 4 , wh ich is still efﬁcient. This scenario starts becom ing intractable as we go beyond 5 because of th e combinator ially hard gra ph isomo rphism pr oblem. SUBFREQ-5, althou gh it is compu tationally very expen- si ve, improves over SUBFREQ-4. Th e prop osed similarity based on C A is almost as cheap as SUBFREQ-4 but perform s better than e ven SUBFREQ-5. Coun ting based approaches, although they capture info rmation, quickly loo se tractability once we start coun ting bigger substructur es. Power iteration of the adjacen cy m atrix is a nice and comp utationally efﬁcient way o f cap turing info rmation abou t the und erlying graph . V I I I . C O N C L U S I O N S W e embed graphs into a new mathematical space, the space of symmetric p ositiv e semideﬁn ite ma trices S k × k . W e take an altogether d ifferent app roach of ch aracterizing gr aphs based on the co variance matrix of the vecto rs ob tained from th e power iteration of the a djacency matrix . Ou r analysis in dicates that the p ropo sed matrix r epresentation C A contains most of the important character istic info rmation abo ut th e n etworks structure. Since the C A representatio n is a covariance matrix in a ﬁxed dimen sional space, it naturally gives a measure o f similarity (or distance) between different gra phs. The overall proced ure is simple a nd scalab le in th at it can be comp uted in time linear in numbe r of edges. Experime ntal e v aluations demo nstrate the superiority o f the C A representatio n, over other state-of-the- art m ethods, in ego network classiﬁcation tasks. Running time comparison s indicate that the pro posed app roach provid es th e r ight bal- ance between th e expressiv eness of r epresentation and th e computatio nal tractability . Finding tractable an d meaningful representatio ns o f graph is a fund amental problem , we believ e our r esults as sho wn w ill provide m otiv ation for using the new representatio n in analyzing rea l networks. R E F E R E N C E S [1] L. A. Adamic, J. Zhang, E. Bakshy , and M. S. Ackerman. Kno wledge sharing and ya hoo answe rs: e veryon e knows somethi ng. In WWW , pa ges 665–674. ACM, 2008. [2] A. Broder , R. Kumar , F . Maghoul, P . Raghav an, S. Raj agopal an, R. Stata, A. T omkins, and J. Wie ner . Graph structur e in the web . Computer networks , 33(1), 2000. [3] C.-C. Chang and C.-J. Lin. LIBSVM: A library for support vec tor machines. ACM T ransact ions on Intell igen t Systems and T ec hnology , 2:27:1–2 7:27, 2011. [4] H.-H. Chen, L. Gou, X. Zhang, and C. L. Giles. Collabseer: a search engine for collaborati on discove ry . In JCDL , pages 231–240. A CM, 2011. [5] F . Chung, L. Lu, and V . V u. Spectra of random graphs with giv en expe cted degre es. PNAS , 100(11):631 3–6318, 2003. [6] C. Cortes, P . Haffn er , and M. Mohri. Rati onal ke rnels. In NIPS , pages 601–608, 2002. [7] A. Das Sarma, D. Nanongka i, G. Panduranga n, and P . T eta li. Distri bute d random walks. J . ACM . [8] D. Easley . Networks, crowds, and markets: Reasoning about a highly connec ted world, 2012. [9] I. J. Farkas, I. Der ´ enyi , A.-L. Barab ´ asi, and T . V icsek . Spectra of real-w orld graphs: Be yond the semicircle law . Physical Revie w E , 64(2):0267 04, 2001. [10] D. Haussler . Con vol ution ker nels on discrete structur es. T echnical report, 1999. [11] J. He and W . W . Chu. A social network-b ased recommend er system (SNRS) . Springer , 2010. [12] Q. He, J. Pei, D. Kifer , P . Mitra, and L . Gile s. Conte xt-a ware citation recommendat ion. In WWW . ACM, 2010. [13] T . Hofmann, B. Sch ¨ o lkopf, and A. J. Smola. Kernel methods in machine learni ng. The annals of statistic s , 2008. [14] J. M. Kleinb erg. A uthorit ati ve sources in a hyperlink ed en vironmen t. J ournal of the ACM , 46(5):604–632 , 1999. [15] R. K ondor and T . Jebara. A ke rnel between sets of ve ctors. In ICML , pages 361–368, 2003. [16] A. N. Krylov . On the numerical solution of equat ions whose solution determin e the frequenc y of sm all vibrations of materia l systems (in russian (1931)). [17] R. Kumar , J. Nov ak, and A. T omkins. Struct ure and ev olutio n of online social networks. In L ink Mining: Models, Algorithms, and Applications , pages 337–357. Springer , 2010. [18] J. Leskov ec, J. Kleinberg, and C. Fal outsos. Graph ev oluti on: Den- siﬁcati on and shrinki ng diamete rs. ACM T ransactions on Knowledg e Discov ery fr om Data (TKDD) , 1(1):2, 2007. [19] F . Lin and W . W . Cohen. Powe r itera tion clusteri ng. In ICML , pages 655–662, 2010. [20] J. McAule y and J. Lesko ve c. L earning to disc ov er social circles in e go netw orks. In NIPS , pages 548–556, 2012. [21] G. D. F . Morales, A. Gionis, and C. L ucchese. From chatter to headli nes: harnessing the real-time web for personalize d news recom- mendatio n. In WSDM , pages 153–162, 2012. [22] M. E. Ne wman. The structure of scientiﬁc collaborat ion networ ks. PNAS , 98(2):404–40 9, 2001. [23] I. Papa rrizos, B. B. Cambazogl u, and A. Gionis. Machine learned job recommendat ion. In RecSys , 2011. [24] N. Sherv ashidze , T . Petri, K. Mehlhorn, K. M. Borgwardt , and S. V iswanat han. Efﬁcient graphlet ke rnels for large graph comparison. In AIST ATS , pages 488–495, 2009. [25] S. H. Strogatz . Exploring complex netw orks. Natur e , 410(6825):2 68– 276, 2001. [26] J. Ugander , L. Backstrom, and J. Kleinber g. Subgraph frequencie s: mapping the empirical and extremal geograp hy of lar ge graph collec- tions. In WWW , pages 1307–1318, 2013. [27] S. V ishwan athan, N. N. Schraudolph, R. Kondo r , and K. M. Borgwar dt. Graph ker nels. JMLR , 99:1201–1242 , 2010. [28] S. Vi shwana than and A. Smola. Binet-c auchy kernels. 2004. [29] D. J. W atts and S. H. St rogatz. Collect i ve dynamics of small- worldn etwork s. natur e , 393(6684):440–4 42, 1998. [30] E. P . W igner . On the distrib ution of the roots of certai n symmetric matrice s. The Annals of Mathematic s , 67(2):325–327, 1958.

A New Space for Comparing Graphs

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment