Multi-Level Anomaly Detection on Time-Varying Graph Data

Multi-Le v el Anomaly Detection on T ime-V arying Graph Data Robert A. Bridges ∗ , 1 , John Collins ∗ , Erik M. Ferragut ∗ , Jason Laska ∗ and Blair D. Sulli van † , 2 ∗ Computational Science and Engineering Division, Oak Ridge National Laboratory , Oak Ridge, TN 37831 { bridgesra,ferragutem,laskaja } @ornl.gov , jparcoll@gmail.com † Department of Computer Science, North Carolina State Univ ersity , Raleigh, NC 27695 blair sulliv an@ncsu.edu Abstract —This work presents a novel modeling and analysis framework for graph sequences which addresses the challenge of detecting and contextualizing anomalies in labelled, streaming graph data. W e introduce a generalization of the BTER model of Seshadhri et al. by adding ﬂexibility to community structur e, and use this model to perf orm multi-scale graph anomaly detection. Speciﬁcally , probability models describing coarse subgraphs are built by aggregating probabilities at ﬁner levels, and these closely related hierarchical models simultaneously detect devi- ations from expectation. This technique provides insight into a graph’ s structure and internal context that may shed light on a detected ev ent. Additionally , this multi-scale analysis facilitates intuitive visualizations by allowing users to narr ow focus from an anomalous graph to particular subgraphs or nodes causing the anomaly . For ev aluation, two hierarchical anomaly detectors ar e tested against a baseline Gaussian method on a series of sampled graphs. W e demonstrate that our graph statistics-based approach outperforms both a distribution-based detector and the baseline in a labeled setting with community structure, and it accurately detects anomalies in synthetic and real-world datasets at the node, subgraph, and graph levels. T o illustrate the accessibility of information made possible via this technique, the anomaly detector and an associated interactive visualization tool are tested on NCAA football data, where teams and conferences that moved within the league are identiﬁed with perfect recall, and precision greater than 0.786. I . I N T RO D U C T I O N Social networks are playing an increasingly important role in today’ s society , yet extracting domain insights from their analysis and visualization remains challenging — in lar ge part due to their transient nature and the inherent complexity of many graph algorithms. Many social graphs naturally have (i) labeled nodes representing individuals or entities, and (ii) an edge set that changes over time, creating a sequence or time- series of indi vidual snapshots of the network. A ke y task in 1 Lead author . Phone: (865) 241-0319 2 Supported in part by DARP A GRAPHS/SP A W AR Grant N66001-14-1- 4063, the Gordon & Betty Moore Foundation, and the National Consortium for Data Science. This manuscript has been authored in part by UT -Battelle, LLC under Contract No. DE-A C05-00OR22725 with the U.S. Department of Energy . The United States Government retains and the publisher , by accepting the article for publication, acknowledges that the United States Gov ernment retains a non- exclusi ve, paid-up, irrevocable, world-wide license to publish or reproduce the published form of this manuscript, or allo w others to do so, for United States Government purposes. The Department of Ener gy will provide public access to these results of federally sponsored research in accordance with the DOE Pub- lic Access Plan (http://energy .gov/do wnloads/doe- public- access- plan). Any opinions, ﬁndings, and conclusions or recommendations expressed in this publication are those of the author(s) and do not necessarily reﬂect the views of DOE, DARP A, SSC Paciﬁc, the Moore F oundation, or the NCDS. understanding this data is the ability to identifying patterns and aberrations across snapshots — speciﬁcally in a way that can pinpoint areas of interest, and provide context for results. Unfortunately , although this time-varying labelled scenario is also natural in man y other domains (e.g. cyber-security), most existing techniques for anomaly detection are either limited to static graphs or unable to “zoom in” on the reason a graph is identiﬁed as non-standard. The importance of context in anomaly detection is easily exempliﬁed in a cyber -security setting, where observing an unanticipated connection (edge) between an internal IP and an external host might w arrant alarm, ho wever giv en the context that many similar IPs (i.e. nodes in a common community) regularly contact that host could sav e an unnecessary inv estigation. Here we address the problem of identifying and contex- tualizing anomalies at multiple le vels of granularity in the sequential graph setting. W e giv e a nov el method for anomaly detection in time-varying graph data, using hierarchically related distributions to detect related abnormalities at three increasingly ﬁne levels of granularity (graph-, subgraph-, and node-). Probabilistic multi-scale detection relies on compar- ison with an underlying graph model; we use an extension (described in Section III) of the recent BTER model [1] that enables improv ed prescription of community structure. T o ﬁt an instance of the model to observed graphs, we giv e methods for detecting communities and estimating parameters (see Section IV). Finally , to test a newly observed graph for anomalous structure, we compute hierarchically-related probabilities from the tuned model and their associated p - values using a Monte-Carlo simulation. Our workﬂow is a streaming detection framew ork, where parameters are learned from previous observ ations, the detector is applied to new data, then the parameters are updated to include the new graph in the observations. Section V deﬁnes the probability calculations for two new multi-scale detectors, as well as a baseline detector similar to that of [2] (which is limited to detecting anomalies at the graph-lev el). It is important to note that performing anomaly detection using a graph’ s probability—as given by the model from which it was sampled—will often result in an inaccurate detector when node labels are used. This is a consequence of the likelihood of an unlabeled graph being shared by iso- morphic copies distinguished by these labels and is discussed in Section V. W e illustrate this phenomenon and provide empirical evidence that modeling a set of statistics indicative of node/subgraph interactions provides more accurate detection in two experiments described in Section VI. In Section VII, we apply our detector to NCAA Football data, establishing its accuracy in detecting v ariations in teams’ schedules as imposed by changes in conference membership. This application natu- rally allo ws the detector to ﬁnd abnormal interactions at the node (team), community (conference), and full graph (season) lev els. Finally , we describe and show sample screenshots of applying our interacti ve anomaly visualization tool, which lev erages the multi-scale analysis to enable users to easily focus their attention on the most critical changes in the data. I I . R E L A T E D W O R K In this paper , we focus on identifying anomalous instances in a sequence of graphs with common node labels. This problem is neither a special case nor an extension of ﬁnding anomalous parts of a single (static) graph (a much more commonly studied problem), sinc e the a vailability of common node labels provides information not av ailable in a single- graph or unlabeled graph ensemble problem, and new methods are required to fully exploit this information. There is limited work transforming a graph sequence to a single instance – e.g., Eberle et al. consider the disjoint union of subgraphs from each data instance as a single graph in [3]. For a survey on graph anomaly detection, we refer the readers to Akoglu et al. [4]. Common techniques for ﬁnding anomalies in graphs can broadly be categorized as using compression techniques or a form of hypothesis testing with graph statistics. For e xample, Eberle & Holder use a compression algorithm relying on minimum description length to detect repetitive subgraphs and identify slight de viations as anomalies [3]. Because this technique searches for subgraphs almost isomorphic to a found normativ e pattern, it is a much more rigid detection framework than ours. Hypothesis testing has a broader set of prior w ork, including papers of Miller et al. which use statistics based on the residual matrix [5], [6]. Much of this work is geared tow ards detecting abnormally dense communities seeded into an R-MA T graph. In [5], the techniques are extended to accommodate the dynamic graph setting and include methods for identifying highly connected regions. Our detectors are designed to identify anomalies caused not only by abnormal density , b ut changes in the interactions within or between com- munities. A more recent hypothesis testing approach of Ne ville & Moreno ﬁts Gaussian distributions to three statistics [2]. Due to similarities with our workﬂo w (using a p -value estimated by a Monte-Carlo simulation from a graph model to decide anomalies), we test our method against a baseline detector using similar Gaussian estimates, although we note that [2] focused on Kronecker graphs, not the GBTER model.Written concurrently with this w ork is that of Peel & Clauset [7] which addresses the problem of change detection for time-varying network sequences. Like Peel & Clauset, we use a hierarchical generativ e graph model and Bayesian hypothesis testing. Our work dif fers in that it introduces a new graph model (Section III) and seeks related anomalies at different scales (as opposed to a “shock” that changes the ov erall graph structure). T o the authors’ knowledge, using multiple related detectors that respect the structure of the graph is a new technique. By design, this analysis informs an interactive tool for exploring the nature of abnormalities in each graph, a relati vely unstudied aspect of graph visualization. W ong et al. [8] present a multi- scale tool for e xploring lar ge graphs, informed by a clustering algorithm especially tuned to detecting star-b urst patterns. In contrast, we detect dense regions as communities and inte grate multi-scale visual-analytics with anomaly scores. This paper extends the general anomaly detection w ork- ﬂow of Ferragut et al. [9] to hierarchically analyze graph data. The general method estimates probability models from obser - vations and ne w data is declared anomalous if it has suf ﬁciently small p -v alues. More precisely , if a probability distribution P is estimated from observed data x 1 , ..., x n − 1 , the p -value of new data, x n , is p -v alue ( x n ) = P ( { X : P ( X ) ≤ P ( x n ) } ) . Notice that in the stereotypical case where P = N (0 , 1) , the standard normal, the deﬁnition abo ve corresponds to the two- sided p -value. Generally , a threshold α ∈ [0 , 1] is set, and if p -value ( x n ) ≤ α , x n is identiﬁed as anomalous. For streaming data, model parameters are iterativ ely updated to include the new observation, x n . Often, α is tuned in light of labeled results to ﬁnd an acceptable balance of f alse vs. true positiv es. Analysis in [9] identiﬁes operational beneﬁts of the method, including a theorem allowing users to regulate the expected alert rate ` a priori by setting α . W e utilize the framework’ s accommodation of any probability model in order to apply it simultaneously at hierarchical lev els. I I I . T H E G E N E R A L I Z E D B T E R M O D E L ( G B T E R ) In order to perform probabilistic anomaly detection, we need a randomized generati ve graph model that enables com- putation of probabilities for v arious graph conﬁgurations while accurately modeling a graph’ s community structure and de gree sequence. Signiﬁcant prior w ork has been dev oted to de velop- ing such models and validating the importance of capturing both these aspects of a real-world data set (e.g., [1], [10], [11], [12], [13]). A broad survey of graph models and common graph characteristics is giv en in [11]. More speciﬁcally , moti vated by social and cyber settings, we require a generative model that can accommodate observed hierarchical structure. A natural candidate is a Stochastic Block Model, ﬁrst introduced in [14], which deﬁnes community membership and generates intra-community edges with an Erd ¨ os-R ´ enyi (ER) [15] model and inter-community edges with a probability that depends on the membership of their endpoints. This achie ves ﬂe xible community membership and density , but the expected degree of each node is implicitly determined by the community structure and parameters. T o improv e adherence to degree distrib ution, one could use the Block T wo-Lev el Erd ¨ os-R ´ enyi (BTER) of Seshadhri et al. [1], [13], but we found the implicitly determined community structure of the model to be too limiting for matching real- world data. BTER edge generation occurs in two steps, with an ER model 4 used for intra-community edges, followed by a Chung-Lu (CL) process [12] to match a speciﬁed expected degree distrib ution. T o address these challenges, we deﬁne and use a gener- alization of BTER that mimics its two-step edge generation process, b ut allows explicit prescription of the communities’ size, membership, and approximate density . The remainder of this section describes this generalized version and compares it to the original BTER model. 4 denoted here as ER ( n, p ) , in which n nodes are ﬁxed and edges occur independently with probability p The generalized block two-le vel Erd ¨ os-R ´ enyi (GBTER) model takes as input (1) the expected degree of each node, (2) community assignments of the nodes, i.e., a partition of the verte x set into disjoint subsets, { C j } , and (3) an edge probability p j for each community C j . In the ﬁrst stage of edge generation, intra-community edges are sampled from an Erd ¨ os-R ´ enyi random graph model, ER( | C j | , p j ) for each community C j . Note the expected de gree of a node within C j is p j ( | C j | − 1) after the ﬁrst stage. In the second stage, we deﬁne the excess e xpected degr ee of a node i , denoted ε i , to be the difference between the input expected degree λ i and the expected degree after stage one. Formally , ε i := max(0 , λ i − p j ( | C j | − 1)) for node i in community C j . W e then apply a Chung-Lu style model [12] on the excess expected de gree -sequence, [ ε i ] i ∈ V . Speciﬁcally , the probability of adding the edge ( i, j ) , is P ( i, j | ε ) = ε i ε j P k ε k . (1) Note that the second stage can generate both inter- and intra- community edges. It is necessary that Chung-Lu inputs, { ε i } , satisfy ε i ε j ≤ P k ε k for Equation 1 to deﬁne a probability . A calculation shows that the expected degree of node i is indeed d i whenev er d i ≥ p j ( | C j | − 1) (i.e., the expected degree from the ﬁrst-stage edges does not exceed the total e xpected degree of any node), and the CL model is well-deﬁned. T o calculate the probability of edge ( i, j ) , we condition on whether i and j share a community . Recall, our communities partition the set of nodes, so each i is in exactly one com- munity . If i, j are assigned to the same community , C , let p denote the internal edge probability of C , and we see P ( i, j | i, j ∈ C ) = p + (1 − p ) ε i ε j P ε k . (2) If i, j are assigned to different communities, the edge proba- bility is as giv en in Equation 1. GBTER differs from the original BTER model by allowing greater ﬂexibility and assignment of community membership, size, and internal edge density ( p ). As indicated in [13], the expected clustering coefﬁcient for an ER( n, p ) graph is p 3 . This implies that GBTER also allo ws pre-speciﬁcation of each community’ s approximate clustering coef ﬁcient. Note that GBTER, as used in this work, assumes node labels, but BTER on the other hand only depends on the number of nodes of each expected degree. This implies that edges in BTER do not occur independently (because the y are conditioned on the community assignment of each node), while the y are inde- pendent in GBTER. Consequently , calculating probabilities of graphs according to the BTER model is both complicated and expensi ve, inhibiting its use for anomaly detection. I V . F I T T I N G M O D E L P A R A M E T E R S W e now describe how to ﬁt the GBTER model to a sequence of observed graphs with common node labels using Bayesian techniques for learning the parameters and inferring the follo wing model inputs: (1) the community assignments, (2) the within-community edge densities, and (3) the expected node de grees. Once a speciﬁc instance of the model is deduced, probabilistic anomaly detectors are constructed, as detailed in Section V. In this work, a partition of the vertex set into communities is learned using the Markov Clustering (MC) algorithm [16]. W e chose MC as it is kno wn to scale well and is easy to implement. T o apply MC, a weighted graph is constructed from observed graphs. Speciﬁcally , for the experiments in Section VI, the weighted graph is constructed by counting the occurrence of each edge, and for the application in Section VII exponential weights are used to do wn-weight older observation of edges. In general, any method of partitioning nodes into communities acceptable for the application at hand will suf ﬁce. For a surve y of community detection algorithms see [17]. W e note that our method requires a partition of the nodes into communities but is blind to the algorithm used. This giv es a lev er for tuning between scalability and accuracy in applications. F or example, communities inferred from context (e.g., grouping nodes by a known, common af ﬁliation) can be used to obviate this step and may provide more insightful results in a real-world setting. Giv en community assignments, the within-community edge densities are estimated. Each community , C , is modeled inter - nally by an Erd ¨ os-R ´ enyi random graph, ER( | C | , p ), and we seek to estimate p . Letting k denoted the number of edges within the subgraph C , it follo ws that k ∼ Binomial (  | C | 2  , p ) . In order to use Bayesian inference, we assume p ∼ Beta ( α, β ) , with prior parameters α > 0 , β > 0 , and then use the max- imum posterior likelihood estimation (MPLE). Speciﬁcally , ( p   k 1 , . . . , k N ) ∼ Beta ( ˆ α, ˆ β ) with posterior parameters ˆ α = α + X i k i , and ˆ β = β + N  | C | 2  − X i k i where k i denotes the number of edges internal to C observed in the i -th graph, G i , for i = 1 , . . . , N . MPLE gi ves p := ( ˆ α − 1) / ( ˆ α + ˆ β − 2) , the mode of the posterior . Lastly , the expected degree sequence must be estimated from the data. For a ﬁxed node, we assume its degree, d , is Poisson distributed with expected degree λ , i.e. d ∼ Poisson ( λ ) . W e use the conjugate prior, λ ∼ Gamma ( α, β ) with prior parameters α > 1 , , β > 1 . This yields the posterior distribution, ( λ   d 1 , . . . , d N ) ∼ Gamma ( ˆ α, ˆ β ) with posterior parameters ˆ α = α + P i d i , and ˆ β = ˆ β + N , where d i denotes the observ ed degree of the node in G i . F or each node, MPLE giv es its expected degree, λ := ( ˆ α − 1) / ˆ β , the mode of the posterior Gamma. V . A N O M A LY D E TE C T O R S Giv en an instance of a GBTER model, which deﬁnes a probability distribution on graphs, one can lev erage the distribution to detect anomalies at the graph, subgraph, and node level. This section deﬁnes two multi-scale detectors, one which uses the GBTER distrib ution directly , and one which lev erages statistics inherent to the GBTER model. The Multi- Scale Probability Detector naturally uses the graph probability as determined by the GBTER model for detection, which is then decomposed into probabilities of subgraphs and nodes for hierarchical information. Although intuiti ve, this detector suffers from a few limitations, discussed below , which informs construction of the Multi-Scale Statistics Detector . This second Fig. 1: ROC Curves from Synthetic Data Experiments. Note: Gaussian Baseline only applicable to graph lev el detection. Graph Lev el Community Lev el Node Lev el Experiment 1 Experiment 2 detector builds from the bottom up deﬁning the probabili ty of a node based on the likelihood of its internal and e xternal degree. Subgraph probabilities are determined by those of its member nodes, so multi-scale analysis is facilitated by both models. Lastly , a baseline method for detecting anomalous graphs by ﬁtting Gaussian distributions to graph statistics, is described. W e note that the Gaussian Baseline is only used for identifying anomalous graphs, as it cannot discriminate anomalies at the subgraph or node le vel. Section VI giv es results of testing the three methods on synthetic (seeded) data and Section VII on NCAA football data. A. Multi-Scale Pr obability Detector Our ﬁrst anomaly detector uses the graph probability , as giv en by the GBTER model, for anomaly detection. Speciﬁ- cally , gi ven a graph G = ( V , E ) with v ertices V and edges E , the probability of G is P ( G ) = Y ( i,j ) ∈ E P ( i, j ) Y ( i,j ) / ∈ E (1 − P ( i, j )) , (3) where P ( i, j ) is the probability of the edge ( i, j ) under the GBTER model, as derived in Section III. In practice, giv en a graph G , we compute it’ s probability using Equation 3, then use Monte-Carlo simulation to estimate its p -value. In order to detect anomalies at different scales, the prob- ability of a graph is decomposed into a product of subgraph probabilities. Speciﬁcally , we deﬁne the probability of node i 0 as P ( i 0 ) := Y j :( i 0 ,j ) ∈ E P ( i 0 , j ) Y j :( i 0 ,j ) / ∈ E (1 − P ( i 0 , j )) . It follows that P ( G ) = Q i P ( i ) 1 / 2 . Similarly , the probability of a subgraph G 0 = ( V 0 , E 0 ) is Q P ( i ) 1 / 2 , with the product ov er i ∈ V 0 . Hence, giv en a partition of V into communities, { C i } , the probability of G also breaks into a product of com- munity probabilities, i.e., P ( G ) = Q i P ( C i ) . This formulation allows anomaly detection of any ﬁx ed subgraph, in particular at the node, community , and graph level. The probability of sampling a graph under a gi ven genera- tiv e model is an intuiti ve choice for anomaly detection. Upon further examination, this technique will yield poor results in models where the mode of the distribution varies depending on whether or not labels are reg arded. As an illustrati ve example, consider the ER model on three labeled nodes, V = { 1 , 2 , 3 } with p = 1 / 3 . The most probable unlabeled graph under this distribution has exactly one edge, and occurs with probability  3 1  (1 / 3)(2 / 3) 2 = 4 / 9 . Now labeling nodes, there are three different but isomorphic graphs with one edge each, namely , with edge (1 , 2) or (2 , 3) or (1 , 3) only . But the probability of each of these one-edge graphs is (1 / 3)(2 / 3) 2 = 4 / 27 , while the probability of the empty graph is (2 / 3) 3 = 8 / 27 . Hence when labels are regarded, the mode of the distribution is the empty graph, not the one-edge graphs as in the unlabeled case; consequently , in this case the Multi-Scale Probability Model will view the expected graphs as more anomalous than the less likely empty graph! Now consider the GBTER model used in the experiment above. Because the probability of a within- community edge is greater than 1 / 2 and inter -community edge is less than 1 / 2 with the given parameters, the labeled-node mode of the distrib ution is the graph with every community as a clique and no other edges. Although this graph is unlikely to be sampled, the Multi-Scale Probability Model will regard it as the most “normal” possible graph. The conclusion of this reasoning is that using the graph’ s probability will produce unwarranted results, yet modeling characterizing statistics of the graph (e.g., inter- and intra-community node degrees) T ABLE I: Community assignments for GBTER Experiment C 1 C 2 C 3 C 4 . . . C 10 M r [0,1,2,3] [4,5,6,7] [8,9,10,11] [12,13,14,15] . . . [36,37,38,39] M a [0 , 11 , 2 , 4 ] [ 3 , 5 , 6 , 8 ] [ 7 , 9 , 10 , 1 ] [12,13,14,15] . . . [36,37,38,39] Note: The seeded-anomaly model M a is obtained from M r by switching the position of 2 nodes from each of the ﬁrst 3 communities. Anomalous nodes sho wn in italicized red print, and anomalous communities are circled. giv es accurate detection capabilities. This is exhibited in our empirical results, and motiv ates the second detector . B. Multi-Scale Statistics Detector Our second detector is based on observing and model- ing intra- and inter-community node de grees (after learning GBTER parameters). Fix a node i 0 ∈ V , and let C denote i 0 ’ s community , p denote C ’ s intra-community edge probability , and λ the e xpected degree of node i 0 (all as learned from ﬁtting the GBTER model to our observations). W e set d in := |{ ( i 0 , j ) ∈ E : j ∈ C }| = i 0 ’ s internal degree, and d ex := |{ ( i 0 , j ) ∈ E : j / ∈ C }| = i 0 ’ s external degree. Following the ER( | C | , p ) assumption, we assume d in ∼ Binomial( | C | − 1 , p ), and d ex ∼ Poisson( ε ), where ε = max(0 , λ − p ( | C | − 1)) , is the excess expected de gree of i 0 (see Section III). For the Multi- Scale Statistics anomaly detector , the probability of node i 0 is deﬁned as the joint probability of its degrees. W e assume the two degrees are independent and obtain, P ( i 0 ) : = P ( d in , d ex ) =  | C | − 1 d in  p d in (1 − p ) | C |− 1 − d in e − ε ε d ex d ex ! Giv en a subgraph G 0 = ( V 0 , E 0 ) we set P ( G 0 ) := Q V 0 P ( i ) . Hence anomaly detection of any subgraph is made possible. Note that since GBTER allows both internal and external edges to be created by the second stage of the process, the model abov e inﬂates internal degree d in and deﬂates d ex compared to GBTER. Additionally , as the range of a Poisson v ariable is unbounded, de grees exceeding | V | − 1 (an impossibility) are assigned positive probability by this model. T o circumvent this possibility , the truncated Poisson can be used for sampling. In our e xperiments, the expected degree ( λ ) and expected excess degree ( ε ) are sufﬁciently smaller than | V | − 1 , which implies the P (deg( i ) > | V | − 1 | ) is negligible. T esting with and without the truncation exhibited similar results. T o use either of the multi-scale detectors, we set thresholds at each le vel, and any node/subgraph/graph with p -v alue below the respectiv e threshold is detected. The model parameters are updated upon receipt and detection of each graph. C. Gaussian Baseline Detector Our baseline method ﬁts univ ariate Gaussian distributions to graph statistics and uses the product of the p -values for de- tection. From each observ ed graph three statistics are obtained: av erage node degree ( X 1 ), a verage clustering coef ﬁcient ( X 2 ), and the spectral norm ( X 3 ). Calculating X 1 and X 2 from a giv en graph is straightforward. In order to calculate X 3 , the GBTER model is used with parameters estimated as described abov e to produce the expected adjacency matrix E ( A ) , in which E ( A ) i,j giv es the probability of an edge between nodes i and j . The spectral norm is deﬁned as the maximum modulus eigen value of the residual matrix A − E ( A ) . After computing the observed statistics, independent univ ariate Gaussian distri- butions ( N ( µ i , σ i ) ) are ﬁt to each of the three statistics. Lastly , giv en a newly observed graph, G , with statistics x 1 , x 2 , x 3 , we assign p -value ( G ) := 3 Y i =1 P ( X i ≤ x i |N ( µ i , σ i )) . As before, p -values falling below a gi ven threshold, α , are labeled anomalous, and the three normal distributions are updated upon receipt of each new graph. This follo ws the approach of Moreno and Ne ville [2], al- though their work is based on Mixed Kroneck er Product graph and uses a verage geodesic distance instead of the spectral norm we employ for X 3 . Since the av erage geodesic distance is undeﬁned for disconnected graphs, we selected the spectral norm based on prior use in network hypothesis testing and strong results for similar tests inv olving Chung-Lu random graphs [6]. While we consider this baseline a natural adaptation of [2], the disparity in use between their and our application inhibits direct comparison. V I . S Y N T H E T I C G R A P H E X P E R I M E N T In order to test the anomaly detection capabilities, two hidden GBTER models are used to generate labeled data, (1) a “regular” model, M r , for sampling non-anomalous graphs, and (2) a seeded-anomaly model, M a , with slightly perturbed inputs to generate anomalous graphs. T o begin the experiment, 100 non-anomalous graphs are sampled from M r , and the anomaly detectors are ﬁt to the data, as described in Section V. T o test the streaming anomaly detection, 500 graphs are iterativ ely generated and observed with every ﬁfth graph from the seeded anomaly model. Upon sampling a new graph, its p − value according to each anomaly detector is computed, and it is labeled as anomalous if it falls below a giv en threshold. Similarly , the hierarchical detectors label each node and community depending on its respective p − v alue. Lastly , each anomaly detector’ s GBTER parameters are updated to include observation of the ne w graph. W e conduct two experiments, both using networks of 40 nodes divided into ten equally-sized communities. For the “regular” model, each community is assigned a within-edge probability of p = . 8 , and the expected de grees of nodes v ary in the range of ﬁ ve to eight according to a truncated po wer-law . T o create the seeded-anomaly model for the ﬁrst experiment, two nodes from each of the ﬁrst three communities are interchanged resulting in six (of 40) anomalous nodes and three (of ten) anomalous communities per anomalous graph (see T able I). For the second experiment, community assignments are held constant, but the within-community density ( p ) of the ﬁrst four communities is changed from 0.8 to 0.4 in the seeded- anomaly model, and the expected degree of the nodes in these four communities is increased by two. This will decrease intra- community , and increase extra-community interaction for these four communities. All together the second e xperiment has four (of ten) anomalous communities, and 16 (of 40) anomalous nodes per anomalous graph. T ABLE II: GBTER Experiment Results ( α maximizing F1) Method α F1 Precision Recall E XP E RI M E NT 1 Graph Level Graph Probability 0.020 0.742 0.678 0.820 Graph Statistic 0.009 0.919 0.929 0.910 Gaussian Baseline 0.029 0.526 0.418 0.710 Community Level Graph Probability 0.019 0.810 0.745 0.887 Graph Statistic 0.009 0.830 0.840 0.820 Node Level Graph Probability 0.020 0.298 0.239 0.393 Graph Statistic 0.017 0.547 0.453 0.690 E XP E RI M E NT 2 Graph Level Graph Probability 0.007 0.895 0.855 0.940 Graph Statistic 0.011 0.922 0.904 0.940 Gaussian Baseline 0.006 0.590 0.697 0.510 Community Level Graph Probability 0.062 0.436 0.390 0.495 Graph Statistic 0.028 0.654 0.620 0.693 Node Level Graph Probability 0.053 0.436 0.368 0.533 Graph Statistic 0.047 0.434 0.427 0.442 T o ev aluate the detectors’ performance, the Receiv er Oper- ator Characteristic (R OC) curve, and area under the R OC curv e (A UC) are displayed in Figure 1. Recall that the Gaussian Baseline is only for graph level detection and thus does not contribute to the community or node lev el results. T able II includes Precision, Recall, and F1 5 for each detector at the threshold α maximizing its F1 score. In light of the R OC, A UC, Precision, Recall, and F1 scores, we see the Multi-Scale Probability Model dominates the Graph Statistic Model in most categories. For the full graph tests, the Gaussian Baseline is far inferior to the new models with the Multi-Scale Statistics Detector as the clear winner . Further, the results at all le vels provide e vidence that the Multi-Scale Statistics Model is the superior method, as expected after the ` a priori analysis giv en in Section V -A. V I I . N C A A F O OT B A L L D A TA E X P E R I M E N T T o illustrate the insight gi ven by multi-scale anomaly detection on real-world data, the Graph Statistics Model is applied to NCAA F ootball data [18]. For comparison, we also run the Gaussian Baseline Detector . Each season is represented as a graph with a node for each Division I team and an edge for each game played. Seasons 2008, 2009 are used 5 F1 is deﬁned as the harmonic average of Precision, P , and Recall, R . Speciﬁcally , F1:=a ve ( P − 1 , R − 1 ) − 1 = 2 P R / ( P + R ) . T ABLE IV: T en most anomalous conferences for each year displayed with p − value and number of membership changes. Blue entries are true-positi ves while red entries are false-positives with threshold α = 10 − 4 . 2010 pv n 2011 pv n 2012 pv n A CC 0.000 0 MWC 0.000 3 W AC 0.000 5 W AC 0.001 0 P A C-10 0.000 2 Big-12 0.000 4 CUSA 0.090 0 Big-12 0.000 2 MWC 0.000 4 P A C-10 0.272 0 W A C 0.000 1 Big-East 0.000 2 Big-12 0.295 0 CUSA 0.000 0 SEC 0.000 2 MWC 0.433 0 SEC 0.178 0 MAC 0.000 2 Sun-Belt 0.455 0 MAC 0.211 0 Sun-Belt 0.000 1 MA C 0.551 0 A CC 0.287 0 P AC-10 0.000 0 SEC 0.639 0 Big-East 0.324 0 CUSA 0.002 0 Big-East 0.646 0 Big-10 0.513 1 A CC 0.965 0 to ﬁt parameters of the models initially , and the streaming detection is performed on years 2010-2012. That is, after ﬁtting parameters on previously observ ed years, the detectors gi ve p - values for the newly observed season. Then, the parameters are updated to include the newly observed data (and the detectors are applied to the next year). This dataset was chosen for two reasons, (1) NCAA conferences gi ve a ground- truth community structure to the graph, and (2) conference membership was relativ ely constant in the 2008-2010 seasons but experienced changes in 2011 and 2012. Because teams play most of their schedule within their conference, these changes are reﬂected in the a season’ s graph and community stucture. Our e xpectation is that the 2010 graph should produce a relativ ely higher p -value (be less anomalous) than the next two years. Furthermore, we e xpect our multi-scale detector to pinpoint the conferences and teams that experienced change. For the experiment, the parameters are learned as discussed in Section V. Communities are detected using Mark ov clustering as before b ut with exponential do wn-weighting of pre vious years’ edges as in Section III. With appropriate conﬁguration of Markov clustering parameters, the communities identiﬁed match almost identically with actual conferences, and we ` a posteriori label/refer to communities by the corresponding conference name for ease of discussion. A. F ootball Data Results Both the Gaussian Baseline Detector and the new Graph Statistics Detector accurately classiﬁed the full graphs (sea- sons), identifying 2010 as non-anomalous, and 2011 and 2012 as anomalous graphs. More speciﬁcally , the Gaussian Baseline reported scores of 13 ∗ 10 − 5 for 2010, numerical 0 for 2011, and 5 . 2 ∗ 10 − 10 2012—recall this is the product of three Gaussian p -values attained from their CDFs. A threshold between 10 − 10 and 10 − 5 will gi ve accurate classiﬁcation. Our Graph Statistics Detector reported scores of 1.0 for 2010, and 0.0 for 2011-12, indicating that no graph sampled in the Monte Carlo simulation w as more probable than the 2010 graph, and none were less probable than the 2011 (or 2012) graph. In addition to identifying the seasons that are/are not anomalous, our method detects the conference from the graph structure, and giv es p -values for the conferences and individual teams that are causing the anomaly . T able IV ranks the most anomalous conferences detected each year by the Graph Statistics Detector . Each conference experiencing a change in membership is detected as maximally anomalous, with p − value = 0. Across all three years, there were a total of three false-positi ves, and no false-neg atives. At the conference T ABLE III: T en Most Anomalous T eams for years 2010, 2011, & 2012 as given by Graph Statistics Detector. Threshold α = 10 − 6 giv es perfect classiﬁcation. T eam p − value 2009 Conf. 2010 Conf. T eam p − value 2010 Conf. 2011 Conf. T eam p − value 2011 Conf. 2012 Conf. W ake Fst. 4.84e-4 A CC A CC Boise St. 2.81e-18 W A C MWC Missouri 3.95e-16 Big-12 SEC W ash. 2.17e-3 P A C-10 P A C-10 Utah 1.59e-12 MWC P A C-10 W . V A 7.70e-14 Big-East Big-12 San Jose St. 2.78e-3 W A C W AC BYU 6.33e-10 MWC - T exas A&M 4.03e-13 Big-12 SEC Utah St. 3.05e-3 W A C W A C Colorado 1.70e-09 Big-12 P AC-10 T exas Chr . 7.66e-10 MWC Big-12 T ulsa 4.32e-3 CUSA CUSA Nebraska 3.53e-08 Big-12 Big-10 T emple 1.28e-09 MAC Big-East T oledo 1.25e-2 MAC MAC W ash. St. 1.29e-06 P A C-10 P AC-10 Nevada 2.55e-07 W A C MWC San Diego St. 1.50e-2 MWC MWC W ash. 2.28e-06 P A C-10 P AC-10 Fresno St. 3.08e-07 W AC MWC Maryland 2.21e-2 ACC ACC Arizona St. 3.28e-06 P AC-10 P A C-10 Ha waii 4.19e-07 W AC MWC N.C. 2.21e-2 ACC ACC San Jose St. 2.26e-05 W A C W A C T exas 4.10e-05 Big-12 Big-12 N.C. St. 2.63e-2 ACC ACC Utah St. 2.31e-06 W A C W AC Miss. 1.04e-04 SEC SEC lev el this gives precision of 11 / 14 u . 786 , and perfect recall (11/11). The results for the Graph Statistics Detector at the node lev el are gi ven in T able III, which details the ten most anomalous teams from each season in decreasing order along with their p − v alue and ground-truth conference memberships for the pre vious and current season. W e notice that a threshold of α = 10 − 6 giv es perfect classiﬁcation, identifying exactly which teams changed conferences as anomalous. In short, this method tells not only which graphs are anomalous, but with high accuracy can pinpoint the nodes and communities causing the anomaly . B. Interactive Data V isualization By design, the multi-scale detector allo ws users to focus attention on note worthy communities and nodes and facilitates an interactiv e visualization tool for easily accessing the ﬁne- grained structure of anomalous areas of the graph. Figure 2 illustrates the beneﬁts of this approach in screenshots from a prototype visualization. While the 2011 graph (Figure 2.a), consists of only ∼ 130 nodes and has well-deﬁned commu- nity structure, an unprocessed visualization provides little insight into the anomalous sections of the graph. Alternati vely , coarsening and displaying only “super”-nodes representing communities and using darker shades to indicated increased anomalousness, obviates the communities of interest (Figure 2.b). In addition, our tool allows conference names and p - values to be automatically displayed so contextual information from the analysis and the domain are easily absorbed by a user . Conference nodes are clickable, and selection displays the inter-conference subgraph, again with nodes shaded to indicate anomalousness of the teams they represent. This setup facilitates interactive exploration of anomalies, and the contexts in which the y occur . F or e xample, clicking on the Mountain W est Conference (MWC) node displays the graph in Figure 2.c, from which it is immediately apparent that, while Utah and Brigham Y oung were previously members of that community , they cease to participate in the MWC. The P A C- 10 Conference subgraph (Figure 2.d) exhibits high density , but each node is v ery anomalous. This indicates that the interaction outside the conference has changed and referencing the tables conﬁrms that ne w teams, namely Utah and Colorado, are no w in this conference. Altogether , the frame work for multi-scale detection yields analytic results that are readily input into an interactive visualization. Upon detection of an anomalous graph, users can no w zoom into areas of interest, and form and resolve hypotheses about ho w the anomaly occurred. V I I I . C O N C L U S I O N S A N D F U T U R E W O R K This work addresses the challenge of identifying anomalies in a sequence of graphs with emphasis on facilitating an understanding why and how a particular graph de viates from normal to put he anomalies in context. W e de veloped and tested a framew ork for identifying anomalies in a time-series of graphs at three hierarchical le vels of granularity . W e intro- duce GBTER, a generalization of the BTER generative graph model, that allows more accurate prescription and modeling of community structure, and build two hierarchical, stream- ing anomaly detectors, one based on the graphs probability and the other on statistics describing node and community interactions. Our ` a priori analysis predicts the statistics-based detector will produce greater accuracy , and this is conﬁrmed in tests on synthetic data where ground-truth is kno wn at the node, subgraph, and graph levels. Additionally , both detectors outperform a baseline detector that ﬁts Gaussian distributions to observed statistics of the full graph. In order to illustrate the insight facilitated by the multi-scale detection capability , the superior multi-scale detector is applied to NCAA football data. In both the synthetic experiment and the application to NCAA data, the Multi-Scale Statistics Detector was able to accurately pinpoint anomalies at the node, subgraph, and graph le vel, e xhibiting the advantage of drilling into anomalous graphs to see exactly what has deviated from expectation. A preliminary visualization informed by the analytics is giv en for this example. W e believ e applying this method to other time- sampled social networks will enable discovery of underlying structure as well as anomalies and the context in which they occur . While in vestig ations of scalability are outside the scope of this work, we expect applications of this approach to necessitate larger data, and we address the bottlenecks in the current implementation. Firstly , this approach requires a partition of the nodes into communities, but is agnostic to the method used. Hence, we have the ability to optimize per- formance by the partitioning algorithm chosen. As mentioned abov e, using communities known from context (e.g., assuming knowledge of the NCAA conferences each year) can obviate this step and provide groupings that are familiar to the operator . Secondly , estimating the p -values of a giv en distribution can be computationally expensi ve, especially if it requires sampling large graphs and calculating their probabilities. In general, importance sampling, in which one over -samples from a sub- set of the event space, can aid in Monte-Carlo simulations, although further research is required to optimize performance gains for our needs. Thirdly , the choice of probability models of the parameters could be changed to admit easier p -value (a) Unprocessed 2011 Season Graph (b) Coarsened 2011 Season Graph (c) 2011 Mountain W est Conference Graph (d) 2011 P A C-10 Conference Graph Fig. 2: 2011 Season Graph interactive visualizations screenshots (produced by MatPlotLib using Netw orkX [19] spring force directed layout) illustrate discovery of anomalies at each level. T eam and conference labels were applied in post-analysis along with p -values in (b)-(d). Dark er colors correspond to more anomalous communities/nodes. computation. For example, multinomials become robust with abundant observations. In a speciﬁc application, ﬂexibility in the modeling may yield increased performance with negligible effects on accuracy . Lastly , adapting the ov erall workﬂo w to ﬁt a speciﬁc application may admit performance gains. F or example, updating parameters less often (in a batch process periodically) or discarding anomalous data from the update observations are options that hav e yet to be explored. In summary , while the current implementation is suitable only for small datasets, the approach has promising scalability and should be adaptable to high-volume and/or large-netw ork settings. R E F E R E N C E S [1] C. Seshadhri, T . G. Kolda, and A. Pinar, “Community structure and scale-free collections of Erd ˝ os-R ´ enyi graphs, ” Physical Revie w E , vol. 85, no. 5, p. 056109, 2012. [2] S. Moreno and J. Neville, “Network hypothesis testing using mixed kro- necker product graph models, ” in IEEE 13th International Confer ence on Data Mining (ICDM) . IEEE, 2013, pp. 1163–1168. [3] W . Eberle and L. Holder , “ Anomaly detection in data represented as graphs, ” Intelligent Data Analysis , vol. 11, no. 6, pp. 663–689, 2007. [4] L. Akoglu, H. T ong, and D. K outra, “Graph based anomaly detection and description: a survey , ” Data Mining and Knowledge Discovery , pp. 1–63, 2014. [5] B. A. Miller , N. T . Bliss, P . J. W olfe, and M. S. Beard, “Detection theory for graphs, ” Lincoln Laboratory J ournal , vol. 20, no. 1, 2013. [6] B. A. Miller, L. H. Stephens, and N. T . Bliss, “Goodness-of-ﬁt statistics for anomaly detection in Chung-Lu random graphs, ” in International Confer ence on Acoustics, Speech and Signal Pr ocessing . IEEE, 2012, pp. 3265–3268. [7] L. Peel and A. Clauset, “Detecting change points in the large-scale structure of ev olving networks, ” arXiv preprint , 2014. [8] P . C. W ong, H. Foote, P . Mackey , G. Chin, H. Soﬁa, and J. Thomas, “ A dynamic multiscale magnifying tool for exploring large sparse graphs, ” Information V isualization , vol. 7, no. 2, pp. 105–117, 2008. [9] E. M. Ferragut, J. Laska, and R. A. Bridges, “ A ne w , principled ap- proach to anomaly detection, ” in International Confer ence on Machine Learning and Applications , v ol. 2. IEEE, 2012, pp. 210–215. [10] A.-L. Barab ´ asi and R. Albert, “Emergence of scaling in random networks, ” Science , vol. 286, no. 5439, pp. 509–512, 1999. [11] D. Chakrabarti and C. Faloutsos, “Graph mining: Laws, generators, and algorithms, ” ACM Computing Surveys (CSUR) , vol. 38, no. 1, p. 2, 2006. [12] F . Chung and L. Lu, “The average distances in random graphs with giv en e xpected degrees, ” Pr oceedings of the National Academy of Sciences , v ol. 99, no. 25, pp. 15 879–15 882, 2002. [13] T . G. Kolda, A. Pinar , T . Plantenga, and C. Seshadhri, “ A scalable generativ e graph model with community structure, ” arXiv preprint arXiv:1302.6636 , 2013. [14] P . W . Holland, K. B. Laske y , and S. Leinhardt, “Stochastic blockmodels: First steps, ” Social Networks , vol. 5, no. 2, pp. 109–137, 1983. [15] P . Erd ˝ os and A. R ´ enyi, “On random graphs, ” Publicationes Mathemat- icae Debr ecen , v ol. 6, pp. 290–297, 1959. [16] S. M. V an Dongen, “Graph clustering by ﬂo w simulation, ” Ph.D. dissertation, Uni versity of Utrecht, 2000. [17] S. Fortunato, “Community detection in graphs, ” Physics Reports , vol. 486, no. 3, pp. 75–174, 2010. [18] S. R. LLC, “Sports Reference College Football Statistics & History, ” http://www .sports- reference.com/cfb/, used with permission. [19] A. A. Hagberg, D. A. Schult, and P . J. Swart, “Exploring network structure, dynamics, and function using NetworkX, ” in Proceedings of the 7th Python in Science Conference (SciPy2008) , Pasadena, CA USA, Aug. 2008, pp. 11–15.

Multi-Level Anomaly Detection on Time-Varying Graph Data

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment