Graph-based Anomaly Detection and Description: A Survey
Detecting anomalies in data is a vital task, with numerous high-impact applications in areas such as security, finance, health care, and law enforcement. While numerous techniques have been developed in past years for spotting outliers and anomalies …
Authors: Leman Akoglu, Hanghang Tong, Danai Koutra
Noname manuscript No. (will be inserted by the editor) Graph-based Anomaly Detection and Description: A Surv ey Leman Akoglu · Hanghang T ong · Danai Koutra Receiv ed: date / Accepted: date Abstract Detecting anomalies in data is a vital task, with numerous high-impact ap- plications in areas such as security , finance, health care, and law enforcement. While numerous techniques hav e been de veloped in past years for spotting outliers and anomalies in unstructured collections of multi-dimensional points, with graph data becoming ubiquitous, techniques for structured graph data have been of focus re- cently . As objects in graphs ha v e long-range correlations, a suite of no vel technology has been dev eloped for anomaly detection in graph data. This surv e y aims to provide a general, comprehensive, and structured ov ervie w of the state-of-the-art methods for anomaly detection in data represented as graphs. As a key contribution, we give a general framework for the algorithms categorized under v arious settings: unsupervised vs. (semi-)supervised approaches, for static vs. dynamic graphs, for attrib uted vs. plain graphs. W e highlight the ef fecti v eness, scala- bility , generality , and robustness aspects of the methods. What is more, we stress the importance of anomaly attribution and highlight the major techniques that facilitate digging out the root cause, or the ‘why’, of the detected anomalies for further analysis and sense-making. Finally , we present sev eral real-world applications of graph-based anomaly detection in div erse domains, including financial, auction, computer traffic, and social networks. W e conclude our survey with a discussion on open theoretical and practical challenges in the field. Leman Akoglu Department of Computer Science, Stony Brook Uni versity , Stony Brook, NY 11794. T el.: +1-631-632-9801, Fax: +1-631-632-2303. E-mail: leman@cs.stonybrook.edu Hanghang T ong Department of Computer Science, City College, City University of New Y ork, New Y ork, NY 10031 USA. E-mail: tong@cs.ccny .cuny .edu Danai K outra Computer Science Department, Carnegie Mellon Uni versity , Pittsburgh, P A 15217 USA. E-mail: danai@cs.cmu.edu 2 Leman Akoglu et al. Keyw ords anomaly detection · graph mining · network outlier detection, ev ent detection, change detection, fraud detection, anomaly description, visual analytics 1 Introduction When analyzing large and complex datasets, knowing what stands out in the data is often at least, or even more important and interesting than learning about its general structure. The branch of data mining concerned with discovering rare occurrences in datasets is called anomaly detection . This problem domain has numerous high-impact applications in security , finance, health care, law enforcement, and many others. Examples include detecting network intrusion or network failure [Ding et al., 2012, Id ´ e and Kashima, 2004, Sun et al., 2008], credit card fraud [Bolton and Hand, 2001], calling card and telecommunications fraud [Cortes et al., 2002, T aniguchi et al., 1998], auto insurance fraud [Phua et al., 2004], health insurance claim er - rors [Kumar et al., 2010], accounting inefficiencies [McGlohon et al., 2009], email and W eb spam [Castillo et al., 2007], opinion deception and revie ws spam [Ott et al., 2012], auction fraud [Pandit et al., 2007], tax ev asion [Abe et al., 2010, W u et al., 2012], customer activity monitoring and user profiling [Fawcett and Provost, 1996, 1999], click fraud [Jansen, 2008, Kshetri, 2010], securities fraud [Ne ville et al., 2005], malicious cargo shipments [Das and Schneider, 2007, Eberle and Holder, 2007] malware/sp yware detection [In vernizzi and Comparetti, 2012, Ma et al., 2009, Prov os et al., 2007], false advertising [Lee et al., 2010], data-center monitoring [Li et al., 2011b], insider threat [Eberle and Holder, 2009], image/video surveillance [Damnjanovic et al., 2008, Krausz and Herpers, 2010], and man y others. In addition to rev ealing suspicious behavior , anomaly detection is vital for spot- ting rare events, such as rare disease outbreaks or side effects in medical domain with vital applications in the medical diagnosis. As “one person’ s signal is another person’ s noise”, yet another application of abnormality detection is data cleaning – i.e., the removal of erroneous values or noise from data as a pre-processing step to learning more accurate models of the data. 1.1 Outliers vs. Graph Anomalies T o tackle the abnormality detection problem, many techniques have been de veloped in the past decades, especially for spotting outliers and anomalies in unstructured collections of multi-dimensional data points. On the other hand, data objects cannot always be treated as points lying in a multi-dimensional space independently . In con- trast, they may exhibit inter-dependencies which should be accounted for during the anomaly detection process (see Figure 1). In fact, data instances in a wide range of disciplines, such as physics, biology , social sciences, and information systems, are in- herently related to one another . Graphs provide a powerful machinery for ef fecti vely capturing these long-range correlations among inter-dependent data objects. T o gi ve an illustrative example, in a re viewer-product revie w graph data, the ex- tent a revie wer is fraudulent depends on what ratings s/he gave to which products, Graph-based Anomaly Detection and Description: A Survey 3 (a) Clouds of points (multi-dimensional) (b) Inter-linked objects (netw ork) Fig. 1 (a) Point-based outlier detection vs. (b) Graph-based anomaly detection. as well as how other revie wers rated the same products, to an extent how trustwor- thy their ratings are, which in turn again depends on what other products they rated, and so on. As can be seen, due to this long-range correlations in real-world datasets, detecting abnormalities in graph data is a significantly different task than that of de- tecting outlying points lying in a multi-dimensional feature space. As a result, re- searchers have recently intensified their study of methods for anomaly detection in structured graph data. Why Graphs? W e highlight four main reasons that make graph-based approaches to anomaly detection vital and necessary: – Inter-dependent nature of the data: As we briefly mentioned above, data objects are often related to each other and exhibit dependencies. In fact, most relational data can be thought of as inter-dependent, which necessitates to account for re- lated objects in finding anomalies. Moreover , this type of datasets are abundant, including biological data such as the food web and protein-protein interaction (PPI) networks, terrorist networks, email and phone-call networks, blog networks, retail networks, social networks, to name b ut a few . – P owerful r epresentation: Graphs naturally represent the inter-dependencies by the introduction of links (or edges) between the related objects. The multiple paths lying between these related objects ef fectiv ely capture their long-range cor- relations. Moreover , a graph representation facilitates the representation of rich datasets enabling the incorporation of node and edge attributes/types. – Relational natur e of pr oblem domains: The nature of anomalies could exhibit themselves as relational. For example in the fraud domain, one could imagine two types of scenarios: (1) opportunistic fraud that spreads by word-of-mouth (if one commits fraud, it is likely that his/her acquaintances will also do so), and (2) organized fraud that takes place by the close collaboration of a related group of subjects. Both of these scenarios point to relational treatment of anomalies. Another example can be gi ven in the performance monitoring domain, where the 4 Leman Akoglu et al. failure of a machine could cause the malfunction of the machines dependent on it. Similarly , the failure of a machine could be a good indicator of the possible other failures of machines in close spatial proximity to it (e.g., due to excessi ve increase of humidity in that particular region of a warehouse). – Robust machinery: Finally , one could argue that graphs serve as more adversarially- robust tools. For example in fraud detection systems, behavioral clues such as log-in times and locations (e.g. IP addresses) can be easily altered or faked by advanced fraudsters. On the other hand, it may be reasonable to argue that the fraudsters could not hav e a global vie w of the entire network (e.g. money transfer , telecommunication, email, re view netw ork) that they are operating in. As such, it would be harder for a fraudster to fit in to this network as good as possible without knowing its entire characteristic structure and dynamic operations. 1.2 Challenges W e first discuss the very immediate challenge associated with our problem of interest. It stems from the fact that no unique definition for the problem of anomaly detection exists. The reason is that the general definition of an anomaly or an outlier is a vague one: the definition becomes meaningful only under a given context or application. The very first definition of an outlier dates back to 1980, and is gi ven by Douglas M. Hawkins [Ha wkins, 1980]: Definition 1 (Hawkins’ Definition of Outlier , 1980) “ An outlier is an observation that dif fers so much from other observations as to arouse suspicion that it w as gener- ated by a different mechanism. ” As one notices, the above definition is quite general and thus make the detection problem an open-ended one. As a result, the problem of anomaly detection has been defined in various ways in different contexts. In other words, the problem has many definitions often tailored for the specific application domain, and also exhibits various names such as outlier, anomaly , outbreak, e vent, change, fraud, detection, etc. In some applications, such as data cleaning, outliers are even called the noise—“one man’ s signal is another man’ s noise”. Nev ertheless, anomaly detection is one of the most evident problems in data mining with numerous applications, and the field of anomaly detection itself is well established. Follo wing the general definition of an outlier by Hawkins as giv en abov e, we provide a general definition for the graph anomaly detection problem as follo ws. Definition 2 (General Graph Anomaly Detection Problem) Given a (plain/attrib uted, static/dynamic) graph database, Find the graph objects (nodes/edges/substructures) that are rare and that differ significantly from the majority of the reference objects in the graph. For practical purposes, a record/point/graph-object is flagged as anomalous if its rarity/likelihood/outlierness score exceeds a user-defined or an estimated threshold. In other words, an anomaly is treated as a data object or a group of objects that is rare Graph-based Anomaly Detection and Description: A Survey 5 (e.g., rare combination of categorical attrib ute values), isolated (e.g., far -away points in n -dimensional spaces), and/or surprising (e.g., data instances that do not fit well in our mental/statistical model, or need too many bits to describe under the Minimum Description Length principle [Rissanen, 1999]). Next, we discuss the challenges associated with anomaly detection and attri- bution, which can be grouped into two: (1) data-specific, and (2) problem-specific challenges. W e also specifically highlight the challenges associated with graph-based anomaly detection. Data-specific challenges: Simply put, the challenges with respect to data are those of working with big data; namely volume, velocity , and variety of massive, streaming, and complex datasets. The same challenges generalize to graph data as well. Scale and Dynamics : W ith the advance of technology , it is much easier than was in the past to collect and analyze very large datasets. As of today , Facebook (graph) consists of more than a billion users 1 (i.e., nodes), the W eb (graph) contains more than 40 billion pages 2 , and over 6 billion users own a cell phone 3 which makes the telecommunication networks billion-scale graphs. Not only is the size of real data in tera- to peta-bytes, but also the rate at which it arriv es is high. Facebook users generate billions of objects (e.g. posts, image/video uploads, etc.), billions of credit card transactions are performed ev ery day , billions of click-through traces of W eb users are generated each day , and so on. This kind of data generation can be thought of as streaming graph data. Complexity : In addition to (graph) data size and dynamicity , the datasets are rich and complex in content; including for example user demographics, interests, roles, as well as different types of relations. As such, incorporation of these additional in- formation sources makes the graph representation a complex one, where nodes and edges can be typed, and hav e a long list of attributes associated with them. As a result, methods which could scale to very large graphs, update their estima- tions when the graph changes ov er time, and that could effecti vely incorporate all the av ailable and useful data sources are essential for graph-based anomaly detection. Problem-specific challenges: Additional challenges arise with respect to the anomaly detection task itself. Lack and Noise of Labels : One main challenge is that the data often comes with- out any class labels, that is, the ground truth of which data instances are anomalous and non-anomalous does not exist. Importantly , the task of manual labeling is quite challenging giv en the size of the data. T o mak e things worse, e ven though endless hu- man power were av ailable, due to the complexity of certain labeling tasks, the labels are expected to be noisy and of v arying quality depending on the annotator . According to Nobel laureate Daniel Kahneman “humans are incorrigibly inconsistent in making summary judgments of complex information” Kahneman [2011]. Surprisingly , they frequently giv e dif ferent answers when ask ed to e valuate the same information twice. For example, experienced radiologists who ev aluate chest X-rays as normal and ab- 1 http://newsroom.fb.com/Key- Facts 2 http://www.worldwidewebsize.com/ 3 http://huff.to/Rc2vbU 6 Leman Akoglu et al. normal are found to contradict themselves 20% of the time when they see the same picture on separate occasions. Due to challenges in obtaining labels, supervised machine learning algorithms are less attractiv e for the task of anomaly detection. It has been shown that humans can perform at best as good as random in labeling a revie w as f ake or not, just by looking at its text [Ott et al., 2011] but can potentially do better by analyzing other relev ant information such as the authors of the re view . Like wise, a single transaction could be treated as anomalous only in relation to a history of previous transactions. These indicate that additional resources and information are needed to obtain human labels, which makes it costly to acquire them and harder and more time-consuming for the human annotators to sort through. What is more, the lack of true labels, i.e. ground truth data, also makes the e valuation of anomaly detection techniques challenging. Class Imbalance and Asymmetric Error : The second challenge arises due to the unbalanced nature of the data; since anomalies are rare only a very small fraction of the data is expected to be abnormal. Moreover , the cost of mislabeling a good data instance versus a bad instance may change depending on the application, and further could be hard to estimate beforehand. For example, mislabeling a cancer patient as healthy could cause fatal consequences while mislabeling an honest customer as a fraudster could cause loss of customer fidelity . If learning-based techniques are to be employed, those issues regarding class imbalance and asymmetric error costs should be carefully accounted for . Novel Anomalies : The third point is the wrist-fight nature of the problem setting, especially in the fraud detection domain. The more the fraudsters understand the ways the detection algorithms w ork, the more the y change their techniques in a way to by- pass the detection and fit-in to the norm. As a result, not only the algorithms should be adaptive to changing and growing data ov er time, they should also be adaptiv e to and be able to detect nov el anomalies in face of adversaries. “Explaining-away” the Anomalies : Additional challenges lie in explaining the anomalies in the post-detection phase. This in volves either digging out the root cause of an anomaly , telling a coherent story for the ‘why’ and ‘ho w’ of the anomaly , and/or presenting the results in a user-friendly form for further analysis. Most of the existing detection techniques, while doing a reasonably good job in spotting the anomalies, completely leav e out this description or attribution phase and thus make it hard for humans to make sense of the outcome. Graph-specific challenges: All of the abo ve challenges associated with the anomaly detection problem generalize to graph data. Graph-based anomaly detection, on the other hand, has additional challenges as well. Inter-dependent Objects : Firstly , the relational nature of the data makes it chal- lenging to quantify the anomalousness of graph objects. While in traditional outlier detection, the objects or data points are treated as independent and identically dis- tributed (i.i.d.) from each other , the objects in graph data have long-range correla- tions. Thus, the “spreading activ ation” of anomalousness or “guilt by associations” need to be carefully accounted for . V ariety of Definitions : Secondly , the definitions of anomalies in graphs are much more diverse than in traditional outlier detection, given the rich representation of Graph-based Anomaly Detection and Description: A Survey 7 graphs. For example, novel types of anomalies related to graph substructures are of interest for many applications, e.g., mone y-laundering rings in trading networks. Size of Sear ch Space : The main challenge associated with more complex anoma- lies such as graph substructures is that the search space is huge, as in many graph theoretical problems associated with graph search. The enumeration of possible sub- structures is combinatorial which makes the problem of finding out the anomalies a much harder task. This search space is enlarged ev en more when the graphs are at- tributed as the possibilities span both the graph structure and the attribute space. As a result, the graph-based anomaly detection algorithms need to be designed not only for effecti veness b ut also for efficiency and scalability . 1.3 Previous Surve ys and Our Contributions There exist very comprehensiv e survey articles on anomaly and outlier detection in general that focus on points of multi-dimensional data instances. In particular, [Chan- dola et al., 2009] covers outlier detection techniques, [Zimek et al., 2012] focuses on outlier detection in high dimensions, and [Schubert et al., 2012] deals with lo- cal outlier detection techniques. In addition, survey and special issue journal articles that address anomaly , ev ent, and change detection include [Chandola et al., 2012, Margineantu et al., 2010, Radke et al., 2005]. Finally , due to the wide-range of ap- plication domains, fraud detection has attracted many surveys [Edge and Sampaio, 2009, Flegel et al., 2010, Phua et al., 2010]. None of the previous surveys, howe ver , discuss the anomaly detection problems in the particular context when one is confronted with large graph datasets. Further, they also do not focus, at least not directly , on graph-based detection techniques. Therefore, in this surv ey we aim to pro vide a comprehensiv e and structured o verview of the state-of-the-art techniques for anomaly , event, and fraud detection in data rep- resented as graphs. As such, our focus is notably dif ferent from, while being comple- mentary to the earlier surve ys. Specifically , our contributions are listed as follo ws. 1. Different from previous surveys on anomaly and outlier detection, we focus on abnormality detection in (large) graph datasets, using graph-based techniques. 2. W e comprehensively explore unsupervised techniques that exploit the graph struc- ture, as well as (semi-) supervised methods that employ relational learning. 3. W e put the abnormality (anomaly , ev ent, fraud) detection methods under a unify- ing lens, point out their connections, pros and cons (e.g., scalability , robustness, generality , etc.) and applications on div erse real-world tasks. 4. In addition to anomaly detection, we highlight the importance of explaining the detected anomalies and provide a surve y of analysis tools and techniques for post- detection exploration and sense-making. 1.4 Overview and Or ganization W e present our survey in four major parts. A general outline and a list of topics we cov er are given as follo ws. 8 Leman Akoglu et al. I. Anomaly detection in static graphs (Section 2) (a) Anomalies in plain (unlabeled) graphs (b) Anomalies in attributed (node-/edge-labeled) graphs II. Anomaly detection in dynamic graphs (Section 3) (a) Feature-based ev ents (b) Decomposition-based ev ents (c) Community- or clustering-based ev ents (d) W indow-based e vents III. Graph-based anomaly description (Section 4) (a) Interpretation-friendly graph anomaly detection (b) Interactiv e graph querying and sense making IV . Graph-based anomaly detection in r eal-world applications (Section 5) (a) Anomalies in telecom networks (f) Anomalies in opinion networks (b) Anomalies in auction networks (g) Anomalies in the W eb network (c) Anomalies in account networks (h) Anomalies in social networks (d) Anomalies in security networks (i) Anomalies in computer networks (e) Anomalies in financial networks The first part focuses on anomaly detection methods for static graph data, and is cov ered for both unlabeled (plain) and labeled (attributed) graphs. The second part focuses on change or ev ent detection approaches for time-v arying or dynamic graph data, based on edit distances and connectivity structure. The ov erview of the first two sections, along with the areas with open problems and challenges are provided in T able 1. In the third part, we stress the importance of anomaly attribution in re- vealing the root-cause of the detected anomalies and in presenting anomalies in a user-friendly form. W e provide the state-of-the-art tools that could facilitate the post- analysis of detected anomalies for the crucial task of sense-making. Finally , in the fourth and last part we demonstrate graph-based anomaly detection techniques in action, where we discuss sev eral real-world applications in div erse domains. T able 1 Categorization of graph-based techniques in Section 2 and Section 3. Plain Attributed Static [Section 2.1] [Section 2.2] Open Dynamic [Section 3.2] [Section 3.2] Many Open Challenges W e show the general outline of our survey in Figure 2 illustrating a sketch of the taxonomy . In the first two parts, namely static and dynamic graph anomalies, we focus on unsupervised techniques as well as (semi-) supervised approaches based on relational classification. Later in the third part, we focus on qualitative analysis tech- niques for the sense-making of spotted anomalies. Finally in part four, we present a long list of applications of graph-based anomaly detection in a wide range of net- works, including finance, security , accounting, to name a few . Graph-based Anomaly Detection and Description: A Survey 9 Graph-based Detection ( Quantitative detection) Anomaly Detection in Static Graphs Anomaly Detection in Dynamic Graphs Graph-based Anomaly Detection Applications Community based Plain Attributed Structure based Community based Feature based Plain Decomposition based Graph-based Description ( Qualitative explanation) Analysis Attribution Interactive querying Visualization Sense making Relational learning based Community based Window based T elecom networks Auction networks Account networks Security networks Opinion networks Financial networks Web network Social networks Computer networks Structure based Fig. 2 Graph-anomaly detection: the outline of the survey . 2 Anomaly Detection in Static Graphs In this section, we will address the anomaly detection in static snapshots of graphs. That is, the main task here is to spot anomalous network entities (e.g., nodes, edges, subgraphs) given the entire graph structure. W e start with a very brief ov erview of outlier detection techniques in static clouds of data points and provide pointers for further reading. Next, we surv ey anomaly detection techniques for static graphs. Overview: Outliers in Clouds of Data P oints Outlier detection deals with the problem of spotting outlying points in the (high- dimensional) feature space of data points. While not directly related, outlier detec- tion techniques are employed in graph-based anomaly detection, for example after a graph-feature extraction step as we describe in this section. Thus it is beneficial to know of general outlier detection methods for spotting graph anomalies. In outlier detection, some methods provide binary 0/1 classification of data points, i.e. outlier vs. non-outlier , while most methods try to assign what is called an outlier - ness score that enables the quantification of the le vel of outlierness of the objects and subsequently rank the objects accordingly . For an illustration, see Figure 1(a). There are se veral different ways of multi-dimensional outlier detection. The tech- niques can be classified into density-based [Breunig et al., 2000, P apadimitriou et al., 2003], distance-based [Aggarwal and Y u, 2001, Chaudhary et al., 2002, Ghoting et al., 2008, Knorr and Ng, 1998, Orair et al., 2010, W ang et al., 2011b], depth-based [Ruts and Rousseeuw, 1996], distribution-based [Saltenis, 2004], clustering-based [He et al., 2003, Lieto et al., 2008, Miller and Browning, 2003, W ang et al., 2012c], classification-based [Abe et al., 2006, Hempstalk et al., 2008, Janssens et al., 2009], information theory-based [Ando, 2007, B ¨ ohm et al., 2009, Smets and Vreeken, 2011], spectrum-based [Liu et al., 2013], and subspace-based [Keller et al., 2012, Kriegel et al., 2012, M ¨ uller et al., 2010, 2012] techniques. Moreov er, there e xist outlier detec- 10 Leman Akoglu et al. tion techniques that can work with categorical features [Ak oglu et al., 2012c, Das and Schneider, 2007, Smets and Vreeken, 2011], or a mixture of both types of features [Otey et al., 2006] in addition to one-class classification-based approaches Janssens et al. [2009], Pauwels and Ambekar [2011]. W e refer the reader to a comprehensive surve y on outlier detection for more dis- cussion and details [Chandola et al., 2012] as well as a recent book by [Aggarwal, 2013] on outlier analysis with comprehensiv e details on these techniques. Anomalies in Static Graph Data … … 12 13 14 2 2 23 W e will study anomaly detection in graph data under two settings: 1) plain graphs, and 2) attributed graphs. An attributed graph is a graph where nodes and/or edges have features associated with them. F or example in a so- cial network, users may have various interests, work/li ve at different locations, be of various education lev els, etc. while the relational links may have various strengths, types, frequency , etc. A plain graph, on the other hand, consists of only nodes and edges among those nodes, i.e. the graph structure. While the specific definition of the graph anomalies may vary , a general definition for the anomaly detection problem for static graphs can be stated as follows: Definition 3 (Static-Graph Anomaly Detection Problem) Given the snapshot of a (plain or attrib uted) graph database, Find the nodes and/or edges and/or substructures that are “fe w and different” or deviate significantly from the patterns observ ed in the graph. 2.1 Anomalies in static plain graphs For a gi ven plain graph, the only information about it is its structure. This category of anomaly detection methods thus e xploit the structure of the graph to find patterns and spot anomalies. These structural patterns can be grouped further into two categories: structur e-based patterns and community-based patterns. 2.1.1 Structure based methods W e organize the structure-based approaches into two: feature-based and proximity- based. The first group exploits the graph structure to extract graph-centric features such as node degree and subgraph centrality , while the second group uses the graph structure to quantify the closeness of nodes in the graph to identify associations. Featur e-based approaches: Main idea: This group of approaches uses the graph representation to extract struc- tural graph-centric features that are sometimes used together with other features ex- tracted from additional information sources for outlier detection in the constructed Graph-based Anomaly Detection and Description: A Survey 11 feature space. Essentially , these methods transform the graph anomaly detection problem to the well-known and understood outlier detection problem. Graph-centric featur es: One could use the gi ven graph structure to compute various measures associated with the nodes, dyads, triads, egonets, communities, as well as the global graph structure [Henderson et al., 2010]. These features hav e been used in sev eral anomaly detection applications including W eb spam [Becchetti et al., 2006] and network intrusion [Ding et al., 2012] as we will discuss in detail in Section 5. The node-level features include (in/out) degrees, centrality measures such as eigen vector [Bonacich and Lloyd, 2001], closeness [Noh and Rieger, 2004], and be- tweenness [Freeman, 1977] centralities, local clustering coefficient [W atts and Stro- gatz, 1998], radius [Kang et al., 2011c], degree assortati vity , and most recently , roles [Henderson et al., 2012]. The dyadic features include reciprocity [Akoglu et al., 2012a], edge betweenness, number of common neighbors, as well as sev eral other local network overlap measures [Gupte and Eliassi-Rad, 2012, Liben-Nowell and Kleinberg, 2003]. [Akoglu et al., 2010] introduce e gonet features such as its number of triangles, total weight, principal eigenv alue, etc. as well as their pairwise corre- lation patterns. [Henderson et al., 2011] enrich and extend the possible graph-based features with recursiv ely aggre gating existing features. The node-gr oup-level features can be listed as compactness measures, such as density , modularity [Newman, 2006], and conductance [Andersen et al., 2006]. Finally , examples to global measures in- clude number of connected components, distribution of component sizes [Kang et al., 2010], principal eigenv alue, minimum spanning tree weight, av erage node degree, global clustering coefficient, to name b ut a few . Appr oaches: A feature-based anomaly detection technique called O D D B A L L is pro- posed by [Akoglu et al., 2010], which extracts egonet-based features and finds pat- terns that most of the egonets of the graph follow with respect to those features. As such, this method can spot anomalous egonets (and hence anomalous nodes), as those that do not follow the observ ed patterns. An e gonet is defined as the 1-step neighborhood around a node; including the node, its direct neighbors, and all the connections among these nodes (an e xample is sho wn on the right figure). More formally an egonet is the induced 1-step sub-graph for each node. Giv en the egonets, the main ques- tion and challenge is which features to look at, as there is a long list of possible graph-based measures that can be extracted as egonet features. The paper proposes a carefully chosen subset of features (e.g. number of triangles, total weight of edges, etc.) that are (1) observ ed to yield patterns across a wide range of real-world graphs, and (2) fast to compute and easy to interpret. The egonet features are then studied in pairs and sev eral patterns in the form of power -laws are observed among strongly related features (e.g. number of neighbors and number of triangles). For a gi ven egonet, its deviation from a particular pattern is computed based on its “distance” to the relev ant po wer-law distribution. Each egonet then receiv es a separate deviation, or outlierness, score with respect to each pattern. The multiple scores a node recei ves from v arious observed patterns brings up the question of how to combine them to obtain the final scores or final ranking. Sev eral 12 Leman Akoglu et al. works [Gao and T an, 2006, Lazarevic and Kumar, 2005] have proposed solutions to how to unite multiple outlierness scores. This problem is addressed in works on outlier ensembles, as discussed in Aggarwal [2012], Zimek et al. [2014]. There are several adv antages of analyzing the e gonet features in pairs, rather than in union. First, this facilitates the visualization of the patterns and outliers in 2-d for post-analysis. Second, the low dimensionality of the feature space helps with interpretability of the results, that is, one can tell what type of anomalies a node belongs to based on its de viation from a particular pattern, or “law”. As an example, in [Kang et al., 2014], the authors propose a package for visualization of billion-scale graphs by focusing on correlation plots (node features in pairs), as well as the spy plot and distribution plots for v arious features. The visualization tool is carefully designed to make the outliers pronounced e ven by a simple inspection. Later work by [Henderson et al., 2011] extends the feature base by recursiv ely combining node-based (“local”) and egonet-based (neighborhood) features. A recur- siv e feature is defined as some aggregate value (e.g. mean, min, max) computed over any existing feature v alue (including recursive ones) among a node’ s neighbors. Intu- itiv ely , local and egonet features capture neighborhood information, whereas recur- siv e features enable to go beyond direct neighborhood to capture more of “regional” or behavioral information. An iterativ e procedure with run time complexity linear in graph size is detailed in the paper to compute recursiv e features and prune highly correlated features on the go. Proximity-based appr oaches: Main idea: This group of techniques exploits the graph structure to measure close- ness (or proximity) of objects in the graph. These methods capture the simple auto- correlation between these objects, where close-by objects are considered to be likely to belong to the same class (e.g., malicious/benign or infected/healthy). Appr oaches: Measuring the importance of the nodes in a graph is one of the most widely studied graph problems. PageRank [Brin and Page, 1998] is one of the most popular algorithms which is based on random walks. A random walk on the (un- weighted) graph jumps randomly from node to node. If currently present on a node u , a random walk in the next step jumps to one of its neighbors with equal probability 1 / d u where d u is the degree of node u . The stationary probability distribution of the random walk on the graph is then considered to rank the nodes by their “importance”. This walk is known to conv erge if the transition matrix, the entries of which de- note the jump probabilities between neighboring nodes, is stochastic, aperiodic, and irreducible [Feller, 1968]. On an undirected graph, the stationary probability of a random walk at node u is directly proportional to its degree d u , and is independent of the starting node. On directed graphs, it is probable that the irreducibility condi- tion, which states that there is a non-zero probability of going from any one node to any other, will be unmet (e.g., in the existence of sink nodes and multiple strongly connected components). T o resolve these issues, a random restart of the walk is per- formed with a certain probability α ∈ ( 0 , 1 ) (a.k.a. the damping factor), where the restart node is chosen at random. Graph-based Anomaly Detection and Description: A Survey 13 A widely used graph-closeness measure that is also based on random walks but with an extension of restarts to a particular node is the Personalized P ageRank (PPR) [Hav eliwala, 2003]. Giv en a restart node q and the parameter α consider the random walk with restart, starting at node q , such that at any step when currently present at a node u , it chooses any of its neighbors with equal probability ( 1 − α ) / d u , and returns to the restart node q with probability α . The stationary probability at any node v of the random walk with restart is defined as the PPR score of v with respect to the restart node q . A more general version of this measure can be giv en for a set Q of restart nodes, where their restart probabilities sum to α . This type of PageRank computation is often referred to as the Biased PageRank. The stationary distribution of probabilities indicates the proximity (or closeness) of each node in the graph with respect to the (set of) restart node(s), and is higher for the nodes that have many , short, and high weighted paths to the restart node(s). Another graph proximity measure that quantifies the closeness of two nodes in the graph is SimRank [Jeh and W idom, 2002], which computes similarity of the struc- tural conte xt in which the graph objects occur , based on their relationships with other objects. It is often thought as measuring how soon two random surfers starting from the two nodes are expected to meet each other by randomly walking “backwards” in the graph. Se veral variants of SimRank are also proposed by [Antonellis et al., 2008, Chen and Giles, 2013, Zhao et al., 2009]. Finally , many link prediction approaches essentially quantify the similarity or closeness of pairs of nodes in the graph. Se veral such measures of varying computa- tional complexity exist. The simple ones include the Jaccard proximity , which is the normalized number of common neighbors of the two nodes. Others include the to- tal number of paths or node-disjoint paths. The slightly more complex Katz measure Katz [1953] counts all the paths weighted inv ersely proportional to the path length. For a well documented list and ev aluation of these as well as other measures, we refer the reader to Liben-Nowell and Kleinber g [2003]. 2.1.2 Community based methods Main idea: The cluster or community-based methods for graph anomaly detection rely on finding densely connected groups of “close-by” nodes in the graph and spot nodes and/or edges that have connections across communities. In fact, the definition of anomaly under this setting can be thought of as finding “bridge” nodes/edges that do not directly belong to one particular community . Appr oaches: Methods that exploit communities or proximity of nodes in the graph to spot (node) anomalies in bipartite graphs include [Sun et al., 2005]. Several real- world data can be represented with bipartite graphs where the bridge nodes rev eal interesting phenomena. Examples include publication networks: authors vs. (unusual) papers written by authors from different research communities; P2P networks: users vs. (cross-border) files; financial trading networks: stocks vs. (cross-sector) traders; and customer-product netw orks: users vs. (“cross-border”) products. The two main problems addressed in [Sun et al., 2005] are (P1) how to find the community of a given node, which is also referred as the “neighborhood” of a node, 14 Leman Akoglu et al. and (P2) how to quantify the lev el of the given node to be a bridge node. For (P1), the authors use random-walk-with-restart-based Personalized PageRank (PPR) scores [Hav eliwala, 2003] of all the nodes with respect to the giv en node, where those nodes with high PPR scores constitute the neighborhood of a node. On similar lines, for (P2) the pairwise PPR scores among all the neighbors of the giv en node are aggregated by av eraging to compute a so-called “normality” score of a node. Intuiti vely , nodes with low normality scores ha ve neighbors with low pairwise proximity to one another . This suggests that the neighbors lie in different, separate communities, which makes the giv en node resemble a bridging node across communities. A U T O P A RT [Chakrabarti, 2004] is based on the notion that nodes with similar neighbors are clustered together, and the edges that do not belong to any struc- ture constitute anomalies (e.g. cross-cluster bridge edges). Similarly , nodes that have many cross-connections to multiple different communities are considered not to be- long to any particular cluster and thus also constitute anomalies. For finding com- munities in a graph, the algorithm re-organizes the rows and columns of the adja- cency matrix into a few homogeneous blocks (of either low or high density). These blocks have the property of containing nodes that are more densely connected to- gether than with the rest of the nodes in the graph—which is the underlying idea in clustering. [Chakrabarti, 2004] dev elops a parameter-free, iterativ e algorithms based on the Minimum Description Length principle [Rissanen, 1999] for rearranging the rows and columns, as well as for finding the best number of blocks or node groups automatically without requiring any user input. Another method that aims to spot (node and edge) anomalies based on graph communities [T ong and Lin, 2011] relies on matrix factorization. Matrix factoriza- tion has been used to address sev eral problems ranging from dimensionality reduc- tion [Ambai et al., 2011, Nikulin and Huang, 2012] to (graph) clustering [Kuang et al., 2012, W ang et al., 2012b]. The factorization of a data matrix A is often formu- lated as A = X × Y + R , where X and Y are the low rank factors and R denotes the residual matrix. In traditional non-negativ e matrix factorization (N M F ), there exists additional constraints on the non-neg ativity of both X and Y , which for example aids in determining the communities. Different from this traditional approach, the main idea for finding anomalies is to wai ve these original constraints but instead enforce non-negati vity constraints on the r esidual matrix for interpretability (hence the name N R M F ). The approach prov es effecti ve in spotting “strange” connections, such as port-scanning-like or ddos-like activity , bridging connections, as well as bipartite- core structures with the help of the non-negati ve residual matrix. The “bridge” nodes and/or edges can be seen as intrusive connectors and/or con- nections that cross the community boundaries in computer security . For example, [Ding et al., 2012] reg ards intrusion as entering a community to which one does not belong, and looks for communication that does not respect the community bound- aries. Analysis shows that cut-vertices (vertices the removal of which disconnects the graph into components) correspond well with ground-truth traffic sources that at- tempted an intrusion, by sending malicious or unwanted traf fic. This work essentially shows one of the real-world applications that community-based anomaly detection methods prov e to be effecti ve. Graph-based Anomaly Detection and Description: A Survey 15 Other community-based netw ork outlier detection methods directly focus on net- work clustering, and in the process, spot hubs and outliers as a by-product [Sun et al., 2010, Xu et al., 2007]. T o find netw ork clusters, S C A N [Xu et al., 2007] exploits the neighborhood of vertices; v ertices sharing man y neighbors are grouped into the same clusters. As such, vertices that are bridging many clusters are labeled as hubs, whereas those that cannot be assigned to any community are flagged as outliers. T o overcome the issue of selecting the minimum similarity threshold parameter of [Xu et al., 2007], [Sun et al., 2010] proposes a novel clustering framew ork called G S K E L E T O N C L U that also aims to find hubs and outliers as byproduct of the graph clustering. 2.2 Anomalies in static attributed graphs For certain kinds of data, it is possible to hav e a richer graph representation, in which nodes and edges exhibit (non-unique) attributes. Examples to such graphs include social networks with user interests as attributes, transaction networks with time, lo- cation, and amount as attributes, cargo shipments with visited ports, financial infor- mation, type of transported goods as attributes, and so on. 4 This category of anomaly detection methods on attributed graphs exploit the structure as well as the coherence of attributes of the graph to find patterns and spot anomalies. These methods can also be grouped into two: structure-based and community-based methods. In a nutshell, the structure-based methods exploit fre- quent substructure and subgraph patterns to spot deformations in these pattens, while community-based methods aim to spot what is called community-outliers that do not exhibit the same characteristics as the others in the same community . 2.2.1 Structure based methods Main idea: Structure based approaches mainly aim to identify substructures in the graph that are rare structurally , i.e. connectivity-wise, as well as attribute-wise. As such, in verse of frequent attributed subgraphs are sought out. The dif ferences from these normativ e substructures are quantified in various ways as we describe belo w . Appr oaches: One of the earliest works on attributed graph anomaly detection by [No- ble and Cook, 2003] addresses two related problems: (P1) the problem of finding unusual substructures in a given graph, and (P2) the problem of finding the unusual subgraphs among a given set of subgraphs, in which nodes and edges contain (non- unique) attributes. Main insight to solve these problems is to look for structures that occur infrequently , which are roughly opposite to what is called the “best substruc- tures”. Intuitiv ely , best substructures are those that occur frequently in the graph and thus can compress the graph well. An information-theoretic formulation based on the Minimum Description Length (MDL) principle [Rissanen, 1999] that trades off be- tween compression quality and the size of such substructures (as the entire graph is the best compressor) is devised as an objecti ve. 4 W e will use the words ‘attribute’ and ‘feature’ interchangeably throughout text. 16 Leman Akoglu et al. The main idea for detecting unusual substructures (P1) is to define a measure that is in versely related to the MDL-based measure defined for the best substructures and rank substructures by this new measure. Similarly , the main idea for finding the un- usual subgraphs (P2) is to define a measure that penalizes those subgraphs containing few common (i.e. best) substructures, making them more anomalous. The methods by [Noble and Cook, 2003] essentially b uild on frequent subgraphs with categorical attributes. On the other hand, most often datasets come with a mix of both numerical and categorical attributes, e.g. dollar amounts in transaction data and number of (e.g., Ping, SYN, etc.) requests in network log data. T reating each numer- ical value as a distinct attribute loses ordering and closeness information. T o address this problem [Davis et al., 2011] proposed discretizing the numerical attributes, where the majority “normal” values are assigned the same single categorical attribute, and all other values are assigned their “outlierness” score. Several discretization mecha- nisms, e.g. based on fitting probability density functions, k -NNs, outlier detection (in particular LOF [Breunig et al., 2000]), and clustering (CbLOF [He et al., 2003]), have been studied. W e also include other discretization techniques that could apply under this setting such as SAX [Lin et al., 2003], MDL-binning [K ontkanen and Myllymki, 2007], and minimum entropy discretization [Fayyad and Irani, 1993]. Later work by [Eberle and Holder, 2007] follows a different insight to look for anomalies than the previous work. Rather than focusing on infrequent substructures, they go after those substructures that are very similar to, though not the same as, a normative (i.e. best) substructure . A statement by United Nations Office on Drugs and Crime corroborates this insight: “The more successful money-laundering appa- ratus is in imitating the patterns and behavior of legitimate transactions, the less the likelihood of it being exposed. ” Using the insight that an intruder would make at most a certain number of changes to blend in with the normal data instances and lower their chances of being detected glaringly , the work by [Eberle and Holder, 2007] formulates three types of anomalous cases based on modification, insertion, and deletion. They formulate v arious anomaly scores that use both (in)frequenc y and modification cost (the lo wer , the more anoma- lous). W e note that the anomalies are assumed to consist of only one type of anomaly , which is prone to missing e.g., a deletion followed by a modification. On similar lines, [Liu et al., 2005] use subgraphs of attributed graphs for detecting non-crashing software bugs. In this type of application domain, ev ery ex ecution of a software program is represented as an attrib uted graph called behavior graph, where nodes denote functions (attributed with function names), and (directed) edges depict function calls or function transitions. Different from previous methods discussed so far , the idea here is to train a classification model that tak es as input positive and ne g- ativ e behavior graphs for correct and incorrect executions, respectively . First, (closed) frequent subgraphs are extracted from a set of behavior graphs, which are then used as features in training a classification model. The pattern-based (e.g. frequent substructures) anomaly detection techniques as described above make them interpretable and amenable for post-analysis by domain experts to re veal the root cause. Moreov er, these methods are quite generally defined such that they can be applied on various types of data and scenarios where the data can be represented as attributed (sub)graphs (like the software execution flow-graphs). Graph-based Anomaly Detection and Description: A Survey 17 On the other hand, this generality comes at a cost of high false positiv e rates, as not all rare occurrences can be attributed to anomalous cases. Furthermore several user- specified thresholds, such as the amount of alteration threshold or subgraph frequency threshold, make it hard to trade of f false positiv e and false negati ve rates by the user . 2.2.2 Community based methods Main idea: These approaches aim to identify those nodes in a graph, often called the community outliers, the attribute values of which de viate significantly from the other members of the specific communities that the y belong to. For example, a smoker in a community of v astly non-smoker baseball players is an e xample of a community out- lier . As such, communities are analyzed based on both link and attribute similarities of the nodes they consist of. While some methods aim to detect outliers simultane- ously with detecting the communities in the graph, some perform the outlier detection as a second step after performing the attributed graph clustering. Appr oaches: [Gao et al., 2010a] differentiates graph-based community outlier detec- tion from three closely related problems; namely , global outlier detection that only considers node attributes, structural outlier detection that only considers links (e.g. [Xu et al., 2007]) (as is discussed in the pre vious section), and local outlier detection that only considers attribute v alues of direct neighbors. While interesting on their own right, these three types of methods are prone to miss outliers in the unison of these—outliers with respect to other community members’ attributes. They de velop a unified probabilistic model that simultaneously finds communities as well as spot community outliers. The unsupervised learning algorithm called C O DA alternates between the two steps of parameter estimation (fixed cluster assignment), and infer- ence for cluster assignments (fixed parameters). As with the nature of such learning algorithms, the good initialization of clusters at the be ginning is a crucial step for the algorithm to reach a good solution. Moreover , the conv ergence of the algorithm is not guaranteed. One way that is used to find a good initialization is to employ a graph clustering algorithm to find a first-cut good quality clustering based on only the link structure, which also helps with faster con ver gence. Recently [M ¨ uller et al., 2013] dev eloped a node outlier ranking technique in at- tributed graphs called G O U T R A N K . Different from [Gao et al., 2010a], their main insight into community outlier detection is the fact that the complex anomalies could be revealed in only a subset of relevant attributes (a.k.a. subspaces). This becomes more apparent especially in high dimensional feature spaces due to the curse of di- mensionality [Beyer et al., 1999]. Roughly speaking, all objects appear to be sparse and dissimilar in high dimensions, or in other words, all the distances between pairs of objects look similar causing all the objects to be equally (dis)similar to one an- other . In their work, they also consider quantifying the degree of deviation for each node-outlier which is beyond binary detection. As such, they address tw o main chal- lenges associated with community outlier detection in attributed graphs; the selection of subgraphs and subspaces, and the scoring of nodes in multiple subspace clusters. Other related work mainly addresses the problem of attributed graph clustering without focusing on outlier detection, including [Akoglu et al., 2012b, Boden et al., 2012a,b, G ¨ unnemann et al., 2010, 2012]. These methods could form the basis for 18 Leman Akoglu et al. community outlier detection in a post-processing step, as opposed to inte grated clus- tering and outlier detection in one algorithm as with the techniques discussed above. During post-processing, nodes that could not be assigned to a “large enough” com- munity (e.g., singletons or micro-clusters) could be analyzed further , or the nodes the remov al from a community of which increases a “fitness” score of the community can be flagged as abnormal. 2.2.3 Relational learning based methods Main idea: This group consists of network-based collecti ve classification algorithms the main idea of which is to exploit the relationships between the objects to assign them into classes, where the number of classes is often two: anomalous and normal. Different from proximity-based approaches which aim to quantify auto-correlations among graph objects, these algorithms are often more complex and thus can model and exploit more comple x correlations between the graph objects. Appr oaches: Classification is the problem of assigning class labels to, or shortly la- beling, data instances based on their observed attributes. Anomaly detection can be formulated as a classification problem, when one has a representative labeled data av ailable. For e xample, determining whether a W eb page is spam or non-spam based upon the words that appear in it and identification of benign/malicious web pages, fraud/legitimate transactions, etc. can all be thought of as two-w ay classification problems. When the labeled data size is reasonably large, one can employ fully su- pervised classification, where the labeled data is used for model learning. When la- beled data is scarce, but still av ailable, one can employ semi-supervised classification, where the learning is done by simultaneously using labeled and unlabeled data. In traditional statistical machine learning approaches, the instances are often as- sumed to be independent, and identically distributed (i.i.d) and often the learning algorithms ignore the dependencies among data instances. Relational classification, on the other hand, is the task of inferring the class labels of a network of objects simultaneously or collectiv ely . The underlying assumption in relational classification is that the relationships between objects carry important information for classifying the objects, such as two linked W eb pages. In many cases, there is a simple auto- correlation between the objects, where the linked objects are likely to have the same labels (e.g. spam pages link to other spam pages, infected people are linked to other infected people). In other cases, more complex correlations may be exhibited (e.g. fraudsters trade with honest people and not with other fraudsters). There exist a large amount of research on relational classification methods [Fried- man et al., 1999, Jensen et al., 2004, Lu and Getoor, 2003, Macskassy and Provost, 2003, Ne ville and Jensen, 2000, 2003, Ne ville et al., 2003, T askar et al., 2002]. Gen- erally , these methods exploit one or more of the follo wing input: 1. the class labels of its neighbors, and 2. the node attributes (features), 3. the attributes of the node’ s neighbors. W e note that although it is possible that some methods described in this section are amenable to use only the first type of information, i.e. nodes’ class labels, and need Graph-based Anomaly Detection and Description: A Survey 19 not exploit node attributes, most methods are easily generalizable to incorporating node attribute information, if a v ailable. Thus, we cover these methods in this section that is attributed to anomaly detection in attributed graphs, and remark that some methods do apply to plain graphs as well. Relational classification methods can be categorized into local and global meth- ods [Sen et al., 2008]. The local algorithms build local predicti ve models for the class of a node in the network and use often iterativ e inference procedures to collectiv ely classify the unlabeled objects. The second group of algorithms define a global formu- lation of class dependencies and use inference algorithms to solve for the assignments that would maximize the joint probability distrib ution. The techniques for the local methods can dif fer in both the local models and the inference methods that they use. [Chakrabarti, 2007] use Naiv e Bayes models for the local attributes of the object and the class labels of the neighbor objects. They then use mean field relaxation labeling for the inference. [Neville and Jensen, 2000] also use a Naiv e Bayes model for the attributes, but they use an iterativ e classification algorithm (ICA) for inference. In later work, they inv estigate the use of relational dependency networks (RDNs) and the inference algorithm is based on Gibbs sampling [Neville and Jensen, 2003]. [Lu and Getoor, 2003] use logistic regression as a local model and ICA for inference but they explore v arious ways of aggregation that can be used for the class labels of the related objects. For sparsely labeled networks, [Gallagher et al., 2008] propose ways to infer “ghost” edges based on graph closeness to improve classification performance. As for the global methods, [Friedman et al., 1999] use probabilistic relational models (PRMs) as a (full joint) model and then use Loopy Belief Propagation (LBP) [Y edidia et al., 2003] for the inference. [T askar et al., 2002] use relational Marko v net- works (RMNs) as a (full joint) model and also use LBP for inference. [Macskassy and Prov ost, 2003] propose a simple baseline algorithm called (probabilistic) weighted- vote relational network (wv-RN) classifier where they use only the class labels of objects for classification; they infer the class label of an object by taking a weighted av erage of the potentially inferred class labels of the related objects iterativ ely . Other global formulations are based on Markov logic netw orks (MLNs). All in all, the relational inference algorithms mentioned abov e can be listed as – Iterativ e Classification Algorithm (ICA) – Gibbs Sampling – Loopy Belief Propagation – W eighted-V ote Relational Network Classifier All these algorithms are fast, iterative, approximate inference algorithms, since exact inference is kno wn to be NP-hard in arbitrary networks [Cooper, 1990]. More- ov er, con vergence is not guaranteed for an y of them. Node ordering for updates (e.g., random, diversity-based) may alter the classification results. For local methods, ad- ditional challenges include feature construction and local classification. For feature construction one has to decide whether to consider in-, out-, or both neighbors, and aggregation method of neighbor labels (e.g., max, mode, count), as well as choice of neighbors to consider (e.g., all, top-k most confidently labeled). W ith respect to local classification, one requires training data, and has to choose the classifier type (e.g., Naiv e Bayes, logistic regression, k-NN, SVM). 20 Leman Akoglu et al. W ith respect to scalability , these methods mostly rely on message passing or in- formation aggregation over neighbors and thus scale linearly with number of edges in the graph. Recently , techniques to speed up inference for massive graphs, especially based on LBP , ha ve been proposed by [Kang et al., 2011a, K outra et al., 2011]. Before concluding this section on graph anomaly detection techniques in static graphs, we provide a summary and qualitative comparison of the detection algorithms presented in this section in T able 2. Graph-based Anomaly Detection and Description: A Survey 21 T able 2 Qualitative and quantitative comparison of anomaly detection algorithms for static graphs. The first four columns refer to the type of graphs that an algorithm can be applied to (with or without weights on the edges, with or without attributes for the nodes); “Linear” holds true for those methods that have time complexity linear in the number of edges of the input graph (and more otherwise); “Parameter -free” methods correspond to those that do not expect any user-specified input parameters; “Output format” corresponds to the output type/format of the method (e.g. anomaly scores and their ranges, binary output/classification e.g. anomalous or not); and “V isualization” refers to the graphical means used –if any– to present the anomalous instances to the user (e.g., distrib ution plots, graph with the anomalous nodes/edges annotated). Graphs Algorithm W eighted Unweighted Attributed Plain Linear Parameter -free Output format V isualization O DD B A L L [Akoglu et al., 2010], [Henderson et al., 2011] 3 3 7 3 7 3 [ 0 , ∞ ] node anomaly scores pairwise feature scatter plots, egonets [Sun et al., 2005] 3 3 7 3 7 3 [ 0 , 1 ] node normality scores score distribution A U TO PART [Chakrabarti, 2004] 7 3 7 3 3 3 binary edge classification adjacency matrix org anized by node clusters N N R M F [T ong and Lin, 2011] 3 3 7 3 3 3 binary edge/node classification residual matrix [Ding et al., 2012] 3 3 7 3 7 3 binary node classification egonets S CA N [Xu et al., 2007] 7 3 7 3 3 7 binary node classification clustering with hub&outlier nodes G S K E L ET ON C L U [Sun et al., 2010] 3 3 7 3 7 3 binary node classification clustering with hub&outlier nodes S UB D U E [Noble and Cook, 2003] 7 3 3 7 7 7 substructure anomaly score ∈ N graph substructures S UB D U E [Noble and Cook, 2003] 7 3 3 7 7 7 [ 0 , 1 ] subgraph anomaly score subgraphs S UB D U E [Eberle and Holder, 2007] 7 3 3 7 7 7 [ 0 , ∞ ] subgraph anomaly score modified subgraphs [Liu et al., 2005] 7 3 3 7 7 7 binary graph classification graphs with traced-back crashing points C OD A [Gao et al., 2010a] 3 3 3 7 3 7 binary node classification graph clustering with community outlier nodes G O U T R AN K [M ¨ uller et al., 2013] 7 3 3 7 3 7 [ 0 , ∞ ] node anomaly scores subspace clustering and outlier nodes 22 Leman Akoglu et al. 3 Anomaly Detection in Dynamic Graphs 3.1 Overview: Ev ent detection in time series of data points In the literature, there is abundance of work on ev ent detection on data series: sta- tistical quality control [Montgomery, 1997]; the famous auto-regressi ve moving av- erage model used for predictions [Box and Jenkins, 1990]; a drift detection method [Gama et al., 2004]; a chart-based approach for monitoring temporal, medical data [Grigg et al., 2003]; change detection in categorical data [Bay and Pazzani, 1999]; StreamKrimp, an MDL-based algorithm [Leeuwen and Siebes, 2008]; detection of disease outbreaks [W ong et al., 2005]. A nice tutorial that covers ev ent detection in data series is [Neill and W ong, 2009] and a survey on outlier detection for temporal data is [Gupta et al., 2013]. 3.2 Event detection in time series of graph data … … 12 13 14 2 2 23 This section provides an overvie w of the anomaly detection algorithms that have been proposed for dynamic or time-evolving graphs (i.e. sequences of static graphs), the ev olution of which as well as their communities hav e been studied by se veral re- search groups [Backstrom et al., 2006, Lesko vec et al., 2005]. In addition, [Aggarw al and Subbian, 2014] provides a comprehensive surve y on e volutionary network anal- ysis. The anomaly detection problem for dynamic graphs, which is the main focus of our survey , is also known as temporal anomalous pattern detection, ev ent detection, change-point detection, and is commonly defined as follows: Definition 4 (Dynamic-Graph Anomaly Detection Problem) Given a sequence of (plain or attrib uted) graphs, Find (i) the timestamps that correspond to a chang e or event , as well as (ii) the top- k nodes, edges, or parts of the graphs that contribute most to the change ( attribution ). Depending on the application domain, the requirements of the algorithms vary , but among the most usual desired properties are: – Scalability . As instructed by the size and volume of the graphs that are produced daily , ideally , the algorithms should be linear or sub-linear on the size of the input graphs. In the dynamic setting, an additional, desired property is that the algorithm should be linear on the size of the update of the input graphs. – Sensitivity to structural and contextual chang es . The anomaly detection methods should be able to discern structural differences between the input graphs under comparison (e.g., missing/new edges, missing/ne w nodes, changes in the weights Graph-based Anomaly Detection and Description: A Survey 23 of the edges), as well as changes in other properties of the graphs, such as labels of the nodes or edges. – Importance-of-change awareness . The algorithms should be sensible to the type and extent of change. Changes in “important” nodes, edges or other graph at- tributes should result in greater anomaly scores, than changes in less important structures. A brief overvie w of the anomaly detection algorithms for time-ev olving graphs is giv en in [Bilgin and Y ener, 2008]. Howe ver , the ab undance of time-ev olving graphs in the recent years has led to increasing interest in them, and subsequently new re- search has been carried out in this area. In the following subsections, we classify the dynamic graph anomaly detection algorithms based on the type of “graph summary” or “footprint” they use, and the type of events they detect: (i) feature-based (e.g. nodes, edges, edge weights), (ii) decomposition-based, (iii) community or clustering- based, and (iv) windo w-based. 3.2.1 F eatur e-based Events Main idea: The key idea behind the feature-based methods is that similar graphs prob- ably share certain properties, such as de gree distrib ution, diameter , eigen values[Kang et al., 2011b] [W atts, 1999]. The general approach in detecting anomalous timestamps in the ev olution of dynamic graphs can be summarized in the following steps: – Extract a “good summary” from each snapshot of the input graph. – Compare consecutiv e graphs using a distance –or equi v alently , similarity– func- tion. A nice surve y on similarity measures is given in [Cha, 2007]. – When the distance is greater than a manually or automatically defined threshold (or con versely , the similarity is smaller than a threshold), characterize the corre- sponding snapshot as anomalous. When it comes to comparing consecuti ve graphs, there is no definite answer about the graph features that one should compare among the various timestamps. The novelty of each proposed algorithm lies in the “graph summary” it constructs, the distance/similarity function it uses, as well as the way it defines and chooses the threshold to flag an instance as anomaly . The majority of feature-extraction-based algorithms deriv e just a similarity score between two input graphs, without doing at- tribution; in other w ords, the algorithms usually cannot detect the nodes or re gions of the graphs that changed most. Appr oaches: [Shoubridge et al., 2002] and [Bunke et al., 2006b] propose sev eral “graph footprints” and metrics for monitoring communication networks: • Maximum Common Subgraph (MCS) distance of the adjacency or the “2-hop” matrices (=square of adjacency matrix), • error correcting graph matching distance [Shoubridge et al., 1999], which refers to the number of edit operations needed to con vert a graph to another , and the costs of each operation may vary , • Graph Edit Distance (GED), which is a simplification of the previous distance, where only topological changes are allowed (i.e., no changes in edge weights), 24 Leman Akoglu et al. • Hamming distance for the adjacency matrices of the graphs, which essentially counts the number of different entries in the matrices, • variations of edge-weight distances, • λ -distance of the adjacency , the “2-hop”, or Laplacian matrices, which is de- fined as the dif ferences in the whole graph spectra, or the top- k eigenv alues of the respectiv e matrices. [Peabody, 2003] also proposes the λ -distance of the Normal- ized Laplacian matrices. At this point, it is worth mentioning that although we consider λ -distance a graph-feature-based anomaly detection technique, it can be also classified as decomposition-based technique, since the extraction of the eigen values of a matrix is done by its decomposition (SVD [Golub and V an Loan, 1996], PCA [Pearson, 1901], LSI [Deerwester et al., 1990], CUR [Drineas et al., 2006]). [Shoubridge et al., 2002] and [Bunke et al., 2006b] use the metrics for tracking sudden changes in communication networks for performance monitoring. The best approaches, in terms of change aw areness, are the GED and MCS, both of which are NP-complete, but the former approach can be simplified gi ven the application and it becomes linear on the number of nodes and edges in the graphs. In [Shoubridge et al., 2002], the graph symmetric difference and difference in the verte x neighborhood subgraphs are proposed for change attribution. The authors in [Bunke et al., 2006b] also go beyond the simple features, such as nodes, edges and weights, and introduce also more complex graph distance functions; the modality distance is defined as the Euclidean distance between the Perron v ectors of the input graphs. Moreover , the authors propose the median graph distance; the median graph w as first introduced by [Dickinson et al., 2002], and it is the graph that minimizes the sum of the edit distances to all the graphs in the sequence. T wo variations of GED with simple and non-linear cost functions for the allo wed operations, which also accommodate the weights of the input graphs is given in [Kapsabelis et al., 2007], and used for accurate monitoring of dynamic computer networks. More details about the graph edit distance can be found in the surve y [Gao et al., 2010b]. In Bunke et al. [2006a], the authors do not only compute the distances between consecutiv e graph instances, but all the pairwise distances (GED), and then apply an offline multidimensional scaling (MDS) procedure; each graph is represented by a point in the 2d-plane, and the distances between the points reflect their structural dis- tances. This way the authors provide a nice, graphical representation of the changes that occur in a time-ev olving graph; points that deviate from the mass of points cor- respond to anomalous timestamps or ev ents. [Gaston et al., 2006] detect abnormal changes in time-evolving communication graphs using the diameter distance – i.e., the dif ference in the graph diameter – which is defined as the greatest of the longest shortest paths for all vertices. One of the early works in this category was conducted by [Pincombe, 2005]. The main idea of this work is to extract a single feature from each graph instance, and then, by using an appropriate metric, compare this feature in consecutive time ticks. Next, the resulting time series of the feature distances is modeled as an auto-regressi ve moving a verage process (ARMA) [Box and Jenkins, 1990], and the residuals (devia- Graph-based Anomaly Detection and Description: A Survey 25 tions from the model) are ev aluated. The instances whose residuals exceed a thresh- old are considered anomalous. Briefly , ARMA is a model for describing time series by using two polynomials (the first for auto-regression, the second for moving aver - age); it is widely used for predicting values in time series. Among the 10 metrics that Pincombe used – weight, maximum common subgraph (MCS) weight/edge/vertex, graph/median edit, modality , diameter , entropy , spectral distance –, the MCS edge, MCS vertex, edit, median and entropy were able to detect the anomalies that were introduced in a time-evolving IP traffic dataset. Recently , another work that detects anomalies in time series ( not graph data ), was introduced by Zhu and Sastry [Zhu and Sastry, 2011]. Their approach uses a General Likelihood Ratio (GLR) test based on Kalman filter for estimating the parameters of Auto-regressi ve Inte grated Moving A verage (ARIMA). The main insight remains the same; the detection of anomalies is based on the residuals of the filter , but in this case the monitoring of the residuals is done with the GLR test. Since this work is not used on graph data, we do not elabo- rate more here; howev er , it appears to be a nice alternativ e for the approach used in [Pincombe, 2005]. Along the same lines, the authors in [Papadimitriou et al., 2008] introduce fiv e graph similarity functions for directed, time-ev olving web graphs: verte x/edge over - lap similarity , vertex ranking, verte x/edge vector similarity , sequence similarity , and signature similarity . Among these metrics, the one that performs best in terms of change detection in web graphs is the Signature Similarity (SS), which is based on the SimHash algorithm. This algorithm uses as features the nodes and edges of the input graphs, weighted appropriately by their PageRank. [Berlingerio et al., 2012] use a graph similarity approach for discontinuity detec- tion in daily instances of social networks. In a nutshell, N E T S I M I L E consists of three phases: (i) Feature Extraction. The focus is on local and egonet-based features (e.g., number of neighbors, clustering coef ficient, av erage of neighbors’ degrees); (ii) Fea- ture Aggregation. The node × features matrix of the first phase is con verted to a single “signature” vector that consists of the median, mean, standard deviation, skewness and kurtosis of each extracted feature over all the nodes in the graph; (iii) Compar- ison. The signature vectors are compared using the Canberra Distance, and a single similarity score is produced for consecutive timestamps of the graph sequence. The days that have low similarity score with the surrounding days are characterized as anomalous. Another recent work, [K outra et al., 2013b], proposes a complex graph-feature- based similarity approach, D E L TA C O N , for discontinuity detection, which enjoys sev- eral desired properties. The intuition behind the method is to compare the pairwise node affinities of consecutiv e snapshots of the graph sequence. These node affini- ties are computed in this work by a fast variant of Belief Propagation [K outra et al., 2011]. The matrices of pairwise node similarity matrices are then compared using the Matusita Distance (which is related to the Euclidean Distance), and the distance is finally transformed to similarity . A faster algorithm that avoids computing all the pairwise similarity scores is also proposed, and it is based on the idea of finding the similarity of all the graph nodes to non-o verlapping groups of nodes (instead of each node individually). Once the time series of the consecutive-graph similarities is ob- 26 Leman Akoglu et al. tained, Quality Control with Individual Moving Range [Montgomery, 1997] is used to spot the anomalous daily ENR ON-graph instances. In contrast to the most of the previous works that detect anomalous graph in- stances, the following algorithms spot anomalous nodes in a graph sequence. The ke y idea in [Akoglu and F aloutsos, 2008] is the follo wing: A node is anoma- lous at some time frame, if its “behavior” deviates fr om its past “normal behavior”. The authors build the “beha vior” of the nodes by extracting v arious egonet node fea- tures (e.g., weighted and unweighted in- and out-degree, number of neighbors, num- ber of triangles) from each snapshot of the graph sequence, and create a correlation matrix of node “behaviors” at each time window using Pearson’ s correlation coeffi- cient. For each correlation matrix (one per time window), the principal eigenv ector, which has one entry per node, is computed. By placing all the corresponding entries of the eigen vectors in a vector , the “eigen-behavior” vector of each node is obtained, and compared ag ainst its typical “eigen-behavior”, which is found by using averaging in the past time windows or SVD. The similarity between the “behaviors” is ev alu- ated using the Euclidean dot-product. F or lo w similarity between a node’ s “behavior” and its past “behaviors”, the corresponding time windo w is reported as anomalous. Last but not least, the work of [Rossi et al., 2012] builds on top of R O L X [Hen- derson et al., 2012] –an NMF and MDL-based role extraction algorithm– to develop an algorithm that recursiv ely extracts structural global and node features, and deter- mine the nodes’ roles (e.g., centers of stars, bridge nodes) over time. The authors use the method for understanding and tracking the network dynamics and ev olution, but propose comparing the obtained node feature vectors ov er time in order to de- tect anomalous patterns. Another similar approach, D B M M [Rossi et al., 2013], that builds on top of R O L X combines feature extraction, matrix decomposition, and a window-based analysis to model the node behavior in temporal graphs, predict future behaviors and spot anomalies. First, the NMF and MDL-based role extraction algo- rithm computes the node group memberships. Then, by taking into account k pre vious time steps, a role transition model per node is generated. The approach does not detect anomalous graph instances, but anomalous n odes per time step in decreasing order of anomalousness; the anomaly score of each node is defined as the difference between its estimated and true mixed membership. 3.2.2 Decomposition-based Events Main idea: The decomposition-based approaches detect temporal anomalies by re- sorting to matrix or tensor decomposition of the time-ev olving graphs, and interpret- ing appropriately selected eigen vectors, eigen values or singular v alues. The methods can be di vided in two categories based on the representation of the graphs: matrices vs. tensors. Appr oaches: W e will first discuss the matrix-oriented approaches. These include the λ -distance [Bunke et al., 2006b, Peabody, 2003, Shoubridge et al., 2002], and the algorithms proposed in [Akoglu and Faloutsos, 2008] and [Rossi et al., 2013] that were presented in Section 3.2.1. All of these approaches use graph features gener- ated by SVD, eigen v alue decomposition or NMF , and, thus, can be also classified as decomposition-based anomaly detection techniques. Graph-based Anomaly Detection and Description: A Survey 27 An additional work that handles each graph in the sequence separately by its ma- trix representation is [Id ´ e and Kashima, 2004] (also window-based approach), which aims at monitoring multi-tier W eb-based systems. Conceptually , the method first ex- tracts the principal eigen vector from the adjacency matrix of each graph; this is re- ferred to as activity v ector . Then, by applying SVD on the matrix that consists of the past activity vectors in a time window w , the typical activity vector is found, and the similarity between the current and typical activity vectors is computed as the cosine of the angle between them. The next step of the algorithm is to define the parameters of the von Mises-Fisher probability distribution Fisher et al. [1993] of the anomaly metric, and the threshold for characterizing a graph as anomalous or normal; the lat- ter is found using an online algorithm. It is worth mentioning that the activity vector per node enables attribution, i.e. detection of the individual nodes that contributed most to the change in a particular graph instance. Based on [Id ´ e and Kashima, 2004], the authors in [Ishibashi et al., 2010] detect uncommon traffic patterns in commu- nication graphs. The novelty of their approach lies in the way the adjacency matrix of the network is created: instead of encoding the connectivity/communication pat- terns between the hosts, the cells hold the similarity between them, a property that is computed based on the ov erlap between their destination hosts. SVD is not the only tool used by the decomposition-based detection algorithms. On the contrary , the last decade, sev eral improv ements on SVD hav e been proposed, including the CUR matrix approximation [Drineas et al., 2006], the Compact Ma- trix Decomposition (CMD) [Sun et al., 2008], and Colibri-S [T ong et al., 2008]. A pictorial comparison of the four methods is given in Fig. 3. Giv en a set of 2-d data points, SVD constructs an optimal subspace using all the data points (full circles); CUR samples data points allo wing for duplicates and linear redundancy (full circles), and approximates the original points based on them. CMD improv es on CUR by sam- pling without substitution, while Colibri-S also guarantees that no linear redundancy exists in the sampled data points. T able 3 provides a qualitati ve comparison of the four approaches. Although SVD is optimal in both norm-2 and Frobenius norm, it is inefficient time and space-wise. Moreover , the singular vectors do not have an intu- itiv e interpretation since they describe the data in a rotated space, and the SVD of a matrix cannot be readily updated for dynamic or streaming graphs. CUR and CMD are much more efficient than SVD, and highly interpretable. Finally , Colibri-S is ev en more ef ficient in time and space, inherits the previous methods’ interpretability , and additionally provides for ef ficient updates for dynamic graphs. CMD [Sun et al., 2008] has been applied for anomaly detection in dynamic graphs: the low-rank approximations of the sparse input graphs are used as their sum- maries. The reconstruction error of each graph from its summary is tracked over time, and, if it changes significantly at some time tick, the corresponding graph is deemed as anomalous. Now we mov e on to the second category of decomposition-based ev ent detection methods, which use tensors instead of matrices for the representation of the graphs. Streaming T ensor Analysis (ST A) [Sun et al., 2006] is applied for anomaly detection to a computer network described by a source-destination-port graph. The authors in- troduce the tensor data structure, instead of a simple matrix, because they describe the networks with more entities than just source and destination. Similarly to [Sun et al., 28 Leman Akoglu et al. Fig. 3 Illustration of qualitative differences between matrix decompositions used for anomaly detection in dynamic graphs. T able 3 Qualitativ e comparison of matrix decomposition methods: SVD, CUR, CMD, Colibri-S. SVD CUR/CMD Colibri-S Quality 3 3 3 Efficienc y 7 3 3 Interpretation 7 3 3 Dynamic Graphs 7 7 3 2008], the main idea behind the proposed algorithm is to decompose the stream of tensors into projection matrices (one for each mode of the tensor), and incrementally update the latter matrices over time. If the incremental update leads at some point to high reconstruction error , then the tensor of that time stamp is considered anomalous. More recently , three more tensor-based approaches were proposed by [K outra et al., 2012], [P apalexakis et al., 2012], and [Araujo et al., 2014]. The first work sim- ply uses the P ARAF A C tensor decomposition; the second de velops a fast, sampling- based, parallelizable decomposition algorithm for sparse tensors; the third, C O M 2 , relies on tensor decomposition (P ARAF AC) to obtain scores for time-e volving com- munities, and then applies MDL to find the “important” communities, and control their expansion (community size). In all three papers, for temporal anomaly detec- tion, the first two dimensions of the tensors hold the information of the adjacency matrix, additional dimensions are used for attributes or extra entities, and the last di- mension corresponds to the time. The detection of outlier groups of nodes at specific time stamps consists of observing different than ’usual’ behavioral patterns in the factors of the decomposition (e.g. sudden increase in the interactions between nodes, bursty or bot-lik e behavior). 3.2.3 Community- or Clustering-based Events Main idea: The main idea of the community or clustering-based approaches is, in- stead of monitoring the changes in the whole network, to monitor graph communities or clusters over time and report an event when there is structural or conte xtual change in any of them. Appr oaches: Being a building block for many applications, clustering, and the related, but not identical, problem of community detection, hav e been studied thoroughly in the data mining and theory communities: METIS [Karypis and Kumar, 1995], one of the first partitioning algorithms that were dev eloped, followed by its parallel Graph-based Anomaly Detection and Description: A Survey 29 implementation ParMETIS [Karypis and Kumar, 1996]; frequent subgraph mining [Kuramochi and Karypis, 2001]; spectral clustering [Ng et al., 2001, Shi and Malik, 1997]; ev olutionary clustering [Chakrabarti et al., 2006]; the Newman’ s algorithms for community detection in complex systems [Newman and Girvan, 2004],[New- man, 2004],[Ne wman, 2006]; co-clustering for concurrent clustering of the rows and the columns of the adjacency matrix of a graph [Chakrabarti, 2004, Dhillon et al., 2003], and its distributed variants [Papadimitriou and Sun, 2008]; dynamic community detection algorithms T antipathananandh and Berger-W olf [2009, 2011], T antipathananandh et al. [2007], and empirical comparison of methods for network community detection [Leskov ec et al., 2010]. G R A P H S C O P E [Sun et al., 2007a] is an MDL-based, parameter-free algorithm for discov ering node partitions in streaming, directed, bipartite graphs, and monitoring their ev olution over time in order to detect ev ents or changes. The partitions consist of “similar” nodes in the sense that splitting a partition leads to higher encoding cost of the adjacency matrix. The algorithm iterativ ely searches for the best source and destination partitions in each graph snapshot, until further partitioning does not lead to additional decrease of the encoding cost. Then, “similar” snapshots are merged into a segment and compressed together; on the other hand, “dissimilar” consecutiv e snapshots lead to the creation of a new segment, and declaration of a change-point. A closely related tensor and MDL-based approach is C O M 2 [Araujo et al., 2014], which tracks “important” communities o ver time, as described in Sec. 3.2.2. Another approach that also uses node partitioning in order to identify structural anomalies in streaming graphs is G O U T L I E R [Aggarwal et al., 2011], where the focus is on undi- rected, unipartite graphs. A reservoir sampling method is applied to create sev eral node partitions and dev elop a structural edge generation model per partition, which describes the likelihood fit of an edge. Each edge in the incoming graph is charac- terized by its composite likelihood fit, which is defined as its median likelihood fit across all node partitions. Then, the graph’ s outlier score is represented by the ge- ometric mean of all the composite edge likelihood fits, and the graph is reported as anomalous if its score is t standard de viations below the av erage outlier score of the graphs seen so far . A slightly dif ferent approach than the ones described abov e is the Bayesian anomaly detection method presented in [Heard et al., 2010]. The authors focus on de- tecting anomalous regions in social networks using a two-stage Bayesian approach. At the first step of the method, the anomalousness of each edge is computed by mod- eling the interactions between each pair of nodes as a counting process. Also, at ev ery graph instance, a p-value –based on the Bayesian learning of the count distributions– is calculated for ev ery existent edge and used in order to decide whether it is anoma- lous or not. The algorithm treats the graph sequence as a stream; it detects changes in the new graphs based on the history (sequential analysis), but also updates the history in light of the new instance (retrospective analysis). This step bears similarities with the methodology followed in [Aggarwal et al., 2011]. Howe ver , the second and last step of the approach in [Heard et al., 2010] is different; it essentially applies cluster- ing techniques on the small subgraph consisting of the anomalous nodes and edges of the first step, so that locally anomaly regions are disco vered. 30 Leman Akoglu et al. A probabilistic modeling approach to change-point detection proposed in [Peel and Clauset, 2014] uses the generalized hierarchical random graphs (GHRG) to model the community structure of real-world networks. The GHRC model decom- poses the nodes of the graph into a collection of nested groups, the relationships of which are represented by a dendogram. This representation captures the community structure at all scales. The change-points are identified by significant changes in the parameters of the fitted model through a generalized likelihood ratio test. Finally , [Gupta et al., 2012] introduce the no vel problem of detecting nodes which, ov er time, behav e dif ferently from the rest community members; those nodes are called evolutionary community outliers . The approach, E C O U T L I E R , consists of two parts: matching the time-ev olving communities (which are detected in each graph instance by applying state-of-the-art techniques), and detecting the e volution- ary community outliers. T o solve the problem, an optimization framework that ap- plies a coordinated descent algorithm is used to match the communities o ver time by appropriately weighting the contribution of the outlier nodes. It operates on pairs of consecutiv e timestamps of graphs, and returns a ranked list of community outliers. 3.2.4 W indow-based Events Main idea: The last category of time-ev olving graph anomaly detection algorithms encompasses methods that are bound to a time window in order to spot anomalous patterns and behaviors in the input graph sequence. Essentially , a number of previ- ous instances are used to model the “normal” behavior , and the incoming graph is compared against those in order to characterize it as normal or anomalous. Appr oaches: In [Priebe et al., 2005], the authors apply scan statistics (as well known as “moving window analysis”) to detect graph snapshots that have unusually high connectivity compared to the past. In general, scan statistics are used for detecting clusters of ev ents in time and space [Glaz, 2007, K ulldorff, 1997, Naus, 1982]. Essen- tially , a local statistic is computed for each time window , and the maximum statistic within each window is called scan statistic; if the scan statistic exceeds a threshold, the corresponding time frame is deemed outlier . In this work, the locality statistic used on the disjoint, weekly snapshots of the ENR ON who-emails-whom graph is the number of edges in the k -step neighborhood of each node, where k = 0 , 1 , 2. This work is followed by a similar , scan-statistics-based approach in [Neil, 2011], where model-based locality statistics are computed in paths and stars, instead of k - step neighborhoods. The method aims at spotting anomalies in computer networks, and the considered shapes are motiv ated by hacker attacks seen in real networks. More recently , [Mongiovi et al., 2013] tackled the problem of detecting contigu- ous regions in graphs that are anomalous ov er time by relating it to the NP-hard problem of finding the Heaviest Dynamic Subgraph (HDS). For each weighted graph in the input sequence, the anomalousness of each edge is computed as its statisti- cal p-value using the empirical distribution of the edge weights; lower p-value cor- responds to higher anomalousness. The proposed iterative algorithm, which solves approximately the HDS problem, alternates between the detection of the subgraph that maximizes the anomaly score for a gi ven interval (spatial), and the detection of time interval that maximizes the score for a given subgraph (temporal). The output Graph-based Anomaly Detection and Description: A Survey 31 of the method is the regions that are more anomalous than a user-defined threshold. An interesting connection is observ ed between this work and [Heard et al., 2010]; the approach in the latter paper can be used to compute the anomaly score of each edge, and then the algorithm in [Mongiovi et al., 2013] can be applied to detect regions that demonstrate anomalous behaviors. As mentioned in Sec. 3.2.2, the method described in [Id ´ e and Kashima, 2004] can also be considered window-based, as the current activity of each node is compared against its activity in the past w time ticks. Similarly , [Rossi et al., 2013] belongs to this category as well, since it models the role transitions of the nodes by taking into account the transitions from a number of previous time steps. In addition, the probabilistic graph model fitting approach by [Peel and Clauset, 2014] of Sec. 3.2.3 is also a window-based one, where the generalized likelihood ratio test is applied ov er a sliding window of fixed length w to detect if any changes have occurred with respect to the fitted model. 3.3 Discussion In the pre vious sections, we revie w the works in the literature that deal with the problem of graph anomaly detection ov er time. No matter which type of events are detected, the notion of graph or subgraph/community/cluster similarity usually comes into play at some step of the algorithms. Although the material that follows is not specifically designed for graph anomaly detection, it is closely related to it, as it giv es alternativ e ways of computing the similarity between graphs, or , equiv alently , their adjacency matrices. – Edit distance/graph isomorphism. One approach to graph comparison when the correspondence between the nodes in not known is graph isomorphism. The underlying idea is that two graphs are similar if they are isomorphic [Pelillo, 1999], or one is isomorphic to a subgraph of the other [Ullmann, 1976] [Char- trand et al., 1998], or they have isomorphic subgraphs. The drawback of this ap- proach is that the exact versions of the algorithms are exponential and, thus, not readily applicable to the continuously increasing in size and volume graphs. The graph edit distance [Bunke, 1999], which has been mentioned in Sec. 3.2.1, is a generalization of the graph isomorphism problem. – Iterative methods. The assumption behind the iterative methods is that “two nodes are similar if their neighborhoods are also similar”. In each iteration, the nodes exchange similarity scores and this process ends when conv ergence is achiev ed. Se veral successful algorithms belong to this category: the similarity flooding algorithm [Melnik et al., 2002] applies in database schema matching; SimRank [Jeh and Widom, 2002] measures the self-similarity of a graph, ie. it assesses the similarities between all pairs of nodes in one graph; the algorithm proposed by [Zager and V erghese, 2008] introduces the idea of coupling the sim- ilarity scores of nodes and edges in order to compute the similarity between two graphs when the node correspondence is unknown. [Bayati et al., 2013] de velop two approximate sparse graph matching algorithms using message passing algo- rithms, and specifically Belief Propagation. Finally , [Koutra et al., 2013a] design 32 Leman Akoglu et al. an alternating projected gradient descent algorithm for ef ficiently aligning big bipartite graphs by exploiting the structural properties of the input graphs. – Featur e Extraction. A number of graph similarity functions, which hav e been used for graph clustering, classification and applications other than change-point detection, hav e been proposed in the literature. The research directions in this cat- egory include: algebraic connecti vity [Fiedler, 1973] [W ilson and Zhu, 2008]), a spectral method that has been studied thoroughly; an SVM-based approach on global feature vectors [Li et al., 2011a]; social networks similarity [Macindoe and Richards, 2010] which is based on graph features that are of value from the social viewpoint; computing edge curvatures under heat kernel embedding [Elghawalby and Hancock, 2008]; comparison of the number of spanning trees [Kelmans, 1976]; fast random w alk graph kernel for unlabeled [Kang et al., 2012] or labeled graphs [Kashima et al., 2003]; graph kernels [V ishwanathan et al., 2010], which are used for computing the similarity between graphs (not nodes). W e should note that graph kernels cannot do attribution –i.e., detect the nodes that contribute most to a change in the graph sequence. As in Section 2, we close this section by comparing the dynamic-graph anomaly detection algorithms qualitativ ely , as well as quantitativ ely in T able 4. Choosing one of the algorithms presented in the pre vious sections for an anomaly detection application is not an easy task nor is there a unique appropriate algorithm; among the things that one should consider when choosing an algorithm are: the type of application (e.g., traffic, communication, computer network), the type of data at hand (e.g., weighted, unweighted, attributed), whether the correspondence between the nodes in consecutiv e graph snapshots is known or not, the time and parameter re- quirements, as well as the tar get of the application (detection of anomalous graph in- stance, subgraph, or node). T able 4 can help refine the algorithms that can be applied in each case. The reader should bear in mind that, in many cases, applying multiple change-point detection techniques is meaningful, as it contrib utes to the disco very of different types of anomalies. Graph-based Anomaly Detection and Description: A Survey 33 T able 4 Qualitative and quantitative comparison of anomaly detection algorithms for dynamic graphs. The first four columns refer to the type of input graphs (with or w/o weights on the edges, with or w/o attributes (or labels) for the nodes); “Linear” holds true for those methods that hav e time complexity linear in the size of the input graphs (and false otherwise); “Parameter -free” methods correspond to those that do not expect any user-specified input parameters (+: parameter can be set, but is not required); “Output format” corresponds to the output type/format of the method (e.g. anomaly scores and their ranges); “Node corresp. ” is true if the algorithm assumes that the correspondence between the nodes of the graph sequence is kno wn; “ Attribution” holds true if the algorithm spots nodes/edges/re gions of graph that are anomalous (and f alse if it detects anomalous graph instances); and “V isualization” refers to the graphical means used –if any– to present the anomalous instances to the user (e.g., distribution plots, graph with the anomalous nodes/edges annotated). Algorithm Unweighted W eighted Plain Attributed Linear Parameter-fr ee Output Format Node corresp. Attribution Visualization (plot ov er time) MCS [Bunke et al., 2006b, Shoubridge et al., 2002] 3 3 3 7 7 3 [ 0 , 1 ] 7 7 consec. graph dist. scores HD [Bunke et al., 2006b, Shoubridge et al., 2002] 3 3 3 7 3 3 [ 0 , 1 ] 7 7 consec. graph dist. scores ECGM [Bunke et al., 2006b, Shoubridge et al., 1999, 2002] 3 3 3 7 7 3 [ 0 , ? ) 7 7 consec. graph dist. scores GED [Bunke et al., 2006b, Shoubridge et al., 2002] 3 7 3 7 3 3 [ 0 , # nod es + # ed ges ] 7 3 sp y plot of graph difference λ -distance [Bunke et al., 2006b, Shoubridge et al., 2002] 3 3 3 7 7 3 + [ 0 , ∞ ) 3 7 consec. graph dist. scores GED w [Kapsabelis et al., 2007] 3 3 3 7 3 3 [ 0 , ∞ ) 3 7 consec. ged scores Diameter Distance [Gaston et al., 2006] 3 3 3 7 7 3 [ 0 , ∞ ) 3 7 consec. graph diameter distance MDS [Bunke et al., 2006a] 3 7 3 7 7 7 pairwise dist. 3 7 MDS + consec. ged scores VEO [Papadimitriou et al., 2008] 3 3 3 7 3 3 + [ 0 , 1 ] 3 7 consec. graph sim. scores V ertex Ranking [Papadimitriou et al., 2008] 3 3 3 7 3 3 + [ 0 , 1 ] 3 7 consec. graph sim. scores V ertex/Edge V ector Sim. [Papadimitriou et al., 2008] 3 3 3 7 3 3 + [ 0 , 1 ] 3 7 consec. graph sim. scores Sequence Sim. [Papadimitriou et al., 2008] 3 3 3 7 3 3 + [ 0 , 1 ] 3 7 consec. graph sim. scores Signature Sim. [Papadimitriou et al., 2008] 3 3 3 7 3 3 + [ 0 , 1 ] 3 7 consec. graph sim. scores [Akoglu and Faloutsos, 2008] 7 3 3 7 7 3 Z-scores 3 3 node Z-scores N ET S I MI L E [Berlingerio et al., 2012] 3 7 3 7 3 3 [ 0 , 1 ] 7 7 consec. graph sim.scores D ELTA C ON [Koutra et al., 2013b] 3 3 3 7 3 3 + [ 0 , 1 ] 3 7 consec. graph sim. scores R O L E - D Y NA MI C S [Rossi et al., 2012] 3 3 3 3 3 3 role memberships 3 3 role memberships D BM M [Rossi et al., 2013] 3 3 3 7 3 3 role memberships 3 3 role memberships E IG E N - S PAC E BA S E D [Id ´ e and Kashima, 2004] 3 3 3 7 7 7 dissim. score [ 0 , 1 ] 3 3 sim. scores & activ . vector change S T A [Sun et al., 2006] 3 3 3 3 7 3 + reconstruction err . 3 3 reconstruction error C MD [Sun et al., 2008] 3 3 3 7 7 3 + reconstruction err . (SSE) 3 3 reconstruction error P AR C UB E [Papalexakis et al., 2012] 3 3 3 3 7 3 factors 3 3 factors ov er time G RA P H S C O PE [Sun et al., 2007a] 3 7 3 7 7 3 reordered mat. spy plot 3 3 encoding cost over time C OM 2 [Araujo et al., 2014] 3 3 3 7 3 3 tensor decomp. 3 3 tensor decomp. o ver time G O U T L IE R [Aggarwal et al., 2011] 3 3 3 7 3 3 likelihood [0,1] 3 3 likelihood ov er time B AYE S I AN A P PR OAC H [Heard et al., 2010] 3 3 3 3 7 7 p-values 3 3 predictiv e p-value E CO U T LI E R [Gupta et al., 2012] 3 3 3 7 3 7 community memberships 3 3 community memberships S CA N S T AT IS T IC S [Priebe et al., 2005] 3 7 3 7 3 3 scan statistics 3 3 scan stat. & vertex scores S CA N S T AT IS T IC S [Neil, 2011] 3 3 3 7 3 3 scores of regions 3 3 scan stat. N ET S P OT [Mongiovi et al., 2013] 7 3 3 7 3 7 scores of regions 3 3 scores of regions 34 Leman Akoglu et al. Concluding Remarks: Static & Dynamic Graph Anomaly Detection Evaluation. T o finalize Sections 2 and 3, we discuss ev aluation methodologies of the graph-based anomaly detection approaches that have been emplo yed in the literature thus far . Recall that ground truth data is often ine xistent in the anomaly detection sce- narios, thus, various methods in the literature hav e been e valuated in se veral different ways which we describe next. – Internal evaluation. This kind of ev aluation mechanism uses the anomalousness scores of objects assigned by a given method to statistically quantify their e xtrem- ity , e.g. by computing their p -v alues under the empirical distribution of scores of all objects. This ev aluation is internal, since the scores are dependent on the specific method and can be as diverse as likelihoods, compression costs, dis- tances, etc., and may not necessarily directly tied with the external purpose of the anomaly detection. – Qualitative evaluation. Unlike the pre vious approach which is quantitati ve, qual- itativ e ev aluation employs informal procedures. One approach is to try to explain away the detected anomalies through a story related to a real-world scenario. An- other approach is to incorporate domain knowledge to exploit and make sense of the detected anomalies. This latter methodology is often used in medicine, where the anomalies may help in knowledge disco very and help with diagnosis. – Synthetic graph generation. A mechanism that is well resorted to is synthetic data generation. In graph-based anomaly detection, sev eral methods create real- istic graphs using graph generators such as preferential attachment Barab ´ asi and Albert [1999], Forest Fire Leskov ec et al. [2005], random-typing graphs Akoglu and Faloutsos [2009] (power -law graphs), and the W axman (Internet AS topology graphs) Medina et al. [2001] models. Often, the kind of anomalies are directly in- jected to the synthetic graphs. Sometimes the graph structure can also be modified by randomly rewiring edges or swapping node attributes. The methods are then ev aluated by their precision and recall in recovering the created anomalies. Syn- thetic graphs also help with ev aluating the behavior of the proposed methods, such as their accuracy and scalability with changing graph characteristics, such as size and degree-distrib ution. – Anomaly injection. The injection of synthetic anomalies has been discussed abov e. This is similar , only this time the anomalies are injected into the real-world graphs. One challenge in this version of anomaly injection is that the ev aluation based on precision and recall becomes tricky , as it would be sev ere to call the anomalies detected other than the injected ones as false positives, given that the original graph may also contain same type of anomalies. – V alidation by external sour ce. Another e valuation approach relies on multiple in- formation sources that are consistent with each other in identifying the anomalies. In such a setting, one or more sources are used for the actual anomaly detection task. The detected anomalies are then tried to be validated or justified based on the rest of the unused information sources. For example, one may only use the graph structure to detect opinion spam and find out fake revie wers, and then use their temporal behavioral information, such as number of revie ws written in a day , to see if the detected revie wers also exhibit suspicious beha vior . Graph-based Anomaly Detection and Description: A Survey 35 Summary . Finally to summarize, we create T able 5 including the various methods discussed this far under dif ferent categorization schemes such as static and dynamic, and plain and attributed graphs. Interestingly , we were unable to find any examples of methods that aims to find anomalies in dynamically changing attributed graphs. W e foresee that this would require novel definitions of anomalies in such a setting as well as necessitate the identification of real world scenarios in which such definitions come alive. Moreover , we notice that methods on static graphs strictly either deal with plain or attributed versions of graphs. It would be interesting to build methods that can work with both; which apply to plain graphs b ut also can use side (attribute) information if av ailable. W e classify those areas of research as open problems in our categorization, and point them out as possible a venues for future exploration. T able 5 Categorization of graph-based techniques discussed in Section 2 and 3. Plain Attributed Static [Section 2.1] [Section 2.2] A U TO PART [Chakrabarti, 2004] S UB D U E [Noble and Cook, 2003] [Sun et al., 2005] [Liu et al., 2005] S CA N [Xu et al., 2007] S U B DU E [Eberle and Holder, 2007] O DD B A L L [Akoglu et al., 2010] C OD A [Gao et al., 2010a] G S K E L ET ON C L U [Sun et al., 2010] [Da vis et al., 2011] N R M F [T ong and Lin, 2011] G O UT R A NK [M ¨ uller et al., 2013] [Ding et al., 2012] N ET R A Y [Kang et al., 2014] Open Dynamic [Section 3.2] [Section 3.2] [Shoubridge et al., 1999] S T A [Sun et al., 2006] [Shoubridge et al., 2002] B A Y E SI A N A P P . [Heard et al., 2010] [Dickinson et al., 2002] R O LE - D YN AM I C S [Rossi et al., 2012] E IG E N SPAC E - B AS E D [Id ´ e and Kashima, 2004] T EN S O R S P LAT [Koutra et al., 2012]* S CA N S T AT IS T IC S [Pincombe, 2005] P AR C U BE [Papalexakis et al., 2012]* S CA N S T AT . [Priebe et al., 2005] C OM 2 [Araujo et al., 2014]* [Bunke et al., 2006b] M DS Bunke et al. [2006a] [Gaston et al., 2006] G ED W [Kapsabelis et al., 2007] G RA P H S C O PE [Sun et al., 2007a] Many Open [Papadimitriou et al., 2008] Challenges [Akoglu and Faloutsos, 2008] C MD [Sun et al., 2008] [Ishibashi et al., 2010] G O U T L IE R [Aggarwal et al., 2011] E CO U T LI E R [Gupta et al., 2012] N ET S I MI L E [Berlingerio et al., 2012] D ELTA C ON [Koutra et al., 2013b] N ET S P OT [Mongiovi et al., 2013] D BM M [Rossi et al., 2013] *: not applied in attributed graphs, b ut it is possible to admit labels/attributes. 36 Leman Akoglu et al. 4 Graph-based Anomaly Description: Interpretation and Sense-making Like many other real applications, the ground-truth for graph anomaly either does not exist or is v ery dif ficult or costly to obtain. Consequently , the end analysts often have to spend much post-processing time to validate the detection results. For example, according to a recent D ARP A B AA 5 , it is estimated that an intelligence agent can only perform 60 initial revie ws on av erage for the so-called insider threat detection. This, coped with the facts that many graph anomaly detection algorithms (including insider threat detection) still have a high false positive rate, makes it extremely challenging and time consuming to identify at least one true positi ve in such applications. On the other hand, it is usually much more persuasive for an ordinary user if the detection algorithm can tell not only which instance is abnormal, b ut also why it looks so different from the majority , normal examples. T o address these issues, graph anomaly attribution has been attracting more and more research attention in the recent years. In this section, we will re view two main types of techniques. The first group aims to make the detection of each in- dividual instance more ‘interpretable’, which is usually done by encoding the so- called interpretation-friendly properties into the traditional graph anomaly detection algorithms. For this category , we will mainly use matrix-factorization based graph anomaly approach as an example. The second group tries to answer the following question. Giv en a set of initial suspects (e.g., the top ranked instances from a graph anomaly detection algorithm), ho w can we find and characterize the internal relation- ship among them so that we can better understand the root cause of such anomalies? For this cate gory , we will introduce interactiv e graph querying and sense making. Definition 5 (Graph-based Anomaly Description Problem) Given a set of anomalies of graph entities (nodes and edges) Interpr et and explain the detection of the individual anomalies, Find and characterize the associations among the anomalies. 4.1 Interpretation-friendly Graph Anomaly Detection Main idea: Here, we consider the first problem of how to mak e the detection of each individual instance (e.g., nodes, edges) more interpretable. The main idea is to encode the so-called interpretation friendly property into the traditional graph anomaly detec- tion algorithms. W e will present the matrix based graph anomaly detection methods. Appr oaches: Suppose we have a bipartite graph (e.g., author-conference graph), and we can represent it by its adjacency matrix A with the rows being authors, columns being conferences and non-zero elements meaning the corresponding authors who hav e published papers in the corresponding conferences. In the matrix-based graph anomaly approaches, we start with factorizing the adjacency matrix as A = XY 0 + R . In this f actorization, the two lo w-rank matrices X and Y usually capture the ‘normal- ity’ of the graphs (e.g., clusters, communities, etc); while the residual R measures the deviation from such ‘normality’, and thus is often a good indicator of ‘anomaly’. 5 https://www.fbo.gov/utils/view?id=2f6289e99a0c04942bbd89ccf242fb4c Graph-based Anomaly Detection and Description: A Survey 37 The dif ferent matrix-based graph anomaly detection approaches differ in the w ay they get these matrices. SVD/PCA is one of the most popular choices, where the columns of X and Y are the singular vectors (up to a scalar by the singular v alues) of the original matrix A . While it is mathematically optimal in the sense that it min- imizes the reconstruction error in both the L2 and the Frobenius norm, it is not nec- essarily good for interpretation. W e give two e xamples below to make such matrices less abstract and therefore more interpretable/consumable to the end analysts. First, note that the singular vectors are usually the linear combination of all the columns/rows of the original adjacency matrix, which are not always easy for inter- pretation. More recently , the so-called example-based low-rank appr oximations hav e started to appear , such as CX/CUR [Drineas et al., 2006], CMD [Sun et al., 2007b] and Colibri [T ong et al., 2008]. All of these methods use the actual columns and rows of the adjacency matrix A to form X and Y . The benefit is that they provide an intuitiv e as well as sparse representation, since X and Y are directly sampled from the original adjacency matrix. The cost of such kind of decomposition is that the ap- proximation is often sub-optimal compared to SVD. W e refer the readers to Section 3.2.2 for the detailed description of these methods. Also see Fig. 3 for a pictorial comparison. Another interpretation-friendly property that has been recognized widely in the recent years is non-ne gativity since negati ve values are usually hard to interpret. Non- negati ve matrix factorization (NMF) methods [Lee and Seung, 2000] which restrict the entries in X and Y to be non-negati ve hav e attracted a lot of research attention. By imposing such non-ne gati vity constrains on the factorized matrices , NMF pro vides a more interpretable way for data mining tasks, e.g., clustering, community detection, etc. Note that although the NMF has been studied largely in the context of such applications (e.g., clustering), we would expect that it is also beneficial for graph anomaly detection, since it helps improv e the interpretation of graph normality . In the context of graph anomalies, it is often the case that anomalies on graphs cor- respond to some actual behaviors/activities of certain nodes. For instance, we might flag an IP source as a suspicious port-scanner if it sends packages to a lot of desti- nations in an IP traf fic network [Sun et al., 2007b]; an IP address might be under the DDoS (distributed denial-of-service) attack if it r eceives packag es from many dif fer- ent sources [Sun et al., 2007b]; a person is flagged as ‘extremely multi-disciplinary’ if s/he publishes papers in many remotely related fields in an author -conference net- work [Akoglu et al., 2010]; in certain collusion-type of fraud in financial transaction network, a group of users always give good ratings to another group of users in or- der to artificially boost the reputation of the target group [Chau et al., 2006], etc. If we map such behaviors/activities (e.g., ‘sends/receiv es packages’, ‘publishes papers’, ‘giv es good ratings’, etc) to the language of matrix factorization, it also suggests that the corresponding entries in the residual matrix R should be non-ne gativ e. In order to capture such activities, non-negati ve residual matrix factorization (NrMF) has been proposed in [T ong and Lin, 2011, 2012], which explicitly requires those elements in the residual matrix R to be non-negati ve if they correspond to actual links in the original graphs. This in turn highly adds to the ease of interpretation. Fig. 4 presents a visual comparison between NrMF and the standard SVD on four typical graph anomalies [T ong and Lin, 2011, 2012]. 38 Leman Akoglu et al. Fig. 4 V isual Comparison between NrMF and SVD. For each type of graph anomalies, the first column is the adjacency matrix of the original graph, the second and third columns are the residual matrices by NrMF and that by SVD, respectiv ely . Fig. 5 The main idea of interactive graph querying. The left, giv en a set of detected abnormal nodes (red circles) from a given large graph (black). Right, the desired output which shows a concise summarization of these abnormal nodes (e.g., ho w they are further grouped into a few clusters, how the abnormal nodes within each group are linked to each other , etc). For feature-based graph anomaly detection, visualization in the (sub)space of the feature is also a very natural and po werful tool to improv e the interpretation of graph anomalies [Akoglu et al., 2010, Kang et al., 2014]. By making the abnormal graph nodes ‘standing out’ in these low-dimensional plots, the end-user could ha ve an intu- itiv e understanding on which graph feature(s) makes them different from normal. 4.2 Finding the root cause of anomalies: Interactive Graph Querying Main idea: Next, we consider the second problem of finding and characterizing the internal r elationships among the anomalies so that we can better understand the root cause of such anomalies. W e will introduce interactive graph querying. The main idea is to find a concise context where detected graph anomalies are linked to each other (See Fig. 5 for an illustration). Note that while extremely useful in graph anomaly detection, these techniques themselves ha ve a much broad applicability . Appr oaches: Connection subgraphs is one of the earliest w orks along this line, which is defined as a small subgraph of a large graph that best captures the relationship Graph-based Anomaly Detection and Description: A Survey 39 from a source node to a target node [Faloutsos et al., 2004]. The original method in [Faloutsos et al., 2004] is based on the so-called delivered current. By interpreting the graph as an electric network, applying + 1 voltage to one query node and setting the other query node 0 voltage, it aims to choose the subgraph which deli vers maxi- mum current between the query nodes. [Koren et al., 2006] propose using cycle-free effecti ve conductance based method for this problem by only considering the top- k simple (i.e., cycle-free) paths from the source to the target. [Ramakrishnan et al., 2005] further apply the deliv ered current based method to multi-relational graphs. Note that in all these works, they deal with pairwise source-target queries. Center- Piece Subgr aphs ( C E P S) [T ong and Faloutsos, 2006] generalizes this by considering the following settings: Giv en Q query nodes in a social network (e.g., a set of top- ranked authors in a co-authorship network), find the node(s) and the resulting sub- graph, that have strong connections to all or most of the Q query nodes. This pro- vides an intuiti ve tool to identify the potential root cause of graph anomaly detection results. For example, in the context of la w enforcement, giv en a set of initial suspects, we may want to find other persons who ha ve strong connections to all or most of the existing suspects, who might be the master criminal mind. The discovered path(s) in the resulting subgraph also provides an intuitiv e explanation on how/why the master mind connects to the individual suspects. All the abo ve works we hav e introduced in this subsection so far , assume, explic- itly or implicitly , some specific connectivity structure among the query nodes. CePS provides certain degree of flexibility by allowing the so-called k -SoftAnd, where we only require the center-piece nodes to have strong connections to k -out-of- Q query nodes. But the end users still need to specify such a parameter k which is not nec- essarily an easy task for applications like graph anomaly detection. T o address this issue, D O T 2 D OT [Akoglu et al., 2013b, Chau et al., 2012] proposes to find ‘right connections’, that is, giv en a set of query nodes (e.g., the top- k ranked nodes in graph anomaly detection), it groups them into one or more groups and within each group, it finds the simple connections to characterize the relationship within that group. This problem itself is NP-Hard, and the authors propose efficient parameter-free algo- rithms to find approximate solutions. In the example of top-k ranking list from some graph anomaly detection algorithm, D OT 2 D O T not only automatically groups the de- tected anomalies one or more groups and each group could correspond to a specific type of anomalies; but also pro vides some explanations why they belong to the same group and what is the possible root cause for that group of anomalies. Moreover , in the case there is a false positive node in the top- k ranking list (e.g., a node which is far away from all the other , true positive, nodes in the top-k ranking list) by automatically treating it as a group by itself. 40 Leman Akoglu et al. 5 Graph-based Anomaly Detection in Real-world Applications … … 12 13 14 2 2 23 Next we shift our focus to real-world fraud and spam sce- narios. Sev eral dif ferent techniques hav e been developed for fraud and spam detection in many real world scenar- ios including frequent pattern mining [Jindal et al., 2010], behavioral monitoring [Fa wcett and Provost, 1999], super- vised learning [Phua et al., 2004], and so on. In this section, we will motiv ate and focus on graph-based detection techniques for real-world ap- plications and particularly highlight their advantages. Ho wev er , the purpose of our surve y is not to suggest the superiority of graph-based techniques over other detec- tion methodologies. Rather, we introduce the av ailable tools focusing on those that exploit graphs. It would be up to the application dev elopers to carefully choose what tools suit their needs best as dif ferent approaches may achie ve different performances depending on the application. For a general survey on various fraud detection tech- niques, we refer the reader to [Bolton and Hand, 2002] and [Phua et al., 2010]. W e highlight two main advantages of graph-based fraud detection techniques as we discussed in Section 1; relational nature of the problem domain and adversar- ial robustness. The former intuiti vely refers to the fact that fraud often occurs in two different ways, (i) by word of mouth where the acquaintances of a fraudster can be considered as more likely to also commit fraud, and (ii) by collaboration where closely related parties come together to commit fraud. In both scenarios, the relational “closeness” can be exploited with graph-based detection techniques. The latter , ro- bustness to adversaries, relates to the dif ficulty imposed on the attacker to break the detection method. One can think that the graph-based representation of the domain in which fraud is committed is fully av ailable only to the system administrators. In other w ords, it is often the case that the fraudsters only have a limited vie w of the op- erational graph in which they act. Therefore, it becomes harder for them to carefully cov er their traces so as to “fit in” the global behavioral patterns of this graph. In this part of the survey , we cover a wide range of applications including telecom fraud [Cortes et al., 2002], auction fraud [Pandit et al., 2007], accounting fraud [Mc- Glohon et al., 2009], securities fraud [Ne ville et al., 2005], opinion spam [Dai et al., 2012, W ang et al., 2011a], trading fraud [Li et al., 2010], network intrusion [Ding et al., 2012, Id ´ e and Kashima, 2004], and W eb spam and malware detection [Bec- chetti et al., 2006, Bencz ´ ur et al., 2005, Castillo et al., 2007, Gy ¨ ongyi et al., 2004, Kang et al., 2011a, Krishnan and Raj, 2006, W u et al., 2006]. 5.1 Anomalies in telecommunication networks While there are many types of telecommunications fraud, one of the most prev alent is known as the subscription fraud. In this type of fraud, the fraudster often acquires an account using false identity with the intention of using the service for free and not making any payments. One of the earliest studies that prov es the graph-based methods effecti ve in telecommunications fraud detection is done by [Cortes et al., 2002], who mainly use Graph-based Anomaly Detection and Description: A Survey 41 linkage analysis together with temporal and calling volume information. In particu- lar , they build and maintain subgraphs around each phone account which they name as the “communities of interest” (COI) of the account. The COI mainly contains the other phone accounts that are most related to the gi ven account in terms of dynami- cally weighted measures that consider the call quantity and durations between these parties ov er time. Using these informativ e subgraphs updated daily , two discrimina- tiv e properties are observed. Firstly , fraudulent phone accounts are found to be linked; fraudsters either directly call each other or they call the same phone numbers which puts them in close proximity in the COIs. A second observation shows that it is pos- sible to spot new fraudulent accounts by the similarity of their COIs to previously flagged fraudulent COIs—this is due to detected and disconnected fraudsters by the phone operator creating new accounts and exhibiting similar calling habits, which are effecti vely captured by their COIs. These graph-based linking methods provide po werful machinery on top of pre vi- ously used signature-based methods [Cortes and Pregibon, 2001, Cortes et al., 2000], where few simple measures such as extensi ve late night activity and long call dura- tions hav e been taken as indicators for fraudulent behavior . 5.2 Anomalies in auction networks Auction sites such as eBay , uBid, bidz, and Y ahoo! Shopping are attractive targets for auction fraud, which constituted about 25% of the complaints to Federal Internet Crime Complaint Center (IC3) in the U.S. in 2008 [FBI, 2009]. The majority of online auction fraud occurs as non-deli very fraud ( ∼ 33%), where the seller fails to deliv er/ship the purchased goods to the buyer . [Chau et al., 2006] dev eloped one of the very first graph-based methods to spot fraudsters committing auction fraud and sho wed the effecti veness of their method on a large crawl of eBay data. The motiv ation to use graph-based methods in that domain is the insufficient solutions based on the individual’ s features, such as age, geo-location, login times, session history , etc. which are “easy” to fake. As we dis- cussed earlier in this section, as well as in Section 1.1, the intuition is that as the fraudsters hav e only a local view of the auction graph, it is “harder” for them to alter their behavior and still be able to “fit in” this graph at large without knowing all the patterns of interactions. The analysis of the fraudsters’ beha vior re veals that in order to game the feedback and reputation system, fraudster create additional accounts or “roles” called accom- plices. Thus, fraudsters exhibit two roles: – accomplice : trades with honest users, looks legitimate – fraudster : trades with accomplices to “sell ” (cheap) items and recei ve good feed- back to boost reputation, and occasionally commits fraud with honest users when reputation is high enough to con vince them Accomplices and fraudster do not necessarily interact among each other . More- ov er, honest users trade among themselves as well as with accomplices that also look like honest users. As such, there is quite a bit heterophily among the labels of neigh- boring nodes: with fraudsters mostly linked to accomplices and occasionally to honest 42 Leman Akoglu et al. users, accomplices linked to both fraudsters and honest users serving as middle-men, and honest users mostly linked to other honest users and accomplices. Using the insights of these interaction characteristics, [Pandit et al., 2007] dev el- oped a relational classification model based on RMNs that can capture these complex correlations (in particular heterophily) among the node labels (honest, accomplice, fraudster), and used LBP for inference. 5.3 Anomalies in accounting networks Accounting fraud inv olves the task of spotting high-risk accounts with suspicious transactions beha vior . Many e xisting techniques for detection rely on (noisy) domain knowledge and rule-based signals, for example, based on large number of returns, many late postings, round-dollar entries, etc. Based on the insight that closely related accounts by their transaction relations would be more likely to ha ve the same labels (risky vs. non-risky), [McGlohon et al., 2009] use relational classification to detect accounting fraud. Here, unlike the heterophily observation in [Chau et al., 2006], the homophily (auto-correlation) of neighboring class labels is assumed. Similarly , a RMN representation is developed and LBP is used for inference. One of the representational powers of global joint models lik e RMNs, in addition to their ability to capture complex correlations, is the fact that the y can integrate prior knowledge if av ailable. In this particular application, the prior knowledge (probabil- ity) of accounts being risky translates to prior belief potentials in the RMN repre- sentation. In fact, [McGlohon et al., 2009] use the previously used (noisy) domain knowledge based on rule-based flags to estimate the prior beliefs. These beliefs are then propagated in the network where some of them are corroborated and some may be discarded. Their results showed that through this type of graph-based validation, the detection (true positive) rate improved significantly ov er the rule-based methods for the same (small) false positi ve rate. 5.4 Anomalies in security networks Relational learning has also been used in securities fraud detection where the task is to spot securities brokers that are likely to commit fraud and other violations of secu- rities regulations in the future. While pre vious methods used handcrafted rules based on information intrinsic to the brok ers such as the number and type of past violations, [Neville et al., 2005] exploited relational information such as social, professional, and organizational relationships (e.g. past co-worker) among the brokers. In fact, this is one of the applications where the likelihood of committing fraud is highly dependent on social phenomena: communicated and encouraged by word-of-mouth by people who wish to commit fraud that relational methods are excellent at spotting. In particular , [Neville et al., 2005] use a subgraph representation for each of the securities brokers. Each subgraph includes, in addition to the target broker , various types of other objects (e.g., firms, disclosures), as well as links that represent relation- ships between these objects (e.g., employment links between a broker and a branch, Graph-based Anomaly Detection and Description: A Survey 43 filing links of disclosures on the broker), and attributes on these objects and links. They then learn relational probability trees [Neville et al., 2003] which exploits (ag- gregated) relational features of those subgraphs to model the distrib ution of the class labels, showing that the learned models rank brokers in a manner consistent with the subjectiv e ratings of experienced examiners, and better than handcrafted rules. 5.5 Anomalies in opinion networks: deception and fake re views Revie w sites such as Y elp, TripAdvisor , Amazon, etc., are attractive tar gets for opin- ion spam. Opinion spam exhibits itself as hype or defame spam, where (often paid) fraud re viewers write fake revie ws to untruthfully boost or damage a vendor’ s repu- tation, respectiv ely and cause unjust perception of the services by future customers. This problem has been approached by three different methodologies, based on (i) behavioral analysis [Feng et al., 2012b, Jindal and Liu, 2008, Jindal et al., 2010, Xie et al., 2012], (ii) language stylometry analysis to spot deception [Feng et al., 2012a, Ott et al., 2011], and (iii) relational analysis and network effects to exploit connec- tions among fraudulent re viewers [Akoglu et al., 2013a, W ang et al., 2011a, 2012a]. More specifically , with respect to (i) and (ii), [Jindal and Liu, 2008, Jindal et al., 2010] extract behavioral features such as revie w length, posting times, time order of revie ws (whether first posted re view or not), etc. in addition to rule-based mining to spot suspicious revie wers. [Feng et al., 2012b] study the distrib utional patterns in rating behaviors, while [Xie et al., 2012] focus on temporal revie wing behaviors to detect fake re view(er)s. As for language-based detection, [Ott et al., 2011] unearth the excessi ve usage of superlativ es, self-referencing, rate of misspell, and agreement words in fak e revie ws as important clues. W ith respect to graph-based detection (iii), [W ang et al., 2011a] dev eloped a prop- agation algorithm to capture the relationships between revie wers, revie ws, and stores (or products, services). The method defines a trustiness score for each revie wer, reli- ability score for each store, and a honesty score for each re view . These scores are de- fined in terms of one another: re vie wer trustiness is a (non-linear) function of his/her revie ws’ honesty scores, store reliability is a function of the trustiness of the re vie wers writing re views for it, and finally revie w honesty is a function of the reliability of the store it is written for as well as the trustiness of the revie wers who hav e also written revie ws for the same store it was written for . The algorithm randomly initiates these scores, and updates them iterati vely until some con vergence criterion is reached. This is similar in design to the HITS algorithm by [Kleinberg, 1998] where the author- itativ eness and hubness scores of W eb pages, which are defined in terms of linear functions of each other, are updated iteratively . On the other hand, the algorithm is not guaranteed to con verge, and cannot e xploit extra kno wledge such as te xtual clues or behavioral information b ut is complementary to these previous methods. Most recently , [Akoglu et al., 2013a] exploited relational classification for opin- ion spam detection. In particular , they developed a relational model based on RMNs that can capture the correlations between revie wers and stores, and used LBP for in- ference. One main difference from earlier network classification based methods is the signed nature of the opinion network, in which the reviewer s are connected to stores 44 Leman Akoglu et al. (or products) with positiv e ( + ) or negati ve ( − ) links that capture the sentiment of their revie ws (e.g., like/dislike). The signed links af fect the label correlations: e.g., while a fraudulent re viewer is likely to link to a low-quality store with a − link (un- justly boosting its reputation), it is less likely for him/her to link to a high-quality store with a + link; although this latter case occurs where fraudulent users occasion- ally write truthful revie ws to camouflage their otherwise fraudulent activities, which is accounted for in the RMN model. 5.6 Anomalies in financial trading networks [Li et al., 2010] use graph-based substructures and their ef ficient detection to spot po- tential fraudulent cases in trading networks. These cases consist of a group of traders that trade among each other in certain ways so as to manipulate the stock market. More specifically , the group of traders may perform transactions on a specific stock among themselves for some amount of time during which the overall shares of the target stock in their trading accounts increase and they end up producing a large v ol- ume of transactions on this stock. After the stock price goes up, these traders start selling the acquired shares to the public producing excessiv e volume of transactions to traders other than themselves. These two different behaviors of a group of traders within consecutive time win- dows are formulated in graph-based terms. In the former , in which excessi ve buying of the stock occurs, the in-link weights are e xpected to be quite high (these are called blackhole patterns), while in the latter selling stage the out-link weights highly exceed in-links’ (these are called volcano patterns). These two fraudulent trading behaviors are formally defined and formulated in graph-theoretic terms, and efficient algorithms are de veloped to detect such patterns quickly in very lar ge and dynamically changing financial trading networks. 5.7 Anomalies in the W eb network: spam and malware One suitable way to define W eb spam is any attempt to get an unjustifiably fa vorable relev ance or importance score for some W eb page, considering the page’ s true value. One of the main techniques in combating spam and malw are on the W eb has been to use trust and distrust propagation over the graph structure. These techniques assume that a link between two pages on the W eb signifies trust between them; i.e., a link from page i to page j is a con veyance of trust from page i to page j . Moreov er , if the target page is known to be a spam page, then they consider the trust judgment of the source page as inv alid, in which case the source page is penalized for trusting an untrustworthy page. One of the earliest methods in impro ving the PageRank algorithm to combat W eb spam is TrustRank [Gy ¨ ongyi et al., 2004], which employs the idea of propagating trust from a set o f highly trusted seed sites. Initially human experts select a list of seed sites that are well-known and trustworthy on the W eb . Each of these seed sites is assigned an initial trust score. A biased PageRank is then used to propagate these trust Graph-based Anomaly Detection and Description: A Survey 45 scores to the descendants of these sites. The amount of trust decreases with distance from the seed set and the number of outgoing links from a giv en site. Anti-T rustRank [Krishnan and Raj, 2006] can be thought of as the dual of T rustRank that performs propagation starting from known bad pages and propagate distrust instead of trust. The intuition used in this work is that the pages pointing to spam pages are very likely to be spam pages themselves. Anti-T rust is propagated in the rev erse direction along incoming links, starting from a seed set of spam pages. [W u et al., 2006] also point out se veral issues regarding T rustRank’ s assumptions, such as the fact that it looks at outgoing links and divides trust propagated to children by their count, which causes two equally trusted pages (but with different number of children) propagate different trust scores to their children. Moreov er the children accumulate trust by simply summing the trust scores from their parents. Instead, they use different splitting and accumulation techniques. In addition, they employ both trust and distrust propagation, and finally assign a weighted score of the two. One of the main challenges of these methods discussed so far is that they all expect a manually labeled seed set of good or bad pages. [Bencz ´ ur et al., 2005] propose a novel way to ov ercome this challenge and fully automate the process. Their idea is to look at the distrib ution of PageRank scores of neighbors for each node (i.e. W eb page in the graph), which is expected to be po wer-law distributed giv en the ov erall PageRank score distribution being power -law and the self-similarity of the W eb. For those nodes where the PageRank distribution of their neighbors deviate significantly from power-la w , they assign a “penalty”. Similar to Anti-TrustRank, a new P ageRank biased by the penalty scores giv es the Spam-Rank scores. Link-based spam detection [Becchetti et al., 2006] looks at dif ferent graph-based measures which are then used as features to train classification models. The graph features include PageRank, TrustRank scores, degree, assortativity (i.e. degree cor- relation), fraction of reciprocal edges, av erage degree of neighbors, etc. These type of link-based features are complementary to other techniques that use content-based features, such as the number of words, number of hyperlinks, text redundancy , etc. [Canali et al., 2011, Ntoulas et al., 2006], and content-free features, such as URL- based-only host and lexical features [Ma et al., 2009]. The work called ‘Know your neighbors’ [Castillo et al., 2007] makes use of var- ious types of features in tandem to learn classifiers and furthermore, use the graph structure to “smooth” the classification results. Main idea is to extract features that are (a) link-based (edge-reciprocity , assortativity , T rustRank score, ratio of TrustRank to Pagerank, radius, neighborhood growth rate for increasing number of hops, etc.); and (b) content-based (compression ratio, entropy of n-grams, etc.). Using these fea- tures they learn classifiers (decision-trees) and then smooth the classifier scores using the graph structure. In particular , they use three different ways to exploit the W eb graph: (i) clustering where all the nodes in the same cluster is relabeled by the major- ity of the initial labeling, (ii) random-walk-with-restart where probabilities are set to normalized spamicity scores from the classifier (similar to Anti-T rustRank), and (iii) stacking where a set of extra features for each object are added to the classification ov er iterations by combining the predictions for the related objects in the graph (this indeed is an ensemble method). 46 Leman Akoglu et al. 5.8 Anomalies in social networks Related to the pre vious section, another group of malw are detection methods focuses on social malware in social networks such as Facebook. Such malware is also called socware. Socware consists of any posts appearing in one’ s news feed in social media platforms such as Facebook and T witter that (i) lead the user to malicious sites that compromise the user’ s device, (ii) promise false rew ards and make the user perform certain tasks (e.g. filling out surveys) potentially for someone else’ s benefit, (iii) mak e the user boost the reputation of certain pages by clicking or ‘liking’ them, (iv) make the user redistribute (e.g., by sharing/re-posting), and so on. T o combat socware, [Rahman et al., 2012] propose a classification framew ork that exploits “social-context-aw are” features, such as message similarity of posts across different users who shared (or made to share as in (iv) above) a particular post, the size of the propagation of the post in the netw ork, the total ‘lik e’ and comment counts of other netw ork users on the post, etc. in addition to other content-based features. In another study , [Gao et al., 2012] perform online spam filtering on social netw orks us- ing incremental clustering, based on features that also include netw ork-le vel features such as sender’ s degree and the interaction history between users. These methods rely on learning classifiers based on collective feature sets (including graph-based features). On the other hand, it would be interesting to see if unsupervised meth- ods that directly focus on graph mining could help in identifying online socware, by studying the propagation-based dissemination of socware in the network. 5.9 Anomalies in computer networks: attacks and intrusion Most graph-based network intrusion detection methods focus on the dynamically growing and changing nature of the network graph. In this graph, the nodes repre- sent the agents in the netw orks, such as ad/file/directory servers and client nodes, and edges represent their communications over the network (note that the edges may be weighted, capturing volume or frequency). The insight behind tracking the dynamic nature of the network graph is the assumption that the communication behavior of a compute node would change when under attack. There exist two main challenges associated with tracking large communications networks and the necessity to consider their relational characteristics: (i) large num- ber of compute nodes makes it impractical to monitor them individually , moreover the behavior of the nodes may be dependent on each other and thus monitoring them in isolation w ould bypass their correlations; (ii) large number of edges mak es it imprac- tical to study the highly dynamic time-series of communications volume in tandem. For these reasons, [Id ´ e and Kashima, 2004] monitor what is called the “activity” vector of nodes. The acti vity score of a node is computed collecti vely; if a node links to many activ e nodes, its acti vity score is high. W ith this definition, the activity vector essentially becomes the principal eigen vector of the adjacency matrix that depicts the communication graph. They track this vector ov er time by measuring the change in its direction and magnitude and dev elop online thresholding techniques to decide Graph-based Anomaly Detection and Description: A Survey 47 when to flag a change as a significant ev ent. These e vents may correspond to network attacks as well as failures and other network configuration changes. [Sun et al., 2008] exploit matrix decomposition to capture the norm of network activity . They employ a sparse and ef ficient (both in time and storage) method called Compact Matrix Decomposition to decompose the adjacency matrix of the network graph and use relativ e sum-square-error of reconstruction as a measure of change to track ov er time for newcoming snapshots of the network graph. They observe that this new measure of change detects e vents that total v olume monitoring misses. Another graph-based method [Ding et al., 2012] considers analysis of network communities, as we discussed in Section 2.1. Simply reput, the idea is to moni- tor cross-community communication behavior to spot network intrusion. Intuitively , communications that cross community boundaries, considered as anti-social, are sus- picious and can be treated as signal of attack. The R OC curves show that methods based on this insight achiev e ov er 90% accuracy in detection, howe ver with a some- what high false alarm rate of about 50% in ground-truth data with malicious attacks. Finally , while not directly focusing on network intrusion, [Iliofotou et al., 2007, 2011] use graph based network traffic representations, called traffic dispersion graphs, to analyze, monitor , visualize, and classify network traffic. 6 Conclusions and Open Challenges Summary . In this survey , our aim has been to provide a comprehensive overvie w of graph-based techniques for anomaly , event, and fraud detection, as well as their use for post-analysis and sense-making in explaining the detected abnormalities. Fol- lowing our taxonomy in Figure 2, we surveyed quantitati ve detection and qualitati ve explanation/attrib ution techniques as two main parts. The detection methods are fur- ther categorized into three groups: (i) anomaly detection in static graphs; (ii) event detection in dynamic graphs; and (iii) fraud detection in real-world scenarios. The first two groups (anomalies and e vents) consist of general abnormality definitions and their detection techniques proposed mainly by the data mining community . The third group (fraud scenarios) consists of specialized techniques for particular fraud types as observed in the real world and mostly in volve (machine) learning approaches. Furthermore, the attribution techniques highlight graph-based tools for analysis, vi- sualization, monitoring, exploration, and sense-making of the anomalies. Conclusions. One of the main messages we aimed to conv ey has been the expres- siv eness of graphs in capturing real world phenomena, which makes them a very powerful machinery for abnormality detection. In particular , we emphasized that (i) data instances are often inter-dependent and exhibit long-range correlations, (ii) the anomaly detection problem is often relational in nature (e.g., opportunistic or orga- nized fraud), and (iii) robust, hard-to-circumvent machinery is essential in the arms race with the attackers in fraud scenarios. As such, graphs pro ve to be effecti ve in all these aspects. Our aim, ho wever , is not to claim the superiority of graph-based methods over other detection techniques. On the other hand, our goal is to highlight the advantages 48 Leman Akoglu et al. of graphs, and provide a comprehensi ve list of a vailable algorithms and tools that ex- ploit graphs to build anomaly detection solutions. W e believ e that those would pro ve complementary to other types of techniques and should most probably be used in tan- dem for better detection performance. In fact, it is at the discretion of the practitioners to decide what type of scenarios best describe the problems they are dealing with, as well as what tools best fit their needs. Open Challenges. While there has been tremendous amount of work in dev eloping graph-based algorithms and machinery for graph-based abnormality detection and attribution, we believ e there is still more work that needs to be done. In this part, we provide a discussion of open challenges which we group in tw o parts: theor etical and practical research challenges. Theor etical resear ch challeng es. While there has been considerable amount of work on static graphs, there still remain problems in the study of dynamic graphs. – Anomaly Detection on Attrib uted Dynamic Graphs. While static attributed graphs hav e been exploited in abnormality detection, there exists only a few works on spotting anomalies by exploiting dynamic attributed graphs (see T able 5). It is certainly of interest to develop definitions and formulations for abnormalities in such settings, and explore and identify where they could find applications in the real world. – The History/T race of Dynamic Updates. While most techniques for dynamic graphs consider and work with edge/node updates, there exists no work that ex- ploits the history of the updates. For example, imagine a W eb page having a link to a malicious W eb page in the past which is later removed. While from the change point of view this is an edge removal, the existence of such a link in the history of the page should be taken into account in making future ev aluations, rather than treating and committing to the change as a simple edge remov al. – Choosing the ‘Right’ T ime W indow/Granularity . Many algorithms for time- ev olving graphs require a time-window for feature extraction or computation of the normal graph/node activity; one of the open questions is how to choose this window in order to discover the dif ferent types of outliers in the graph sequences. W ould it make sense to set it to a day , or a week, or a month, based on the re- spectiv e periodicities that hav e been reported for human activities/botnet attacks etc.? W ould another time granularity serve for detecting the existent anomalies? Or , would a combination of time granularities work best? Moreov er, while there has been considerable work from algorithmic point of view in abnormality detection, there still remains problems from systems perspecti ve. – Adversarial Robustness. Most methods in the data mining and machine learning community focus on detection performance while ignoring adversarial r obust- ness . It is of high interest, from the practitioner’ s point of vie w , to understand the adversarial robustness of a new algorithm; i.e. ho w easy is it to break the algo- rithm, or what is the minimum amount of kno wledge or computational po wer the attacker needs to ha ve access to, in order to camouflage his/her bad activities. Graph-based Anomaly Detection and Description: A Survey 49 – The Cost of Graph Anomaly Detection. Most methods ignore the cost aspects of information. These costs, on the other hand, may exhibit themselves in v arious forms with varying lev els, e.g., cost in measurement and monitoring exerted on the system; cost in being e xposed to certain types of attacks exerted on the users; and cost in getting around of the algorithms exerted on the adversaries (which also relates to the abov e). These varying costs should be accounted for dif ferently in algorithm dev elopment. – Scalable Real-time Discontinuity Detection. One of the most important future challenges is to dev elop scalable approaches for real-time discontinuity detection, i.e., for streaming graphs. Specifically , research should focus on algorithms that are linear , or , e ven better , sub-linear to the input. Practical resear ch challenges. Challenges from the practitioner’ s point of view , which could also be posed as research problems, include the following. – F inding the X-factor . It is often hard to predict what would boost a detection al- gorithm’ s performance the most; e.g., better priors or better and/or more (human) labels in learning algorithms, better parameter tuning, creating frameworks that combine multiple algorithms working in parallel or sequentially , choosing the appropriate algorithms for the framew ork, or simply having more data. – Evaluation. Due to the challenges associated with collecting true labels related to cost and annotator noise, ground truth data is often inexistent. As such, various works employ dif ferent approaches as was discussed in the Concluding Remarks after Sections 2 and 3, such as anomaly injections and qualitativ e analysis. Thus far , there is no standard for ev aluating (graph) anomaly detection methods. – Graph Construction. Often times the data does not form a network as it is the case in computer networks. Rather , it is up to the practitioners to b uild a network rep- resentation of their data in order to use graph-based techniques. In such cases, it is often hard to anticipate what source of data is best to use in graph construction. – Anomaly Detection on Multi-Graphs. On the contrary to abov e, it may be the case that there is more than one network av ailable, capturing different aspects of re- lations (e.g. friendship network and telecommunication network among the same individuals). While possibly beneficial, how to exploit all a vailable networks and fuse clues from all these sources for anomaly detection remains an open area. – Balance between Attribution and ‘Novelty’ Detection. By anomaly attribution, we essentially want to attrib ute the detected anomalies to known, human- understandable e vidences (e.g., the kno wn frauds, the known rule-based meta detectors, etc). This might contradict to some anomaly detection tasks where the goal is to find ‘novel’ patterns beyond users’ current understandings. More re- search needs to be done in the direction of how to balance between the attribution and the ability of the detection algorithm to find ‘nov elty’. – Augmented Graph Anomaly Detection. When there is an explicit network repre- sentation, it may also be possible to introduce/remove latent edges, for example edges based on (e.g. text, time-series, correlation) similarities or domain knowl- edge (e.g. known irrele vant types of edges). 50 Leman Akoglu et al. Acknowledgements This material is based upon work supported by the Army Research Office (ARO) under Cooperativ e Agreement Numbers W911NF-14-1-0029 and W911NF-09-2-0053, the Defense Ad- vanced Research Projects Agency (DARP A) under Contract Numbers W911NF-11-C-0088, W911NF-11- C-0200 and W911NF-12-C-0028, the National Science Foundation (NSF) under Grants No. IIS-1217559 and IIS1017415, by Region II Univ ersity Transportation Center under the project number 49997-33-25, and the Stony Brook Uni versity Office of V ice President for Research. Any findings and conclusions expressed in this material are those of the author(s) and do not necessar - ily reflect the position or the policy of the U.S. Gov ernment and the other funding parties, and no official endorsement should be inferred. The U.S. Government is authorized to reproduce and distribute reprints for Government purposes notwithstanding an y copyright notation here on. References Naoki Abe, Bianca Zadrozny , and John Langford. Outlier detection by active learn- ing. In Pr oceedings of the 12th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, P A , pages 504–509, 2006. Naoki Abe, Prem Melville, Cezar Pendus, Chandan K. Reddy , David L. Jensen, V ince P . Thomas, James J. Bennett, Gary F . Anderson, Brent R. Cooley , Melissa K owalczyk, Mark Domick, and Timoth y Gardinier . Optimizing debt collections using constrained reinforcement learning. In Pr oceedings of the 16th ACM Inter- national Confer ence on Knowledge Discovery and Data Mining (SIGKDD), W ash- ington, DC , pages 75–84. A CM, 2010. Charu Aggarwal and Karthik Subbian. Evolutionary network analysis: A survey . A CM Computing Surveys , 2014. Charu C. Aggarwal. Outlier ensembles. In A CM SIGKDD Explorations , 2012. Charu C. Aggarwal. Outlier Analysis . Springer-V erlag New Y ork Incorporated, 2013. Charu C. Aggarwal and Philip S. Y u. Outlier detection for high dimensional data. In Pr oceedings of the ACM International Conference on Management of Data (SIG- MOD), Santa Barbara, CA , pages 37–46. A CM, 2001. Charu C. Aggarwal, Y uchen Zhao, and Philip S. Y u. Outlier detection in graph streams. In Pr oceedings of the 27th International Confer ence on Data Engineering (ICDE), Hannover , Germany , pages 399–409, 2011. Leman Akoglu and Christos Faloutsos. Event detection in time series of mobile communication graphs. Proceedings of Army Science Confer ence , (1), 2008. Leman Akoglu and Christos Faloutsos. R TG: A recursive realistic graph generator using random typing. Data Mining and Knowledge Discovery , 19(2):194–209, 2009. Leman Ak oglu, Mary McGlohon, and Christos Faloutsos. OddBall: Spotting anoma- lies in weighted graphs. In Pr oceedings of the 14th P acific-Asia Confer ence on Knowledge Discovery and Data Mining (P AKDD), Hyderabad, India , pages 410– 421, 2010. Leman Akoglu, Pedro O. S. V az de Melo, and Christos Faloutsos. Quantifying reciprocity in large weighted communication networks. Pr oceedings of the 16th P acific-Asia Confer ence on Knowledge Discovery and Data Mining (P AKDD), K uala Lumpur , Malysia , 2012a. Leman Akoglu, Hanghang T ong, Brendan Meeder , and Christos Faloutsos. Pics: Parameter -free identification of cohesiv e subgroups in large attributed graphs. In Graph-based Anomaly Detection and Description: A Survey 51 Pr oceedings of the 12th SIAM International Confer ence on Data Mining (SDM), Anaheim, CA , pages 439–450. SIAM / Omnipress, 2012b. Leman Akoglu, Hanghang T ong, Jilles Vreeken, and Christos Faloutsos. F ast and reliable anomaly detection in categorical data. In Pr oceedings of the 21st ACM Confer ence on Information and Knowledge Manag ement (CIKM), Maui, Hawaii , pages 415–424, 2012c. Leman Akoglu, Rishi Chandy , and Christos Faloutsos. Opinion fraud detection in online re views using network effects. In Pr oceedings of the 7th International AAAI Confer ence on W eblogs and Social Media (ICWSM) , 2013a. Leman Akoglu, Jilles Vreeken, Hanghang T ong, Duen Horng Chau Nikolaj T atti, and Christos Faloutsos. Mining connection pathways for marked nodes in large graphs. In Pr oceedings of the 13th SIAM International Confer ence on Data Mining (SDM), T exas-A ustin, TX , 2013b. Mitsuru Ambai, Nugraha P . Utama, and Y uichi Y oshida. Dimensionality reduction for histogram features based on supervised non-negati ve matrix factorization. IEICE T ransactions on Information and Systems , 94-D(10):1870–1879, 2011. Reid Andersen, Fan Chung, and Ke vin Lang. Local graph partitioning using pagerank vectors. In Pr oceedings of the 47th Annual IEEE Symposium on F oundations of Computer Science , pages 475–486. IEEE Computer Society , 2006. Shin Ando. Clustering needles in a haystack: An information theoretic analysis of minority and outlier detection. In Pr oceedings of the 7th IEEE International Con- fer ence on Data Mining (ICDM), Omaha, NE , pages 13–22, 2007. Ioannis Antonellis, Hector Garcia-Molina, and Chi-Chao Chang. Simrank++: Query rewriting through link analysis of the click graph. In Pr oceedings of the 34nd Inter- national Conference on V ery Larg e Data Bases (VLDB), Auckland, New Zealand , pages 408–421, 2008. Miguel Araujo, Spiros Papadimitriou, Stephan Gnnemann, Christos Faloutsos, Prith- wish Basu, Ananthram Swami, Ev angelos Papale xakis, and Danai Koutra. Com2: Fast automatic discovery of temporal (comet) communities. In Pr oceedings of the 18th P acific-Asia Conference on Knowledge Discovery and Data Mining (P AKDD), T ainan, T aiwan , 2014. Lars Backstrom, Dan Huttenlocher , Jon Kleinberg, and Xiangyang Lan. Group for- mation in large social networks: membership, gro wth, and ev olution. In Pr oceed- ings of the 12th ACM SIGKDD international conference on Knowledge discov- ery and data mining , Proceedings of the 12th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, P A, pages 44– 54. A CM, 2006. Albert-L ´ aszl ´ o Barab ´ asi and R ´ eka Albert. Emergence of scaling in random networks. Science , 286:509–512, 1999. Stephen D. Bay and Michael J. Pazzani. Detecting change in categorical data: Min- ing contrast sets. In Pr oceedings of the 5th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), San Die go, CA , pages 302– 306. A CM Press, 1999. Mohsen Bayati, David F . Gleich, Amin Saberi, and Y ing W ang. Message passing algorithms for sparse network alignment. ACM T ransactions on Knowledge Dis- covery fr om Data , 7(1):3:1–3:31, March 2013. 52 Leman Akoglu et al. Luca Becchetti, Carlos Castillo, Debora Donato, Stefano Leonardi, and Ricardo Baeza-Y ates. Link-based characterization and detection of W eb Spam. In Second International W orkshop on Adversarial Information Retrieval on the W eb (AIR- W eb) , August 2006. Andr ´ as A. Bencz ´ ur , K ´ aroly Csalog ´ any , T am ´ as Sarl ´ os, and M ´ at ´ e Uher . Spamrank: fully automatic link spam detection. In Pr oceedings of the F irst International W orkshop on Adversarial Information Retrieval on the W eb , May 2005. Michele Berlingerio, Danai K outra, T ina Eliassi-Rad, and Christos Faloutsos. Net- simile: A scalable approach to size-independent network similarity . CoRR , abs/1209.2684, 2012. Ke vin Beyer , Jonathan Goldstein, Raghu Ramakrishnan, and Uri Shaft. When is ”nearest neighbor” meaningful? In International Confer ence on Database Theory , pages 217–235, 1999. Cemal Cagatay Bilgin and Blent Y ener . Dynamic network ev olution: Models, clus- tering, anomaly detection. Survey , 2008. Brigitte Boden, Stephan G ¨ unnemann, Holger Hoffmann, and Thomas Seidl. Mining coherent subgraphs in multi-layer graphs with edge labels. In Proceedings of the 18th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China , pages 1258–1266. A CM, 2012a. Brigitte Boden, Stephan G ¨ unnemann, and Thomas Seidl. Tracing clusters in ev olv- ing graphs with node attributes. In Pr oceedings of The 21st ACM Confer ence on Information and Knowledge Manag ement (CIKM 2012), Maui, USA , 2012b. Christian B ¨ ohm, Katrin Haegler , Nikola S. M ¨ uller , and Claudia Plant. CoCo: coding cost for parameter-free outlier detection. In Pr oceedings of the 15th ACM Inter- national Confer ence on Knowledg e Discovery and Data Mining (SIGKDD), P aris, F rance , pages 149–158. A CM, 2009. Richard J. Bolton and Da vid J. Hand. Unsupervised profiling methods for fraud detection. In Pr oceedings of Confer ence Cr edit Scoring and Cr edit Contr ol VII , pages 5–7, 2001. Richard J. Bolton and David J. Hand. Statistical fraud detection: A revie w . Statistical Science , 17, 2002. Phillip Bonacich and Paulette Lloyd. Eigen vector -like measures of centrality for asymmetric relations. Social Networks , 23(3):191–201, July 2001. George Edward Pelham Box and Gwilym Jenkins. T ime Series Analysis, F or ecasting and Contr ol . Holden-Day , Incorporated, 1990. Markus M. Breunig, Hans-Peter Kriegel, Raymond T . Ng, and J ¨ org Sander . Lof: Identifying density-based local outliers. In Proceedings of the ACM International Confer ence on Management of Data (SIGMOD), Dallas, TX , pages 93–104. A CM, 2000. Serge y Brin and Lawrence Page. The anatomy of a large-scale hypertextual web search engine. Computer Networks , 30(1-7):107–117, 1998. Horst Bunke. Error correcting graph matching: On the influence of the underlying cost function. IEEE T ransactions on P attern Analysis and Machine Intelligence , 21(9):917–922, 1999. Horst Bunke, Peter J. Dickinson, Andreas Humm, Christophe Irniger , and Miro Kraetzl. Computer network monitoring and abnormal ev ent detection using graph Graph-based Anomaly Detection and Description: A Survey 53 matching and multidimensional scaling. In Pr oceedings of 6th Industrial Confer- ence on Data Mining (ICDM) , pages 576–590, July 14 - 15 2006a. Horst Bunke, Peter J. Dickinson, Miro Kraetzl, and W alter D. W allis. A Graph- Theor etic Appr oach to Enterprise Network Dynamics (PCS) . Birkhauser , 2006b. Davide Canali, Marco Cova, Gio vanni V igna, and Christopher Kruegel. Prophiler: a fast filter for the large-scale detection of malicious web pages. In Pr oceedings of the 19th International Confer ence on W orld W ide W eb (WWW), Hyder abad, India , pages 197–206. A CM, 2011. Carlos Castillo, Debora Donato, Aristides Gionis, V anessa Murdock, and Fabrizio Silvestri. Know your neighbors: web spam detection using the web topology . In Pr oceedings of the 30th International Confer ence on Resear ch and Development in Information Retrieval (SIGIR), Amster dam , pages 423–430. A CM, 2007. Sung-Hyuk Cha. Comprehensive survey on distance / similarity measures between probability density functions. International Journal of Mathematical Models and Methods in Applied Sciences , 1(4):300–307, 2007. Deepayan Chakrabarti. Autopart: parameter-free graph partitioning and outlier de- tection. In Pr oceedings of the 8th Eur opean Confer ence on Principles and Prac- tice of Knowledge Discovery in Databases (PKDD), Pisa, Italy , pages 112–124. Springer-V erlag Ne w Y ork, Inc., 2004. Deepayan Chakrabarti, Ravi Kumar , and Andrew T omkins. Evolutionary clustering. In Pr oceedings of the 12th ACM SIGKDD international confer ence on Knowledge discovery and data mining , Proceedings of the 12th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, P A, pages 554–560. A CM, 2006. Soumen Chakrabarti. Dynamic personalized pagerank in entity-relation graphs. In Pr oceedings of the 16th International Conference on W orld W ide W eb (WWW), Alberta, Canada , pages 571–580, 2007. V arun Chandola, Arindam Banerjee, and V ipin K umar . Anomaly detection: A surv ey . A CM Computing Surveys , 41:15:1–15:58, 2009. V arun Chandola, Arindam Banerjee, and V ipin Kumar . Anomaly detection for dis- crete sequences: A survey . IEEE T ransactions on Knowledge and Data Engineer- ing , 24(5):823–839, 2012. Gary Chartrand, Grzegorz Kubicki, and Michelle Schulz. Graph similarity and dis- tance in graphs. Aequationes Mathematicae , 55(1-2):129–145, 1998. Duen Horng Chau, Shashank Pandit, and Christos Faloutsos. Detecting fraudulent personalities in networks of online auctioneers. In Proceedings of the 10th Eur o- pean Conference on Principles and Practice of Knowledge Discovery in Databases (PKDD), Berlin, Germany , pages 103–114, 2006. Duen Horng Chau, Leman Akoglu, Jilles Vreeken, Hanghang T ong, and Christos Faloutsos. T ourviz: interactiv e visualization of connection pathways in large graphs. In Proceedings of the 18th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China , pages 1516–1519, 2012. Amitabh Chaudhary , Alexander S. Szalay , and Andrew W . Moore. V ery fast outlier detection in large multidimensional data sets. In Proceedings of the ACM SIGMOD W orkshop on Resear ch Issues in Data Mining and Knowledg e Discovery (DMKD), Madison, WI , 2002. 54 Leman Akoglu et al. Hung-Hsuan Chen and C. Lee Giles. ASCOS: an asymmetric network structure con- text similarity measure. In IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASON AM), Niagar a F alls, Canada , 2013. Gregory F . Cooper . The Computational Complexity of Probabilistic Inference Using Bayesian Belief Networks. Artificial Intelligence , 42(2-3):393–405, 1990. Corinna Cortes and Daryl Pregibon. Signature-based methods for data streams. Data Mining and Knowledge Discovery , 5(3):167–182, 2001. Corinna Cortes, Kathleen Fisher , Daryl Pregibon, and Anne Rogers. Hancock: a lan- guage for extracting signatures from data streams. In Pr oceedings of the 6th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Boston, MA , pages 9–17. A CM, 2000. Corinna Cortes, Daryl Pregibon, and Chris V olinsky . Communities of interest. Intel- ligent Data Analysis , 6(3):211–219, 2002. Hanbo Dai, Feida Zhu, Ee-Peng Lim, and HweeHwa Pang. Detecting anomalies in bipartite graphs with mutual dependency principles. In Pr oceedings of the 12th IEEE International Confer ence on Data Mining (ICDM), Brussels, Belgium , pages 171–180. IEEE Computer Society , 2012. Uros Damnjanovic, V irginia Fernandez Arguedas, Ebroul Izquierdo, and Jos ´ e M. Mart ´ ınez. Event detection and clustering for surveillance video summarization. In 9th International W orkshop on Image Analysis for Multimedia Interactive Ser- vices , pages 63–66. IEEE Computer Society , 2008. Kaustav Das and Jeff G. Schneider . Detecting anomalous records in categorical datasets. In Proceedings of the 13th A CM International Confer ence on Knowl- edge Discovery and Data Mining (SIGKDD), San Jose , CA , pages 220–229. ACM, 2007. Michael Davis, W eiru Liu, Paul Miller, and George Redpath. Detecting anomalies in graphs with numeric labels. In Pr oceedings of the 21st A CM Confer ence on In- formation and Knowledge Manag ement (CIKM), Glasgow , Scotland , pages 1197– 1202. A CM, 2011. Scott Deerwester , Susan T . Dumais, George W . Furnas, Thomas K. Landauer , and Richard Harshman. Indexing by latent semantic analysis. Journal of the American Society for Information Science , 41(6):391–407, 1990. Inderjit S. Dhillon, Subramanyam Mallela, and Dharmendra S. Modha. Information- theoretic co-clustering. In Pr oceedings of the 9th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), W ashington, DC , pages 89–98. A CM, 2003. Peter Dickinson, Horst Bunke, Arek Dadej, and Miro Kraetzl. Median graphs and anomalous change detection in communication networks. In Information, Decision and Contr ol, 2002. F inal Pr ogram and Abstracts , pages 59–64, 2002. Qi Ding, Natallia Katenka, Paul Barford, Eric D. K olaczyk, and Mark Crovella. In- trusion as (anti)social communication: characterization and detection. In Pr oceed- ings of the 18th A CM International Confer ence on Knowledge Disco very and Data Mining (SIGKDD), Beijing, China , pages 886–894. A CM, 2012. Petros Drineas, Ravi Kannan, and Michael W . Mahoney . Fast monte carlo algo- rithms for matrices iii: Computing a compressed approximate matrix decomposi- tion. SIAM Journal on Computing , 36(1):184–206, 2006. Graph-based Anomaly Detection and Description: A Survey 55 W illiam Eberle and Lawrence B. Holder . Discovering structural anomalies in graph- based data. In Pr oceedings of the International W orkshop on Mining Graphs and Complex Structures at the 7th IEEE International Confer ence on Data Mining (ICDM), Omaha, NE , pages 393–398. IEEE Computer Society , 2007. W illiam Eberle and Lawrence B. Holder . Graph-based approaches to insider threat detection. In Pr oceedings of the 5th Annual Cyber Security and Information Intel- ligence Resear ch W orkshop (CSIIRW) , page 44. A CM, 2009. Michael Edward Edge and Pedro R. Falcone Sampaio. A surve y of signature based methods for financial fraud detection. Computers & Security , 28(6):381–394, 2009. Hew ayda Elghawalby and Edwin R. Hancock. Measuring graph similarity using spectral geometry . In Pr oceedings of the 5th international confer ence on Image Analysis and Recognition (ICIAR) , pages 517–526, 2008. Christos Faloutsos, Ke vin S. McCurley , and Andrew T omkins. F ast discovery of connection subgraphs. In Pr oceedings of the 10th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Seattle, W A , pages 118– 127, 2004. T om Fawcett and Foster J. Provost. Combining data mining and machine learning for effecti ve user profiling. In Pr oceedings of the 2nd ACM International Con- fer ence on Knowledge Discovery and Data Mining (KDD), P ortland, OR , pages 8–13. AAAI Press, 1996. T om F awcett and Foster J. Prov ost. Activity monitoring: Noticing interesting changes in behavior . In Pr oceedings of the 5th ACM International Confer ence on Knowl- edge Discovery and Data Mining (SIGKDD), San Die go, CA , pages 53–62. A CM, 1999. Usama M. Fayyad and Keki B. Irani. Multi-interval discretization of continuous- valued attributes for classification learning. In Pr oceedings of the 5th Inter- national J oint Conference on Artificial Intelligence (IJCAI), Chambery , F rance , pages 1022–1029. Morgan Kaufmann, 1993. FBI. Online auction fraud, June 2009. W illiam Feller . An intr oduction to pr obability theory and its applications . W iley , 1968. Song Feng, Ritwik Banerjee, and Y ejin Choi. Syntactic stylometry for deception detection. In Pr oceedings of the 50th Annual Meeting of the Association for Com- putational Linguistics (A CL), Jeju Island, K or ea , 2012a. Song Feng, Longfei Xing, Anupam Gogar, and Y ejin Choi. Distributional footprints of deceptiv e product revie ws. In Pr oceedings of the 6th International AAAI Con- fer ence on W eblogs and Social Media (ICWSM) , 2012b. Miroslav Fiedler . Algebraic connectivity of graphs. Czec hoslovak Mathematical Journal , 23(98):298–305, 1973. N. I. Fisher, T . Lewis, and B. J. J. Embleton. Statistical analysis of spherical data . Cambridge Univ ersity Press, 1993. Ulrich Flegel, Julien V ayssire, and Gunter Bitz. A state of the art survey of fraud detection technology . In Insider Thr eats in Cyber Security , volume 49 of Advances in Information Security , pages 73–84. Springer , 2010. 56 Leman Akoglu et al. Linton C. Freeman. A set of measures of centrality based upon betweenness. So- ciometry , 40:35–41, 1977. Nir Friedman, Lise Getoor, Daphne Koller , and A vi Pfeffer . Learning probabilistic relational models. In Proceedings of the 11th International Joint Conference on Artificial Intelligence (IJCAI), Stoc kholm, Sweden , pages 1300–1309, 1999. Brian Gallagher , Hanghang T ong, Tina Eliassi-Rad, and Christos Faloutsos. Using ghost edges for classification in sparsely labeled networks. In Pr oceedings of the 14th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Las V egas, NV , pages 256–264. A CM, 2008. Joo Gama, Pedro Medas, Gladys Castillo, and Pedro Rodrigues. Learning with drift detection. In In SBIA Brazilian Symposium on Artificial Intelligence , pages 286– 295. Springer V erlag, 2004. Hongyu Gao, Y an Chen, Kathy Lee, Diana Palsetia, and Alok Choudhary . T owards Online Spam Filtering in Social Networks. In Pr oceedings of the 19th Annual Network & Distributed System Security Symposium , 2012. Jing Gao and Pang-Ning T an. Con verting output scores from outlier detection algo- rithms into probability estimates. In Pr oceedings of the 6th IEEE International Confer ence on Data Mining (ICDM), Hong Kong , China , pages 212–221, 2006. Jing Gao, Feng Liang, W ei Fan, Chi W ang, Y izhou Sun, and Jiawei Han. On com- munity outliers and their efficient detection in information networks. In Pr oceed- ings of the 16th A CM International Confer ence on Knowledge Disco very and Data Mining (SIGKDD), W ashington, DC , pages 813–822. A CM, 2010a. Xinbo Gao, Bing Xiao, Dacheng T ao, and Xuelong Li. A survey of graph edit dis- tance. Journal of P attern Analysis and Applications , 13(1):113–129, 2010b. Matthew E. Gaston, Miro Kraetzl, and W alter D. W allis. Using graph diameter for change detection in dynamic networks. In Australian Journal of Combinatorics , pages 299–311, 2006. Amol Ghoting, Sriniv asan Parthasarathy , and Matthe w Eric Otey . Fast mining of distance-based outliers in high-dimensional datasets. Data Mining and Knowledg e Discovery , 16(3):349–364, 2008. Joseph Glaz. Scan statistics. Encyclopedia of Statistics in Quality and Reliability , 2007. Gene H. Golub and Charles F . V an Loan. Matrix computations (3r d ed.) . Johns Hopkins Univ ersity Press, 1996. Olivia A. Grigg, V ern T . Farewell, and David J. Spiegelhalter . Use of risk-adjusted cusum and rspert charts for monitoring in medial contexts. Statistical Methods in Medical Resear ch , 2003. Stephan G ¨ unnemann, Ines F ¨ arber , Brigitte Boden, and Thomas Seidl. Subspace clus- tering meets dense subgraph mining: A synthesis of two paradigms. In Pr oceed- ings of the 10th IEEE International Confer ence on Data Mining (ICDM), Sydney , Austr alia , pages 845–850. IEEE Computer Society , 2010. Stephan G ¨ unnemann, Brigitte Boden, and Thomas Seidl. Finding density-based sub- space clusters in graphs with feature vectors. Data Mining and Knowledge Dis- covery , 25(2):243–269, 2012. Manish Gupta, Jing Gao, Y izhou Sun, and Jiawei Han. Integrating community match- ing and outlier detection for mining e volutionary community outliers. In Pr oceed- Graph-based Anomaly Detection and Description: A Survey 57 ings of the 18th A CM International Confer ence on Knowledge Disco very and Data Mining (SIGKDD), Beijing, China , pages 859–867. A CM, 2012. Manish Gupta, Jing Gao, Charu C. Aggarwal, and Jiawei Han. Outlier detection for temporal data: A surv ey . IEEE T ransactions on Knowledge and Data Engineering , 99(PrePrints):1, 2013. ISSN 1041-4347. Mangesh Gupte and T ina Eliassi-Rad. Measuring tie strength in implicit social net- works. In Pr oceedings of the ACM Conference on W eb Science, Evanston, IL , pages 109–118. A CM, 2012. Zolt ´ an Gy ¨ ongyi, Hector Garcia-Molina, and Jan Pedersen. Combating web spam with trustrank. In Pr oceedings of the 30th International Confer ence on V ery Lar ge Data Bases (VLDB), T or onto, Canada , pages 576–587, 2004. T aher H. Haveliw ala. T opic-sensiti ve pagerank: A context-sensiti ve ranking algo- rithm for web search. IEEE T ransactions on Knowledge and Data Engineering , 15(4):784–796, 2003. Douglas Hawkins. Identification of outliers. Chapman and Hall , 1980. Zengyou He, Xiaofei Xu, and Shengchun Deng. Discov ering cluster-based local outliers. P attern Recognition Letters , 24(9-10):1641–1650, 2003. Nicholas A. Heard, Da vid J. W eston, Kiriaki Platanioti, and Da vid J. Hand. Bayesian anomaly detection methods for social networks. Annals of Applied Statistics , 4: 645–662, January 2010. Kathryn Hempstalk, Eibe Frank, and Ian H. W itten. One-class classification by com- bining density and class probability estimation. In Proceedings of the Eur opean Confer ence on Machine Learning and Principles and Practice of Knowledge Dis- covery in Databases (ECML PKDD), Antwerp, Belgium . Springer , 2008. Keith Henderson, Tina Eliassi-Rad, Christos Faloutsos, Leman Akoglu, Lei Li, Koji Maruhashi, B. Aditya Prakash, and Hanghang T ong. Metricforensics: A multi-level approach for mining volatile graphs. In Pr oceedings of the 16th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), W ashington, DC , pages 163–172, 2010. Keith Henderson, Brian Gallagher , Lei Li, Leman Akoglu, Tina Eliassi-Rad, Hang- hang T ong, and Christos Faloutsos. It’ s who you know: graph mining using recur- siv e structural features. In Pr oceedings of the 17th A CM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), San Die go, CA , pages 663– 671. A CM, 2011. Keith Henderson, Brian Gallagher, Tina Eliassi-Rad, Hanghang T ong, Sugato Basu, Leman Akoglu, Danai Koutra, Christos Faloutsos, and Lei Li. RolX: structural role extraction & mining in large graphs. In Pr oceedings of the 18th A CM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China , pages 1231–1239, 2012. Tsuyoshi Id ´ e and Hisashi Kashima. Eigenspace-based anomaly detection in computer systems. In Pr oceedings of the tenth ACM SIGKDD international conference on Knowledge discovery and data mining , Proceedings of the 10th A CM International Conference on Knowledge Discov ery and Data Mining (SIGKDD), Seattle, W A, pages 440–449. A CM, 2004. Marios Iliofotou, Prashanth Pappu, Michalis Faloutsos, Michael Mitzenmacher , Sumeet Singh, and George V arghese. Network monitoring using traf fic disper- 58 Leman Akoglu et al. sion graphs. In Pr oceedings of the 7th ACM SIGCOMM Conference on Internet Measur ement , pages 24–26. ACM, 2007. Marios Iliofotou, Hyunchul Kim, Michalis Faloutsos, Michael Mitzenmacher , Prashanth Pappu, and George V arghese. Graption: A graph-based P2P traffic clas- sification framework for the internet backbone. Computer Networks , 55(8):1909– 1920, 2011. Luca In vernizzi and Paolo Milani Comparetti. Evilseed: A guided approach to finding malicious web pages. In IEEE Symposium on Security and Privacy , pages 428– 442, 2012. Keisuk e Ishibashi, Tsuyoshi K ondoh, Shigeaki Harada, T atsuya Mori, Ryoichi Ka wa- hara, and Shoichiro Asano. Detecting anomalous traf fic using communication graphs. In T elecommunications: The Infrastructur e for the 21st Century (WTC), 2010 , pages 1–6, 2010. Bernard J. Jansen. Click fraud. IEEE Computer , 40(7):85–86, 2008. Jeroen H. M. Janssens, Ildik ´ o Flesch, and Eric O. Postma. Outlier detection with one-class classifiers from ML and KDD. In Pr oceedings of the 8th International Confer ence on Machine Learning and Applications (ICMLA), Miami Beach, FL , pages 147–153. IEEE Computer Society , 2009. Glen Jeh and Jennifer W idom. SimRank: a measure of structural-context similarity. In Pr oceedings of the 8th ACM International Confer ence on Knowledge Disco very and Data Mining (SIGKDD), Edmonton, Alberta , pages 538–543, 2002. David Jensen, Jennifer Neville, and Brian Gallagher . Why collectiv e inference im- prov es relational classification. In Pr oceedings of the 10th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Seattle, W A , pages 593–598, 2004. Nitin Jindal and Bing Liu. Opinion spam and analysis. In Pr oceeding of the 1st A CM International Confer ence on W eb Sear ch and Data Mining (WSDM) , pages 219–230, 2008. Nitin Jindal, Bing Liu, and Ee-Peng Lim. Finding unusual revie w patterns using unexpected rules. In Pr oceedings of the 19th A CM Conference on Information and Knowledge Management (CIKM), T or onto, Canada , pages 1549–1552. A CM, 2010. Daniel Kahneman. Thinking, fast and slow . Farrar , Straus and Giroux, 2011. U Kang, Mary McGlohon, Leman Akoglu, and Christos Faloutsos. Patterns on the connected components of terabyte-scale graphs. In Pr oceedings of the 10th IEEE International Conference on Data Mining (ICDM), Sydney , Australia , pages 875– 880, 2010. U Kang, Duen Horng Chau, and Christos Faloutsos. Mining large graphs: Algo- rithms, inference, and discov eries. In Pr oceedings of the 27th International Con- fer ence on Data Engineering (ICDE), Hannover , Germany , pages 243–254. IEEE Computer Society , 2011a. U Kang, Spiros Papadimitriou, Jimeng Sun, and Hanghang T ong. Centralities in large networks: Algorithms and observ ations. In Pr oceedings of the 11th SIAM Interna- tional Confer ence on Data Mining (SDM), Mesa, AZ , pages 119–130, 2011b. U Kang, Charalampos E. Tsourakakis, Ana Paula Appel, Christos Faloutsos, and Jure Leskov ec. Hadi: Mining radii of large graphs. ACM T ransactions on Knowledge Graph-based Anomaly Detection and Description: A Survey 59 Discovery fr om Data , 5:8:1–8:24, February 2011c. ISSN 1556-4681. U Kang, Hanghang T ong, and Jimeng Sun. Fast random walk graph kernel. In Pr oceedings of the 12th SIAM International Confer ence on Data Mining (SDM), Anaheim, CA , 2012. U Kang, Jay-Y oon Lee, Danai K outra, and Christos Faloutsos. Net-Ray: V isualizing and mining web-scale graphs. In Proceedings of the 18th P acific-Asia Conference on Knowledge Discovery and Data Mining (P AKDD), T ainan, T aiwan , 2014. Kelly M. Kapsabelis, Peter J. Dickinson, and Kutluyil Dogancay . In vestigation of graph edit distance cost functions for detection of netw ork anomalies. In Pr oceed- ings of the 13th Biennial Computational T echniques and Applications Confer ence, CT A C-2006 , volume 48 of ANZIAM J ournal , pages C436–C449, October 2007. George Karypis and V ipin K umar . Metis - unstructured graph partitioning and sparse matrix ordering system, version 2.0. T echnical report, 1995. George Karypis and V ipin Kumar . Parallel multile vel k-way partitioning scheme for irregular graphs. In Pr oceedings of the 1996 A CM/IEEE conference on Super com- puting (CDR OM) , Supercomputing ’96. IEEE Computer Society , 1996. Hisashi Kashima, Koji Tsuda, and Akihiro Inokuchi. Marginalized kernels between labeled graphs. In Pr oceedings of the T wentieth International Conference on Ma- chine Learning , pages 321–328. AAAI Press, 2003. Leo Katz. A new status index deri ved from sociometric analysis. Psychometrika , 18 (1):39–43, March 1953. Fabian Keller , Emmanuel M ¨ uller , and Klemens B ¨ ohm. Hics: High contrast subspaces for density-based outlier ranking. In Pr oceedings of the 28th International Con- fer ence on Data Engineering (ICDE), W ashington, DC , pages 1037–1048, 2012. Alexander K. Kelmans. Comparison of graphs by their number of spanning trees. Discr ete Mathematics , 16(3):241 – 261, 1976. Jon M. Kleinberg. Authoritative sources in a hyperlinked environment. In Pr oceed- ings of the 5th Annual ACM-SIAM Symposium on Discrete Algorithms (SOD A), San F rancisco, CA , pages 668–677, 1998. Edwin M. Knorr and Raymond T . Ng. Algorithms for mining distance-based outliers in large datasets. In Pr oceedings of the 24th International Conference on V ery Lar ge Data Bases (VLDB), New Y ork City , NY , pages 392–403, 1998. Petri K ontkanen and Petri Myllymki. Mdl histogram density estimation. Journal of Machine Learning Resear ch - Pr oceedings T rack , 2:219–226, 2007. Y ehuda K oren, Stephen C. North, and Chris V olinsky . Measuring and extracting proximity in networks. In Pr oceedings of the 12th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, P A , pages 245–255, 2006. Danai K outra, T ai-Y ou K e, U Kang, Duen Horng Chau, Hsing-Kuo Kenneth Pao, and Christos Faloutsos. Unifying guilt-by-association approaches: Theorems and fast algorithms. In Pr oceedings of the Eur opean Confer ence on Machine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Athens, Gr eece , pages 245–260, 2011. Danai K outra, Ev angelos Papale xakis, and Christos Faloutsos. T ensorsplat: Spotting latent anomalies in time. In 16th P anhellenic Conference on Informatics (PCI) , 2012. 60 Leman Akoglu et al. Danai K outra, Hanghang T ong, and David Lubensky . Big-Align: F ast bipartite graph alignment. In Pr oceedings of the 13th IEEE International Confer ence on Data Mining (ICDM), Dallas, T exas , 2013a. Danai K outra, Joshua V ogelstein, and Christos Faloutsos. Deltacon: A principled massiv e-graph similarity function. In Pr oceedings of the 13th SIAM International Confer ence on Data Mining (SDM), T exas-A ustin, TX , 2013b. Barbara Krausz and Rainer Herpers. MetroSurv: detecting ev ents in subway stations. Multimedia T ools and Applications , 50(1):123–147, 2010. Hans-Peter Kriegel, Peer Kr ¨ oger , Erich Schubert, and Arthur Zimek. Outlier detec- tion in arbitrarily oriented subspaces. In Pr oceedings of the 12th IEEE Interna- tional Confer ence on Data Mining (ICDM), Brussels, Belgium , pages 379–388, 2012. V ijay Krishnan and Rashmi Raj. W eb spam detection with anti-trust rank. In Pr o- ceedings of the 2nd International W orkshop on Adversarial IR on the W eb at the 29th International Confer ence on Researc h and Development in Information Re- trieval (SIGIR), Seattle, W A , pages 37–40, 2006. Nir Kshetri. The economics of click fraud. IEEE Security & Privacy , 8(3):45–53, 2010. Da Kuang, Haesun Park, and Chris H. Q. Ding. Symmetric nonnegati ve matrix fac- torization for graph clustering. In Proceedings of the 12th SIAM International Confer ence on Data Mining (SDM), Anaheim, CA , pages 106–117, 2012. Martin Kulldorf f. A spatial scan statistic. Communications in Statistics: Theory and Methods , 26:1481–96, 1997. Mohit Kumar , Rayid Ghani, and Zhu-Song Mei. Data mining to predict and pre vent errors in health insurance claims processing. In Pr oceedings of the 16th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), W ashington, DC , pages 65–74. A CM, 2010. Michihiro Kuramochi and George Karypis. Frequent subgraph discovery . In Pr o- ceedings of the 2001 IEEE International Conference on Data Mining , Proceedings of the 1st IEEE International Conference on Data Mining (ICDM), San Jose, CA, pages 313–320, W ashington, DC, USA, 2001. IEEE Computer Society . Aleksandar Lazare vic and V ipin Kumar . Feature bagging for outlier detection. In Pr oceedings of the 11th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, IL , pages 157–166, 2005. Daniel D. Lee and H. Sebastian Seung. Algorithms for non-negativ e matrix fac- torization. In Pr oceedings of the 14th Annual Conference on Neural Information Pr ocessing Systems (NIPS), Den ver , CO , pages 556–562, 2000. Kyumin Lee, James Cav erlee, and Stev e W ebb . Uncovering Social Spammers: Social Honeypots + Machine Learning. In Pr oceedings of the 33r d International Con- fer ence on Researc h and Development in Information Retrieval (SIGIR), Geneva, Switzerland , pages 435–442, 2010. Matthijs Leeuwen and Arno Siebes. Streamkrimp: Detecting change in data streams. In Pr oceedings of the 2008 Eur opean Confer ence on Machine Learning and Knowledge Discovery in Databases - P art I , pages 672–687. Springer-V erlag, 2008. Graph-based Anomaly Detection and Description: A Survey 61 Jure Leskov ec, Jon Kleinberg, and Christos Faloutsos. Graphs over time: densifica- tion laws, shrinking diameters and possible explanations. In Pr oceedings of the eleventh A CM SIGKDD international confer ence on Knowledge discovery in data mining , Proceedings of the 11th A CM International Conference on Knowledge Discov ery and Data Mining (SIGKDD), Chicago, IL, pages 177–187. A CM, 2005. Jure Lesk ovec, Ke vin J. Lang, and Michael Mahoney . Empirical comparison of algo- rithms for netw ork community detection. In Pr oceedings of the 19th International Confer ence on W orld W ide W eb (WWW), Raleigh, NC , pages 631–640, Ne w Y ork, NY , USA, 2010. A CM. Geng Li, Murat Semerci, Bulent Y ener , and Mohammed J. Zaki. Graph classifica- tion via topological and label attributes. In Pr oceedings of the 9th International W orkshop on Mining and Learning with Graphs (MLG), San Diego, USA , Aug 2011a. Lei Li, Chieh-Jan Mike Liang, Jie Liu, Suman Nath, Andreas T erzis, and Christos Faloutsos. Thermocast: A cyber -physical forecasting model for data centers. In Pr oceedings of the 17th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), San Die go, CA . A CM, 2011b. Zhongmou Li, Hui Xiong, Y anchi Liu, and Aoying Zhou. Detecting blackhole and volcano patterns in directed networks. In Pr oceedings of the 10th IEEE Inter- national Confer ence on Data Mining (ICDM), Sydney , Austr alia , pages 294–303. IEEE Computer Society , 2010. David Liben-Nowell and Jon M. Kleinberg. The link prediction problem for social networks. In Pr oceedings of the 12th ACM Confer ence on Information and Knowl- edge Manag ement (CIKM), New Orleans, LA , pages 556–559, 2003. Giuseppe Lieto, Fabio Orsini, and Genovef fa Pagano. Cluster analysis for anomaly detection. volume 53 of Advances in Soft Computing , pages 163–169. Springer , 2008. Jessica Lin, Eamonn Keogh, Stefano Lonardi, and Bill Chiu. A symbolic representa- tion of time series, with implications for streaming algorithms. In Proceedings of the ACM SIGMOD W orkshop on Resear ch Issues in Data Mining and Knowledge Discovery (DMKD), San Die go, CA , pages 2–11. A CM, 2003. Bo Liu, Y anshan Xiao, Longbing Cao, Zhifeng Hao, and Feiqi Deng. Svdd-based outlier detection on uncertain data. Knowledge and Information Systems , 34(3): 597–618, 2013. Chao Liu, Xifeng Y an, Hwanjo Y u, Jiawei Han, and Philip S. Y u. Mining behav- ior graphs for ”backtrace” of noncrashing bugs. In Proceedings of the 5th SIAM International Confer ence on Data Mining (SDM), Newport Beach, CA , 2005. Qing Lu and Lise Getoor . Link-based classification. In Proceedings of the 20th International Confer ence on Machine Learning (ICML), W ashington, DC , 2003. Justin Ma, Lawrence K. Saul, Stefan Sav age, and Geoffre y M. V oelker . Beyond blacklists: learning to detect malicious web sites from suspicious urls. In Pr oceed- ings of the 15th A CM International Confer ence on Knowledge Disco very and Data Mining (SIGKDD), P aris, F rance , pages 1245–1254. A CM, 2009. Owen Macindoe and Whitman Richards. Graph comparison using fine structure analysis. In International Confer ence on Privacy , Security , Risk and T rust (So- cialCom/P ASSAT) , pages 193–200, 2010. 62 Leman Akoglu et al. Sofus Macskassy and F oster Pro vost. A simple relational classifier . In Pr oceedings of the KDD-W orkshop on Multi-Relational Data Mining (MRDM), W ashington, DC , pages 64–76, 2003. Dragos D. Margineantu, W eng-Keen W ong, and Den ver Dash. Machine learning algorithms for ev ent detection. Machine Learning , 79(3):257–259, 2010. Mary McGlohon, Stephen Bay , Markus G. Anderle, David M. Steier, and Christos Faloutsos. Snare: a link analytic system for graph labeling and risk detection. In Pr oceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), P aris, F rance , pages 1265–1274, 2009. Alberto Medina, Anukool Lakhina, Ibrahim Matta, and John W . Byers. BRITE: An approach to univ ersal topology generation. In Pr oceedings of the IEEE 9th In- ternational Symposium on Modeling, Analysis and Simulation of Computer and T elecommunication Systems . IEEE Computer Society , 2001. Serge y Melnik, Hector Garcia-Molina, and Erhard Rahm. Similarity flooding: A versatile graph matching algorithm and its application to schema matching. In Pr oceedings of the 18th International Conference on Data Engineering (ICDE), San Jose , CA , 2002. David J. Miller and John Browning. A mixture model and em-based algo- rithm for class discovery , robust classification, and outlier rejection in mixed la- beled/unlabeled data sets. IEEE T ransactions on P attern Analysis and Machine Intelligence , 25(11):1468–1483, 2003. Misael Mongiovi, Petko Bogdanov , Razvan Ranca, Ambuj K. Singh, Evangelos E. Papale xakis, and Christos Faloutsos. Netspot: Spotting significant anomalous re- gions on dynamic networks. In Pr oceedings of the 13th SIAM International Con- fer ence on Data Mining (SDM), T exas-A ustin, TX , 2013. Douglas C. Montgomery . Introduction to statistical quality control. 1997. Emmanuel M ¨ uller , Matthias Schiffer , and Thomas Seidl. Adaptiv e outlierness for subspace outlier ranking. In Pr oceedings of the 19th A CM Confer ence on Informa- tion and Knowledge Management (CIKM), T or onto, Canada , pages 1629–1632. A CM, 2010. Emmanuel M ¨ uller , Ira Assent, Patricia Iglesias, Yvonne M ¨ ulle, and Klemens B ¨ ohm. Outlier ranking via subspace analysis in multiple views of the data. In Pr oceed- ings of the 12th IEEE International Confer ence on Data Mining (ICDM), Brussels, Belgium , pages 529–538. IEEE Computer Society , 2012. Emmanuel M ¨ uller , Patricia Iglesias, Yvonne M ¨ ulle, and B ¨ ohm Klemens. Ranking outlier nodes in subspaces of attributed graphs. In Pr oceedings of the 4th Interna- tional W orkshop on Graph Data Management: T echniques and Applications , 2013. Joseph I. Naus. Approximations for distributions of scan statistics. Journal of the American Statistical Association , 77(377):pp. 177–183, 1982. Josh Neil. Scan Statistics for the Online Detection of Locally Anomalous Subgraphs . PhD thesis, Univ ersity of New Me xico, July 2011. Daniel B. Neill and W eng-Keen W ong. A tutorial on e vent detection. T utorial. A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), 2009. Jennifer Neville and David Jensen. Iterative classification in relational data. In Pr o- ceedings of the AAAI W orkshop on Learning Statistical Models fr om Relational Graph-based Anomaly Detection and Description: A Survey 63 Data , pages 13–20. AAAI Press, 2000. Jennifer Ne ville and David Jensen. Collecti ve classification with relational depen- dency networks. In Proceedings of the 9th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), W ashington, DC , 2003. Jennifer Neville, David Jensen, Lisa Friedland, and Michael Hay . Learning relational probability trees. In Pr oceedings of the 9th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), W ashington, DC , 2003. Jennifer Neville, Ozgur Simsek, David Jensen, John Komorosk e, Kelly Palmer , and Henry G. Goldberg. Using relational knowledge discovery to prev ent securities fraud. In Pr oceedings of the 11th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Chicago, IL , pages 449–458, 2005. Mark E. J. Ne wman. Detecting community structure in networks. Eur opean Physical Journal , B 38:321–330, 2004. Mark E. J. Ne wman. Modularity and community structure in networks. Pr oceedings of the National Academy of Sciences , 103(23):8577–8582, 2006. Mark E. J. Newman and Michelle Girvan. Finding and e valuating community struc- ture in networks. Physical Review E , 69(2):026113+, February 2004. Andrew Y . Ng, Michael I. Jordan, and Y air W eiss. On spectral clustering: Analysis and an algorithm. In Advances In Neural Information Pr ocessing Systems , pages 849–856. MIT Press, 2001. Vladimir Nikulin and T ian-Hsiang Huang. Unsupervised dimensionality reduction via gradient-based matrix factorization with two adaptiv e learning rates. J ournal of Machine Learning Resear ch - Pr oceedings T rack , 27:181–194, 2012. Caleb C. Noble and Diane J. Cook. Graph-based anomaly detection. In Pr oceed- ings of the 9th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), W ashington, DC , pages 631–636, 2003. Jae Dong Noh and Heiko Rieger . Random walks on complex networks. Physical Revie w Letters , 92:118701, March 2004. Alexandros Ntoulas, Marc Najork, Mark Manasse, and Dennis Fetterly . Detecting spam web pages through content analysis. In Pr oceedings of the W orld W ide W eb confer ence , pages 83–92, Edinburgh, Scotland, May 2006. Gustav o Henrique Orair, Carlos H. C. T eixeira, Y e W ang, W agner Meira Jr ., and Srini- vasan P arthasarathy . Distance-based outlier detection: Consolidation and renewed bearing. Proceedings of the VLDB Endowment , 3(2):1469–1480, 2010. Matthew Eric Otey , Amol Ghoting, and Sriniv asan Parthasarathy . Fast distributed outlier detection in mixed-attrib ute data sets. Data Mining and Knowledge Dis- covery , 12(2-3):203–228, 2006. Myle Ott, Y ejin Choi, Claire Cardie, and Jeffre y T . Hancock. Finding deceptive opinion spam by an y stretch of the imagination. In Pr oceedings of the 49th Annual Meeting of the Association for Computational Linguistics (ACL), P ortland, OR , pages 309–319, 2011. Myle Ott, Claire Cardie, and Jeffrey T . Hancock. Estimating the prev alence of de- ception in online revie w communities. In Pr oceedings of the 21st International Confer ence on W orld W ide W eb (WWW), L yon, F rance , pages 201–210. A CM, 2012. 64 Leman Akoglu et al. Shashank Pandit, Duen Horng Chau, Samuel W ang, and Christos Faloutsos. Net- probe: a fast and scalable system for fraud detection in online auction networks. In Pr oceedings of the 16th International Confer ence on W orld W ide W eb (WWW), Alberta, Canada , 2007. Panagiotis Papadimitriou, Ali Dasdan, and Hector Garcia-Molina. W eb graph simi- larity for anomaly detection. J ournal of Internet Services and Applications , 1(1): 1167, 2008. Spiros Papadimitriou and Jimeng Sun. Disco: Distributed co-clustering with map- reduce: A case study tow ards petabyte-scale end-to-end mining. In Pr oceedings of the 2008 Eighth IEEE International Conference on Data Mining , Proceedings of the 8th IEEE International Conference on Data Mining (ICDM), Pisa, Italy , pages 512–521. IEEE Computer Society , 2008. Spiros Papadimitriou, Hiroyuki Kitagawa, Phillip B. Gibbons, and Christos Falout- sos. Loci: Fast outlier detection using the local correlation integral. In Pr oceedings of the 19th International Confer ence on Data Engineering (ICDE), Bangalor e, In- dia , pages 315–326. IEEE Computer Society , 2003. Evangelos E. Papalexakis, Christos Faloutsos, and Nicholas D. Sidiropoulos. Par - cube: Sparse parallelizable tensor decompositions. In Pr oceedings of the Eur o- pean Confer ence on Mac hine Learning and Principles and Practice of Knowledge Discovery in Databases (ECML PKDD), Bristol, UK , pages 521–536, 2012. Eric J. Pauwels and Onkar Ambekar . One class classification for anomaly detection: Support vector data description revisited. In Pr oceedings of the 11th IEEE Inter- national Confer ence on Data Mining (ICDM), V ancouver , Canada , volume 6870, pages 25–39, 2011. Mitchell Peabody . Finding groups of graphs in databases. Master’ s thesis, Drexel Univ ersity , 2003. Karl Pearson. On lines and planes of closest fit to systems of points in space. Philo- sophical Magazine , 2(6):559–572, 1901. Leto Peel and Aaron Clauset. Detecting change points in the large-scale structure of ev olving networks. CoRR , abs/1403.0989, 2014. Marcello Pelillo. Replicator equations, maximal cliques, and graph isomorphism. Neural Computation , 11(8):1933–1955, 1999. Clifton Phua, Damminda Alahakoon, and V incent Lee. Minority report in fraud de- tection: classification of ske wed data. SIGKDD Explorations , 6(1):50–59, 2004. Clifton Phua, V incent C. S. Lee, Kate Smith-Miles, and Ross W . Gayler . A comprehensiv e surv ey of data mining-based fraud detection research. CoRR , abs/1009.6119, 2010. Brandon Pincombe. Anomaly detection in time series of graphs using arma processes. ASOR Bulletin, 2005. Carey E. Priebe, John M. Conroy , Da vid J. Marchette, and Y oungser Park. Scan statistics on enron graphs. Computational and Mathematical Or ganization Theory , 11(3):229–247, October 2005. ISSN 1381-298X. Niels Provos, Dean McNamee, Panayiotis Mavrommatis, K e W ang, and Nagendra Modadugu. The ghost in the browser: Analysis of web-based malware. In Pr o- ceedings of the 1st W orkshop on Hot T opics in Understanding Botnets (HotBots) , 2007. Graph-based Anomaly Detection and Description: A Survey 65 Richard J. Radke, Sriniv as Andra, Omar Al-Kofahi, and Badrinath Roysam. Image change detection algorithms: a systematic survey . IEEE T ransactions on Image Pr ocessing , 14(3):294–307, 2005. Md Sazzadur Rahman, Ting-Kai Huang, Harsha V . Madhyastha, and Michalis Falout- sos. Efficient and scalable socware detection in online social networks. In Pr o- ceedings of the 21st USENIX confer ence on Security symposium (Security) , pages 32–32. USENIX Association, 2012. Cartic Ramakrishnan, William Milnor , Matthew Perry , and Amit Sheth. Discover - ing informativ e connection subgraphs in multi-relational graphs. SIGKDD Explo- rations Special Issue on Link Mining , 2005. Jorma Rissanen. Hypothesis selection and testing by the MDL principle. Computer Journal , 42:260–269, 1999. Ryan A. Rossi, Brian Gallagher , Jennifer Ne ville, and Keith Henderson. Role- dynamics: fast mining of large dynamic networks. In Pr oceedings of the 21st International Conference on W orld W ide W eb (WWW), Lyon, F rance , WWW ’12 Companion, pages 997–1006. A CM, 2012. Ryan A. Rossi, Brian Gallagher , Jennifer Neville, and Keith Henderson. Modeling dynamic behavior in large ev olving graphs. In Proceeding of the 6th A CM Inter- national Conference on W eb Sear ch and Data Mining (WSDM) , pages 667–676, 2013. Ida Ruts and Peter J. Rousseeuw . Computing depth contours of biv ariate point clouds. Computational Statistics & Data Analysis , 23(1):153–168, Nov ember 1996. Vydunas Saltenis. Outlier detection based on the distribution of distances between data points. Informatica (Lithuanian Academy of Sciences) , 15(3):399–410, 2004. Erich Schubert, Arthur Zimek, and Hans-Peter Kriegel. Local outlier detection re- considered: a generalized view on locality with applications to spatial, video, and network outlier detection. Data Mining and Knowledge Discovery , 2012. Prithviraj Sen, Galileo Namata, Mustafa Bilgic, Lise Getoor, Brian Gallagher , and T ina Eliassi-Rad. Collectiv e classification in network data. AI Magazine , 29(3): 93–106, 2008. Jianbo Shi and Jitendra Malik. Normalized cuts and image segmentation. IEEE T ransactions on P attern Analysis and Machine Intelligence , 22:888–905, 1997. Peter Shoubridge, Miro Kraetzl, and David Ray . Detection of abnormal change in dynamic netw orks. In Information, Decision and Contr ol, 1999. IDC 99. Pr oceed- ings. 1999 , pages 557–562, 1999. Peter Shoubridge, Miro Kraetzl, W alter D. W allis, and Horst Bunke. Detection of abnormal change in a time series of graphs. Journal of Inter connection Networks , 3(1-2):85–101, 2002. K oen Smets and Jilles Vreeken. The Odd One Out: Identifying and characterising anomalies. In Pr oceedings of the 11th SIAM International Confer ence on Data Mining (SDM), Mesa, AZ , pages 804–815, 2011. Heli Sun, Jianbin Huang, Jia wei Han, Hongbo Deng, Peixiang Zhao, and Boqin Feng. gskeletonclu: Density-based network clustering via structure-connected tree divi- sion or agglomeration. In Pr oceedings of the 10th IEEE International Conference on Data Mining (ICDM), Sydney , Australia , pages 481–490. IEEE Computer So- ciety , 2010. 66 Leman Akoglu et al. Jimeng Sun, Huiming Qu, Deepayan Chakrabarti, and Christos Faloutsos. Neighbor- hood formation and anomaly detection in bipartite graphs. In Pr oceedings of the 5th IEEE International Confer ence on Data Mining (ICDM), Houston, TX , pages 418–425. IEEE Computer Society , 2005. Jimeng Sun, Dacheng T ao, and Christos Faloutsos. Beyond streams and graphs: dynamic tensor analysis. In Proceedings of the 12th A CM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, P A , pages 374–383, 2006. Jimeng Sun, Christos Faloutsos, Spiros Papadimitriou, and Philip S. Y u. Graphscope: parameter-free mining of large time-ev olving graphs. In Pr oceedings of the 13th A CM SIGKDD international conference on Knowledge discovery and data mining , Proceedings of the 13th A CM International Conference on Kno wledge Discovery and Data Mining (SIGKDD), San Jose, CA, pages 687–696. A CM, 2007a. Jimeng Sun, Y inglian Xie, Hui Zhang, and Christos Faloutsos. Less is more: Compact matrix decomposition for large sparse graphs. In Pr oceedings of the 7th SIAM International Confer ence on Data Mining (SDM), Minneapolis, MN , 2007b. Jimeng Sun, Y inglian Xie, Hui Zhang, and Christos Faloutsos. Less is more: Sparse graph mining with compact matrix decomposition. Statistical Analysis and Data Mining , 1(1):6–22, February 2008. ISSN 1932-1864. Michiaki T aniguchi, Michael Haft, Jaakko Hollmen, and V olker T resp. Fraud de- tection in communication networks using neural and probabilistic methods. In Acoustics, Speech and Signal Pr ocessing , volume 2, pages 1241 –1244 vol.2, may 1998. Chayant T antipathananandh and T anya Berger-W olf. Constant-factor approximation algorithms for identifying dynamic communities. In Pr oceedings of the 15th ACM International Conference on Knowledge Discovery and Data Mining (SIGKDD), P aris, F rance , pages 827–836. A CM, 2009. Chayant T antipathananandh and T anya Berger-W olf. Finding communities in dy- namic social networks. In Pr oceedings of the 11th IEEE International Confer ence on Data Mining (ICDM), V ancouver , Canada , pages 1236–1241. IEEE, 2011. Chayant T antipathananandh, T anya Berger-W olf, and David Kempe. A framew ork for community identification in dynamic social networks. In Pr oceedings of the 13th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), San Jose , CA , pages 717–726, New Y ork, NY , USA, 2007. ACM. Benjamin T askar , Pieter Abbeel, and Daphne K oller . Discriminativ e probabilistic models for relational data. In Proceedings of the 22nd International Confer ence on Uncertainty in Artificial Intelligence (U AI), Cambridge, MA , pages 485–492, 2002. Hanghang T ong and Christos Faloutsos. Center-piece subgraphs: problem definition and fast solutions. In Pr oceedings of the 12th A CM International Conference on Knowledge Discovery and Data Mining (SIGKDD), Philadelphia, P A , pages 404– 413, 2006. Hanghang T ong and Ching-Y ung Lin. Non-neg ati ve residual matrix factorization with application to graph anomaly detection. In Pr oceedings of the 11th SIAM International Conference on Data Mining (SDM), Mesa, AZ , pages 143–153, 2011. Graph-based Anomaly Detection and Description: A Survey 67 Hanghang T ong and Ching-Y ung Lin. Non-negati ve residual matrix factorization: problem definition, fast solutions, and applications. Statistical Analysis and Data Mining , 5(1):3–15, 2012. Hanghang T ong, Spiros Papadimitriou, Jimeng Sun, Philip S. Y u, and Christos Faloutsos. Colibri: fast mining of large static and dynamic graphs. In Proceed- ings of the 14th A CM International Confer ence on Knowledge Disco very and Data Mining (SIGKDD), Las V egas, NV , pages 686–694, 2008. Julian R. Ullmann. An algorithm for subgraph isomorphism. Journal of A CM , 23(1): 31–42, 1976. S. V . N. V ishwanathan, Nicol N. Schraudolph, Risi Imre K ondor, and Karsten M. Borgw ardt. Graph kernels. Journal of Machine Learning Resear ch , 11:1201–1242, 2010. Guan W ang, Sihong Xie, Bing Liu 0001, and Philip S. Y u. Re view graph based online store revie w spammer detection. In Pr oceedings of the 11th IEEE Interna- tional Confer ence on Data Mining (ICDM), V ancouver , Canada , pages 1242–1247, 2011a. Guan W ang, Sihong Xie, Bing Liu, and Philip S. Y u. Identify online store revie w spammers via social re view graph. ACM T ransactions on Intelligent Systems and T echnology , 3(4):61, 2012a. Lijun W ang, Manjeet Rege, Ming Dong, and Y ongsheng Ding. Lo w-rank kernel matrix factorization for large-scale e volutionary clustering. IEEE T ransactions on Knowledge and Data Engineering , 24(6):1036–1050, 2012b. Xiaochun W ang, Xiali W ang, and D. Mitchell W ilkes. A minimum spanning tree- inspired clustering-based outlier detection technique. In Pr oceedings of the 12th IEEE International Confer ence on Data Mining (ICDM), Brussels, Belgium , pages 209–223, 2012c. Y e W ang, Srini vasan Parthasarathy , and Shirish T atikonda. Locality sensitiv e outlier detection: A ranking dri ven approach. In Proceedings of the 27th International Confer ence on Data Engineering (ICDE), Hannover , Germany , pages 410–421, 2011b. Duncan J. W atts. Small W orlds , volume 19. Princeton Univ ersity Press, 1999. Duncan J. W atts and Stev en H. Strogatz. Collecti ve dynamics of ’small-world’ net- works. Natur e , 393(6684):440–442, June 1998. ISSN 00280836. Richard C. W ilson and Ping Zhu. A study of graph spectra for comparing graphs and trees. Journal of P attern Recognition , 41(9):2833–2841, 2008. W eng-Keen W ong, Andrew Moore, Gregory Cooper, and Michael W agner . What’ s strange about recent ev ents (wsare): An algorithm for the early detection of disease outbreaks. Journal of Machine Learning Resear ch , 6:1961–1998, December 2005. ISSN 1532-4435. Baoning W u, V inay Goel, and Brian D. Davison 0001. Propagating trust and distrust to demote web spam. In Pr oceedings of the W orkshop Models of T rust for the W eb (MTW) at the 15th International W orld W ide W eb Confer ence (WWW), Edinbur gh, Scotland , volume 190 of CEUR W orkshop Pr oceedings , 2006. Roung-Shiunn W u, Chin-Shyh Ou, Hui ying Lin, She-I Chang, and David C. Y en. Using data mining technique to enhance tax e vasion detection performance. Expert Systems with Applications , 39(10):8769–8777, 2012. 68 Leman Akoglu et al. Sihong Xie, Guan W ang, Shuyang Lin, and Philip S. Y u. Revie w spam detection via temporal pattern discovery . In Pr oceedings of the 18th ACM International Confer ence on Knowledge Discovery and Data Mining (SIGKDD), Beijing, China , pages 823–831, 2012. Xiaowei Xu, Nurcan Y uruk, Zhidan Feng, and Thomas A. J. Schweiger . Scan: a structural clustering algorithm for networks. In Proceedings of the 13th ACM In- ternational Confer ence on Knowledge Discovery and Data Mining (SIGKDD), San Jose , CA , pages 824–833. ACM, 2007. Jonathan S. Y edidia, W illiam T . Freeman, and Y air W eiss. Understanding belief prop- agation and its generalizations. In Exploring AI in the new millennium, pages 239–269. 2003. Laura Zager and George V erghese. Graph similarity scoring and matching. Applied Mathematics Letters , 21(1):86–94, 2008. Peixiang Zhao, Jiawei Han, and Y izhou Sun. P-rank: a comprehensiv e structural sim- ilarity measure ov er information networks. In Pr oceedings of the 18th A CM Con- fer ence on Information and Knowledge Manag ement (CIKM), Hong Kong , China , pages 553–562. A CM, 2009. Bonnie Zhu and Shankar Sastry . Revisit dynamic arima based anomaly detec- tion. In International Confer ence on Privacy , Security , Risk and T rust (Social- Com/P ASSAT) , pages 1263–1268, 2011. Arthur Zimek, Erich Schubert, and Hans-Peter Kriegel. A surve y on unsupervised outlier detection in high-dimensional numerical data. Statistical Analysis and Data Mining , 5(5):363–387, 2012. Arthur Zimek, Ricardo J.G.B. Campello, and J ¨ org Sander . Ensembles for unsu- pervised outlier detection: Challenges and research questions. a position paper . SIGKDD Explor . Newsl. , 15(1):11–22, 2014.
Original Paper
Loading high-quality paper...
Comments & Academic Discussion
Loading comments...
Leave a Comment