Graphs in machine learning: an introduction
Graphs are commonly used to characterise interactions between objects of interest. Because they are based on a straightforward formalism, they are used in many scientific fields from computer science to historical sciences. In this paper, we give an introduction to some methods relying on graphs for learning. This includes both unsupervised and supervised methods. Unsupervised learning algorithms usually aim at visualising graphs in latent spaces and/or clustering the nodes. Both focus on extracting knowledge from graph topologies. While most existing techniques are only applicable to static graphs, where edges do not evolve through time, recent developments have shown that they could be extended to deal with evolving networks. In a supervised context, one generally aims at inferring labels or numerical values attached to nodes using both the graph and, when they are available, node characteristics. Balancing the two sources of information can be challenging, especially as they can disagree locally or globally. In both contexts, supervised and un-supervised, data can be relational (augmented with one or several global graphs) as described above, or graph valued. In this latter case, each object of interest is given as a full graph (possibly completed by other characteristics). In this context, natural tasks include graph clustering (as in producing clusters of graphs rather than clusters of nodes in a single graph), graph classification, etc. 1 Real networks One of the first practical studies on graphs can be dated back to the original work of Moreno [51] in the 30s. Since then, there has been a growing interest in graph analysis associated with strong developments in the modelling and the processing of these data. Graphs are now used in many scientific fields. In Biology [54, 2, 7], for instance, metabolic networks can describe pathways of biochemical reactions [41], while in social sciences networks are used to represent relation ties between actors [66, 56, 36, 34]. Other examples include powergrids [71] and the web [75]. Recently, networks have also been considered in other areas such as geography [22] and history [59, 39]. In machine learning, networks are seen as powerful tools to model problems in order to extract information from data and for prediction purposes. This is the object of this paper. For more complete surveys, we refer to [28, 62, 49, 45]. In this section, we introduce notations and highlight properties shared by most real networks. In Section 2, we then consider methods aiming at extracting information from a unique network. We will particularly focus on clustering methods where the goal is to find clusters of vertices. Finally, in Section 3, techniques that take a series of networks into account, where each network is
💡 Research Summary
The paper provides a comprehensive introductory survey of graph‑based methods in machine learning, covering both unsupervised and supervised settings as well as single‑graph and multiple‑graph scenarios. It begins with a historical overview, noting that graph analysis dates back to Moreno’s sociograms in the 1930s and has since spread to biology (metabolic pathways), social sciences (relationship networks), infrastructure (power grids), the Web, geography, and history. Basic graph terminology is clarified: a graph G = (V, E) may be directed, weighted, or labeled, and is typically represented by an N × N adjacency matrix X. Real‑world networks share three salient properties: sparsity (edges grow linearly with vertices), the presence of a giant connected component, and heterogeneous degree distributions together with the small‑world effect (short average path lengths).
Section 2 focuses on unsupervised learning from a single graph, primarily graph clustering and community detection. Two major families of approaches are examined. The first relies on modularity maximization: the Girvan–Newman modularity score Q = ∑ₖ(eₖₖ − aₖ²) quantifies the excess of intra‑community edges over a random null model. Greedy algorithms such as Louvain iteratively merge communities to reach a local optimum, while more sophisticated heuristics aim for better solutions despite the NP‑hard nature of the problem. Degree‑corrected modularity variants address the known resolution limit. The second family is the latent‑position cluster model (LPCM). Each vertex i is assigned a latent Euclidean coordinate Z_i; edge probabilities depend on distances in this space. By assuming that latent positions are drawn from a Gaussian mixture, the model simultaneously yields a low‑dimensional embedding (useful for visualization) and a clustering of vertices. Parameter inference can be performed via maximum likelihood, MCMC, or variational Bayesian methods.
The paper then moves beyond pure community structures to the stochastic block model (SBM) and its extensions. In SBM, each vertex belongs to one of K latent groups, and the probability of an edge between i and j is given by a K × K matrix Π that captures arbitrary inter‑group interaction patterns. Because the posterior over group assignments Z is conditionally dependent, exact EM cannot be applied; instead, variational EM, variational Bayes EM, and Gibbs sampling are employed. Model selection (choosing K) is tackled with variational approximations to BIC/AIC or non‑parametric Bayesian schemes such as the Dirichlet process. The authors discuss numerous SBM extensions: weighted edges, overlapping communities, covariates, and dynamic SBMs that incorporate temporal evolution via hidden Markov models, linear dynamical systems, or continuous‑time Poisson processes. These dynamic models allow vertices and/or edges to appear and disappear over time, capturing realistic network evolution.
Section 3 addresses the case where each data point is itself a graph (multiple‑graph setting). Applications include molecular graphs in chemistry and protein contact graphs in biology. Two complementary strategies are presented. Specialized algorithms adapt neural architectures to graph data: recursive neural networks and graph neural networks (GNNs) process vertices sequentially or via message‑passing, learning representations directly from graph topology. The second strategy builds graph‑level distances or kernels (e.g., graph edit distance, Weisfeiler–Lehman kernel, graphlet kernels) and then plugs them into conventional machine‑learning pipelines such as SVMs or kernel PCA. The authors note the computational difficulty of graph isomorphism (NP‑hard) and the need for approximate or heuristic similarity measures.
In the concluding remarks, the paper highlights current challenges: scalability to massive, evolving networks; integration of heterogeneous information (node attributes, timestamps, multi‑relational edges); robust model selection; and interpretability of learned representations. Future research directions include deep representation learning for graphs, large‑scale dynamic network inference, and multimodal learning that fuses graphs with text, images, or other data modalities. Overall, the survey positions graph‑based machine learning as a versatile framework applicable across static and dynamic, single‑graph and multi‑graph domains, summarizing key algorithms, theoretical foundations, and practical considerations.
Comments & Academic Discussion
Loading comments...
Leave a Comment