Clustering attributed graphs: models, measures and methods
Clustering a graph, i.e., assigning its nodes to groups, is an important operation whose best known application is the discovery of communities in social networks. Graph clustering and community detection have traditionally focused on graphs without attributes, with the notable exception of edge weights. However, these models only provide a partial representation of real social systems, that are thus often described using node attributes, representing features of the actors, and edge attributes, representing different kinds of relationships among them. We refer to these models as attributed graphs. Consequently, existing graph clustering methods have been recently extended to deal with node and edge attributes. This article is a literature survey on this topic, organizing and presenting recent research results in a uniform way, characterizing the main existing clustering methods and highlighting their conceptual differences. We also cover the important topic of clustering evaluation and identify current open problems.
💡 Research Summary
The paper provides a comprehensive survey of clustering methods for attributed graphs, i.e., graphs whose nodes and/or edges carry additional information beyond simple connectivity. It begins by motivating the need for such models: traditional community detection focuses only on structure, whereas real‑world systems often involve node attributes (age, gender, interests) and multiple types of relationships (friendship, co‑work, family). The authors categorize the literature into two main families: edge‑attributed graph clustering and node‑attributed graph clustering, and within each family they further subdivide the approaches based on how the attribute information is incorporated.
Edge‑attributed graph clustering deals with multi‑layer or multiplex networks where each layer represents a distinct edge type. The survey outlines four principal strategies: (1) Flattening – collapsing all layers into a single weighted graph by counting the number of layers that connect each pair of nodes; this enables the reuse of classic modularity‑based algorithms but discards layer‑specific importance. (2) Modularity extensions – defining a modularity function for each layer and combining them (e.g., weighted sum, multi‑objective optimization) to preserve layer‑wise community structure; this approach is more expressive but introduces extra parameters and higher computational cost. (3) Clique‑finding – detecting fully connected subgraphs within individual layers and using their overlap to form multi‑layer clusters; it guarantees high homogeneity but suffers from scalability issues in sparse large networks. (4) Emerging clusters – handling dynamic addition or removal of layers and tracking how clusters evolve over time; useful for real‑time social media analysis yet reliant on heuristic change‑detection thresholds. The authors discuss the assumptions behind each method (e.g., shared community structure across layers) and highlight the trade‑offs between simplicity, fidelity, and scalability.
Node‑attributed graph clustering focuses on graphs where each vertex carries a feature vector. The survey groups the techniques into seven categories: (1) Data representation – concatenating adjacency and attribute matrices, or using tensor formulations, often preceded by normalization and dimensionality reduction. (2) Weight modification – adjusting edge weights according to attribute similarity (e.g., increasing weight for homophilic pairs). (3) Linear combination – forming a composite similarity measure that linearly blends structural distance and attribute distance, with a tunable balance parameter. (4) Walk‑based approaches – biasing random walks or label‑propagation processes toward attribute‑similar nodes, thereby influencing the stationary distribution used for clustering. (5) Statistical inference – employing probabilistic generative models (e.g., stochastic block models with covariates, mixed‑membership models) and estimating parameters via EM or variational Bayes; these capture complex attribute‑structure dependencies but are sensitive to model specification. (6) Subspace methods – identifying low‑dimensional subspaces of the attribute space that are most informative for community structure, then clustering within those subspaces; this is effective when only a subset of attributes drives community formation. (7) Other – recent deep‑learning embeddings (GCN, GraphSAGE) and hypergraph extensions are briefly mentioned.
The paper devotes a substantial section to practical aspects, especially evaluation. It distinguishes between quantitative metrics (NMI, ARI, precision/recall, modularity, attribute homogeneity scores) and qualitative assessments (visual inspection, domain‑expert validation, case‑study relevance). The authors stress that for attributed graphs a single metric rarely suffices; a composite evaluation that accounts for both structural cohesion and attribute consistency is essential. They also discuss applicability concerns such as data scale, attribute type (categorical vs. numeric), inter‑layer correlation, and algorithmic complexity, offering guidance on selecting methods based on these factors.
Finally, the survey identifies several open research problems: (i) developing models that capture non‑linear interactions among multiple attributes; (ii) extending clustering to dynamic, time‑evolving attributed graphs; (iii) designing scalable algorithms capable of handling millions of nodes and thousands of layers; (iv) establishing standardized benchmark datasets and unified evaluation frameworks that jointly consider structure and attributes; and (v) creating domain‑specific solutions for fields like healthcare, finance, and online social media where attribute semantics differ markedly.
Overall, the article serves as a valuable reference for researchers entering the field of attributed graph clustering, summarizing the state‑of‑the‑art, clarifying methodological differences, and outlining a roadmap for future advances.
Comments & Academic Discussion
Loading comments...
Leave a Comment