Bridging Academia and Industry: A Comprehensive Benchmark for Attributed Graph Clustering
Attributed Graph Clustering (AGC) is a fundamental unsupervised task that integrates structural topology and node attributes to uncover latent patterns in graph-structured data. Despite its significance in industrial applications such as fraud detection and user segmentation, a significant chasm persists between academic research and real-world deployment. Current evaluation protocols suffer from the small-scale, high-homophily citation datasets, non-scalable full-batch training paradigms, and a reliance on supervised metrics that fail to reflect performance in label-scarce environments. To bridge these gaps, we present PyAGC, a comprehensive, production-ready benchmark and library designed to stress-test AGC methods across diverse scales and structural properties. We unify existing methodologies into a modular Encode-Cluster-Optimize framework and, for the first time, provide memory-efficient, mini-batch implementations for a wide array of state-of-the-art AGC algorithms. Our benchmark curates 12 diverse datasets, ranging from 2.7K to 111M nodes, specifically incorporating industrial graphs with complex tabular features and low homophily. Furthermore, we advocate for a holistic evaluation protocol that mandates unsupervised structural metrics and efficiency profiling alongside traditional supervised metrics. Battle-tested in high-stakes industrial workflows at Ant Group, this benchmark offers the community a robust, reproducible, and scalable platform to advance AGC research towards realistic deployment. The code and resources are publicly available via GitHub (https://github.com/Cloudy1225/PyAGC), PyPI (https://pypi.org/project/pyagc), and Documentation (https://pyagc.readthedocs.io).
💡 Research Summary
The paper addresses a critical gap between academic research on Attributed Graph Clustering (AGC) and its deployment in real‑world industrial settings. While AGC has become a cornerstone for unsupervised tasks such as fraud ring detection, user segmentation, and community discovery, existing studies are limited to small, high‑homophily citation graphs, rely on full‑batch training that does not scale, and evaluate models primarily with supervised metrics (ACC, NMI, ARI) that are unsuitable when ground‑truth labels are scarce.
To bridge this divide, the authors introduce PyAGC, a production‑grade benchmark and library that systematically stresses AGC methods across a wide spectrum of graph sizes, structural properties, and feature modalities. The contributions are fourfold:
-
Diverse Data Atlas – Twelve datasets are curated, ranging from 2.7 K to 111 M nodes. In addition to classic academic benchmarks (Cora, CiteSeer, PubMed), the collection includes industrial graphs such as HM, Pokec, and WebTopic, which exhibit low homophily, heterogeneous tabular attributes, and massive scale. Detailed statistics (node count, edge density, attribute dimension, homophily scores) are provided to enable reproducible comparisons.
-
Scalable Mini‑Batch Implementations – The paper refactors a broad set of state‑of‑the‑art AGC algorithms (spectral filters, GNN‑based encoders, prototype‑based clustering, differentiable pooling, contrastive learning, etc.) into memory‑efficient mini‑batch versions. By formalizing the loss over sampled sub‑graphs, the authors demonstrate that models previously limited to a few hundred thousand nodes can now be trained on graphs with over 100 M nodes within two hours on a single 32 GB V100 GPU.
-
Encode‑Cluster‑Optimize (ECO) Framework – All AGC methods are unified under a three‑module taxonomy: Encode (E) for representation learning (parametric GNNs or non‑parametric spectral filters), Cluster (C) for projection (differentiable prototypes/pooling or discrete post‑hoc algorithms), and Optimize (O) for training strategy (decoupled pre‑training vs. joint end‑to‑end optimization). This modular view simplifies the design of new algorithms and ensures fair, interchangeable benchmarking.
-
Holistic Evaluation Protocol – Beyond traditional supervised scores, the benchmark mandates reporting of unsupervised structural quality metrics such as Modularity and Conductance, as well as comprehensive efficiency profiling (training time, inference latency, peak memory). This protocol reflects the practical needs of industry where labels are unavailable and operational constraints dominate.
The PyAGC library is released on GitHub, PyPI, and ReadTheDocs, featuring a clean API, interchangeable components, and extensive documentation. It has been battle‑tested in Ant Group’s high‑stakes workflows for fraud detection, anti‑money‑laundering, and user profiling, confirming that the benchmark addresses genuine scalability and robustness challenges.
In summary, the paper delivers a robust, reproducible, and scalable platform that aligns academic AGC research with the realities of industrial deployment. By expanding datasets, enabling mini‑batch training, providing a unified methodological taxonomy, and insisting on unsupervised quality and efficiency metrics, PyAGC sets a new standard for future work in attributed graph clustering.
Comments & Academic Discussion
Loading comments...
Leave a Comment