HGC-Herd: Efficient Heterogeneous Graph Condensation via Representative Node Herding

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Heterogeneous graph neural networks (HGNNs) have demonstrated strong capability in modeling complex semantics across multi-type nodes and relations. However, their scalability to large-scale graphs remains challenging due to structural redundancy and high-dimensional node features. Existing graph condensation approaches, such as GCond, are primarily developed for homogeneous graphs and rely on gradient matching, resulting in considerable computational, memory, and optimization overhead. We propose HGC-Herd, a training-free condensation framework that generates compact yet informative heterogeneous graphs while maintaining both semantic and structural fidelity. HGC-Herd integrates lightweight feature propagation to encode multi-hop relational context and employs a class-wise herding mechanism to identify representative nodes per class, producing balanced and discriminative subsets for downstream learning tasks. Extensive experiments on ACM, DBLP, and Freebase validate that HGC-Herd attains comparable or superior accuracy to full-graph training while markedly reducing both runtime and memory consumption. These results underscore its practical value for efficient and scalable heterogeneous graph representation learning.

💡 Research Summary

This paper introduces HGC-Herd, a novel and efficient framework for condensing heterogeneous graphs to address the scalability challenges of Heterogeneous Graph Neural Networks (HGNNs). HGNNs excel at modeling complex relationships in graphs with multiple node and edge types but suffer from high computational and memory overhead on large-scale graphs. Existing graph condensation methods, primarily designed for homogeneous graphs, rely on computationally expensive gradient-matching optimization.

HGC-Herd proposes a training-free alternative that generates a compact, informative synthetic graph while preserving semantic and structural fidelity. The framework operates in three key stages. First, Feature Propagation performs a one-time, lightweight aggregation of node features along pre-defined metapaths. This step enriches node representations with multi-hop relational context without the need for repeated aggregation during HGNN training. Second, Class-wise Prototype Construction computes the centroid (prototype) in the feature space for each class of the target node type. Third, and most crucially, Strategic Herding Selection is employed. For each class, nodes are greedily selected one by one such that the mean feature of the currently selected set best approximates the class prototype. This deterministic herding mechanism creates balanced and representative subsets (“herds”) for each class, effectively capturing the original data distribution.

The method is extensively evaluated on three heterogeneous graph benchmarks: ACM, DBLP, and Freebase, for node classification tasks. Results across various condensation ratios (from 1.2% to 9.6% of the original nodes) demonstrate that HGC-Herd significantly outperforms strong baselines including random sampling, K-Center selection, graph coarsening, and the gradient-based GCond method. Remarkably, with only 1.2% of the data, HGC-Herd achieves accuracy close to training on the full graph (e.g., 91.88% vs. 93.11% on ACM). Furthermore, the training-free design leads to substantial efficiency gains. HGC-Herd reduces graph condensation time by 4-6 times compared to gradient-based methods and significantly lowers the overall runtime and memory consumption for end-to-end HGNN training on the condensed graph. Ablation studies confirm the necessity of both feature propagation and the herding selection components.

In conclusion, HGC-Herd presents a practical, scalable, and effective solution for heterogeneous graph condensation. By eliminating costly bi-level optimization and leveraging prototype-guided herding, it enables efficient HGNN training with minimal performance loss, offering great potential for deploying sophisticated graph models in resource-constrained environments.

HGC-Herd: Efficient Heterogeneous Graph Condensation via Representative Node Herding

💡 Research Summary

Comments & Academic Discussion

Leave a Comment