GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning

GDGB: A Benchmark for Generative Dynamic Text-Attributed Graph Learning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Dynamic Text-Attributed Graphs (DyTAGs), which intricately integrate structural, temporal, and textual attributes, are crucial for modeling complex real-world systems. However, most existing DyTAG datasets exhibit poor textual quality, which severely limits their utility for generative DyTAG tasks requiring semantically rich inputs. Additionally, prior work mainly focuses on discriminative tasks on DyTAGs, resulting in a lack of standardized task formulations and evaluation protocols tailored for DyTAG generation. To address these critical issues, we propose Generative DyTAG Benchmark (GDGB), which comprises eight meticulously curated DyTAG datasets with high-quality textual features for both nodes and edges, overcoming limitations of prior datasets. Building on GDGB, we define two novel DyTAG generation tasks: Transductive Dynamic Graph Generation (TDGG) and Inductive Dynamic Graph Generation (IDGG). TDGG transductively generates a target DyTAG based on the given source and destination node sets, while the more challenging IDGG introduces new node generation to inductively model the dynamic expansion of real-world graph data. To enable holistic evaluation, we design multifaceted metrics that assess the structural, temporal, and textual quality of the generated DyTAGs. We further propose GAG-General, an LLM-based multi-agent generative framework tailored for reproducible and robust benchmarking of DyTAG generation. Experimental results demonstrate that GDGB enables rigorous evaluation of TDGG and IDGG, with key insights revealing the critical interplay of structural and textual features in DyTAG generation. These findings establish GDGB as a foundational resource for advancing generative DyTAG research and unlocking further practical applications in DyTAG generation. The dataset and source code are available at https://github.com/Lucas-PJ/GDGB-ALGO.


💡 Research Summary

The paper addresses a critical gap in the emerging field of generative dynamic text‑attributed graph (DyTAG) learning: the lack of high‑quality, text‑rich benchmarks and standardized evaluation protocols. Existing dynamic graph datasets either omit textual attributes or provide only trivial, low‑semantic texts (e.g., usernames, email addresses), which hampers the development of generative models that need rich semantic inputs. To remedy this, the authors introduce GDGB (Generative DyTAG Benchmark), a collection of eight carefully curated DyTAG datasets spanning e‑commerce (Sephora), social networks (WeiboTech, WeiboDaily), Wikipedia‑based interactions (WikiRevision, WikiLife), movie collaborations (IMDB), and celebrity biographies. Each dataset supplies both node and edge texts that are substantially longer, more coherent, and higher‑rated by language‑model‑based quality metrics than those in the prior DTGB benchmark.

Based on GDGB, the authors define two novel generative tasks. Transductive Dynamic Graph Generation (TDGG) requires a model to reconstruct the temporal evolution of edges and their associated texts given a fixed set of source and destination nodes; all nodes are known beforehand, so the challenge lies in preserving temporal consistency and textual fidelity. Inductive Dynamic Graph Generation (IDGG) extends TDGG by allowing the emergence of new nodes during graph evolution, thereby modeling realistic scenarios where new users, items, or entities continuously appear. IDGG demands simultaneous node creation, edge formation, and text generation, making it a more demanding and practically relevant task.

To evaluate generated DyTAGs holistically, the authors propose a multi‑dimensional metric suite. Structural quality is measured via degree distribution and spectral Maximum Mean Discrepancy (MMD) between generated and ground‑truth graphs. Textual quality is assessed using perplexity and an LLM‑based rating (e.g., GPT‑4 scoring). A graph‑embedding metric (e.g., Graph2Vec or Node2Vec distance) captures overall semantic similarity at the graph level. This combination ensures that a model cannot excel in one aspect while neglecting the others, addressing a common shortcoming of prior work.

The paper also introduces GAG‑General, an LLM‑based multi‑agent generative framework that generalizes the earlier GAG system (originally designed for bipartite social graphs). In GAG‑General, separate agents specialize in node creation, edge linking, and text composition, sharing a common memory of temporal context via prompts. This architecture enables scalable, step‑wise generation of large dynamic graphs while maintaining coherence across modalities.

Experimental results compare GAG‑General against two recent feature‑supportive baselines: VRDAG (node‑text generation via variational autoencoders) and DG‑Gen (edge‑text generation via conditional distributions). On GDGB, incorporating high‑quality texts leads to significant reductions in structural MMD (15‑30% lower) and improvements in textual perplexity (over 20% better) for all models. Conversely, on the older DTGB datasets, the same textual inputs sometimes degrade performance, underscoring the importance of text quality. Ablation studies further reveal that the synergy between structural and textual signals is crucial: models that ignore text perform markedly worse on both structural and semantic metrics.

A case study on the Sephora dataset illustrates the practical impact. In TDGG, the model accurately reproduces the sequence of product reviews and timestamps for a fixed set of users and items. In IDGG, it successfully generates new user nodes, assigns realistic profile texts, and creates plausible review edges with coherent textual content, mirroring real‑world growth patterns in e‑commerce platforms.

In summary, the contributions are fourfold: (1) GDGB, the first generative DyTAG benchmark with eight high‑quality, text‑rich datasets; (2) two well‑defined generative tasks (TDGG and IDGG) that capture both transductive and inductive dynamics; (3) a comprehensive evaluation framework that jointly measures structural, temporal, and textual fidelity; and (4) GAG‑General, a versatile LLM‑driven multi‑agent system that sets strong baselines for both tasks. By providing data, tasks, metrics, and a reference implementation, the work establishes a solid foundation for future research in generative dynamic graph learning, with potential applications in recommendation systems, social media analysis, citation network simulation, and beyond.


Comments & Academic Discussion

Loading comments...

Leave a Comment