LLMTM: Benchmarking and Optimizing LLMs for Temporal Motif Analysis in Dynamic Graphs
📝 Abstract
The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and important local property of dynamic graphs which can directly reflect anomalies and unique phenomena, are essential for understanding their evolutionary dynamics and structural features. However, leveraging LLMs for temporal motif analysis on dynamic graphs remains relatively unexplored. In this paper, we systematically study LLM performance on temporal motif-related tasks. Specifically, we propose a comprehensive benchmark, LLMTM (Large Language Models in Temporal Motifs), which includes six tailored tasks across nine temporal motif types. We then conduct extensive experiments to analyze the impacts of different prompting techniques and LLMs (including nine models: openPangu-7B, the DeepSeek-R1-Distill-Qwen series, Qwen2.5-32B-Instruct, GPT-4o-mini, DeepSeek-R1, and o3) on model performance. Informed by our benchmark findings, we develop a tool-augmented LLM agent that leverages precisely engineered prompts to solve these tasks with high accuracy. Nevertheless, the high accuracy of the agent incurs a substantial cost. To address this trade-off, we propose a simple yet effective structure-aware dispatcher that considers both the dynamic graph’s structural properties and the LLM’s cognitive load to intelligently dispatch queries between the standard LLM prompting and the more powerful agent. Our experiments demonstrate that the structure-aware dispatcher effectively maintains high accuracy while reducing cost.
💡 Analysis
The widespread application of Large Language Models (LLMs) has motivated a growing interest in their capacity for processing dynamic graphs. Temporal motifs, as an elementary unit and important local property of dynamic graphs which can directly reflect anomalies and unique phenomena, are essential for understanding their evolutionary dynamics and structural features. However, leveraging LLMs for temporal motif analysis on dynamic graphs remains relatively unexplored. In this paper, we systematically study LLM performance on temporal motif-related tasks. Specifically, we propose a comprehensive benchmark, LLMTM (Large Language Models in Temporal Motifs), which includes six tailored tasks across nine temporal motif types. We then conduct extensive experiments to analyze the impacts of different prompting techniques and LLMs (including nine models: openPangu-7B, the DeepSeek-R1-Distill-Qwen series, Qwen2.5-32B-Instruct, GPT-4o-mini, DeepSeek-R1, and o3) on model performance. Informed by our benchmark findings, we develop a tool-augmented LLM agent that leverages precisely engineered prompts to solve these tasks with high accuracy. Nevertheless, the high accuracy of the agent incurs a substantial cost. To address this trade-off, we propose a simple yet effective structure-aware dispatcher that considers both the dynamic graph’s structural properties and the LLM’s cognitive load to intelligently dispatch queries between the standard LLM prompting and the more powerful agent. Our experiments demonstrate that the structure-aware dispatcher effectively maintains high accuracy while reducing cost.
📄 Content
The success of Large Language Models (LLMs) has motivated exploration into their capabilities on complex structured data, such as web data (Mao et al. 2024). A key frontier is the application of LLMs to dynamic graphs, aiming at capturing evolution patterns of temporal graphs. Recent works study the LLMs’ spatial-temporal understanding abilities on dynamic graphs, highlighting the immense potential of LLMs as a new paradigm for dynamic graph analysis. (Zhang et al. 2024;Huang et al. 2025) Temporal motifs, as elementary units reflecting important local properties of dynamic graphs (Paranjape et al. 2017;Liu et al. 2021), are typically defined as a set of nodes that interact in a specific temporal sequence within a short period of time. Therefore, temporal motifs play a critical role in revealing the functionality and characterizing the key features of dynamic graphs (Seshadhri et al. 2013;Jha et al. 2014;Pinar et al. 2016;Bressan et al. 2017;Jain and Seshadhri 2018). Thus, mining temporal motifs is essential for numerous real-world applications, such as fraud detection (Zhang et al. 2025), friendship prediction (Qiu et al. 2023), vendor identification (Liu et al. 2025), knowledge graph reasoning (Wang et al. 2024b(Wang et al. , 2023(Wang et al. , 2022;;Liu 2025;Liu and Shu 2025;Liu, Wang, and Tong 2025), among others.
Traditional temporal motif detection methods are typically designed for specific motifs and cannot handle diverse motifs in a unified manner (Cai et al. 2024). Deep learningbased approaches, which often require supervised training, perform poorly on this task (experiments in Appendix C.4). However, the capability of LLMs to solve temporal motifrelated problems in dynamic graphs remains underexplored. Different from other existing benchmarks (Table 1), this paper starts by exploring the following research question: RQ1: Can Large Language Models Solve Temporal Motif Problems on Dynamic Graphs?
Addressing this question is non-trivial and presents three key challenges:
• How to design a benchmark that can rigorously assess an LLM’s understanding and reasoning on temporal motifs? • How to generate dynamic graph datasets with a balanced distribution of positive and negative motif instances for fair evaluation? • How to formulate prompting scheme that can precisely instruct LLMs to understand and process the complex spatio-temporal characteristics of temporal motifs? The Benchmark. To address these challenges, we introduce LLMTM, a comprehensive benchmark for evaluating LLMs on temporal motif problems (Table 1). Unlike prior work that only considers incremental graph changes (Zhang et al. 2024), our benchmark uses a quadruplet representation, (u, v, t, op), to fully capture both edge appearance (add) and Table 1: A comparison of related benchmarks by their core research areas. Prior work has focused on LLMs for static graphs (LLMGP), general dynamic graph problems (LLM4DyG), or temporal motifs without considering LLMs (TNM). In contrast, our LLMTM benchmark is the first to specifically evaluate the capabilities of LLMs on temporal motifs, thereby filling a critical research gap.
disappearance (delete). It comprises six tailored tasks organized into two levels of increasing complexity: (1) singletemporal motif recognition, and (2) multi-temporal motif identification (Figure 1). Furthermore, by analyzing the relationship between temporal motifs frequency, time window size, and dynamic graph scale, we determined random dynamic graph generation settings that ensure our datasets are balanced. To comprehensively assess the capabilities of LLMs, we designed a well-defined prompting scheme (Appendix A.1) and conducted extensive experiments, evaluating four prompting techniques (including zero/one-shot prompting, zero/oneshot chain-of-thought prompting (Wei et al. 2023)) and nine LLMs (including closed-source o3, DeepSeek-R1, GPT-4o-mini and open-source openPangu-7B (Shi et al. 2025), DeepSeek-R1-Distill-Qwen-7B, 14B, 32B (DeepSeek-AI et al. 2025), Qwen2.5-32B-Instruct (Yang et al. 2024), QwQ-32B (Qwen et al. 2025)).
Our extensive experiments reveal a key limitation of LLMs. We observe that LLMs perform poorly on complex tasks such as “Motif Detection” and all Level 2 multi-motif tasks, primarily due to excessive cognitive load (The longcontext reasoning capability required for LLMs to extract dynamic graph and temporal motif from natural language) (Observations 3 & 5). This suggests that the reasoning depth of current LLMs remains shallow, and they are likely to fail on problems that require complex, multi-step reasoning.
Tool learning with large language models (LLMs) has emerged as a promising paradigm for augmenting the capabilities of LLMs to tackle highly complex problems (Qu et al. 2025). We further consider: RQ2: How can agentic capability help to solve temporal motif problems on dynamic graphs? A Tool-Augmented LLM Agent. Motivated by this, we design a tool-augmented LLM agent, which leverages
This content is AI-processed based on ArXiv data.