CID-GraphRAG: Enhancing Multi-Turn Dialogue Systems through Dual-Pathway Retrieval of Conversation Flow and Context Semantics
We present CID-GraphRAG (Conversational Intent-Driven Graph Retrieval-Augmented Generation), a novel framework that addresses the limitations of existing dialogue systems in maintaining both contextual coherence and goal-oriented progression in multi-turn customer service conversations. Unlike traditional RAG systems that rely solely on semantic similarity or static knowledge graphs, CID-GraphRAG constructs intent transition graphs from goal-achieved historical dialogues and implements a dual-retrieval mechanism that balances intent-based graph traversal with semantic search. This approach enables the system to simultaneously leverage both conversational intent flow patterns and contextual semantics, significantly improving retrieval quality and response quality. In extensive experiments on real-world customer service dialogues, we demonstrated that CID-GraphRAG significantly outperforms both semantic-based and intent-based baselines across automatic metrics, LLM-as-a-Judge evaluations and human evaluations, with relative gains of 11.4% in BLEU, 4.9% in ROUGE, and 5.9% in METEOR. Most notably, CID-GraphRAG achieves a 57.9% improvement in response quality according to LLM-as-a-Judge evaluations. These results demonstrate that integrating intent transition structures with semantic retrieval creates a synergistic effect that neither approach achieves independently, establishing CID-GraphRAG as an effective framework for real-world multi-turn dialogue systems in customer service and other knowledge-intensive domains.
💡 Research Summary
CID‑GraphRAG (Conversational Intent‑Driven Graph Retrieval‑Augmented Generation) tackles a fundamental challenge in multi‑turn, task‑oriented dialogue: simultaneously preserving contextual coherence while steering the conversation toward a predefined goal. Existing Retrieval‑Augmented Generation (RAG) approaches either rely solely on semantic similarity, which captures the immediate meaning of the dialogue but ignores the underlying flow of intents, or they employ static knowledge graphs that enable structured reasoning but lack any representation of conversational dynamics. CID‑GraphRAG bridges this gap by constructing an intent transition graph from historically successful customer‑service interactions and by integrating this graph with a semantic similarity search in a dual‑pathway retrieval mechanism.
Methodology
The system operates in two phases. In the construction phase, a large corpus of goal‑achieved dialogues is processed by a large language model (Claude 3.7 Sonnet) to assign hierarchical intents: a primary intent that captures the broad function of a turn (e.g., “Service Scheduling”) and a secondary intent that refines the action (e.g., “Appointment Proposal”). These intents become nodes in a directed graph together with intent‑pair nodes (assistant‑user intent combinations) and conversation‑example nodes that anchor real dialogue snippets. Edges encode hierarchical, pairing, transition, and anchoring relations, thereby forming a network that reflects historically successful conversational trajectories.
During inference, the current user turn is again classified into primary and secondary intents. An intent‑based path then forms an intent pair with the preceding assistant intent and retrieves candidate intents from the graph, scoring them by normalized co‑occurrence frequency f′. Simultaneously, a semantic path embeds the current dialogue history and all historical examples linked to the candidate intents, computing cosine similarity sim between the current and historical embeddings. A weighted sum S_i = α·f′ + (1‑α)·sim produces a final relevance score for each candidate example. The top‑k examples are fed to the LLM as few‑shot prompts together with the identified user intent and explicit generation instructions, ensuring that the generated response is grounded both in the structural pattern of successful intent transitions and in the semantic context of the current conversation.
Experimental Setup
The authors evaluate on a real‑world customer‑service dataset concerning vehicle‑sticker issues, comprising 268 dialogues (1,574 turns) with an average of 5.9 turns per dialogue. The data are split 80/10/10 for training, validation, and testing. Three evaluation modalities are employed: (1) automatic metrics (BLEU‑2/4, ROUGE‑1/2/L, METEOR, BERTScore), (2) LLM‑as‑a‑Judge using Claude 3.7 Sonnet to rate responses on relevance, helpfulness, style consistency, contextual appropriateness, and professionalism, and (3) human evaluation by five domain experts on the same five dimensions.
Baselines include (a) a direct LLM that generates without retrieval, (b) an intent‑only GraphRAG that retrieves examples purely by graph matching, and (c) a semantic‑only Conversation‑RAG that retrieves by embedding similarity. All systems use Claude 3.7 Sonnet as the underlying generator (temperature 0.0) with 5‑shot prompting for the retrieval‑augmented baselines.
Results
CID‑GraphRAG outperforms all baselines across the board. Automatic metrics show a 11.4 % gain in BLEU‑4, 4.9 % in ROUGE‑L, and 5.9 % in METEOR compared to the best baseline. In LLM‑as‑a‑Judge evaluations, CID‑GraphRAG achieves a 57.9 % higher win rate for response quality. Human judges also rate its outputs highest on average. Hyper‑parameter analysis reveals that a small intent weight (α = 0.1, i.e., 10 % intent, 90 % semantic) yields the best trade‑off; larger α values cause the system to over‑prioritize structural patterns at the expense of contextual relevance, confirming that intent information is a valuable but supplementary signal.
Insights and Limitations
The study demonstrates that (i) intent transition graphs provide a structural scaffold that disambiguates utterances with similar lexical content but different conversational functions, and (ii) semantic embeddings retain the fine‑grained contextual fit needed for high‑quality responses. The weighted combination allows the system to reap the synergistic benefits of both signals. Limitations include reliance on LLM‑driven intent labeling, which may inherit model biases, and the fact that the graph is built only from goal‑achieved dialogues, potentially limiting coverage of rare or novel intent sequences. Future work could explore more robust, possibly supervised intent classifiers, continual graph updates, and scaling the approach to multiple domains.
Conclusion
CID‑GraphRAG introduces a novel dual‑pathway retrieval framework that unifies structural intent flow with semantic similarity, achieving state‑of‑the‑art performance in multi‑turn, goal‑oriented dialogue generation. The results validate the hypothesis that integrating intent transition structures with semantic retrieval yields a synergistic effect unattainable by either component alone, paving the way for more coherent, goal‑driven conversational AI systems.
Comments & Academic Discussion
Loading comments...
Leave a Comment