TCM-DiffRAG: Personalized Syndrome Differentiation Reasoning Method for Traditional Chinese Medicine based on Knowledge Graph and Chain of Thought
Background: Retrieval augmented generation (RAG) technology can empower large language models (LLMs) to generate more accurate, professional, and timely responses without fine tuning. However, due to the complex reasoning processes and substantial individual differences involved in traditional Chinese medicine (TCM) clinical diagnosis and treatment, traditional RAG methods often exhibit poor performance in this domain. Objective: To address the limitations of conventional RAG approaches in TCM applications, this study aims to develop an improved RAG framework tailored to the characteristics of TCM reasoning. Methods: We developed TCM-DiffRAG, an innovative RAG framework that integrates knowledge graphs (KG) with chains of thought (CoT). TCM-DiffRAG was evaluated on three distinctive TCM test datasets. Results: The experimental results demonstrated that TCM-DiffRAG achieved significant performance improvements over native LLMs. For example, the qwen-plus model achieved scores of 0.927, 0.361, and 0.038, which were significantly enhanced to 0.952, 0.788, and 0.356 with TCM-DiffRAG. The improvements were even more pronounced for non-Chinese LLMs. Additionally, TCM-DiffRAG outperformed directly supervised fine-tuned (SFT) LLMs and other benchmark RAG methods. Conclusions: TCM-DiffRAG shows that integrating structured TCM knowledge graphs with Chain of Thought based reasoning substantially improves performance in individualized diagnostic tasks. The joint use of universal and personalized knowledge graphs enables effective alignment between general knowledge and clinical reasoning. These results highlight the potential of reasoning-aware RAG frameworks for advancing LLM applications in traditional Chinese medicine.
💡 Research Summary
**
The paper introduces TCM‑DiffRAG, a novel Retrieval‑Augmented Generation (RAG) framework specifically designed for the complex reasoning required in Traditional Chinese Medicine (TCM) diagnosis and treatment. Conventional RAG approaches, which typically perform a single round of document retrieval followed by generation, struggle in TCM because the discipline relies heavily on multi‑step logical inference, syndrome differentiation, and substantial inter‑school variability. To overcome these challenges, the authors combine two complementary techniques: a dual‑level knowledge graph (KG) and a Chain‑of‑Thought (CoT) reasoning model.
Knowledge Graph Construction
A “macro‑micro” KG is built from 580 classic TCM textbooks, case collections, and related literature. At the macro level, a document layout model extracts hierarchical headings (book → chapter → section) and constructs a tree‑like structure that serves as a semantic index. At the micro level, a large language model (LLM) extracts entities and relations from paragraph texts, producing subject‑predicate‑object triples. The macro headings act as hubs linking to the micro‑level triples, creating many‑to‑many mappings between documents and knowledge items. This approach sacrifices strict ontological rigor in favor of natural language indexing that aligns more closely with clinical text.
Personalized KG Transfer
Because different TCM schools and individual practitioners have distinct diagnostic and therapeutic preferences, a personalized KG is derived from the general KG using real clinical cases. The process begins by feeding question‑answer pairs from various schools into the Qwen2.5‑72B‑instruct model, prompting it to generate multi‑hop CoT reasoning chains. These chains are decomposed into structured triples (G_query). Vector embeddings (Alibaba Cloud text‑embedding‑v3) and cosine similarity retrieve the top‑k matching triples from the general KG (G_recall). The corresponding textbook passages (D_tuple) and the most relevant document snippets for the original question (D_snippets) are then collected.
CoT‑Driven Retrieval and Generation
The retrieved triples and texts are supplied as context to the same LLM, which generates a complete reasoning process from question to answer (C_i). New entities and relations extracted from this reasoning text are merged back into the KG, forming a personalized KG (G_personal) that captures school‑specific logic such as emphasis on certain syndromes or treatment formulas. This closed‑loop enables continual refinement of the KG as more cases are processed.
Experimental Evaluation
Three distinct TCM test sets—TCM‑MCQ, TCM‑SD, and Jingfang‑SD—are used to benchmark performance. The baseline Qwen‑plus model achieves scores of 0.927, 0.361, and 0.038 respectively. After applying TCM‑DiffRAG, scores rise dramatically to 0.952, 0.788, and 0.356. Improvements are even more pronounced for non‑Chinese LLMs, demonstrating the language‑agnostic benefit of the structured KG and CoT integration. TCM‑DiffRAG also outperforms directly supervised fine‑tuned (SFT) models and other RAG variants such as naive RAG, KG‑RAG, and reasoning‑based RAG.
Key Contributions
- Dual‑Level KG Construction – A systematic method for extracting hierarchical and relational knowledge from large corpora of TCM textbooks.
- Personalized KG Generation – Integration of real clinical cases to adapt the KG to specific schools or practitioners, addressing the “treat the same disease differently” phenomenon.
- CoT Decomposition Pipeline – Automatic conversion of multi‑hop reasoning chains into triples, enabling precise matching with KG elements and multi‑step retrieval.
- Benchmark Datasets – Creation of three evaluation sets that reflect different aspects of TCM reasoning, providing a standard for future research.
Limitations and Future Work
The approach relies on the accuracy of LLM‑based entity extraction; errors can propagate through both KG layers. High‑quality, diverse clinical case data are required to build robust personalized KGs, which may be scarce. Current retrieval uses only cosine similarity, potentially overlooking richer relational patterns. Future directions include refining the ontology for stricter semantic consistency, incorporating multimodal data (e.g., pulse waveforms, tongue images), and applying reinforcement learning to dynamically optimize retrieval strategies.
In summary, TCM‑DiffRAG demonstrates that coupling structured domain knowledge with chain‑of‑thought reasoning substantially enhances LLM performance on individualized diagnostic tasks in Traditional Chinese Medicine, offering a promising blueprint for knowledge‑aware RAG systems in other complex, knowledge‑intensive domains.
Comments & Academic Discussion
Loading comments...
Leave a Comment