GIP-RAG: An Evidence-Grounded Retrieval-Augmented Framework for Interpretable Gene Interaction and Pathway Impact Analysis

GIP - RAG: An Evidence - Gro unded Retr ieval - Augmented Framework for Int erpretable Gene Interaction and Pathway Impact Analysis Fujian Jia 1 , Jiwen Gu 1 , Cheng Lu 2 , D ez hi Zhao 1 , Mengjiang Huang 3 , Yuanzhi Lu 4 , Xin Liu 1,§ , Kang Liu 1,§ 1, Kanghua Juntai Biotech Co. Ltd., Room 1504, Building 7, Dongwang Jingyuan, No. 768 Jingwang Road, Kunshan, Suzhou, Jiangsu Province, China 2, Department of Radiation Oncology, Cancer Treatment Center, The Second Affiliated Hospital of Hainan Medical University, Haikou, China 3, Department of Nutrition, University of California, Davis, USA 4, Department of Pathology, The First Affiliated Hospital of Jinan University, Tianhe Qu, Guangzhou, China. §, Correspondence : Xin Liu, xin_liu@kanghuajuntai.com Kang Liu, kellen0101@live.com Abstract Understanding mechanistic relationships among genes and their impacts on biological pathways is fundamental for elucidating disease mechanisms and advancing precision medicine. Although numerous public databases provide extensive information on molecular interactions and signaling pathways, effectively integrating heterogeneous knowledge sources and performing interpretable multi-step reasoning across biological networks remains challenging. Here, we present GIP-RAG (Gene Interaction Prediction through Retrieval-Augmented Generation), a computational framework that combines biomedical knowledge graphs with the reasoning capabilities of large language models (LLMs) to infer and interpret gene interactions. The framework first constructs a unified gene interaction knowledge graph by integrating curated interaction data from multiple public resources, including KEGG, WikiPathways, SIGNOR, Pathway Commons, and PubChem. Given user-specified genes, a query-driven subgraph retrieval module dynamically extracts relevant evidence from the knowledge graph. The retrieved subgraphs are incorporated into a structured prompting strategy that guides LLM-based stepwise reasoning to identify direct and indirect regulatory relationships and generate mechanistic explanations supported by biological evidence. Beyond pairwise interaction inference, this network further introduces a pathway-level functional impact assessment module that simulates the propagation of gene perturbations through signaling networks and evaluates potential pathway state alterations. Evaluation across multiple biological scenarios demonstrates that the framework produces consistent, interpretable, and evidence-supported insights into gene regulatory relationships. Overall, GIP-RAG provides a general and interpretable paradigm for integrating biological knowledge graphs with retrieval-augmented large language models to facilitate mechanistic reasoning in complex molecular networks. Introduction In recent years, the systematic analysis of gene interaction networks has attracted substantial attention in biomedical research [1–3]. Such networks constitute a fundamental basis for understanding the regulatory mechanisms of complex biological systems, elucidating disease pathogenesis, and guiding precision medicine. Although public databases such as KEGG [4], WikiPathways [5], SIGNOR [6], Pathway Commons [7], and PubChem [8] provide abundant information on molecular interactions and biological pathways, efficiently integrating multi- source, heterogeneous data and performing mechanistic inference between genes while preserving interpretability remains a major challenge in computational biology. Existing pathway- or network-based methods for gene interaction inference typically rely on graph-theoretic analyses, path-search algorithms, or statistical association models, which can to some extent reveal direct or local regulatory relationships [9,10]. However, their flexibility and interpretability are limited when addressing multi-hop, cross-pathway, and context-dependent indirect regulatory relationships [11–12]. Meanwhile, with the rapid development of large language models (LLMs) and their remarkable reasoning capabilities demonstrated in natural language processing tasks, their potential applications in biomedical knowledge integration and complex relational inference have increasingly drawn attention[13-15]. Nevertheless, inference based solely on the generative capacity of LLMs often suffers from incomplete knowledge coverage and hallucination effects, lacking stringent biological evidence constraints, which limits their reliability for high-confidence mechanistic inference[16]. To address these challenges, we propose a retrieval-augmented generation (RAG)-based gene interaction reasoning framework, termed GIP-RAG (Gene Interaction Prediction through RAG). This framework systematically integrates high-confidence biological knowledge from multiple public databases to construct a unified gene interaction knowledge graph. A query-driven subgraph retrieval strategy is then employed to explicitly introduce structured evidence into the LLM reasoning process. On this basis, carefully designed structured prompt engineering enables evidence-grounded, step-by-step reasoning and mechanistic interpretation. Furthermore, GIP- RAG is extended to pathway-level functional impact assessment, simulating the propagation of gene perturbations through signaling networks and characterizing potential pathway state alterations from a systems biology perspective. In summary, the main contributions of this study are as follows: 1. Construction of a high-quality, multi-source gene interaction knowledge graph, together with subgraph retrieval and structured reasoning interfaces, providing a reliable evidence foundation for mechanistic gene–gene interaction inference; 2. Proposal of the GIP-RAG framework, which combines RAG techniques with LLMs to achieve evidence-based and interpretable gene interaction reasoning; 3. Development of a pathway-level functional impact assessment module that simulates signaling network reconfiguration under gene perturbation hypotheses, offering a practical tool for systems biology research. Overall, this framework provides a general and interpretable technical paradigm for integrating biological knowledge graphs with the reasoning capabilities of large language models, and offers potential value for precision medicine, disease mechanism analysis, and novel therapeutic target discovery. Methods We propose a computational framework termed GIP-RAG (Gene Interaction Prediction through Retrieval-Augmented Generation), which integrates multi-source biomedical knowledge graphs with the advanced reasoning capabilities of large language models (LLMs) to infer and explain potential mechanistic interactions among user-specified genes. The core strength of GIP-RAG lies in its tight coupling of precise retrieval from structured biological knowledge with semantic and logical reasoning by generative models, enabling interpretable inference of direct or indirect gene–gene relationships supported by existing biological evidence. 1. Overall Framework Design The GIP-RAG framework consists of four major stages: l Multi-source data integration: Extraction and normalization of biological pathway and chemical interaction data from five public databases. l Biomedical knowledge graph construction: Transformation of standardized interaction data into a unified knowledge graph repository. l Query-driven subgraph retrieval: Dynamic retrieval of evidence subgraphs relevant to the queried genes. l LLM -based structured reasoning and synthesis: Conversion of retrieved evidence into biologically coherent mechanistic explanations through prompt engineering. 2. Data Sources and Knowledge Integration To comprehensively capture gene interaction relationships across multiple biological layers, we systematically collected pathway and interaction data from the following widely used, community- curated or expert-reviewed public databases: • KEGG: Provides manually curated molecular interaction and bioche mical re action net works; • WikiPathways: A community-maintained resource covering diverse biological pathways; • SIGNOR: A signaling database emphasizing causal and directional regulatory relationships; • Pathway Commons: An integrated collection of pathway and molecular interaction data aggregated from multiple authoritative sources; • PubChem: Contains gene–compound interactions and associated bioactivity annotations. Each database was independently parsed to extract both pathway-level and molecular-level interaction information. Particular emphasis was placed on interactions with explicit directionality and mechanistic annotations, including but not limited to activation, inhibition, binding, phosphorylation, and complex formation. 3. Data Standardization and Knowledge Graph Construction To enable seamless integration of heterogeneous data sources, all extracted interactions underwent a unified standardization and harmonization pipeline, as described below. 3.1. Format Standardization All raw interaction records were converted into a standardized triplet format (source entity, interaction type, target entity).Each record additionally retained rich metadata, including source entity, target entity, interaction type, pathway context, originating database, and evidence annotations. Only high-confidence interactions supported by manual curation or experimental evidence were retained. 3.2. Gene Identifier Harmonization All gene names were mapped to HGNC-approved official gene symbols, ensuring consistent gene representation across databases. 3.3. Semantic Normalization Heterogeneous interaction descriptors were mapped to a controlled vocabulary to ensure semantic consistency across sources (e.g., “activates” and “positively regulates” were unified under the term “activation”). 3.4. Confidence Assessment Each interaction was assigned a composite confidence score based on: • The number of independent databases supporting the interaction; • The evidence level of the original source (manual curation, literature support, or inference); • The completeness of directionality and mechanistic annotation. Only interactions exc eeding a predefined confidence threshold were included in downstream knowledge graph construction. 3.5. Gene Interaction Knowledge Graph Repository The standardize d interactions were stored in a graph database (GraphStore), forming a biomedical knowledge graph composed of multiple node and edge types. The schema is defined as follows: • Node types: – Gene nodes (represented by HGNC symbols), – Pathway nodes, – Compound nodes. • Edge types: – Gene–gene regulatory edges, – Gene–pathway association edges, – Gene–compound biochemical interaction edges. Each edge is enriched with attributes including interaction type, source database, confidence score, and literature references. The graph structure is designed to support efficient graph traversal and subgraph extraction. 4. Query Processing and Subgraph Retrieval Given an input list of query genes, the system performs the following steps to retrieve relevant evidence: 1. Gene identifier validation: Input genes are standardized and validated using the same harmonization pipeline employed during graph construction. 2. Dynamic subgraph traversal: The subgraph retrieval module performs depth-first or breadth- first searches to dynamically extract gene-centered evidence subgraphs. The traversal depth ( D ) is adaptively controlled to balance biological relevance and computational complexity. Retrieved subgraphs include direct interactions, shared regulators, and indirect ass ociations mediated through pathway nodes. 5. Structured Reasoning and Explanation Generation Based on Knowledge Subgraphs Retrieved subgraphs are provided as external knowledge inputs to the LLM, and a multi-stage prompt engineering strategy is employed to guide evidence-based logical reasoning. 5.1. Structured Design of Reasoning Prompts Prompt templates follow a hierarchical structure to ensure clarity of instruction and consistency of input: • Role and task definition: The LLM is explicitly instructed to act as a molecular biology expert tasked with inferring functional gene relationships based on provided interaction evidence. • Structured contextual input: Subgraph information is supplied in JSON format, including query genes, edge lists (with interaction types and data sources), and aggregated confidence metadata. • Chain-of-thought reasoning instructions: A step-by-step reasoning strategy is enforced, comprising: 1. Path identification: Identification of all direct and multi-hop paths connecting the query genes; 2. Evidence evaluation: Integration of interaction types, database authority, multi- source consistency, and functional module coherence; 3. Synthesis and ranking: Aggregation of high-confidence paths to infer regulatory directionality or functional association, followed by ranking according to evidence strength. • Output specifications: The LLM is required to generate structured yet natural-language explanations that explicitly include core conclusions, supporting mechanistic descriptions (with cited evidence sources), qualitative assessments of evidence strength, and stated limitations. 5.2. Pathway Function Impact Assessment: Mechanistic Evaluation Based on Pathway Perturbation Building upon inferred gene relationships, we further developed a pathway-level functional impact assessment module to evaluate the potential effects of gene perturbations (e.g., overexpression, knockdown, or mutation) on overall pathway states. 1. Pathway-context-enhanced retrieval: Expanded subgraph extraction is performed with particular emphasis on key input/output nodes, regulatory hubs, feedback loops, and pathway cross-talk points. 2. Prompt framework for functional impact reasoning: A system-biology-oriented prompt is designed as follows: – Input: Enhanced pathway subgraphs combined with hypothetical gene perturbation scenarios (e.g., “loss of function of gene A”). – Reasoning steps: a. Local impact simulation: Simulation of signal propagation originating from the perturbed node; b. Network-level impact assessment: Identification of downstream functional modules most strongly affected; c. Mechanistic analysis: Explicit consideration of compensatory mechanisms, redundant pathways, and feedback failures to enable fine-grained impact evaluation. – Output: A structured report describing altered core pathway states, mechanistic cascades, system-level adaptive responses, and a high-level summary of the resulting global signal network reconfiguration pattern. 6. RAG-LLM System Implementation and Deployment To enable scalable retrieval-augmented reasoning over the constructed biomedical knowledge graph, we implemented a modular RAG -LLM infrastructure that integrates graph retrieval, semantic indexing, and large language model inference within a unified computational pipeline. 6.1 Component #1 Knowledge Graph Storage and Query Layer The standardized interaction data were stored in a graph database environment to enable efficient traversal and subgraph extraction. The knowledge graph was implemented using Neo4j, which supports property graph models and high-performance graph queries. Nodes represent biological entities (genes, pathways, compounds), and edges represent directional biological interactions annotated with interaction type, evidence source, and confidence scores. Graph queries were executed through Cypher query language, allowing dynamic extraction of gene-centered subgraphs with adjustable traversal depth during runtime. The retrieved graph structures were serialized into structured JSON objects to serve as contextual inputs for downstream RAG reasoning. 6.2 Component #2 Semantic Retrieval and Indexing Module To support efficient contextual retrieva l, the graph-derived textual and structured knowledge units were indexed using an embedding-based semantic retrieval system. Biological interaction records and pathway descriptions were first transformed into textual evidence blocks. These blocks were embedded using a transformer-based sentence embedding model (Bio-domain optimized embedding models such as BioBERT or general embedding models depending on deployment constraints). The embeddings were stored in a vector database (FAISS), enabling fast approximate nearest-neighbor search for semantically relevant biological evidence. During query execution, user-specified genes were used to generate retrieval queries that combined: graph-based subgraph extraction and embedding-based semantic similarity search. The retrieved evidence blocks were then aggregated and passed to the reasoning stage. 6.3 Component #3 RAG Orchestration Pipeline The retrieval-augmented generation pipeline was implemented using a modular orchestration framework (LangChain). This pipeline coordinates the following steps: 1. Query preprocessing : Gene identifiers provided by users are normalized to HGNC-approved gene symbols. 2. Graph-based evidence retrieval : The graph traversal module retrieves candidate interaction paths and pathway contexts. 3. Evidence ranking and filtering : Retrieved interactions are ranked according to confidence scores, database authority, and redundancy across multiple sources. 4. Context assembly : High-confidence interaction paths and supporting metadata are converted into structured JSON prompts. 5. LLM reasoning and synthesis : The assembled evidence context is fed into the LLM together with structured reasoning instructions. This design ensures that the LLM operates under explicit evidence constraints rather than relying on implicit knowledge contained within model parameters. 6.4 Component #4 Large Language Model Reasoning Engine The reasoning engine was implemented using a large language model deployed through a scalable inference environment. Depending on computational infrastructure, the model can be hosted through cloud-based APIs or locally deployed model servers. The LLM was configured with controlled generation parameters to balance reasoning depth and output stability, including temperature control, token limits, and deterministic decoding settings when necessary. The entire pipeline was implemented in Python, integrating widely adopted libraries for graph processing, vector retrieval, and LLM orchestration. The modular architecture enables flexible replacement of individual components (e.g., embedding models or LLMs) without altering the overall framework. This implementation allows GIP-RA G to efficiently integrate structured biological knowledge with the reasoning capabilities of modern language models, providing an extensible infrastructure for interpretable gene interaction inference. Results 1. Overview of the RAG-Driven Gene Interaction Inference Framework W e first constructed and validated a gene interaction inference framework based on Retrieval- Augmented Generation (RAG), designed to infer potential regulatory relationships between user- specified genes under the support of multiple public biological databases. The framework integrates manually curated pathway and molecular interaction information from KEGG, W ikiPathways, SIGNOR, Pathway Commons, and PubChem. Through a unified knowledge representation and retrieval strategy , it enables efficient utilization of heterogeneous biological knowledge. Across the overall test set, the system consistently returned structured inference results that include interaction types, upstream–downstream directionality , biological pathway context, and supporting literature evidence. Compared with generative models that rely on a single database or lack retrieval augmentation, the proposed framework demonstrated clear advantages in inference consistency , interpretability , and result completeness, providing a robust foundation for the analysis of complex gene regulatory networks. Figure 1. Schematic overview of the RAG-driven gene int eraction inference framework. 2. Multi-Level Validation of Gene Interaction Type and Directionality Inference Based on the query processing and subgraph retrieval as well as structured reasoning procedures described in the Methods s ection, we systematically evaluated the framework’ s ability to infer gene interactions across multiple levels of biological complexity . The evaluation design strictly corresponded to four representative biological scenarios outlined in the Methods, enabling a comprehensive assessment of model performance under varying evidence conditions. 2.1 Inference of Direct Regulatory Relationships Within the Same Pathway For gene pairs located within the same pathway and linked by well-defined direct regulatory relationships (e.g., phosphorylation or transcriptional activation), GIP-RAG accurately retrieved edges representing these direct interactions along with their complete attributes. The reasoning module then generated explanations that explicitly described the biochemical mechanism, regulatory direction, and original database references. In all tested cases, such interactions were correctly inferred, and the inferred regulatory directions were fully consistent with source database annotations. These results confirm the framework’ s core capability to retrieve high-confidence direct evidence and produce precise mechanistic interpretations. Figure 2. Representative results of direct regulatory relationship inference within the same pathway . 2.2 Inference of Indirect Regulatory Relationships Within the Same Pathway For gene pairs residing within the same pathway but functionally connected through one or more intermediate nodes, the framework successfully extracted multi-hop paths connecting the genes via dynamic subgraph traversal with an adjustable depth parameter (D). The path identification and evidence evaluation steps embedded in the reasoning prompts guided the LLM to integrate these indirect paths into a coherent mechanistic narrative (e.g., “Gene A indirectly activates Gene C by inhibiting an intermediate suppressor , Gene B”). The results demonstrate that the framework can effectively identify and explain indirect regulatory relationships mediated by shared regulators or signaling cascades, thereby overcoming the limitations of approaches focused solely on dire ct interactions. Figure 3. Examples of indirect gene i nteraction inference based on pathway context. 2.3 Cross-Pathway Gene Interaction Inference T o evaluate the framework’ s ability to integrate cross-pathway knowledge, we tested gene pairs belonging to dif ferent biological pathways but potentially involved in functional crosstalk. The query-driven subgraph retrieval module successfully constructed cross-pathway subgraphs by leveraging pathway nodes or shared compound nodes. Based on these subgraphs, the reasoning module inferred plausible regulatory relationships (e.g., a gene product in one pathway acting as a signaling molecule influencing another pathway) and explicitly identified key hub nodes that bridge the two pathways. This capability highlights the potential of GIP-RAG to discover functional associations that extend beyond individual pathway boundaries through integrated knowledge graph reasoning. Figure 4. Inference results for gene interactions across distinct biological pathways. 2.4 System-Level Network Inference with Multi-Gene Inputs Finally , we evaluated the framework’ s performance with multiple (>2) input genes. The system first performed dynamic subgraph traversal to retrieve an interaction network encompassing all input genes. The reasoning module then conducted not only pairwise inference of potential relationships between genes but, more importantly , system-level integrative analysis, identifying core regulatory hubs, potential functional modules, and dominant signal flow directions within the network. Thi s capability provides a powerful computational tool for investigating the coordinated behavior of gene sets (e.g., differentially expressed gene sets) under specific physiological or pathological conditions. Figure 5. System-level regulatory network infe rence with multi-gene inputs. 3. Pathway-Level Functional Impact Assessment Results Using the pathway functional impact assessment module developed in this study , we simulated and evaluated the cascading ef fects of functional perturbations in specific genes (e.g., knockdown, overexpression, or gain-of-function mutations) on associated pathway networks. Given a hypothetical gene perturbation (e.g., “loss of activity of Gene A”), the module first performed pathway context–enhanced retrieval to obtain an extended subgraph containing key input/output nodes and feedback loops. Subsequently , guided by the functional impact reasoning prompt framework, the LLM simulated signal propagation from the perturbed node along the edges of the subgraph. The results show that the framework can clearly identify direct first-order downstream tar gets and correctly infer the direction of activity changes (upregulation or downregulation) based on interaction types (activation or inhibition). Beyond local effects, the framework further assessed the global consequences of perturbations across the network. Specifically , it successfully identified significantl y affected downstream functional modules (e.g., cell cycle or apoptosis), potential compensatory or redundant pathways that may be activated to maintain network stability when primary pathways are disrupted, and alterations in feedback loops that could lead to system oscillations or steady-state imbalance. The inference outputs were presented in the form of structured reports, detailing affecte d core pathway states, mechanistic cascade events, and ultimately summarizing the global network reconfiguration patterns (e.g., an overall shift from pro-survival signaling to pro-apoptotic signaling). Figure 6. Representative example of pathway-level functional impact assessment induced by gene perturbation Discussion In this study, we propose GIP-RAG, a gene interaction inference framework based on retrieval- augmented generation (RAG), which aims to enable interpretable inference of potential mechanistic relationships between genes through the systematic integration of multi-source biomedical knowledge graphs and the reasoning capabilities of large language models (LLMs). Compared with traditional gene regulatory network inference methods that rely on statistical correlations or black-box models, this framework emphasizes logic-based reasoning constrained by existing biological evidence, thereby partially addressing the long-standing challenges of limited interpretability and verifiability in inferred gene interactions. The introduction of the RAG architecture represents a core design element of this work. Its key advantage lies in explicitly incorporating external structured knowledge into the reasoning process of language models, which effectively mitigates the “hallucination” issues commonly observed in purely generative models. Within this framework, the language model does not perform inference in isolation but instead conducts stepwise reasoning under the constraints imposed by retrieved high-confidence subgraph evidence. Moreover, by explicitly constraining the reasoning steps through structured prompt engineering, the model is guided to perform mechanistic analysis in a predefined logical sequence rather than merely producing declarative conclusions. We further extend gene interaction inference to pathway -level functional impact assessment by introducing a pathway impact evaluation module that analyzes the potential effects of gene functional perturbations on global signaling networks. This module provides a systems-level mechanistic perspective by simulating signal propagation, identifying key regulatory nodes, and characterizing potential compensatory pathways. As a result, the framework is capable not only of addressing whether “gene A influences gene B,” but also of inferring through which pathways such effects may occur and how they may directionally alter overall pathway states. This design facilitates a network-level understanding of regulatory reprogramming in complex biological processes, particularly in scenarios characterized by multi-gene coordination and extensive pathway crosstalk. In addition to its methodological contributions, the GIP-RAG framework may provide practical value in clin ic practice especially in precision oncology and multidisciplinary tumor board (MDT) decision-making. Modern cancer treatment incre asingly relies on molecular profiling, where multiple genetic alterations are simultaneously detected through next-generation sequencing. However, interpreting the functional relationships among these alterations and understanding their combined impact on signaling pathways remains a major challenge for clinicians. Individual gene mutations rarely act in isolation; instead, they function within complex regulatory networks involving pathway crosstalk, feedback regulation, and compensatory signaling mechanisms. In this context, systematic pathway-level interpretation of gene interactions is particularly important for MDT discussions. By integrating knowledge from multiple curated pathway databases and performing evidence-grounded reasoning over gene interaction networks, the proposed framework enables structured interpretation of complex multi-gene relationships. Such analyses may help identify key regulatory hubs, clarify whether multiple mutations converge on the same signaling pathway, and distinguish driver alterations from secondary or context- dependent events. Furthermore, pathway-aware reasoning may support the rational design of combination targeted therapies. Many targeted drugs act on different nodes within the same signaling cascade or across interconnected pathways. Understanding how gene perturbations propagate through signaling networks can provide insights into potential synergistic drug combinations or mechanisms of resistance. Therefore, frameworks such as GIP-RAG that integrate biological knowledge graphs with interpretable reasoning may serve as a useful computational aid for MDT teams when evaluating complex genomic profiles and exploring mechanistically informed therapeutic strategies. Despite these strengths, several limitations of the present study should be acknowledged. First, the inference capability of GIP-RAG is inherently dependent on the coverage and annotation quality of existing public databases; consequently, its performance remains limited for gene relationships that are insufficiently studied or lack robust supporting evidence. Second, heterogeneity across databases with respect to evidence grading, interaction definitions, and biological context may sti ll influence inference outcomes, despite mitigation through confidence assessment and multi-source integration. In addition, the current framework primarily focuses on qualitative and mechanistic reasoning and does not incorporate quantitative expression data or dynamic parameters, which limits its ability to model regulatory strength or precise changes in pathway activity. Furthermore, the reasoning behavior of language models remains sensitive to prompt design and model scale, and different configurations may lead to variations in inferred details. This highlights the necessity of more systematic robustness evaluations and comparative analyses in future work. Several directions merit further exploration. First, integrating experimental data—such as bulk or single-cell transcriptomic profiles and perturbation assay results—into the current knowledge- driven framework may enhance the adaptability of inference results to specific biological contexts. Second, incorporating contextual annotations such as tissue type, cell type, or disease state may facilitate more accurate modeling of condition-dependent regulatory network rewiring. Finally, in clinical and precision medicine settings, applying this framework to patient-specific mutation combinations may provide valuable support for therapeutic target identification and resistance mechanism analysis. Reference 1. Park M, Shin JE, Yee J, Ahn YM, Joo EJ. Gene-gene interaction analysis for age at onset of bipolar disorder in a Korean population. J Affect Disord . 2024;361:97-103. doi:10.1016/j.jad.2024.05.152 2. Slim L, Chatelain C, Foucauld H, Azencott CA. A systematic analysis of gene-gene interaction in multiple sclerosis. BMC Med Genomics . 2022;15(1):100. Published 2022 Apr 30. doi:10.1186/s12920-022-01247-3 3. Ansari MA, Naqvi HA, Khidri FF, Rajput AH, Mahmood A, Waryah AM. Gene-gene and gene-environmental interaction of dopaminergic system genes in Pakistani children with attention deficit hyperactivity disorder. Saudi J Biol Sci . 2024;31(8):104045. doi:10.1016/j.sjbs.2024.104045 4. Kanehisa M, Furumichi M, Sato Y, Ishiguro-Watanabe M, Tanabe M . KEGG: integrating viruses and cellular organisms. Nucleic Acids Res . 2021;49(D1):D545-D551. doi:10.1093/nar/gkaa970 5. Slenter DN, Kutmon M, Hanspers K, et al. WikiPathways: a multifaceted pathway database bridging metabolomics to other omics research. Nucleic Acids Res . 2018;46(D1):D661-D667. doi:10.1093/nar/gkx1064 6. Perfetto L, Briganti L, Calderone A, et al. SIGNOR: a database of causal relationships between biological entities. Nucleic Acids Res . 2016;44(D1):D548-D554. doi:10.1093/nar/gkv1048 7. Cerami EG, Gross BE, Demir E, et al. Pathway Commons, a web resource for biological pathway data. Nucleic Acids Res . 2011;39(Database issue):D685-D690. doi:10.1093/nar/gkq1039 8. Kim, S., Chen, J., Cheng, T., et al. (2023). PubChem in 2023: new data content and improved web interfaces . Nucleic Acids Research, 51(D1), D1373–D1380. 9. Hsieh AR, Tsai CY. Biomedical literature mining: graph kernel-based learning for gene-gene interaction extraction. Eur J Med Res . 2024;29(1):404. Published 2024 Aug 2. doi:10.1186/s40001-024-01983-5 10. Chanumolu SK, Albahrani M, Can H, Otu HH. KEGG2Net: Deducing gene interaction networks and acyclic graphs from KEGG pathways. EMBnet J . 2021;26:e949. doi:10.14806/ej.26.0.949 11. Chai LE, Loh SK, Low ST, Mohamad MS, Deris S, Zakaria Z. A review on the computational approaches for gene regulatory network construction. Comput Biol Med . 2014;48:55-65. doi:10.1016/j.compbiomed.2014.02.011 12. Marku M, Pancaldi V. From time-series transcriptomics to gene regulatory networks: A review on inference methods. PLoS Comput Biol . 2023;19(8):e1011254. Published 2023 Aug 10. doi:10.1371/journal.pcbi.1011254 13. Lin A, Ye J, Qi C, et al. Bridging artificial intelligence and biological sciences: a comprehensive review of large language models in bioinformatics. Brief Bioinform . 2025;26(4):bbaf357. doi:10.1093/bib/bbaf357 14. Xie Y, Lu J, Ho J, Nahab F, Hu X, Yang C. PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept Linking. Int ACM SIGIR Conf Res Dev Inf Retr . 2024;2024:2589-2593. doi:10.1145/3626772.3657904 15. Xie Y, Lu J, Ho J, Nahab F, Hu X, Yang C. PromptLink: Leveraging Large Language Models for Cross-Source Biomedical Concept Linking. Int ACM SIGIR Conf Res Dev Inf Retr . 2024;2024:2589-2593. doi:10.1145/3626772.3657904 16. Tian S, Jin Q, Yeganova L, et al. Opportunities and challenges for ChatGPT and large language models in biomedicine and health. Brief Bioinform . 2023;25(1):bbad493. doi:10.1093/bib/bbad493

GIP-RAG: An Evidence-Grounded Retrieval-Augmented Framework for Interpretable Gene Interaction and Pathway Impact Analysis

Original Paper

Comments & Academic Discussion

Leave a Comment

Original Paper

Related Papers

Comments & Academic Discussion

Leave a Comment