Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models

February 23, 2026

Reading time: 5 minute

...

📝 Abstract

The complex reasoning ability of Large Language Models (LLMs) poses a critical bottleneck for their practical applications. Test-time expansion methods such as Tree-of-Thought (ToT) and Graph-of-Thought (GoT) enhance reasoning by introducing intermediate reasoning structures, tree search, or graph-based exploration mechanisms. However, their reasoning strategies suffer from limited diversity, redundant search branches, and inadequate integration and error correction across heterogeneous reasoning paths. To address these limitations, we propose a novel reasoning framework called Multi-chain Graph Refinement & Selection (MGRS), which first generates multiple diverse reasoning trajectories for a given problem, refines candidate responses using a composite self- and cross-verification strategy, then constructs a reasoning relation graph and estimates the success rate of intermediate nodes, and finally computes cumulative success rates to select the most reliable answer and corresponding reasoning trajectory. Experimental results demonstrate that MGRS significantly advances both the reasoning capability and computational efficiency of reasoning enhancement methods. Across six benchmark datasets spanning four distinct tasks, MGRS achieves an average accuracy of 82.9%, outperforming state-of-the-art baselines by a clear margin of 2.1%. Remarkably, on the 24-point game, MGRS attains 100% accuracy for the first time, while delivering a 13.6x speed-up compared to the leading Forest of Thoughts framework.

💡 Analysis

🇰🇷 한글로 읽기

📄 Content

Since the establishment of the large-scale pre-training paradigm, Large Language Models (LLMs) have have exhibited unprecedented generalization capabilities not only in traditional NLP tasks such as machine translation, and conversational generation, but also in emerging fields such as code generation, AI4Science, and embodied intelligence. However, when confronted with complex problems that require logical reasoning, the conventional next-token prediction mechanism exhibits inherent limitations, including systematic biases and error accumulation [1]. To alleviate these problems, the Chainof-Thought (CoT) framework [2] introduces reasoning instructions or exemplars with intermediate steps into prompts, thereby guiding models to perform multi-step reasoning during inference and generate final answers through a structured deductive process. This approach has yielded substantial performance improvements on mathematical, commonsense, and symbolic reasoning benchmarks.

Although CoT provides a viable paradigm for complex reasoning, its single-chain, unidirectional generation mechanism remains limited in terms of reasoning depth, breadth, and accuracy. Several studies have therefore proposed various extensions to mitigate these shortcomings. Among them, Tree-of-Thought (ToT) [3] and Graph-of-Thought (GoT) [4] replace the linear chain with a multi-branch tree or a directed acyclic graph, enabling global optimization through concurrent exploration and backtracking. Self-Consistency [5] and Majority Voting [6] sample multiple independent reasoning chains to estimate the most consistent outcome. Moreover, Reflexion [7] and Self-Correct [8] introduce a “generate-criticize-revise” closed loop that iteratively refines reasoning trajectories and corrects intermediate errors.

The aforementioned extensions provide possible solutions for enhancing model performance. However, several challenges remain unresolved. First, although traditional tree-and graph-based reasoning frameworks are theoretically capable of modeling branching structures and complex dependencies, they often lack mechanisms that guide LLMs to expand branches at critical decision points. As a result, these frameworks tend to degenerate into ensembles of multiple, highly similar reasoning chains. Second, not all nodes necessarily require branching. From the perspective of human reasoning, it is more natural to explore multiple solution paths for a given problem-each corresponding to a distinct combination of branches within the reasoning tree. The diversity of these combinations is often sufficient for problem-solving and cross-verification, while also helping to reduce computational overhead. Furthermore, branch expansion in existing approaches typically relies on end-level voting across independent chains, overlooking step-wise error propagation. Even high-confidence segments may accumulate biases during intermediate steps. Therefore, it is essential to improve the mechanism of chain-of-thought enhancement and effectively adapt it to tree-or graph-based reasoning frameworks, thereby enhancing overall reasoning accuracy.

This paper proposes a novel reasoning framework-Multi-chain Graph Refinement & Selection (MGRS)-that enhances the reasoning capabilities of large language models (LLMs) by reforming the inference process. MGRS generates multiple diverse reasoning chains for a given problem and constructs a directed acyclic graph (DAG) through dependency analysis and similarity-based node merging. To further strengthen reasoning, we design a composite verification and refinement strategy that integrates self-verification within individual reasoning chains, cross-verification among branches, and targeted error correction. Moreover, we introduce an answer selection strategy based on estimated success probability, which identifies the most reliable final answer-rather than the most frequent reasoning chain-by jointly considering success likelihood and voting consistency. Experimental results demonstrate that the proposed MGRS framework substantially improves the reasoning performance of LLMs, enabling them to address complex tasks with higher accuracy and efficiency.

Structured reasoning methods explicitly organize the reasoning process into structured forms such as chains, trees, or graphs, enabling large language models (LLMs) to generate intermediate steps in a stepwise manner according to predefined logic.

In chain-based approaches, Chain-of-Thought (CoT) [2] decomposes a problem into a sequence of relatively simple intermediate steps and gradually reasons toward the final conclusion. By reducing the overall difficulty of reasoning, this approach effectively improves accuracy on complex mathematical and logical tasks. Subsequent research has enhanced CoT in terms of reliability and zero-shot triggering efficiency. For instance, TiM [9] stores intermediate reasoning steps in external memory slots and introduces backtracking for long-chain reasoning. Zero-

View Original ArXiv

This content is AI-processed based on ArXiv data.

Multi-chain Graph Refinement and Selection for Reliable Reasoning in Large Language Models

📝 Abstract

💡 Analysis

📄 Content

Table of Contents

Table of Contents

📝 Abstract

💡 Analysis

📄 Content

Start searching

No results found