Diverge to Induce Prompting: Multi-Rationale Induction for Zero-Shot Reasoning

Diverge to Induce Prompting: Multi-Rationale Induction for Zero-Shot Reasoning
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

To address the instability of unguided reasoning paths in standard Chain-of-Thought prompting, recent methods guide large language models (LLMs) by first eliciting a single reasoning strategy. However, relying on just one strategy for each question can still limit performance across diverse tasks. We propose Diverge-to-Induce Prompting (DIP), a framework that first prompts an LLM to generate multiple diverse high-level rationales for each question. Each rationale is then elaborated into a detailed, step-by-step draft plan. Finally, these draft plans are induced into a final plan. DIP enhances zero-shot reasoning accuracy without reliance on resource-intensive sampling. Experiments show that DIP outperforms single-strategy prompting, demonstrating the effectiveness of multi-plan induction for prompt-based reasoning.


💡 Research Summary

The paper introduces Diverge‑to‑Induce Prompting (DIP), a novel prompting framework designed to improve zero‑shot reasoning in large language models (LLMs). Traditional zero‑shot Chain‑of‑Thought (CoT) prompting often suffers from unstable reasoning paths because the model is left to generate its own chain without guidance. Recent single‑strategy methods such as Plan‑and‑Solve and Strategic CoT (S‑CoT) mitigate this by first eliciting a single high‑level plan, but they still commit to one intuition per question, which can be sub‑optimal for diverse tasks.

DIP addresses this limitation by explicitly generating multiple high‑level rationales (called “rationales” or “strategies”) for each input question. The process consists of three main phases:

  1. Divergent Phase – The model is prompted once to produce N distinct rationales (the paper uses N = 5 by default). Each rationale represents a different high‑level approach to solving the problem.
  2. Draft Plan Construction – Using the same prompt call, each rationale is expanded into a detailed, step‑by‑step draft plan. This yields a set of draft plans {p₁,…,p_N}.
  3. Draft Plan Induction – All draft plans are fed back to the model in a single call, where the model synthesizes them into a single “final plan” (P_DIP). This induction step leverages the model’s inductive reasoning ability to merge diverse perspectives, resolve contradictions, and select the most reliable steps.
  4. Inference – The final plan guides a standard CoT generation, producing the chain of reasoning (c) and the final answer (y*).

The authors evaluate DIP across six major model families (LLaMA, Mistral, Gemini, GPT, Grok, and the o‑Series) and on two challenging benchmarks: BIG‑Bench Hard (BBH) and LiveBench Reasoning. Baselines include Zero‑shot CoT (Z‑CoT), Rationale CoT (R‑CoT), and Strategic CoT (S‑CoT). Results show that DIP consistently outperforms all baselines. For example, Llama 4 Scout improves from 77.74% (Z‑CoT) to 84.46% (+6.72%) and even achieves a +30.5% gain over Z‑CoT on the GPT‑4.1 Mini model. Across all models, DIP’s gains over Z‑CoT range from 0.58% to 6.72% on BBH and up to 30.5% on LiveBench.

A key contribution is the cost‑efficiency analysis. Compared to sampling‑based multi‑path methods such as Self‑Consistency (SC) with k = 20, DIP uses dramatically fewer output tokens (e.g., 1,556 vs. 7,532 for Llama 4 Scout) while achieving comparable or higher accuracy. Even when combined with SC on the final answer generation step (DIP+SC), the token overhead remains far lower than pure SC approaches.

Ablation studies reveal that the rationale generation step is crucial: a variant without it (DIP‑R) underperforms DIP on 9 out of 10 models. Varying the number of rationales N shows that moderate diversity (N = 5–7) yields the best performance, while too many rationales can introduce noise for some models.

The paper positions DIP as a “plan‑before‑do” alternative to search‑based methods like Tree‑of‑Thoughts, which explore multiple reasoning paths during execution. By front‑loading diversity in abstract planning and then inductively merging plans, DIP captures the benefits of multi‑plan exploration without the computational burden of executing many paths.

In conclusion, DIP demonstrates that (1) generating multiple high‑level rationales mitigates the bias of single‑strategy prompting, (2) inductive synthesis of draft plans produces a more robust final reasoning plan, and (3) this approach delivers superior accuracy with substantially lower token cost than existing multi‑path sampling methods. Future work may explore automated optimization of rationale prompts, scaling to more complex logical structures, and integrating external verification without sacrificing efficiency.


Comments & Academic Discussion

Loading comments...

Leave a Comment