No Global Plan in Chain-of-Thought: Uncover the Latent Planning Horizon of LLMs
This work stems from prior complementary observations on the dynamics of Chain-of-Thought (CoT): Large Language Models (LLMs) is shown latent planning of subsequent reasoning prior to CoT emergence, thereby diminishing the significance of explicit CoT; whereas CoT remains critical for tasks requiring multi-step reasoning. To deepen the understanding between LLM’s internal states and its verbalized reasoning trajectories, we investigate the latent planning strength of LLMs, through our probing method, Tele-Lens, applying to hidden states across diverse task domains. Our empirical results indicate that LLMs exhibit a myopic horizon, primarily conducting incremental transitions without precise global planning. Leveraging this characteristic, we propose a hypothesis on enhancing uncertainty estimation of CoT, which we validate that a small subset of CoT positions can effectively represent the uncertainty of the entire path. We further underscore the significance of exploiting CoT dynamics, and demonstrate that automatic recognition of CoT bypass can be achieved without performance degradation. Our code, data and models are released at https://github.com/lxucs/tele-lens.
💡 Research Summary
This paper investigates whether large language models (LLMs) actually formulate a global plan when generating a chain‑of‑thought (CoT) or whether they operate with a short‑sighted, incremental strategy. Prior work presented two seemingly contradictory views: (1) internal representations of LLMs already encode the entire reasoning trajectory before CoT emerges, suggesting explicit CoT may be unnecessary, and (2) theoretical limits of the Transformer architecture imply that multi‑step reasoning requires explicit intermediate steps, making CoT indispensable. To reconcile these perspectives, the authors introduce the notion of a “latent planning horizon” and propose a probing framework called Tele‑Lens. Tele‑Lens attaches a low‑rank adapter to each transformer layer, transforms hidden states, and directly maps them to the full vocabulary logits, enabling three probing tasks: predicting subsequent tokens, estimating total reasoning length, and directly predicting the final answer.
Experiments span twelve diverse tasks grouped into three categories: explicit compositional tasks (Parity, Cycle, Subsum), implicit compositional tasks (math datasets GSM8K, MATH, AIME, and logical reasoning datasets MuSR, Zebra), and knowledge/semantic tasks (CSQA, MMLU, QuALITY, GPQA). Two model families are used: the open‑source Qwen‑3‑32B with native thinking mode, and an “in‑domain” LLM fine‑tuned from Qwen2.5‑7B‑Instruct via reinforcement learning (GRPO) to produce more stable CoT. For each task, up to 4 000 training, 100 validation, and 500 test examples are prepared, and adapters are trained for roughly 5 K steps per layer with rank = 256.
Key findings: (1) For tasks that truly require multi‑step reasoning, direct answering without CoT yields near‑random performance, confirming that Transformers struggle with complex compositionality. CoT dramatically improves accuracy across most datasets. (2) The in‑domain LLM, while sometimes lagging behind Qwen‑3 on raw accuracy, achieves the best results on the three explicit compositional tasks and generates much shorter CoT sequences (≈1 K characters vs. >10 K for Qwen‑3), indicating a more decisive internal plan. (3) Probing results reveal a “myopic” planning horizon: hidden states can reliably predict the final answer only one or two steps ahead, and only for simpler problems do early states contain a coarse gist of the answer. For harder problems, prediction accuracy quickly flattens, showing that the model does not maintain a global roadmap. (4) Leveraging this short‑horizon behavior, the authors propose the “Wooden Barrel” hypothesis for uncertainty estimation: the reliability of a reasoning chain is governed by a few critical pivot positions rather than the entire token sequence. Selecting top‑k pivot tokens based on entropy or perplexity improves uncertainty metrics by up to 6 % absolute. (5) Building on the same insight, they design an automatic CoT‑bypass mechanism that skips CoT generation when pivot uncertainty is low, achieving up to 16.2 % bypass with only a 0.03 % drop in overall accuracy on Qwen‑3‑32B.
Overall, the study demonstrates that LLMs do not construct detailed global plans; instead, they perform incremental local transitions, with only limited foresight for the immediate next steps. Understanding this latent planning horizon enables more efficient use of CoT, better confidence calibration, and selective omission of unnecessary reasoning steps. Future work may extend the analysis to larger model families, explore meta‑learning methods for automatically identifying pivot tokens, and investigate how training objectives could be modified to broaden the planning horizon.
Comments & Academic Discussion
Loading comments...
Leave a Comment