Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Test-time compute allocation in large reasoning models (LRMs) is widely used and has applications in mathematical problem solving, code synthesis, and planning. Recent work has addressed this problem by scaling self-consistency and parallel thinking, adding generic thinking tokens'' and prompting models to re-read the question before answering. Unfortunately, these approaches either inject task-agnostic tokens or mandate heuristics that do not explain -- and often ignore -- the \emph{spontaneous} repetition that many LRMs exhibit at the head of their internal chains. In contrast, we analyze and harness the model's tendency to restate the question, which we term the \emph{Echo of Prompt (EOP)}, as a front-loaded, compute-shaping mechanism. We formalize its probabilistic cost by casting echo removal as rejection-based conditioning and defining the \emph{Echo Likelihood Gap} $Δ\mathcal{L}$ as a computable proxy. This provides the missing theoretical link that links early repetition to likelihood gains and downstream accuracy. However, it does not by itself specify how to exploit EOP. Consequently, we develop \emph{Echo-Distilled SFT (ED-SFT)} to instill an echo-then-reason’’ pattern through supervised finetuning, and \emph{Echoic Prompting (EP)} to re-ground the model mid-trace without training. While promising, quantifying benefits beyond verbosity is non-trivial. Therefore, we conduct length and suffix-controlled likelihood analyses together with layer-wise attention studies, showing that EOP increases answer to answer-prefix attention in middle layers, consistent with an \emph{attention refocusing} mechanism. We evaluate on GSM8K, MathQA, Hendrycks-MATH, AIME24, and MATH-500 under identical decoding settings and budgets, and find consistent gains over baselines. Code is available at https://github.com/hhh2210/echoes-as-anchors.

💡 Research Summary

The paper investigates a pervasive yet under‑studied phenomenon in large reasoning models (LRMs): the spontaneous repetition of the user’s prompt at the beginning of a chain‑of‑thought, which the authors name “Echo of Prompt” (EOP). While prior work has deliberately injected “think‑tokens”, “re‑read” instructions, or early‑exiting mechanisms, these approaches either add task‑agnostic tokens or impose heuristics that ignore the model’s own tendency to echo. The authors argue that EOP is not a bug but a learned strategy that can shape test‑time compute allocation.

Probabilistic Formalism
The core theoretical contribution is a probabilistic framework that treats the presence of an echo as a random event. Let πθ(y|x) be the base conditional distribution of a reasoning trace y given prompt x. Using a separately trained MLP probe, the output space Y is partitioned into Y_echo (traces containing an echo) and Y_trim (echo‑free traces). The echo‑free distribution τθ(y|x) is defined as πθ conditioned on the event y∈Y_trim, i.e., τθ(y|x)=πθ(y|x)·1

Echoes as Anchors: Probabilistic Costs and Attention Refocusing in LLM Reasoning

💡 Research Summary

Comments & Academic Discussion

Leave a Comment