Large Language Models (LLMs) can generate reasoning tokens before their final answer to boost performance on complex tasks. While these sequences seem like human thought processes, empirical evidence reveals that they are not a faithful explanation of the model's actual reasoning process. To address this gap between appearance and function, we introduce the State over Tokens (SoT) conceptual framework. SoT reframes reasoning tokens not as a linguistic narrative, but as an externalized computational state -- the sole persistent information carrier across the model's stateless generation cycles. This explains how the tokens can drive correct reasoning without being a faithful explanation when read as text and surfaces previously overlooked research questions on these tokens. We argue that to truly understand the process that LLMs do, research must move beyond reading the reasoning tokens as text and focus on decoding them as state.
💡 Deep Analysis
📄 Full Content
State over Tokens:
Characterizing the Role of Reasoning Tokens
Mosh Levy
Bar-Ilan University
moshe0110@gmail.com
Zohar Elyoseph
University of Haifa
Shauli Ravfogel
New York University
Yoav Goldberg
Bar-Ilan University
Allen Institute for AI
Abstract
Large Language Models (LLMs) can generate reasoning tokens before their
final answer to boost performance on complex tasks. While these sequences
seem like human thought processes, empirical evidence reveals that they
are not a faithful explanation of the model’s actual reasoning process. To
address this gap between appearance and function, we introduce the State
over Tokens (SoT) conceptual framework. SoT reframes reasoning tokens
not as a linguistic narrative, but as an externalized computational state—the
sole persistent information carrier across the model’s stateless generation
cycles. This explains how the tokens can drive correct reasoning without
being a faithful explanation when read as text and surfaces previously
overlooked research questions on these tokens. We argue that to truly
understand the process that LLMs do, research must move beyond reading
the reasoning tokens as text and focus on decoding them as state.
1
Introduction
The assertion that Large Language Models (LLMs) can reason now appears unremarkable
(Mitchell, 2025; Maslej et al., 2025). A key factor to achieve this was letting models generate
a sequence of tokens before their final answer, which significantly improves performance
(Wei et al., 2022; Zelikman et al., 2022; DeepSeek-AI et al., 2025).
We refer to this sequence of symbols, which includes phrases such as ‘therefore’, ‘consider’
and ‘it follows that’ as the reasoning tokens, and explicitly distinguish this name from
reasoning text, which is the same tokens when interpreted by a reader according to their
English semantics.
The combination of (a) utility in improving the answer; and (b) appearance as a readable En-
glish text, may lead to the following inference: the reasoning text is a faithful explanation of
the model’s reasoning process. This is strengthened by metaphors like “Chain-of-Thought”,
which imply that the steps in the text are ”thoughts” that explain the process. Yet empirical
findings contradict this inference (see Section 2.1): the reasoning text is not a faithful expla-
nation of the model’s reasoning process. While those findings clarify what the reasoning
tokens are not, they leave a conceptual vacuum as to what they are. Our aim in this paper is
to help fill that vacuum. Drawing on the idea that metaphors structure understanding and
guide thinking (Lakoff & Johnson, 1980), we believe that adopting more apt descriptions
and metaphors can steer researchers and practitioners toward more fruitful directions and
surface a new set of questions that are less salient under the prevailing view of the reasoning
text as an explanation.
To understand reasoning tokens, we must focus on the functional role they play, rather
than their appearance, which empirical research has found to be deceiving. To this end, we
advocate viewing them as representing State over Tokens (SoT), which characterizes the
1
arXiv:2512.12777v1 [cs.CL] 14 Dec 2025
reasoning tokens as a computational device that enables the persistence of a process across
separate and stateless computation cycles. We argue that in order to understand the role of
the reasoning tokens, we should interpret this sequence of tokens not using their semantics
when read as English text, but as the state carriers of a computational process.
The Whiteboard Analogy
Consider a hypothetical scenario: you are placed in a room with a problem written
on a whiteboard. Your task is to solve it, but under a peculiar constraint: every 10 sec-
onds, your memory is completely wiped and resets to the same state as it was when
you first entered the room. Within each interval, you can read what is on the board
and add a single word. These rounds repeat until you finally write down the solution.
How might you solve a problem under such constraints?
You may write
intermediate results on the board: numbers, conclusions, or partial computa-
tions—that you can use when you return after being ”reset”. You might perform
several mental calculations before writing down just the result so the whiteboard may
not capture every calculation that you did within each cycle. Moreover, you may use
an encoding scheme when writing on the board: abbreviations, symbols, or even
apparent gibberish that will mean something specific to you when you encounter it in
the next cycle. All in all, an outside observer may interpret the whiteboard text incorrectly.
The whiteboard analogy mirrors the model’s operation: the words are the reasoning tokens,
you are the model, and the ten-second interval represents the model’s limited capacity per
cycle. Motivated by this intuition, we present the SoT framework (Section 3) and use it to
demonstrate two common misconceptions that underlay the belief that the text is a faithful