Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality

Are More Tokens Rational? Inference-Time Scaling in Language Models as Adaptive Resource Rationality
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

Human reasoning is shaped by resource rationality – optimizing performance under constraints. Recently, inference-time scaling has emerged as a powerful paradigm to improve the reasoning performance of Large Language Models by expanding test-time computation. Specifically, instruction-tuned (IT) models explicitly generate long reasoning steps during inference, whereas Large Reasoning Models (LRMs) are trained by reinforcement learning to discover reasoning paths that maximize accuracy. However, it remains unclear whether resource-rationality can emerge from such scaling without explicit reward related to computational costs. We introduce a Variable Attribution Task in which models infer which variables determine outcomes given candidate variables, input-output trials, and predefined logical functions. By varying the number of candidate variables and trials, we systematically manipulate task complexity. Both models exhibit a transition from brute-force to analytic strategies as complexity increases. IT models degrade on XOR and XNOR functions, whereas LRMs remain robust. These findings suggest that models can adjust their reasoning behavior in response to task complexity, even without explicit cost-based reward. It provides compelling evidence that resource rationality is an emergent property of inference-time scaling itself.


💡 Research Summary

The paper investigates whether resource‑rational behavior can emerge in large language models (LLMs) simply from inference‑time scaling, without any explicit cost‑based reward. To this end the authors introduce the Variable Attribution Task (VAT), a controlled benchmark in which a model is given N candidate binary variables and T input‑output trials generated by a hidden two‑input Boolean function. The model must identify the unique pair of variables that determines the output. By varying N and T the authors manipulate task complexity, and they define an “information ratio” ρ = 2^T / (N choose 2) to quantify how much experimental information is available relative to the hypothesis space.

Two families of models are examined: (1) Instruction‑tuned (IT) models that have been fine‑tuned on chain‑of‑thought (CoT) data and generate long reasoning traces at test time, and (2) Large Reasoning Models (LRMs) that are additionally trained with reinforcement learning (RL) to treat the reasoning trace as a latent search process and receive reward only for correct final answers. The concrete models evaluated are DeepSeek‑R1, DeepSeek‑V3, Qwen‑Thinking, and Qwen‑Instruct.

VAT admits two natural computational strategies. The “permutation” strategy enumerates each possible variable pair and checks consistency across all trials; it has low working‑memory demand but can be computationally expensive. The “elimination” strategy processes trials one by one, pruning inconsistent pairs and maintaining a global hypothesis set; it requires more memory but can dramatically reduce the search space when the underlying Boolean function is conjunctive or disjunctive, because a single positive (or negative) trial eliminates many candidates. For XOR/XNOR functions, pruning is weak, so elimination becomes costly.

The authors generate 3,000 VAT instances covering ten non‑trivial Boolean functions (conjunctive, disjunctive, XOR/XNOR) across ten values of N (3–16) and six levels of T (minimum required plus 0–5 extra trials). To label the reasoning strategy used by each model response, they employ a high‑capacity external LLM (Kimi‑K2‑Instruct‑0905) that classifies the trace as permutation, elimination, or invalid. Human annotation of a balanced 100‑sample subset shows the judge achieves 86 % accuracy and a Cohen’s κ of 0.76, supporting its reliability.

Results reveal a clear phase transition in strategy selection as task complexity grows. With small N (≤ 3) both IT and LRM models predominantly use permutation. As N increases, the proportion of elimination rises sharply, indicating that models autonomously shift toward a more memory‑intensive but information‑efficient approach when the hypothesis space expands quadratically. This transition is robust across model families but differs by logical function: for conjunctive and disjunctive functions elimination dominates, whereas for XOR/XNOR the shift is attenuated. DeepSeek‑R1 still shows a modest increase in elimination for XOR‑like tasks, while Qwen‑Thinking almost entirely sticks to permutation.

Performance-wise, IT models suffer a noticeable drop in accuracy on XOR and XNOR tasks, reflecting the inefficiency of permutation when pruning offers little benefit. In contrast, LRMs maintain high accuracy across all function types, suggesting that RL training implicitly internalizes a cost‑aware policy that avoids costly maintenance of large hypothesis sets.

Statistical model comparison (AIC) confirms that log(N²)·T and the information ratio ρ are strong predictors of elimination usage (p < 0.001). The authors interpret these findings as evidence that resource‑rationality can emerge from the mere act of allocating more computation at inference time: models detect the structure of the problem and reallocate internal resources accordingly, without any explicit penalty for computation.

The paper concludes that inference‑time scaling is not just a performance hack but a mechanism that can give rise to adaptive, cost‑sensitive reasoning akin to human metacognition. Limitations include reliance on an external LLM for strategy labeling and the restriction of VAT to binary Boolean functions with only two relevant variables. Future work should extend the paradigm to richer causal inference settings, multi‑variable interactions, and real‑world scientific reasoning tasks to test the generality of emergent resource rationality.


Comments & Academic Discussion

Loading comments...

Leave a Comment