Anytime Safe PAC Efficient Reasoning
Large Reasoning Models (LRMs) have demonstrated remarkable performance on complex tasks but suffer from high computational costs and latency. While selective thinking strategies improve efficiency by routing easy queries to non-thinking models, existing approaches often incur uncontrollable errors, especially in online settings where the performance loss of a non-thinking model is only partially observed and data are non-stationary. To address this, we propose Betting Probably Approximately Correct (B-PAC) reasoning, a principled method that enables anytime safe and efficient online reasoning under partial feedback. Specifically, we utilize inverse propensity scoring estimators to construct test supermartingales for candidate thresholds, and then dynamically adjust the routing threshold based on the accumulated statistical evidence of safety. Theoretically, we establish the anytime-valid performance loss control and the efficiency of B-PAC reasoning. Extensive experiments demonstrate that B-PAC reasoning significantly reduces computational overhead, decreasing thinking model usage by up to 81.01%, while controlling the performance loss below the user-specified level.
💡 Research Summary
Large Reasoning Models (LRMs) achieve state‑of‑the‑art performance on complex tasks but are hampered by high inference cost and latency, especially when they “over‑think” simple queries. Selective‑thinking approaches mitigate this by routing easy inputs to a cheap, non‑thinking model and reserving the expensive “thinking” model for hard cases. However, existing methods rely on heuristic thresholds, lack rigorous risk guarantees, and break down in online settings where (i) performance loss of the cheap model is only observed when the expensive model is invoked (partial feedback) and (ii) the data distribution may drift over time (non‑stationarity).
The paper introduces Betting Probably Approximately Correct (B‑PAC) reasoning, a principled online framework that provides anytime‑valid (ε, α)‑PAC guarantees on the performance loss relative to the thinking model while dramatically reducing its usage. The core components are:
- Uncertainty‑based routing – For each incoming query Xₜ the cheap model produces an answer and an uncertainty score Uₜ∈
Comments & Academic Discussion
Loading comments...
Leave a Comment