ORION: Teaching Language Models to Reason Efficiently in the Language of Thought
Large Reasoning Models (LRMs) achieve state-of-the-art performance in mathematics, code generation, and task planning. However, their reliance on long chains of verbose 'thinking' tokens results in hi
Large Reasoning Models (LRMs) achieve state-of-the-art performance in mathematics, code generation, and task planning. However, their reliance on long chains of verbose “thinking” tokens results in high latency, redundancy, and incoherent reasoning paths. Inspired by the Language of Thought Hypothesis -which posits that human reasoning operates over a symbolic, compositional mental language called Mentalese-we introduce a cognitively motivated framework that trains models to reason in a similar compact style. Mentalese encodes abstract reasoning as ultra-compressed, structured tokens, enabling models to solve complex problems with far fewer steps. To achieve both efficiency and accuracy, we propose SHORTER LENGTH PREFERENCE OPTIMIZATION (SLPO), a reinforcement learning method that directly optimizes models to generate concise yet correct reasoning by rewarding shorter solutions that maintain high accuracy while flexibly allowing longer reasoning when complexity demands it. When applied to Mentalese-aligned models, SLPO achieves much larger compression rates by enabling compressed reasoning that preserves the benefits of detailed thinking without the computational overhead, allowing us to present the best-performing models at each compression level along the performance-efficiency Pareto frontier. Across mathematical benchmarks -including AIME 2024 & 2025, Minerva-Math, OlympiadBench, Math500, and AMC -our ORION models generate reasoning traces with 4-16× fewer tokens, achieve up to 5× lower inference latency, and reduce training costs by 7-9× relative to the base DeepSeek R1 Distilled model, while maintaining 90-98% of the baseline accuracy. ORION models also surpass Claude and ChatGPT-4o by up to 5% in accuracy while maintaining 2× compression. Our findings demonstrate Mentalese-style compressed reasoning offers a breakthrough toward human-like cognitive efficiency, opening new possibilities for real-time, cost-effective reasoning without sacrificing accuracy. 1 1× 2× 5× 10× 20× 40× Compression Rate (×) relative to DeepSeek R1 1.5B 10% 20% 30% 40% 50% Average Accuracy (%) on Mathematical Benchmarks
📜 Original Paper Content
🚀 Synchronizing high-quality layout from 1TB storage...