Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens

Latent Thoughts Tuning: Bridging Context and Reasoning with Fused Information in Latent Tokens
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

While explicit Chain-of-Thought (CoT) equips Large Language Models (LLMs) with strong reasoning capabilities, it requires models to verbalize every intermediate step in text tokens, constraining the model thoughts to the discrete vocabulary space. Recently, reasoning in continuous latent space has emerged as a promising alternative, enabling more robust inference and flexible computation beyond discrete token constraints. However, current latent paradigms often suffer from feature collapse and instability, stemming from distribution mismatches when recurrently using hidden states as the input embeddings, or alignment issues when relying on assistant models. To address this, we propose Latent Thoughts Tuning (LT-Tuning), a framework that redefines how latent thoughts are constructed and deployed. Instead of relying solely on raw hidden states, our method introduces a Context-Prediction-Fusion mechanism that jointly leveraging contextual hidden states and predictive semantic guidance from the vocabulary embedding space. Combined with a progressive three-stage curriculum learning pipeline, LT-Tuning also enables dynamically switching between latent and explicit thinking modes. Experiments demonstrate that our method outperforms existing latent reasoning baselines, effectively mitigating feature collapse and achieving robust reasoning accuracy.


💡 Research Summary

The paper addresses the limitations of explicit Chain‑of‑Thought (CoT) prompting, which forces large language models (LLMs) to verbalize every intermediate reasoning step as discrete tokens. While recent work has explored reasoning directly in continuous latent spaces, existing approaches suffer from two major problems: (1) feature collapse and instability caused by a distribution mismatch when raw hidden states are reused as input embeddings, especially in models with untied input‑output embeddings; and (2) static allocation of latent reasoning steps, which wastes computation on easy sub‑problems and provides insufficient depth for hard ones.
LT‑Tuning (Latent Thoughts Tuning) proposes a unified solution. Its core is the Context‑Prediction Fusion mechanism, which constructs each latent token by linearly blending the contextual hidden state from a chosen transformer layer with a probability‑weighted embedding derived from the model’s vocabulary distribution. This fusion aligns the latent input with the model’s embedding manifold while preserving contextual information, thereby mitigating feature collapse.
A confidence‑driven dynamic insertion strategy further improves efficiency: a special control token is inserted only when the model’s prediction confidence falls below a threshold τ, directing latent reasoning to uncertain steps and allowing standard text generation elsewhere.
Training proceeds through a three‑stage curriculum. Stage 1 fine‑tunes the base model on standard CoT data to acquire basic step‑by‑step reasoning. Stage 2 introduces tokens based on confidence and initially uses raw hidden states as latent embeddings. Stage 3 replaces these with the fused representations, stabilizing latent‑space optimization.
Experiments on Llama‑3.2‑1B, 3B, and 8B models across four mathematical reasoning benchmarks (GSM8K, ASDiv‑Aug, MultiArith, SVAMP) show consistent gains over prior latent reasoning baselines such as Coconut and Soft‑Thinking, with up to a 4.3 percentage‑point improvement. The method scales well: unlike earlier approaches that degrade on larger models, LT‑Tuning remains robust, and a lightweight adapter suffices for models with separate input and output embeddings.
In summary, LT‑Tuning simultaneously resolves the distribution mismatch and static computation issues that have limited latent‑space reasoning, offering a post‑training plug‑in that can be applied to off‑the‑shelf LLMs. It delivers dynamic, stable, and semantically rich latent reasoning while preserving the strengths of explicit CoT, narrowing the performance gap between discrete and continuous reasoning paradigms.


Comments & Academic Discussion

Loading comments...

Leave a Comment