Do Reasoning Models Enhance Embedding Models?

Do Reasoning Models Enhance Embedding Models?
Notice: This research summary and analysis were automatically generated using AI technology. For absolute accuracy, please refer to the [Original Paper Viewer] below or the Original ArXiv Source.

State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via Reinforcement Learning with Verifiable Rewards (RLVR), a natural question arises: do enhanced reasoning translate to superior semantic representations when these models serve as embedding initializations? Contrary to expectation, our evaluation on MTEB and BRIGHT reveals a null effect: embedding models initialized from RLVR-tuned backbones yield no consistent performance advantage over their base counterparts when subjected to identical training recipes. To unpack this paradox, we introduce Hierarchical Representation Similarity Analysis (HRSA), a framework that decomposes similarity across representation, geometry, and function levels. HRSA reveals that while RLVR induces irreversible latent manifold’s local geometry reorganization and reversible coordinate basis drift, it preserves the global manifold geometry and linear readout. Consequently, subsequent contrastive learning drives strong alignment between base- and reasoning-initialized models, a phenomenon we term Manifold Realignment. Empirically, our findings suggest that unlike Supervised Fine-Tuning (SFT), RLVR optimizes trajectories within an existing semantic landscape rather than fundamentally restructuring the landscape itself.


💡 Research Summary

The paper investigates whether reasoning models trained with Reinforcement Learning with Verifiable Rewards (RLVR) can improve the quality of text embeddings when used as backbones for contrastive‑learning (CL) based embedding models. Recent advances have shown that RLVR dramatically boosts complex problem‑solving and reasoning abilities in large language models (LLMs). The authors therefore hypothesize that such enhanced reasoning might translate into richer semantic representations for downstream embedding tasks.

To test this, they take several state‑of‑the‑art decoder‑only LLMs (e.g., Qwen, DeepSeek) and create matched pairs: a base model (M_base) and its RLVR‑fine‑tuned counterpart (M_reason). Both are then stripped of the language‑model head, pooled, and trained with the same InfoNCE contrastive objective on a variety of datasets. Evaluation is performed on the multilingual, code, and general‑purpose benchmarks of MTEB as well as the BRIGHT suite, covering retrieval, clustering, and semantic similarity tasks.

Across all settings, the RLVR‑initialized embedding models (M_Emb_reason) achieve performance statistically indistinguishable from the base‑initialized models (M_Emb_base). The performance gaps are near zero (±0.1–0.3 points), in stark contrast to supervised fine‑tuning (SFT), which often yields noticeable shifts. This “null effect” suggests that RLVR does not inherently enhance embedding quality under identical CL recipes.

To understand why, the authors introduce HRSA (Hierarchical Representation Similarity Analysis), a three‑level framework inspired by RSA.

  1. Representation Level examines coordinate‑wise feature alignment using dimension‑wise correlation and Orthogonal Procrustes analysis. RLVR shows high alignment; any drift can be corrected by a simple orthogonal rotation or permutation.
  2. Geometry Level assesses the shape of the latent manifold independent of coordinate basis. Linear CKA measures global geometry (isometric transformations allowed), while k‑NN overlap captures local neighborhood preservation. Results reveal that RLVR preserves the global manifold geometry (isometric) but reorganizes local neighborhoods, indicating irreversible local distortions but a stable overall shape.
  3. Function Level evaluates downstream task behavior via cross‑model linear probes. Both RLVR and base models share virtually identical linear readouts, confirming functional equivalence.

Combining these analyses, the authors propose the “Manifold Realignment” phenomenon: RLVR optimizes trajectories within an existing semantic landscape rather than reshaping the landscape itself. When the same CL fine‑tuning is later applied, the contrastive objective realigns the latent space, effectively erasing the modest coordinate drift introduced by RLVR. Consequently, the final embedding spaces of RLVR‑initialized and base‑initialized models become nearly identical.

The study concludes that current RLVR methods do not improve embedding quality, but they do not degrade it either; they maintain the semantic backbone while enhancing reasoning capabilities. HRSA offers a systematic toolkit for future work to dissect how various fine‑tuning regimes affect representation, geometry, and function, and highlights that improvements in reasoning do not automatically translate into better embeddings.


Comments & Academic Discussion

Loading comments...

Leave a Comment