QUICK REVIEW

[论文解读] Do Reasoning Models Enhance Embedding Models?

Wun Yu Chan, Shaojin Chen|arXiv (Cornell University)|Jan 29, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

RLVR 调整的推理模型骨干在与基线骨干在相同对比学习方案下训练时，未能始终提升嵌入模型性能；HRSA揭示全局几何保持与局部几何重组，导致流形重新对齐。

ABSTRACT

State-of-the-art embedding models are increasingly derived from decoder-only Large Language Model (LLM) backbones adapted via contrastive learning. Given the emergence of reasoning models trained via Reinforcement Learning with Verifiable Rewards (RLVR), a natural question arises: do enhanced reasoning translate to superior semantic representations when these models serve as embedding initializations? Contrary to expectation, our evaluation on MTEB and BRIGHT reveals a **null effect**: embedding models initialized from RLVR-tuned backbones yield no consistent performance advantage over their base counterparts when subjected to identical training recipes. To unpack this paradox, we introduce **H**ierarchical **R**epresentation **S**imilarity **A**nalysis (HRSA), a framework that decomposes similarity across representation, geometry, and function levels. HRSA reveals that while RLVR induces irreversible latent manifold's local geometry reorganization and reversible coordinate basis drift, it preserves the global manifold geometry and linear readout. Consequently, subsequent contrastive learning drives strong alignment between base- and reasoning-initialized models, a phenomenon we term **Manifold Realignment**. Empirically, our findings suggest that unlike Supervised Fine-Tuning (SFT), RLVR optimizes trajectories within an existing semantic landscape rather than fundamentally restructuring the landscape itself.

研究动机与目标

评估在相同训练方案下，RLVR 优化的推理骨干是否会提升文本嵌入质量相较于基线骨干。
量化并理解 RLVR 在嵌入骨干中引发的表征变化。
提供一个框架来解释为何 RLVR 与监督微调会以不同方式影响表示。

提出的方法

在相同 InfoNCE 对比目标下，训练成对的基线和 RLVR 调整后骨干作为嵌入模型。
在多样化基准上评估：MTEB Multilingual v2、MTEB Code v1 和 BRIGHT。
开发分层表征相似性分析（HRSA），将相似性分解为表示层、几何层和功能层。
使用逐维相关性和正交 Procrustes 分析表示层；线性 CK A 和 k-NN 重叠用于几何层；跨模型线性探针用于功能层。
比较 SFT、RLVR 与对比学习后嵌入空间，以识别流形重新对齐。

Figure 1 : Latent manifold and model relationships. CL , SFT , and RLVR denote Contrastive Learning, Supervised Fine-Tuning, and Reinforcement Learning with Verifiable Rewards, respectively. z indicates the representations of the corresponding models. Suffix “-Emb” is added to the model name to indi

实验结果

研究问题

RQ1在相同训练设置下，基线骨干与 RLVR 调整骨干的嵌入质量是否存在统计显著提升？
RQ2HRSA 如何揭示 RLVR 与 SFT 在潜在流形的表示、几何和功能方面的不同影响？
RQ3对比学习能否覆盖 RLVR 引起的漂移，使基线与推理初始化的嵌入模型对齐？
RQ4在观察到的嵌入并未提升的现象背后，存在哪些机制？

主要发现

从 RLVR 调整骨干初始化的嵌入模型在 MTEB Multilingual v2、MTEB Code v1 和 BRIGHT 的性能与基线相当而非优越。
HRSA 显示 RLVR 保留全局流形几何和线性读出，同时不可逆地重组局部几何，导致经过长期训练后的坐标基漂移。
对比学习重新对齐了基线与 RLVR 初始化的嵌入模型，指示存在流形重新对齐，即全局结构保持但局部邻域不同。
RLVR 作为在稳定语义景观中的轨迹优化，而非像监督微调那样重建景观。
跨模型线性探针显示 RLVR 的迁移性高于 SFT，表明功能方向在不同模型之间仍然兼容。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。