QUICK REVIEW

[论文解读] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

Anas Barakat, Souradip Chakraborty|arXiv (Cornell University)|Feb 24, 2026

Topic Modeling被引用 0

一句话总结

该论文分析了为什么为 pass@k 进行优化可能损害 pass@1，原因在于提示干扰导致的梯度冲突，并在 LLM 数学推理任务上提出了一个形式化框架并进行了经验验证。

ABSTRACT

Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.

研究动机与目标

解释为什么优化 pass@k 可能降低 LLM 事后训练中的 pass@1。
引入提示干扰（prompt interference）的概念以及梯度冲突机制。
给出充要条件并分析 k 如何影响 pass@k 与 pass@1 的梯度对齐。
在可验证的数学推理任务上提供经验性证据以阐明理论。

提出的方法

定义 pass@k 目标并推导其带重加权系数 w_k(p_theta(x)) 的策略梯度。
引入提示相似性核 kappa_theta，用于衡量对 pass@1 的每个提示梯度相似度。
将提示干扰形式化为基于提示之间梯度相似度的正向或负向。
通过内积 <∇J_k(θ), ∇J_1(θ)> 表征梯度冲突，并推导包含重加权提示分布的条件。
使用一个 toy 情境多臂老虎机示例来说明负向提示干扰和梯度冲突。
给出在增加 k 时可能导致梯度冲突和对 pass@1 退化的解析条件。

Figure 1 : (a) Empirical trade-off. Under pass@k policy optimization, pass@ $k$ increases while pass@ $1$ may decrease. We explain this empirically observed trade-off in (b) and (c) , which schematically illustrate the pass@ $1$ and pass@ $k$ ( $k>1$ ) gradients for three prompts and their expectati

实验结果

研究问题

RQ1在何种条件下 pass@k 的优化会与 pass@1 的梯度产生冲突？
RQ2pass@k 中的隐式重加权如何影响提示分布及由此产生的梯度方向？
RQ3负向提示干扰在优化 pass@k 时退化 pass@1 中起到了怎样的作用？
RQ4选择 k 如何影响 pass@k 与 pass@1 的梯度冲突的可能性？

主要发现

Pass@k 梯度是逐提示的 pass@1 梯度的加权版本，可能在平均意义上与 pass@1 冲突。
Pass@k 会引入对更难提示的再加权，这可能提高对负向干扰提示的权重，从而引发梯度冲突。
若 pass@k 导致的提示分布对平均一致性 a_theta(x) 的期望为负，则梯度冲突的充分条件成立。
pass@k 的权重与一致性得分之间的协方差可以驱动冲突，导致梯度之间的钝角。
一个 toy 示例表明，一步的 pass@k 更新可能提高 pass@k 却降低 pass@1。
在某些提示成功概率分布下，增大 k 可能放大梯度冲突。

Figure 2 : Cosine kernel heatmap: cos ( ∇ p θ ( x ) , ∇ p θ ( x ′ ) ) for subsamples of prompts: 120 easy and 80 hard among a total of 6000 samples. Blue regions correspond to negative prompt interference.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。