Skip to main content
QUICK REVIEW

[论文解读] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

Anas Barakat, Souradip Chakraborty|arXiv (Cornell University)|Feb 24, 2026
Topic Modeling被引用 0
一句话总结

该论文分析了为什么为 pass@k 进行优化可能损害 pass@1,原因在于提示干扰导致的梯度冲突,并在 LLM 数学推理任务上提出了一个形式化框架并进行了经验验证。

ABSTRACT

Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.

研究动机与目标

  • 解释为什么优化 pass@k 可能降低 LLM 事后训练中的 pass@1。
  • 引入提示干扰(prompt interference)的概念以及梯度冲突机制。
  • 给出充要条件并分析 k 如何影响 pass@k 与 pass@1 的梯度对齐。
  • 在可验证的数学推理任务上提供经验性证据以阐明理论。

提出的方法

  • 定义 pass@k 目标并推导其带重加权系数 w_k(p_theta(x)) 的策略梯度。
  • 引入提示相似性核 kappa_theta,用于衡量对 pass@1 的每个提示梯度相似度。
  • 将提示干扰形式化为基于提示之间梯度相似度的正向或负向。
  • 通过内积 <∇J_k(θ), ∇J_1(θ)> 表征梯度冲突,并推导包含重加权提示分布的条件。
  • 使用一个 toy 情境多臂老虎机示例来说明负向提示干扰和梯度冲突。
  • 给出在增加 k 时可能导致梯度冲突和对 pass@1 退化的解析条件。
Figure 1 : (a) Empirical trade-off. Under pass@k policy optimization, pass@ $k$ increases while pass@ $1$ may decrease. We explain this empirically observed trade-off in (b) and (c) , which schematically illustrate the pass@ $1$ and pass@ $k$ ( $k>1$ ) gradients for three prompts and their expectati
Figure 1 : (a) Empirical trade-off. Under pass@k policy optimization, pass@ $k$ increases while pass@ $1$ may decrease. We explain this empirically observed trade-off in (b) and (c) , which schematically illustrate the pass@ $1$ and pass@ $k$ ( $k>1$ ) gradients for three prompts and their expectati

实验结果

研究问题

  • RQ1在何种条件下 pass@k 的优化会与 pass@1 的梯度产生冲突?
  • RQ2pass@k 中的隐式重加权如何影响提示分布及由此产生的梯度方向?
  • RQ3负向提示干扰在优化 pass@k 时退化 pass@1 中起到了怎样的作用?
  • RQ4选择 k 如何影响 pass@k 与 pass@1 的梯度冲突的可能性?

主要发现

  • Pass@k 梯度是逐提示的 pass@1 梯度的加权版本,可能在平均意义上与 pass@1 冲突。
  • Pass@k 会引入对更难提示的再加权,这可能提高对负向干扰提示的权重,从而引发梯度冲突。
  • 若 pass@k 导致的提示分布对平均一致性 a_theta(x) 的期望为负,则梯度冲突的充分条件成立。
  • pass@k 的权重与一致性得分之间的协方差可以驱动冲突,导致梯度之间的钝角。
  • 一个 toy 示例表明,一步的 pass@k 更新可能提高 pass@k 却降低 pass@1。
  • 在某些提示成功概率分布下,增大 k 可能放大梯度冲突。
Figure 2 : Cosine kernel heatmap: cos ( ∇ p θ ( x ) , ∇ p θ ( x ′ ) ) for subsamples of prompts: 120 easy and 80 hard among a total of 6000 samples. Blue regions correspond to negative prompt interference.
Figure 2 : Cosine kernel heatmap: cos ( ∇ p θ ( x ) , ∇ p θ ( x ′ ) ) for subsamples of prompts: 120 easy and 80 hard among a total of 6000 samples. Blue regions correspond to negative prompt interference.

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。