QUICK REVIEW

[論文レビュー] Why Pass@k Optimization Can Degrade Pass@1: Prompt Interference in LLM Post-training

Anas Barakat, Souradip Chakraborty|arXiv (Cornell University)|Feb 24, 2026

Topic Modeling被引用数 0

ひとこと要約

論文は、パス@kを最適化することが、プロンプト干渉による勾配の衝突が原因でパス@1を害する理由を分析し、 formalな枠組みとLLMの数学推論タスクでの実証を導入する。

ABSTRACT

Pass@k is a widely used performance metric for verifiable large language model tasks, including mathematical reasoning, code generation, and short-answer reasoning. It defines success if any of $k$ independently sampled solutions passes a verifier. This multi-sample inference metric has motivated inference-aware fine-tuning methods that directly optimize pass@$k$. However, prior work reports a recurring trade-off: pass@k improves while pass@1 degrades under such methods. This trade-off is practically important because pass@1 often remains a hard operational constraint due to latency and cost budgets, imperfect verifier coverage, and the need for a reliable single-shot fallback. We study the origin of this trade-off and provide a theoretical characterization of when pass@k policy optimization can reduce pass@1 through gradient conflict induced by prompt interference. We show that pass@$k$ policy gradients can conflict with pass@1 gradients because pass@$k$ optimization implicitly reweights prompts toward low-success prompts; when these prompts are what we term negatively interfering, their upweighting can rotate the pass@k update direction away from the pass@1 direction. We illustrate our theoretical findings with large language model experiments on verifiable mathematical reasoning tasks.

研究の動機と目的

LLMポスト訓練において、なぜパス@kを最適化するとパス@1が低下するのかを説明する。
プロンプト干渉の概念と勾配衝突の機構を導入する。
kがパス@kとパス@1の勾配整合性に與える影響を分析する十分条件と分析を提供する。
検証可能な数学推論タスクで経験的証拠を示し、理論を補強する。

提案手法

パス@k目的関数を定義し、再重み付け係数 w_k(p_theta(x)) を用いた政策勾配を導出する。
パス@1の勾配類似度を測るプロンプト類似度カーネル kappa_theta を導入する。
プロンプト干渉を、勾配類似度に基づいて正の干渉または負の干渉として形式化する。
内積 <∇J_k(θ), ∇J_1(θ)> の観点から勾配衝突を特徴づけ、再重み付けされたプロンプト分布を用いた条件を導出する。
負のプロンプト干渉と勾配衝突を説明するためにおもちゃのコンテキストバンディットの例を用いる。
kを大きくすることで勾配衝突が生じ、パス@1の劣化につながる解析条件を提供する。

Figure 1 : (a) Empirical trade-off. Under pass@k policy optimization, pass@ $k$ increases while pass@ $1$ may decrease. We explain this empirically observed trade-off in (b) and (c) , which schematically illustrate the pass@ $1$ and pass@ $k$ ( $k>1$ ) gradients for three prompts and their expectati

実験結果

リサーチクエスチョン

RQ1パス@kの最適化がパス@1の勾配と衝突する条件は何か。
RQ2パス@kにおける暗黙の再重み付けが、プロンプトの分布と結果としての勾配方向にどのような影響を与えるか。
RQ3負のプロンプト干渉が、パス@kを最適化する際にパス@1を劣化させる役割は何か。
RQ4プロンプトの成功確率分布の異なる場合、kの選択がパス@kとパス@1間の勾配衝突の可能性にどう影響するか。

主な発見

パス@k の勾配は、個々のプロンプトのパス@1 勾配の加重版であり、平均的にはパス@1と衝突する可能性がある。
パス@k は難易度の高いプロンプトへ再ウェイトを誘導し、負の干渉プロンプトを過度に重視して勾配衝突を引き起こしうる。
勾配衝突の十分条件は、パス@k によって誘発されるプロンプト分布が平均的な一致度 a_theta(x) を負に持つ場合である。
パス@k の重みと一致度スコアの共分散が衝突を引き起こし、勾配間の鈍い角度を生む。
1ステップのパス@k 更新が、パス@k を増やす一方でパス@1を減少させることを示すおもちゃの例。
kを増やすことは、プロンプト成功確率の分布が特定の場合に勾配衝突を増幅する。

Figure 2 : Cosine kernel heatmap: cos ( ∇ p θ ( x ) , ∇ p θ ( x ′ ) ) for subsamples of prompts: 120 easy and 80 hard among a total of 6000 samples. Blue regions correspond to negative prompt interference.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。