QUICK REVIEW

[論文レビュー] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang, Jiaye Yang|arXiv (Cornell University)|Jan 6, 2026

Multimodal Machine Learning Applications被引用数 0

ひとこと要約

論文は、Spectral Orthogonal Exploration (SOE) をWeak-Student/Strong-Teacher設定とともに導入し、LLMsにおける推論崩壊を回避して難解な数学ベンチマークでの問題解決精度と探索効率を向上させる。

ABSTRACT

While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.

研究の動機と目的

Reasoning Collapse および Low-Rank Manifold Hypothesis として知られるLLM推論の幾何学的故障モードを動機づけ・診断する。
推論空間を拡張する直交プローブを用いた幾何学的介入（SOE）を提案する。
難易度の高い数学ベンチマークでの解法発見と合格率の改善を実証する。

提案手法

Weak-Student を Orthogonal Probe としてモデリングし、Teacher の Null Space から脱出する。
Monte Carlo look-ahead および Micro-SVD を用いて Teacher のバイアス・マニホールドを推定し、トップk 主方向を取得する。
Student プローブの Orthogonality Score を計算し、Teacher の支配的固有空間に正射成分を最大化するプローブを選択する。
選択された直交プローブを Teacher の推論文脈に組み込み、推論を再開して正解の解法を回復する。
Pass@16 の改善を定量化し、探索の効率性を計算予算と比較する。

Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl

実験結果

リサーチクエスチョン

RQ1長い時間スパンの推論中に large language model において Reasoning Collapse を引き起こす幾何学的要因とは何か。
RQ2弱い student からの直交的・異質ソースのプローブは Teacher の探索空間を拡大し、高品質な解法を回復できるか。
RQ3SOE は難解な数学ベンチマークでの解法精度と探索効率にどのような影響を与えるか。

主な発見

Dataset	Baseline (Self-Consistency)	Ours (SOE)	Relative Improvement
AIME 24	38.5%	76.9%	+99.7%
AIME 25	35.3%	70.6%	+100.0%
MATH-500	33.7%	45.9%	+36.2%
Olympiad Bench	11.7%	15.5%	+32.5%
Omni-Math (Hard)	14.5%	20.8%	+43.4%
Average	26.7%	45.9%	+62.4%

SOE は Baseline Self-Consistency に対してベンチマークを跨いで顕著な改善を示す：AIME 24（76.9% vs 38.5%）、AIME 25（70.6% vs 35.3%）、MATH-500（45.9% vs 33.7%）、Olympiad Bench（15.5% vs 11.7%）、Omni-Math (Hard)（20.8% vs 14.5%）、平均で相対改善 +62.4%。
SOE は意味的探索効率を高め、Baseline が飽和する中でほぼ線形に解法発見率を維持する。
Student プローブの Orthogonality スコアはベンチマークを通じて一貫して高く、バイアスマニホールドを脱出する幾何学的機構を支持する。
フレームワークは1ステップあたりの待機時間（AIME_2025ベンチマークで約2.60秒）を伴うが、正しい推論経路を発見する面で大きな利得を提供する。

Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。