[论文解读] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration
论文提出 Spectral Orthogonal Exploration (SOE) 及弱-学生/强-教师设置,用以逃离大型语言模型中的推理崩溃,提升在困难数学基准上的问题求解准确性和探索效率。
While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.
研究动机与目标
- Motivate and diagnose a geometric failure mode in LLM reasoning known as Reasoning Collapse and Low-Rank Manifold Hypothesis.
- Propose a geometric intervention (SOE) that uses an orthogonal probe to expand the reasoning space.
- Demonstrate improved solution discovery and pass rates on challenging mathematical benchmarks.
提出的方法
- Model a Weak-Student as an Orthogonal Probe to escape the Teacher’s Null Space.
- Estimate the Teacher’s bias manifold via Monte Carlo look-ahead and Micro-SVD to obtain top-k principal directions.
- Compute an Orthogonality Score for Student probes and select the probe that maximizes projection orthogonal to the Teacher’s dominant eigenspace.
- Stitch the selected orthogonal probe into the Teacher’s reasoning context and resume inference to recover correct solutions.
- Quantify improvements in Pass@16 and analyze exploration efficiency vs. compute budget.

实验结果
研究问题
- RQ1What geometric factors cause Reasoning Collapse in large language models during long-horizon reasoning?
- RQ2Can an orthogonal, heterogeneously sourced probe (from a weaker student) widen the Teacher’s search space and recover high-quality solutions?
- RQ3How does SOE impact solution accuracy and exploration efficiency on difficult mathematical benchmarks?
主要发现
| Dataset | Baseline (Self-Consistency) | Ours (SOE) | Relative Improvement |
|---|---|---|---|
| AIME 24 | 38.5% | 76.9% | +99.7% |
| AIME 25 | 35.3% | 70.6% | +100.0% |
| MATH-500 | 33.7% | 45.9% | +36.2% |
| Olympiad Bench | 11.7% | 15.5% | +32.5% |
| Omni-Math (Hard) | 14.5% | 20.8% | +43.4% |
| Average | 26.7% | 45.9% | +62.4% |
- SOE yields substantial improvements over the Baseline Self-Consistency across benchmarks: AIME 24 (76.9% vs 38.5%), AIME 25 (70.6% vs 35.3%), MATH-500 (45.9% vs 33.7%), Olympiad (15.5% vs 11.7%), Omni-Math (Hard) (20.8% vs 14.5%), averaging +62.4% relative improvement.
- SOE achieves higher semantic exploration efficiency, maintaining a near-linear discovery rate while the Baseline saturates.
- Orthogonality scores for Student probes are consistently high across benchmarks, supporting the geometric mechanism of exiting the bias manifold.
- The framework incurs per-step latency (~2.60s on AIME_2025 benchmark) but provides substantial gains in discovering correct reasoning traces.

更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。