QUICK REVIEW

[论文解读] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang, Jiaye Yang|arXiv (Cornell University)|Jan 6, 2026

Multimodal Machine Learning Applications被引用 0

一句话总结

论文提出 Spectral Orthogonal Exploration (SOE) 及弱-学生/强-教师设置，用以逃离大型语言模型中的推理崩溃，提升在困难数学基准上的问题求解准确性和探索效率。

ABSTRACT

While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.

研究动机与目标

Motivate and diagnose a geometric failure mode in LLM reasoning known as Reasoning Collapse and Low-Rank Manifold Hypothesis.
Propose a geometric intervention (SOE) that uses an orthogonal probe to expand the reasoning space.
Demonstrate improved solution discovery and pass rates on challenging mathematical benchmarks.

提出的方法

Model a Weak-Student as an Orthogonal Probe to escape the Teacher’s Null Space.
Estimate the Teacher’s bias manifold via Monte Carlo look-ahead and Micro-SVD to obtain top-k principal directions.
Compute an Orthogonality Score for Student probes and select the probe that maximizes projection orthogonal to the Teacher’s dominant eigenspace.
Stitch the selected orthogonal probe into the Teacher’s reasoning context and resume inference to recover correct solutions.
Quantify improvements in Pass@16 and analyze exploration efficiency vs. compute budget.

Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl

实验结果

研究问题

RQ1What geometric factors cause Reasoning Collapse in large language models during long-horizon reasoning?
RQ2Can an orthogonal, heterogeneously sourced probe (from a weaker student) widen the Teacher’s search space and recover high-quality solutions?
RQ3How does SOE impact solution accuracy and exploration efficiency on difficult mathematical benchmarks?

主要发现

Dataset	Baseline (Self-Consistency)	Ours (SOE)	Relative Improvement
AIME 24	38.5%	76.9%	+99.7%
AIME 25	35.3%	70.6%	+100.0%
MATH-500	33.7%	45.9%	+36.2%
Olympiad Bench	11.7%	15.5%	+32.5%
Omni-Math (Hard)	14.5%	20.8%	+43.4%
Average	26.7%	45.9%	+62.4%

SOE yields substantial improvements over the Baseline Self-Consistency across benchmarks: AIME 24 (76.9% vs 38.5%), AIME 25 (70.6% vs 35.3%), MATH-500 (45.9% vs 33.7%), Olympiad (15.5% vs 11.7%), Omni-Math (Hard) (20.8% vs 14.5%), averaging +62.4% relative improvement.
SOE achieves higher semantic exploration efficiency, maintaining a near-linear discovery rate while the Baseline saturates.
Orthogonality scores for Student probes are consistently high across benchmarks, supporting the geometric mechanism of exiting the bias manifold.
The framework incurs per-step latency (~2.60s on AIME_2025 benchmark) but provides substantial gains in discovering correct reasoning traces.

Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。