Skip to main content
QUICK REVIEW

[论文解读] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang, Jiaye Yang|arXiv (Cornell University)|Jan 6, 2026
Multimodal Machine Learning Applications被引用 0
一句话总结

论文提出 Spectral Orthogonal Exploration (SOE) 及弱-学生/强-教师设置,用以逃离大型语言模型中的推理崩溃,提升在困难数学基准上的问题求解准确性和探索效率。

ABSTRACT

While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.

研究动机与目标

  • Motivate and diagnose a geometric failure mode in LLM reasoning known as Reasoning Collapse and Low-Rank Manifold Hypothesis.
  • Propose a geometric intervention (SOE) that uses an orthogonal probe to expand the reasoning space.
  • Demonstrate improved solution discovery and pass rates on challenging mathematical benchmarks.

提出的方法

  • Model a Weak-Student as an Orthogonal Probe to escape the Teacher’s Null Space.
  • Estimate the Teacher’s bias manifold via Monte Carlo look-ahead and Micro-SVD to obtain top-k principal directions.
  • Compute an Orthogonality Score for Student probes and select the probe that maximizes projection orthogonal to the Teacher’s dominant eigenspace.
  • Stitch the selected orthogonal probe into the Teacher’s reasoning context and resume inference to recover correct solutions.
  • Quantify improvements in Pass@16 and analyze exploration efficiency vs. compute budget.
Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl
Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl

实验结果

研究问题

  • RQ1What geometric factors cause Reasoning Collapse in large language models during long-horizon reasoning?
  • RQ2Can an orthogonal, heterogeneously sourced probe (from a weaker student) widen the Teacher’s search space and recover high-quality solutions?
  • RQ3How does SOE impact solution accuracy and exploration efficiency on difficult mathematical benchmarks?

主要发现

DatasetBaseline (Self-Consistency)Ours (SOE)Relative Improvement
AIME 2438.5%76.9%+99.7%
AIME 2535.3%70.6%+100.0%
MATH-50033.7%45.9%+36.2%
Olympiad Bench11.7%15.5%+32.5%
Omni-Math (Hard)14.5%20.8%+43.4%
Average26.7%45.9%+62.4%
  • SOE yields substantial improvements over the Baseline Self-Consistency across benchmarks: AIME 24 (76.9% vs 38.5%), AIME 25 (70.6% vs 35.3%), MATH-500 (45.9% vs 33.7%), Olympiad (15.5% vs 11.7%), Omni-Math (Hard) (20.8% vs 14.5%), averaging +62.4% relative improvement.
  • SOE achieves higher semantic exploration efficiency, maintaining a near-linear discovery rate while the Baseline saturates.
  • Orthogonality scores for Student probes are consistently high across benchmarks, supporting the geometric mechanism of exiting the bias manifold.
  • The framework incurs per-step latency (~2.60s on AIME_2025 benchmark) but provides substantial gains in discovering correct reasoning traces.
Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali
Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali

更好的研究,从现在开始

从论文设计到论文写作,大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成,并经人工编辑审核。