Skip to main content
QUICK REVIEW

[논문 리뷰] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang, Jiaye Yang|arXiv (Cornell University)|2026. 01. 06.
Multimodal Machine Learning Applications인용 수 0
한 줄 요약

이 논문은 Spectral Orthogonal Exploration(SOE)를 Weak-Student/Strong-Teacher 설정과 함께 제시하여 LLM의 추론 붕괴를 탈피하고 hard 수학 벤치마크에서 문제 해결 정확도와 탐색 효율을 향상시킵니다.

ABSTRACT

While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.

연구 동기 및 목표

  • Motivate and diagnose a geometric failure mode in LLM reasoning known as Reasoning Collapse and Low-Rank Manifold Hypothesis.
  • Propose a geometric intervention (SOE) that uses an orthogonal probe to expand the reasoning space.
  • Demonstrate improved solution discovery and pass rates on challenging mathematical benchmarks.

제안 방법

  • Model a Weak-Student as an Orthogonal Probe to escape the Teacher’s Null Space.
  • Estimate the Teacher’s bias manifold via Monte Carlo look-ahead and Micro-SVD to obtain top-k principal directions.
  • Compute an Orthogonality Score for Student probes and select the probe that maximizes projection orthogonal to the Teacher’s dominant eigenspace.
  • Stitch the selected orthogonal probe into the Teacher’s reasoning context and resume inference to recover correct solutions.
  • Quantify improvements in Pass@16 and analyze exploration efficiency vs. compute budget.
Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl
Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl

실험 결과

연구 질문

  • RQ1What geometric factors cause Reasoning Collapse in large language models during long-horizon reasoning?
  • RQ2Can an orthogonal, heterogeneously sourced probe (from a weaker student) widen the Teacher’s search space and recover high-quality solutions?
  • RQ3How does SOE impact solution accuracy and exploration efficiency on difficult mathematical benchmarks?

주요 결과

  • SOE yields substantial improvements over the Baseline Self-Consistency across benchmarks: AIME 24 (76.9% vs 38.5%), AIME 25 (70.6% vs 35.3%), MATH-500 (45.9% vs 33.7%), Olympiad Bench (15.5% vs 11.7%), Omni-Math (Hard) (20.8% vs 14.5%), averaging +62.4% relative improvement.
  • SOE achieves higher semantic exploration efficiency, maintaining a near-linear discovery rate while the Baseline saturates.
  • Orthogonality scores for Student probes are consistently high across benchmarks, supporting the geometric mechanism of exiting the bias manifold.
  • The framework incurs per-step latency (~2.60s on AIME_2025 benchmark) but provides substantial gains in discovering correct reasoning traces.
Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali
Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.