QUICK REVIEW

[논문 리뷰] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration

Dayu Wang, Jiaye Yang|arXiv (Cornell University)|2026. 01. 06.

Multimodal Machine Learning Applications인용 수 0

한 줄 요약

이 논문은 Spectral Orthogonal Exploration(SOE)를 Weak-Student/Strong-Teacher 설정과 함께 제시하여 LLM의 추론 붕괴를 탈피하고 hard 수학 벤치마크에서 문제 해결 정확도와 탐색 효율을 향상시킵니다.

ABSTRACT

While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.

연구 동기 및 목표

Motivate and diagnose a geometric failure mode in LLM reasoning known as Reasoning Collapse and Low-Rank Manifold Hypothesis.
Propose a geometric intervention (SOE) that uses an orthogonal probe to expand the reasoning space.
Demonstrate improved solution discovery and pass rates on challenging mathematical benchmarks.

제안 방법

Model a Weak-Student as an Orthogonal Probe to escape the Teacher’s Null Space.
Estimate the Teacher’s bias manifold via Monte Carlo look-ahead and Micro-SVD to obtain top-k principal directions.
Compute an Orthogonality Score for Student probes and select the probe that maximizes projection orthogonal to the Teacher’s dominant eigenspace.
Stitch the selected orthogonal probe into the Teacher’s reasoning context and resume inference to recover correct solutions.
Quantify improvements in Pass@16 and analyze exploration efficiency vs. compute budget.

Figure 1: Geometric Interpretation of Reasoning Collapse. We characterize reasoning collapse as the transition of the state space from a high-dimensional Healthy Reasoning Manifold to a low-rank Bias Manifold . This confinement renders high-value solutions in the Null Space geometrically inaccessibl

실험 결과

연구 질문

RQ1What geometric factors cause Reasoning Collapse in large language models during long-horizon reasoning?
RQ2Can an orthogonal, heterogeneously sourced probe (from a weaker student) widen the Teacher’s search space and recover high-quality solutions?
RQ3How does SOE impact solution accuracy and exploration efficiency on difficult mathematical benchmarks?

주요 결과

SOE yields substantial improvements over the Baseline Self-Consistency across benchmarks: AIME 24 (76.9% vs 38.5%), AIME 25 (70.6% vs 35.3%), MATH-500 (45.9% vs 33.7%), Olympiad Bench (15.5% vs 11.7%), Omni-Math (Hard) (20.8% vs 14.5%), averaging +62.4% relative improvement.
SOE achieves higher semantic exploration efficiency, maintaining a near-linear discovery rate while the Baseline saturates.
Orthogonality scores for Student probes are consistently high across benchmarks, supporting the geometric mechanism of exiting the bias manifold.
The framework incurs per-step latency (~2.60s on AIME_2025 benchmark) but provides substantial gains in discovering correct reasoning traces.

Figure 2: Mechanism of Spectral Orthogonal Exploration (SOE). To counteract space narrowing, we introduce an Orthogonal Probe as a geometric intervention. This force effectively disrupts the low-rank confinement and diversifies the reasoning trajectory, expanding the hyper-space to access high-quali

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.