[논문 리뷰] Student Guides Teacher: Weak-to-Strong Inference via Spectral Orthogonal Exploration
이 논문은 Spectral Orthogonal Exploration(SOE)를 Weak-Student/Strong-Teacher 설정과 함께 제시하여 LLM의 추론 붕괴를 탈피하고 hard 수학 벤치마크에서 문제 해결 정확도와 탐색 효율을 향상시킵니다.
While Large Language Models (LLMs) demonstrate near-human capabilities, they often suffer from "Reasoning Collapse" in complex mathematical proving and long-horizon planning. Models tend to degenerate into low-rank Bias Manifold, where stochastic sampling merely produces lexical variations of erroneous logic rather than semantic exploration. This geometric collapse renders the model "blind" to high-value solutions that lie within its Null Space. To address this, we propose Spectral Orthogonal Exploration (SOE), a geometric framework operating on a counter-intuitive "Student Guides Teacher" paradigm. Specifically, we utilize a weak auxiliary agent not for imitation, but as an orthogonal probe. By explicitly navigating the Teacher's Null Space, SOE serves as a geometric bridge, effectively ejecting the model from local optima to explore diverse, high-value solution spaces. Experiments on mathematical benchmarks demonstrate that, relative to baseline methods, our approach improves average accuracy by 62.4% and increases average sampling efficiency by 113.7%, indicating a promising path toward overcoming performance plateaus in advanced reasoning tasks.
연구 동기 및 목표
- Motivate and diagnose a geometric failure mode in LLM reasoning known as Reasoning Collapse and Low-Rank Manifold Hypothesis.
- Propose a geometric intervention (SOE) that uses an orthogonal probe to expand the reasoning space.
- Demonstrate improved solution discovery and pass rates on challenging mathematical benchmarks.
제안 방법
- Model a Weak-Student as an Orthogonal Probe to escape the Teacher’s Null Space.
- Estimate the Teacher’s bias manifold via Monte Carlo look-ahead and Micro-SVD to obtain top-k principal directions.
- Compute an Orthogonality Score for Student probes and select the probe that maximizes projection orthogonal to the Teacher’s dominant eigenspace.
- Stitch the selected orthogonal probe into the Teacher’s reasoning context and resume inference to recover correct solutions.
- Quantify improvements in Pass@16 and analyze exploration efficiency vs. compute budget.

실험 결과
연구 질문
- RQ1What geometric factors cause Reasoning Collapse in large language models during long-horizon reasoning?
- RQ2Can an orthogonal, heterogeneously sourced probe (from a weaker student) widen the Teacher’s search space and recover high-quality solutions?
- RQ3How does SOE impact solution accuracy and exploration efficiency on difficult mathematical benchmarks?
주요 결과
- SOE yields substantial improvements over the Baseline Self-Consistency across benchmarks: AIME 24 (76.9% vs 38.5%), AIME 25 (70.6% vs 35.3%), MATH-500 (45.9% vs 33.7%), Olympiad Bench (15.5% vs 11.7%), Omni-Math (Hard) (20.8% vs 14.5%), averaging +62.4% relative improvement.
- SOE achieves higher semantic exploration efficiency, maintaining a near-linear discovery rate while the Baseline saturates.
- Orthogonality scores for Student probes are consistently high across benchmarks, supporting the geometric mechanism of exiting the bias manifold.
- The framework incurs per-step latency (~2.60s on AIME_2025 benchmark) but provides substantial gains in discovering correct reasoning traces.

더 나은 연구,지금 바로 시작하세요
연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.
카드 등록 없음 · 무료 플랜 제공
이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.