Skip to main content
QUICK REVIEW

[論文レビュー] Reasoning Beyond Chain-of-Thought: A Latent Computational Mode in Large Language Models

Zhenghao He, Kèyù Zhü|arXiv (Cornell University)|Jan 12, 2026
Topic Modeling被引用数 1
ひとこと要約

This is a direct translation task. The provided tldr should be translated into Japanese while preserving numbers and technical terms. The content describes a paper identifying latent features in LLMs via Sparse Autoencoders and targeted steering to trigger or enhance multi-step reasoning without explicit CoT prompting, matching or surpassing CoT in some cases.

ABSTRACT

Chain-of-Thought (CoT) prompting has improved the reasoning performance of large language models (LLMs), but it remains unclear why it works and whether it is the unique mechanism for triggering reasoning in large language models. In this work, we study this question by directly analyzing and intervening on the internal representations of LLMs with Sparse Autoencoders (SAEs), identifying a small set of latent features that are causally associated with LLM reasoning behavior. Across multiple model families and reasoning benchmarks, we find that steering a single reasoning-related latent feature can substantially improve accuracy without explicit CoT prompting. For large models, latent steering achieves performance comparable to standard CoT prompting while producing more efficient outputs. We further observe that this reasoning-oriented internal state is triggered early in generation and can override prompt-level instructions that discourage explicit reasoning. Overall, our results suggest that multi-step reasoning in LLMs is supported by latent internal activations that can be externally activated, while CoT prompting is one effective, but not unique, way of activating this mechanism rather than its necessary cause.

研究の動機と目的

  • Investigate whether multi-step reasoning in LLMs is linked to a latent internal mechanism beyond explicit CoT prompts.
  • Identify latent features associated with reasoning using a two-stage SAE-based pipeline.
  • Demonstrate causal effects of targeted latent steering on reasoning accuracy across multiple models and benchmarks.

提案手法

  • Use a two-stage pipeline: (i) feature discovery by projecting token activations through a pretrained Sparse Autoencoder (SAE) to obtain sparse latent features; (ii) causal validation via targeted latent steering injected at the first generation step.
  • Aggregate latent features at early generation steps and compare prompt-induced activations under direct vs. CoT prompting to identify reasoning-related features.
  • Apply an additive, pre-activation steering intervention on selected latent features, followed by a residual injection to minimize reconstruction bias.
  • Evaluate intervention sensitivity by singleton feature perturbations on training data and confirm effects on held-out test sets.
  • Assess latency and timing: steering early in generation tends to be more effective, with features peaking early and decaying thereafter.
  • Compare steering with CoT prompting across model families up to 70B to show steered direct prompts can match or exceed CoT performance with fewer tokens.

実験結果

リサーチクエスチョン

  • RQ1Can a latent internal mechanism for reasoning be triggered without explicit CoT prompts, by steering latent internal features?
  • RQ2Are there small, causally influential latent features whose activation improves reasoning accuracy when steered?
  • RQ3How does latent steering compare to Chain-of-Thought prompting in terms of accuracy and token efficiency across model scales?
  • RQ4When during generation should steering be applied for maximal effect, and does it override prompt-level instructions?
  • RQ5Do steering effects generalize across prompting styles and model families?

主な発見

  • A small set of latent features identified by SAE are causally associated with reasoning behavior.
  • Steering a single latent feature at the first generation step can improve reasoning accuracy to match or exceed CoT prompting in several benchmarks.
  • Latent steering often yields shorter reasoning traces than explicit CoT, especially in large models.
  • The reasoning-oriented internal state is triggered early in generation and can override prompts that discourage explicit reasoning.
  • Early, targeted interventions are more effective than late or broad activations.
  • Across six model families (up to 70B), steering demonstrates robust improvements on GSM8K, GPQA, and BBH benchmarks, with variable effects depending on task reliance on multi-step reasoning.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。