QUICK REVIEW

[論文レビュー] Agentic Reasoning: A Streamlined Framework for Enhancing LLM Reasoning with Agentic Tools

Junde Wu, Jia Zhu|ArXiv.org|Feb 7, 2025

Artificial Intelligence in Law被引用数 5

ひとこと要約

Agentic Reasoning は、外部エージェント（ウェブ検索、コーディング、Mind Map メモリ）を統合して多段階の、ツール補助の推論を行い、専門レベルのタスクでいくつかのベースラインを上回ることでLLM推論を強化します。

ABSTRACT

We introduce Agentic Reasoning, a framework that enhances large language model (LLM) reasoning by integrating external tool-using agents. Agentic Reasoning dynamically leverages web search, code execution, and structured memory to address complex problems requiring deep research. A key innovation in our framework is the Mind-Map agent, which constructs a structured knowledge graph to store reasoning context and track logical relationships, ensuring coherence in long reasoning chains with extensive tool usage. Additionally, we conduct a comprehensive exploration of the Web-Search agent, leading to a highly effective search mechanism that surpasses all prior approaches. When deployed on DeepSeek-R1, our method achieves a new state-of-the-art (SOTA) among public models and delivers performance comparable to OpenAI Deep Research, the leading proprietary model in this domain. Extensive ablation studies validate the optimal selection of agentic tools and confirm the effectiveness of our Mind-Map and Web-Search agents in enhancing LLM reasoning. The code is at: https://github.com/theworldofagents/Agentic-Reasoning

研究の動機と目的

外部ツールを統合することによって内部推論を超えるLLM推論の改善を動機づける。
ウェブ検索、コード実行、構造化メモリを連携させて深い研究タスクを支援するフレームワークを導入する。
エージェント的推論が専門家レベルのQAと実世界の研究タスクで精度と効率を向上させることを示す。

提案手法

Agentic Reasoning を提案する。ここでは LLM が外部エージェントと専用トークンを介してウェブ検索、コーディング、Mind Map メモリを呼び出す。
Graph-RAG アプローチを用いて推論チェーンから抽出された構造化知識グラフとして Mind Map を構築する。
別の LLM とコンパイラを介してコードを生成し実行するコーディングエージェントを使用し、推論のための結果を返す。
一般タスクの核となる外部ツールとしてウェブ検索とコーディングをツールセットの中心に限定し、Mind Map は構造化メモリと照会を可能にする。
逐次推論と回答生成要素を含む P(r,a|o,q,e,k) として推論プロセスを正式化する。
ツール使用をテスト時検証機として活用した best-of-N スタイルの選択を実証してロバスト性を高める。

Figure 1: The overall workflow of Agentic Reasoning.

実験結果

リサーチクエスチョン

RQ1外部ツールを LLM 推論に統合して複雑で多段階の問題解決を改善するにはどうすればよいか。
RQ2Mind Map 知識グラフは演繹推論と誤誘導的なプロンプトへの耐性を向上させるか。
RQ3ウェブ検索とコーディングエージェントが専門家レベルのQAと深い研究タスクに与える影響は何か。
RQ4ツールベースの推論を test time に best-of-N の選択や verifier 的な仕組みでスケールできるか。

主な発見

Agentic Reasoning は GPQA データセットで、Physics 88.1、Chemistry 58.3、Biology 79.6 の強いGPQA性能を達成。
GPQA Extended Set で、Agentic Reasoning は 75.2 (Phy)、53.1 (Chem)、72.8 (Bio) を記録し、報告された比較で人間の専門家を上回る。
深層研究評価で、Agentic Reasoning は Finance、Medicine、Law の分野の専門家レポート（合格率ベース）で Gemini Deep Research を上回る。
テスト時の推論は、ツール使用の割合が高いほど各問題ごとのパフォーマンスが向上することを示し、best-of-N 検証を堅牢な推論戦略として支持する。
Mind Map は難解な論理問題や戦略的ゲームシナリオ（例: Werewolf）で特に有効で、演繹的正確性と戦略的推論を改善する。

Figure 2: Case study on a complex medical decision-making problem.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。