QUICK REVIEW

[論文レビュー] Learning to Rewrite Tool Descriptions for Reliable LLM-Agent Tool Use

Ruocheng Guo, Kaiwen Dong|arXiv (Cornell University)|Feb 23, 2026

Scientific Computing and Data Management被引用数 0

ひとこと要約

論文は Trace-Free+ を提案し、 Trace-rich トレーニングから trace-free デプロイメントへ知識を転移させるカリキュラム学習フレームワークを提示。見たことのないツールに対するツール選択と使用を改善する。

ABSTRACT

The performance of LLM-based agents depends not only on the agent itself but also on the quality of the tool interfaces it consumes. While prior work has focused heavily on agent fine-tuning, tool interfaces-including natural language descriptions and parameter schemas-remain largely human-oriented and often become a bottleneck, especially when agents must select from large candidate tool sets. Existing approaches to improving tool interfaces rely on execution traces, which are frequently unavailable in cold-start or privacy-constrained settings, and typically optimize each tool independently, limiting scalability and generalization to unseen tools. We propose Trace-Free+, a curriculum learning framework that progressively transfers supervision from trace-rich settings to trace-free deployment, encouraging the model to abstract reusable interface-usage patterns and tool usage outcomes. To support this approach, we construct a large-scale dataset of high-quality tool interfaces using a structured workflow over a diverse collection of tools. Experiments on StableToolBench and RestBench show consistent gains on unseen tools, strong cross-domain generalization, and robustness as the number of candidate tools scales to over 100, demonstrating that tool interface optimization is a practical and deployable complement to agent fine-tuning.

研究の動機と目的

LLMベースのツール使用エージェントに対するツールのインターフェース（説明とパラメータスキーマ）の品質と一般化性を向上させる。
コールドスタートやプライバシー制約下での堅牢なツール選択とパラメータ生成を可能にする。
多数のツールに対して高品質なツールインターフェースを生成するためのスケーラブルなデータ合成ワークフローを開発する。
候補ツール集合が100を超えて拡大する際のドメイン横断の一般化とスケーラビリティを示す。

提案手法

実世界のツールを用いた構造化されたエージェント指向ワークフロー（ToolBench のシードは健康と完備性のため精練済み）を用いて高品質なツールインターフェースの大規模データセットを構築する。
ツール間の使用パターンと障害を明らかにするための多段階・依存関係を考慮したユーザークエリを合成する。
2段階の記述改善（D0 -> D1 一般改善；D1 -> D2 RIMRULE による trace-based 改善）を用いてオープンウェイトの LLM をツール記述子生成器として訓練し、trace-based と trace-free 推論を可能にする。
カリキュラム学習を適用し、trace-rich データと trace-free データの両方でモデルを訓練し、徐々に trace-free 監督への依存を高める（Trace-Free+）。
RestBench と StableToolBench で teacher-forcing プロトコルを用いた trace-free および trace-based 設定を評価し、サブタスク・クエリ・ツールレベルの指標を測定する。

Figure 1 : An illustration of the proposed tool interface improvement pipeline. Compared to the original description ( $D0$ ), the learned description generator produces more effective tool descriptions that lead to better tool usage.

実験結果

リサーチクエスチョン

RQ1trace-free トレーニングは推論時に unseen ツールへ trace-based 監督の利点を転移できるか。
RQ2カリキュラム学習戦略はツール候補セットの拡大に伴い一般化とロバスト性を向上させるか。
RQ3learned tool-description generators は trace-free 条件下で trace-based ベースラインおよびプロンプトベース手法と比較してどれだけ良く機能するか。
RQ4ドメイン内・ドメイン跨いだツールセットで改善は一貫しているか。
RQ5複数ヒップのタスクにおけるツール記述の質がツール選択と API 実行成功に与える影響はどれくらいか。

主な発見

Trace-Free+ は unseen ツールに対するサブタスクおよびクエリレベルの成功率を trace-free およびいくつかのベースラインより一貫して改善する。
Trace-Free+ は harder な multi-hop サブセットで D1 を上回り、ツール間依存関係の学習における trace 情報を含むカリキュラムの価値を示す。
Trace-Free+ は優れた跨ドメイン一般化を達成し、StableToolBench Split B で学習した場合に RestBench（TMDB/Spotify）での性能を改善する。
Trace-Free+ は候補ツール数が100を超えて増加する場合の頑健性を示し、ベースラインより性能低下が小さい。
trace-based モデルは trace 内のツール使用パターンからより多くを得るが、trace-free カリキュラムでもコールドスタート条件下で競争力のある結果を得られる。

Figure 2 : The SFT data synthesis pipeline.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。