QUICK REVIEW

[論文レビュー] ToolAlpaca: Generalized Tool Learning for Language Models with 3000 Simulated Cases

Qiaoyu Tang, Ziliang Deng|arXiv (Cornell University)|Jun 8, 2023

Topic Modeling被引用数 16

ひとこと要約

ToolAlpaca は自動的に多様なツール使用コーパスを構築し、コンパクトな言語モデルをファインチューニングして一般化されたツール使用能力を実現する。未見のツールでGPT-3.5と競えるのは約3.9k個のシミュレーションケースのみを用いる。

ABSTRACT

Enabling large language models to utilize real-world tools effectively is crucial for achieving embodied intelligence. Existing approaches to tool learning have either primarily relied on extremely large language models, such as GPT-4, to attain generalized tool-use abilities in a zero-shot manner, or utilized supervised learning to train limited scopes of tools on compact models. However, it remains uncertain whether smaller language models can achieve generalized tool-use abilities without tool-specific training. To address this question, this paper introduces ToolAlpaca, a novel framework designed to automatically generate a diverse tool-use corpus and learn generalized tool-use abilities on compact language models with minimal human intervention. Specifically, ToolAlpaca first automatically creates a highly diversified tool-use corpus by building a multi-agent simulation environment. The corpus contains 3938 tool-use instances from more than 400 real-world tool APIs spanning 50 distinct categories. Subsequently, the constructed corpus is employed to fine-tune compact language models, resulting in two models, namely ToolAlpaca-7B and ToolAlpaca-13B, respectively. Finally, we evaluate the ability of these models to utilize previously unseen tools without specific training. Experimental results demonstrate that ToolAlpaca achieves effective generalized tool-use capabilities comparable to those of extremely large language models like GPT-3.5, demonstrating that learning generalized tool-use ability is feasible for compact language models.

研究の動機と目的

ツール固有の訓練なしで、コンパクトな言語モデルが一般化されたツール使用能力を獲得できるかを示す。
小型LMのファインチューニングに適した多様で構造化されたツール使用コーパスを自動生成する。
ToolAlpaca コーパスでのファインチューニングが未見ツールと実世界のAPIへ一般化を可能にすることを示す。

提案手法

公開APIから400以上の実世界ツールを標準化されたドキュメント（名称、導入、説明、機能ドキュメント、OpenAPI仕様）に変換して多様なツールセットを構築する。
LLMsを用いた3エージェントの多回対話シミュレーション（ユーザー、アシスタント、ツール実行者）を通じて3,938件のツール使用事例を生成する。
生成されたコーパスを使ってコンパクトなLM（Vicuna-7Bと Vicuna-13B）をファインチューニングし、未見のシミュレート済みおよび実世界ツールで評価する。
GPT-4による機械評価と選択サブセットでの手動評価を用いてマルチモーダルおよび未見ツールへの一般化を評価する。
ツールセットの多様性が一般化性能に与える影響を定量化する。

Figure 1: A high-level overview of ToolAlpaca, consisting of three components: (1)Toolset construction, where structured documentation for each tool is generated based on the brief introductions provided by public-apis. (2) Tool-use instance generation via multi-agent simulation. (3) ToolAlpaca mode

実験結果

リサーチクエスチョン

RQ1コンパクトな言語モデルはツール固有の訓練なしで一般化されたツール使用能力を学べるか。
RQ2自動化された多様な合成データが、コンパクトLMを未見ツールや実世界のAPIへ一般化させるのに役立つか。
RQ3ツールセットの多様性は一般化性能にどう影響するか。
RQ4ToolAlpacaは未見ツールで大規模LM（例：GPT-3.5）とどう比較されるか。

主な発見

ファインチューニングされた ToolAlpaca-7B および ToolAlpaca-13B は、未見ツールでベースの Vicuna モデルより高い受容度/精度を達成した。
ToolAlpaca-13B は未見ツールでGPT-3.5に匹敵する性能を達成する。
3.9kのシミュレートケースでの訓練により実世界のAPIへ一般化が可能となり、ToolAlpacaはVicunaのベースラインより優れている。
ToolAlpaca は外部データセット外のマルチモーダルツール（GPT4Tools テストセット）で強い一般化を示した。
ツールセットの多様性を高めると（同じインスタンス数でも）検証性能が向上する。
多様性はコンパクトモデルの一般化ツール学習を可能にする重要な要因である。

Figure 2: An instance of a tool documentation, composed of five essential parts: name, introduction, description, function documentation, OpenAPI specification .

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。