QUICK REVIEW

[論文レビュー] Self-Discover: Large Language Models Self-Compose Reasoning Structures

Pei Zhou, Jay Pujara|arXiv (Cornell University)|Feb 6, 2024

Topic Modeling被引用数 9

ひとこと要約

Self-Discover は LLM が自己発見的にタスク固有の推論構造を構成することで、原子推論モジュールを組み合わせ、推論回数を抑えつつ難解な推論ベンチマークを改善します。

ABSTRACT

We introduce SELF-DISCOVER, a general framework for LLMs to self-discover the task-intrinsic reasoning structures to tackle complex reasoning problems that are challenging for typical prompting methods. Core to the framework is a self-discovery process where LLMs select multiple atomic reasoning modules such as critical thinking and step-by-step thinking, and compose them into an explicit reasoning structure for LLMs to follow during decoding. SELF-DISCOVER substantially improves GPT-4 and PaLM 2's performance on challenging reasoning benchmarks such as BigBench-Hard, grounded agent reasoning, and MATH, by as much as 32% compared to Chain of Thought (CoT). Furthermore, SELF-DISCOVER outperforms inference-intensive methods such as CoT-Self-Consistency by more than 20%, while requiring 10-40x fewer inference compute. Finally, we show that the self-discovered reasoning structures are universally applicable across model families: from PaLM 2-L to GPT-4, and from GPT-4 to Llama2, and share commonalities with human reasoning patterns.

研究の動機と目的

固定プロンプトモジュールに依存するのではなく、タスクには固有の推論構造がある枠組みを動機づける。
LLMs がまずタスク固有の推論構造を自己発見し、その後それに従ってインスタンスを解くという2段階プロセスを開発する。
自己発見された構造が従来の prompting 方法よりも効率的で解釈可能であることを示す。
発見された構造のモデルファミリ間での転移性と人間の推論パターンとの整合性を示す。

提案手法

自然言語で記述された原子推論モジュールのシードセットを定義する（例：クリティカルシンキング、ステップバイステップ思考）。
Stage 1: 自己発見は三つのアクション—有用なモジュールを選択し、タスクに適応させ、実行可能な JSON に似た構造を実装する。
Stage 2: decode 時に自己発見された構造に従ってタスクインスタンスを解く。
発見された構造をキーと値の形式（JSON）で表現し、デコードの指針と解釈性を高める。
Self-Discover を zero-shot Direct Prompting、Chain-of-Thought (CoT)、Plan-and-Solve (PS)、および CoT-Self-Consistency のような推論集約ベースラインと比較する。

Figure 1 : Self-Discover guides LLMs to self-discover and compose atomic reasoning modules into a reasoning structure to solve challenging tasks. Through testing on challenging reasoning benchmarks incuding Big Bench-Hard (BBH), agent reasoning (T4D), and MATH, we find that Self-Discover outperforms

実験結果

リサーチクエスチョン

RQ1Self-Discovered reasoning structures は Diverse benchmarks（BBH、T4D、MATH）において LLM の推論を改善できるか。
RQ2自己発見された構造の恩恵を最も受けるタスクカテゴリはどれか、そして他の prompting 手法と比べて効率はどうか。
RQ3自己発見された構造はモデルファミリ間および異なる LLM に転移可能か。

主な発見

手法	BBH	T4D	MATH
PaLM 2-L	56%	30%	45%
PaLM 2-L + CoT	60%	40%	42%
PaLM 2-L + PS	61%	42%	49%
PaLM 2-L + Self-Discover	67%	69%	50.5%
GPT-4	58%	51%	70.5%
GPT-4 + CoT	75%	52%	71%
GPT-4 + PS	73%	53%	70%
GPT-4 + Self-Discover	81%	85%	73%

Self-Discover は PaLM 2-L および GPT-4 の BBH、T4D、MATH での推論性能を向上させ、いくつかの設定で CoT より最大 32% の改善を示す。
23 の BBH タスクで、Self-Discover は PaLM 2-L の場合 CoT より絶対的に 7%、PS より 6% の改善をもたらし、GPT-4 でも同様の向上を示す。
T4D タスクでは、Self-Discover は PaLM 2-L に対してベースラインより絶対改善 ≥27%、GPT-4 に対して 32%、正答率は PaLM 2-L が 69%、GPT-4 が 85%。
MATH では Self-Discover は合理的な gains を示す（PaLM 2-L で 1–7%、GPT-4 で 2–3%）、誤りパターンはほとんど構造より計算の問題に起因。
Self-Discover は CoT-Self-Consistency や多数決投票のような推論集約型ベースラインより 10–40 倍少ない推論回数で性能を維持または向上。
自己発見された構造はモデルファミリ間で転移可能（PaLM 2-L → GPT-4；GPT-4 → Llama-2-70B）し、人間の推論パターンと共通点を示す。

Figure 2 : Illustration of using Self-Discover for problem-solving . Given a generative LM, task, and seed reasoning module descriptions, we guide LMs to generate a reasoning structure in key-value format to solve the task. Finally, models can follow the self-discovered structures to solve the every

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。