QUICK REVIEW

[論文レビュー] Complexity-Based Prompting for Multi-Step Reasoning

Yao Fu, Hao Peng|arXiv (Cornell University)|Oct 3, 2022

Topic Modeling被引用数 73

ひとこと要約

この論文は複雑さに基づくプロンプティングと複雑さに基づく一貫性を導入し、プロンプトのより複雑な推論チェーンを選択し、GPT-3とCodexで複数段階推論ベンチマークで新しい最先端の結果を達成した。

ABSTRACT

We study the task of prompting large-scale language models to perform multi-step reasoning. Existing work shows that when prompted with a chain of thoughts (CoT), sequences of short sentences describing intermediate reasoning steps towards a final answer, large language models can generate new reasoning chains and predict answers for new inputs. A central question is which reasoning examples make the most effective prompts. In this work, we propose complexity-based prompting, a simple and effective example selection scheme for multi-step reasoning. We show that prompts with higher reasoning complexity, i.e., chains with more reasoning steps, achieve substantially better performance on multi-step reasoning tasks over strong baselines. We further extend our complexity-based criteria from prompting (selecting inputs) to decoding (selecting outputs), where we sample multiple reasoning chains from the model, then choose the majority of generated answers from complex reasoning chains (over simple chains). When used to prompt GPT-3 and Codex, our approach substantially improves multi-step reasoning accuracy and achieves new state-of-the-art (SOTA) performance on three math benchmarks (GSM8K, MultiArith, and MathQA) and two BigBenchHard tasks (Date Understanding and Penguins), with an average +5.3 and up to +18 accuracy improvements. Compared with existing example selection schemes like manual tuning or retrieval-based selection, selection based on reasoning complexity is intuitive, easy to implement, and annotation-efficient. Further results demonstrate the robustness of performance gains from complex prompts under format perturbation and distribution shift.

研究の動機と目的

より複雑な推論チェーンを用いたプロンプトで、改良された多段階推論を動機づける。
複雑なプロンプト例を選択するための、簡易で注釈効率の高い方法を提案する。
複雑な推論チェーン間の投票によるデコーディングへのプロンプティングの拡張（Complexity-based Consistency）。
複数のデータセットとモデルタイプにわたる堅牢な性能向上を実証する。

提案手法

複雑なサンプルを、Chain-of-Thought (CoT) における推論ステップが多いものとして定義する。
GPT-3とCodexを用いて、複雑さベースのプロンプティングを手作業作成のCoTプロンプトおよびランダムCoTプロンプトと比較する。
全チェーンではなく、上位K個の複雑なチェーン間での投票によるデコーディング（Complexity-based Consistency）へ拡張する。
GSM8K、MultiArith、MathQA、Date Understanding、Penguins、StrategyQAで評価する。
プロンプト distributions と撹乱に対する堅牢性を示し、交絡因子を分析する。
他の例選択スキーム（ランダム、セントロイド、リトリーバル）と比較する。

実験結果

リサーチクエスチョン

RQ1より複雑な推論チェーンを用いたプロンプティングは、より簡易なプロンプトと比較して、多段階推論の正確性を向上させるか。
RQ2デコード時に最も複雑なチェーンから出力を選択することは、全チェーンでの投票より良い結果をもたらすか。
RQ3複雑さに基づくプロンプティングの利得は、分布シフト、プロンプトの撹乱、およびさまざまな複雑さの代理指標に対して堅牢か。
RQ4複雑さベースのプロンプティングは、データセット全体で、既存の例選択法（ランダム、セントロイド、リトリーバル）とどう比較されるか。
RQ5この改善は、非常に大規模なモデルにのみ現れる新たな能力なのか。

主な発見

複雑なプロンプトは、GPT-3とCodexで、手作業作成またはランダムCoTプロンプトよりもかなり高い精度を示す。
上位K個の複雑なチェーン間の投票（Complexity-based Consistency）は、全チェーンや単純チェーンでの投票より優れている。
GSM8K、MultiArith、MathQAで新しい最先端の性能を達成し、Date UnderstandingとPenguinsでも高い成果を示し、平均利得はGPT-3で+5.3、Codexで+6.2。
利得は、インディストリビューション、ノイズ付き、分布シフトなど、プロンプト分布全体およびプロンプト形式の撹乱下でも持続する。
複雑なプロンプトは、リトリーバルベースまたは全トレーニングセット手法と比べて堅牢性と注釈効率が示され、複雑さが信頼できる複雑さの代理指標として機能する（例：質問長、式の長さ）。
複雑さベースのプロンプティングは非常に大規模なモデルで現れる新たな能力であり、ベースラインCoTプロンプトより顕著な利点を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。