QUICK REVIEW

[論文レビュー] Least-to-Most Prompting Enables Complex Reasoning in Large Language Models

Denny Zhou, Nathanael Schärli|arXiv (Cornell University)|May 21, 2022

Topic Modeling被引用数 317

ひとこと要約

least-to-most プロンプティングは困難な問題をより簡単なサブ問題の連続に分解し、それらを順に解くことで、訓練なしに LLM がより難しいタスクへ一般化できる。標準的なプロンプティングおよびチェーン・オブ・ソウト（chain-of-thought）プロンプティングより、記号操作、構成的一般化、および数学的推論のベンチマークで優れている。

ABSTRACT

Chain-of-thought prompting has demonstrated remarkable performance on various natural language reasoning tasks. However, it tends to perform poorly on tasks which requires solving problems harder than the exemplars shown in the prompts. To overcome this challenge of easy-to-hard generalization, we propose a novel prompting strategy, least-to-most prompting. The key idea in this strategy is to break down a complex problem into a series of simpler subproblems and then solve them in sequence. Solving each subproblem is facilitated by the answers to previously solved subproblems. Our experimental results on tasks related to symbolic manipulation, compositional generalization, and math reasoning reveal that least-to-most prompting is capable of generalizing to more difficult problems than those seen in the prompts. A notable finding is that when the GPT-3 code-davinci-002 model is used with least-to-most prompting, it can solve the compositional generalization benchmark SCAN in any split (including length split) with an accuracy of at least 99% using just 14 exemplars, compared to only 16% accuracy with chain-of-thought prompting. This is particularly noteworthy because neural-symbolic models in the literature that specialize in solving SCAN are trained on the entire training set containing over 15,000 examples. We have included prompts for all the tasks in the Appendix.

研究の動機と目的

大規模言語モデルにおける容易さから難易度への一般化を動機づけ、チェーン・オブ・ソウト（chain-of-thought）プロンプティングの限界に対処する。
問題を分解し、そのサブ問題を順次解く2段階のプロンプティング枠組みを導入する。
least-to-most プロンプティングが、記号操作、SCANの構成的一般化、数学的推論データセットにおいて、より難しい問題への一般化を可能にすることを示す。

提案手法

2段階のプロンプティング：（i）分解プロンプトは問題をサブ問題に分割することを示す；（ii）サブ問題解決プロンプトは prior answers を用いた逐次解決を示す。
プロンプトはfew-shotであり、モデルの訓練やファインチューニングは不要。
プロンプトは chain-of-thought や self-consistency デコーディングと組み合わせても、独立して使用してもよい。
プロンプトは、以前に解決されたサブ問題の出力（基本ケースと再帰的ステップ）を用いて解を構築するようモデルに教えるよう設計されている。
評価は複数のタスクにわたり、記号操作（末尾文字の連結）、構成的一般化のための SCAN、数学的推論（GSM8K と DROP）を含む。

実験結果

リサーチクエスチョン

RQ1least-to-most プロンプティングは、プロンプトで見たより難しい問題を LLM が解けるようになるか。
RQ2サブ問題へ分解することは、記号的・構成的・数学的推論タスクの一般化を向上させるか。
RQ3これらの領域で least-to-most プロンプティングはチェーン・オブ・ソウト・プロンプティングとどう比較されるか。
RQ4訓練なしのプロンプトだけで、SC A N や GSM8K/DROP のベンチマークで高い精度を達成できる程度はどれくらいか。

主な発見

L	チェーン・オブ・ソウト	最小から最大へプロンプティング
4	84.2	94.0
6	69.2	88.4
8	50.2	83.0
10	39.8	76.4
12	31.8	74.0

末尾文字連結で、least-to-most プロンプティングはチェーン・オブ・ソウト・プロンプティングより高い精度を達成し、リスト長が増えるにつれて顕著になる。
SCAN では、least-to-most プロンプティングを用いた code-davinci-002 が長さ分割下で 99.7% の精度に達し、標準プロンプティングやチェーン・オブ・ソウト・プロンプティングを大きく上回る。
GSM8K と DROP では least-to-most プロンプティングがチェーン・オブ・ソウト・プロンプティングを上回り、特に DROP で顕著な利益を示し、GSM8K では多くのステップが必要な問題で同程度の利益を示す。
分解ベースのプロンプトは、モデルの訓練なしで、示されたより長いまたはより複雑な問題を解決できるようにする。
エラー分析では、ほとんどの失敗はサブ問題の解釈や結合/ステップの解釈に起因し、サブ問題の解決自体の誤りではない。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。