QUICK REVIEW

[論文レビュー] Large Language Models as Analogical Reasoners

Michihiro Yasunaga, Xinyun Chen|arXiv (Cornell University)|Oct 3, 2023

Natural Language Processing Techniques被引用数 14

ひとこと要約

アナロジー型 prompting は LLMs に、ラベルなしデータで推論を導くように、個別に適合した見本と知識を自己生成させ、数学、コード、そして BIG-Bench のタスク全体で性能を向上させる。

ABSTRACT

Chain-of-thought (CoT) prompting for language models demonstrates impressive performance across reasoning tasks, but typically needs labeled exemplars of the reasoning process. In this work, we introduce a new prompting approach, analogical prompting, designed to automatically guide the reasoning process of large language models. Inspired by analogical reasoning, a cognitive process in which humans draw from relevant past experiences to tackle new problems, our approach prompts language models to self-generate relevant exemplars or knowledge in the context, before proceeding to solve the given problem. This method presents several advantages: it obviates the need for labeling or retrieving exemplars, offering generality and convenience; it can also tailor the generated exemplars and knowledge to each problem, offering adaptability. Experimental results show that our approach outperforms 0-shot CoT and manual few-shot CoT in a variety of reasoning tasks, including math problem solving in GSM8K and MATH, code generation in Codeforces, and other reasoning tasks in BIG-Bench.

研究の動機と目的

連鎖思考 prompting において、手作業でラベル付けされた推論の見本への依存を減らす動機づけ。
モデルが関連する見本と知識を文脈内で想起・生成するアナロジー型 prompting を提案。
自己生成された見本と知識が、数学、コード、BIG-Bench のタスク全般で性能を向上させることを示す。

提案手法

自己生成された見本を導入: 対象問題を解く前に、LLM に関連する複数の problem–solution の見本を一括で想起・生成させる。
自己生成された知識を拡張: 見本に付随する高レベルのチュートリアルを任意で生成し、複雑なタスクの一般化を向上させる。
知識・見本・解答をエンドツーエンドで生成する単一パス prompting を模索。
複数のベース LLM (GPT-3.5-turbo, GPT-4, PaLM 2) を用い、GSM8K、MATH、Codeforces、BIG-Bench を横断して実験。
自己生成の有効性を評価するため、0-shot CoT、5-shot CoT、およびリトリーバル型 CoT と比較。
見本の数（K）と見本の前の知識の順序が性能に与える影響を分析。

実験結果

リサーチクエスチョン

RQ1自己生成された見本は、多様な推論タスクにおけるCoT prompting において、手動でラベル付けされた見本を置換できるか？
RQ2見本とともに自己生成された高レベルの知識を追加することで、特にコード生成のような複雑なタスクにおいて問題解決を改善するか？
RQ3モデルサイズの増大と異なるベース LLM によるスケーリングはどうなるか？
RQ4信頼性と性能の観点で、自己生成と見本のリトリーバルのトレードオフは何か？

主な発見

自己生成された見本は、0-shot および標準的な few-shot CoT を超えて GSM8K および MATH の精度を向上させる。
自己生成された知識と見本の組み合わせは Codeforces タスクで追加的な向上をもたらし、高レベルの要点の利点を際立たせる。
BIG-Bench のタスク全体で、自己生成された見本は 0-shot CoT を上回り、手動の3-shot CoTと競合する。
このアプローチはより大きな LLM でスケールし、より大きなモデルではリトリーバル型 CoT よりも優れる傾向がある。
見本の数を3–5に増やすと、一般に安定性が増し性能が向上する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。