QUICK REVIEW

[論文レビュー] Large Language Model Instruction Following: A Survey of Progresses and Challenges

Renze Lou, Kai Zhang|arXiv (Cornell University)|Mar 18, 2023

Topic Modeling被引用数 11

ひとこと要約

この調査は、NLPにおける指示遵守を、3つの指示タイプ（NLI志向、LLM志向、人間志向）に分け、モデリング戦略、データセット、評価、将来の課題を総合的に検討する。

ABSTRACT

Task semantics can be expressed by a set of input-output examples or a piece of textual instruction. Conventional machine learning approaches for natural language processing (NLP) mainly rely on the availability of large-scale sets of task-specific examples. Two issues arise: first, collecting task-specific labeled examples does not apply to scenarios where tasks may be too complicated or costly to annotate, or the system is required to handle a new task immediately; second, this is not user-friendly since end-users are probably more willing to provide task description rather than a set of examples before using the system. Therefore, the community is paying increasing interest in a new supervision-seeking paradigm for NLP: learning to follow task instructions, i.e., instruction following. Despite its impressive progress, there are some common issues that the community struggles with. This survey paper tries to summarize and provide insights to the current research on instruction following, particularly, by answering the following questions: (i) What is task instruction, and what instruction types exist? (ii) How to model instructions? (iii) What are popular instruction following datasets and evaluation metrics? (iv) What factors influence and explain the instructions' performance? (v) What challenges remain in instruction following? To our knowledge, this is the first comprehensive survey about instruction following.

研究の動機と目的

タスク指示とプロンプトを超える指示タイプを定義し、分類する。
指示がどのようにエンコードされ、どのように新しいタスクへ一般化するのに使われるかを説明する。
指示遵守のデータセットと評価指標をレビューする。
指示遵守性能に影響を与える要因と実用的な課題を分析する。
LLMsにおける指示遵守を改善する将来の方向性を提案する。

提案手法

NLI志向、LLM志向、人間志向の指示タイプを分類し、それらを間接的監視と関連付ける。
意味解析、フラット化と連結、HyperNetworks、RLHFなどのモデリング戦略を検討する。
間接的監視の観点と、各指示タイプがどのように監視源を活用するかを説明する。
指示遵守データセットと、人間が注釈を付けたデータとLLM合成データのトレードオフを要約する。
自動指標、人的評価、LLMベースの評価者を含む評価手法をレビューする。

実験結果

リサーチクエスチョン

RQ1タスク指示とは何か、どのような指示タイプが存在するのか？
RQ2一般化を助けるために指示をどのようにエンコード・モデリングするか？
RQ3指示遵守で使用されるデータセットと評価指標は何か？
RQ4指示遵守性能に影響を与える要因と残る課題は何か？
RQ5LLMsを用いた指示遵守の今後の方向性は何か？

主な発見

指示遵守は三つのタイプ、NLI志向、LLM志向、人間志向の指示に構造化できる。
NLI志向とLLM志向の指示は、それぞれNLIデータセットや言語モデル化からの間接的監視に依拠して、ゼロショットまたは少数ショットの一般化を可能にする。
人間志向の指示はより複雑なモデリングを必要とするが、特に指示チューニング後は、より広いタスクカバレッジとエンドユーザーの使いやすさを提供する。
RLHF（人間のフィードバックからの強化学習）は、人間の好みにモデルを整合させるために使用され、予測シフトペナルティや報酬モデリングを含む。
指示チューニングデータセットは人間が注釈したもの（高品質だが多様性が限られる）またはLLM合成データ（多様性は高いがノイズが多い）で、混合がしばしば有益。
指示遵守の評価には自動指標、人的評価、LLMベースの評価者を組み合わせ、それぞれに偏りとトレードオフがある。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。