QUICK REVIEW

[論文レビュー] Making Pre-trained Language Models Better Few-shot Learners

Tianyu Gao, Adam Fisch|arXiv (Cornell University)|Dec 31, 2020

Topic Modeling参考文献 46被引用数 128

ひとこと要約

LM-BFF はプロンプトベースのファインチューニング、自動プロンプト生成、選択的デモンストレーションを組み合わせて、中規模の言語モデルのfew-shot 学習を大幅に改善し、標準のファインチューニングに対して最大で 30% の絶対的な向上を達成します。 RoBERTa-large を用いた few-shot 設定で、分類と回帰のタスクで強力なタスク非依存の性能を示します。

ABSTRACT

The recent GPT-3 model (Brown et al., 2020) achieves remarkable few-shot performance solely by leveraging a natural-language prompt and a few task demonstrations as input context. Inspired by their findings, we study few-shot learning in a more practical scenario, where we use smaller language models for which fine-tuning is computationally efficient. We present LM-BFF--better few-shot fine-tuning of language models--a suite of simple and complementary techniques for fine-tuning language models on a small number of annotated examples. Our approach includes (1) prompt-based fine-tuning together with a novel pipeline for automating prompt generation; and (2) a refined strategy for dynamically and selectively incorporating demonstrations into each context. Finally, we present a systematic evaluation for analyzing few-shot performance on a range of NLP tasks, including classification and regression. Our experiments demonstrate that our methods combine to dramatically outperform standard fine-tuning procedures in this low resource setting, achieving up to 30% absolute improvement, and 11% on average across all tasks. Our approach makes minimal assumptions on task resources and domain expertise, and hence constitutes a strong task-agnostic method for few-shot learning.

研究の動機と目的

GPT-3 のような巨大モデルではなく、適度なサイズの LM（例: RoBERTa/BERT）を用いた実用的な few-shot 学習を推進する。
最小限のデータでファインチューニングを改善するための、タスクに依存しないシンプルな技法のセットを開発する。
複数の NLP タスク（分類と回帰）にわたり、プロンプトベースのファインチューニングとデモンストレーション手法を評価する。
手動設計の工学的労力を削減するため、プロンプトとデモンストレーションを生成する自動ワークフローを提供する。

提案手法

下流タスクをラベル語を用いたマスク化言語モデリングとして扱う、プロンプトベースのファインチューニング。
自動プロンプト生成（(i) ラベル語（verbalizers）の自動選択と (ii) T5 ベースの探索を用いたテンプレートの自動生成）による。
クラスごとに1つの例を抽出して入力と組み合わせ、最小限のデモンストレーションセットを作る、動的で選択的なデモンストレーション。
安定性を評価するため、複数のランダム分割を用いた、8つの単文タスクと7つの文ペアタスクにわたる系統的評価。
少数ショット設定下での標準的なファインチューニングおよび GPT-3 風のイン-context 学習との比較。

実験結果

リサーチクエスチョン

RQ1自動生成プロンプトを含むプロンプトベースのファインチューニングは、few-shot において手動設計プロンプトと同等かそれを上回ることができるか？
RQ2慎重なサンプリングを用いたデモンストレーションの組み込みは、中程度のサイズのLMにおける標準的なファインチューニングを超える性能を向上させるか？
RQ3自動的なラベル語の選択とテンプレート生成が、分類と回帰といったタスク全般における堅牢な few-shot 学習にどのように寄与するか？
RQ4デモンストレーションのサンプリング戦略とテンプレートの品質が few-shot の性能に与える影響は何か？
RQ5このアプローチはタスク非依存でリソース効率が高く、現実世界で実用的に使える程度か？

主な発見

Prompt-based fine-tuning は few-shot 設定で標準的なファインチューニングを大幅に上回る。
Automatic prompt generation（テンプレートとラベル語）は、いくつかのタスクで manual prompts に匹敵するか上回ることがある。
慎重に設計されたサンプリング戦略を用いたデモンストレーションの組み込みは、few-shot の性能に追加の向上をもたらす。
The combined LM-BFF methods achieve up to 30% absolute improvement and 11% average improvement across evaluated tasks.
On RoBERTa-large with 32 training examples, many binary SST-2-like tasks reach around 90% accuracy.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。