QUICK REVIEW

[论文解读] PPT: Pre-trained Prompt Tuning for Few-shot Learning

Yuxian Gu, Xu Han|arXiv (Cornell University)|Sep 9, 2021

Topic Modeling被引用 100

一句话总结

PPT 在统一的自监督任务上对软提示进行预训练，以初始化提示微调，使 PPT 能够优于原生提示微调，且在少样本和全样本数据设置中常常与或超过全模型微调的效果。

ABSTRACT

Prompts for pre-trained language models (PLMs) have shown remarkable performance by bridging the gap between pre-training tasks and various downstream tasks. Among these methods, prompt tuning, which freezes PLMs and only tunes soft prompts, provides an efficient and effective solution for adapting large-scale PLMs to downstream tasks. However, prompt tuning is yet to be fully explored. In our pilot experiments, we find that prompt tuning performs comparably with conventional full-model fine-tuning when downstream data are sufficient, whereas it performs much worse under few-shot learning settings, which may hinder the application of prompt tuning in practice. We attribute this low performance to the manner of initializing soft prompts. Therefore, in this work, we propose to pre-train prompts by adding soft prompts into the pre-training stage to obtain a better initialization. We name this Pre-trained Prompt Tuning framework "PPT". To ensure the generalization of PPT, we formulate similar classification tasks into a unified task form and pre-train soft prompts for this unified task. Extensive experiments show that tuning pre-trained prompts for downstream tasks can reach or even outperform full-model fine-tuning under both full-data and few-shot settings. Our approach is effective and efficient for using large-scale PLMs in practice.

研究动机与目标

动机：通过提示微调在大型预训练语言模型中将预训练与下游任务桥接的必要性。
提出一种软提示的预训练策略，以在少样本设置中改进初始化。
将下游分类任务统一到一个通用的预训练框架中，以实现跨任务的提示泛化。
证明 PPT 在保持参数效率的同时能够达到甚至超越全模型微调的效果。

提出的方法

将下游任务表示为模式-言语化对以形成提示。
在与任务格式（句子对、选择题、单文本）对齐的自监督任务上对软提示进行预训练。
将任务统一为单一的多项选择预训练格式，以实现广泛适用性。
用预训练的软提示初始化下游提示微调，并仅微调 0.41M 个提示参数。
在英语和中文 11B 规模的预训练语言模型上，分别在少样本和全数据设定下评估 PPT 及其变体。

实验结果

研究问题

RQ1预训练的软提示是否可以提高大型 PLMs 在少样本学习中的提示微调效果？
RQ2通过预训练统一任务格式是否能够提升提示的跨任务泛化能力？
RQ3在少样本和全数据情形下，PPT 相对于全模型微调与原生提示微调在准确性与方差方面的表现如何？

主要发现

在少样本和全数据情形下，PPT 通常优于原生提示微调和语言模型适配基线。
混合型 PPT（软提示结合精心设计的硬提示）在若干英语和中文任务上通常获得最佳表现。
PPT 在许多数据集上可以超过或接近全模型微调（FT）的水平，表明通过对提示进行预训练可以弥合预训练与下游任务之间的差距。
统一的 PPT（将任务格式统一为多项选择）取得了具有竞争力的结果，特别是对标签超过五个的任务。
PPT 减少了少样本结果的波动，在不同随机种子下呈现更稳定的性能。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。