QUICK REVIEW

[论文解读] AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts

Taylor Shin, Yasaman Razeghi|arXiv (Cornell University)|Oct 29, 2020

Topic Modeling参考文献 26被引用 66

一句话总结

AutoPrompt 自动生成提示词以从预训练的掩码语言模型中引出知识，使零样本/少样本任务能够与微调模型竞争，在若干知识任务上超过手动提示。

ABSTRACT

The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AutoPrompt, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AutoPrompt, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.

研究动机与目标

研究在预训练阶段预训练语言模型获得了哪些知识（语言知识、事实、常识以及任务特定知识）。
开发一种自动化方法，为广泛任务生成提示，而无需手动提示设计。
表明梯度引导提示在情感分析和自然语言推理上无需微调也能展现出强大性能。

提出的方法

将任务表示为填空题，使用一个模板，将输入提示、触发标记和 [MASK] 标记纳入其中。
使用梯度引导搜索来学习触发标记，以最大化跨批次的标签似然度（论文中的等式2）。
当标签对应于词汇标记时，对标签标记边缘化以获得类别概率（等式1）。
通过在 [MASK] 嵌入上训练逻辑回归分类器并通过输出嵌入的兼容性对候选标签标记进行评分来自动化标签标记选择（等式3–5）。
在预训练 MLM（BERT base、RoBERTa large）上评估提示，涵盖任务（情感分析、NLI、事实检索、关系抽取），无需微调；并与手动提示和微调基线进行比较。
提供公开可用的实现，用于为 HuggingFace 模型生成提示。

实验结果

研究问题

RQ1自动生成的提示是否能在不进行微调的情况下揭示预训练 MLM 中的任务知识？
RQ2梯度引导的提示是否在情感分析、NLI 和知识检索任务上优于手动制作的提示？
RQ3在低数据情境下，AutoPrompt 提示与微调有何比较？
RQ4由 AutoPrompt 提示的 MLM 在多大程度上能从文本中提取事实性和关系性知识？

主要发现

AutoPrompt 使 MLM 能在无需微调的情况下执行情感分析和 NLI，有时可与最先进的有监督模型相媲美。
AutoPrompt 发现的提示在 LAMA 上引出更准确的事实知识，比手工创建的提示效果更好（文本中提到的 P@1 提高）。
由 AutoPrompt 提示的 MLM 在某些条件下可以超越有监督的关系抽取模型，并且对上下文真实性敏感。
在低数据情境下，AutoPrompt 在 NLI 上可能优于微调，并在某些情况下为 RoBERTa 提供更高的平均准确性和稳定性，而在情感分析上有时落后于微调。
AutoPrompt 减少了对任务特定微调和多任务特定检查点存储的需求，使单一的预训练模型能够通过提示处理多种任务。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。