[Paper Review] AutoPrompt: Eliciting Knowledge from Language Models with Automatically Generated Prompts
AutoPrompt automatically generates prompts to elicit knowledge from pretrained masked language models, enabling zero/few-shot tasks to perform competitively with finetuned models and outperform manual prompts on several knowledge tasks.
The remarkable success of pretrained language models has motivated the study of what kinds of knowledge these models learn during pretraining. Reformulating tasks as fill-in-the-blanks problems (e.g., cloze tests) is a natural approach for gauging such knowledge, however, its usage is limited by the manual effort and guesswork required to write suitable prompts. To address this, we develop AutoPrompt, an automated method to create prompts for a diverse set of tasks, based on a gradient-guided search. Using AutoPrompt, we show that masked language models (MLMs) have an inherent capability to perform sentiment analysis and natural language inference without additional parameters or finetuning, sometimes achieving performance on par with recent state-of-the-art supervised models. We also show that our prompts elicit more accurate factual knowledge from MLMs than the manually created prompts on the LAMA benchmark, and that MLMs can be used as relation extractors more effectively than supervised relation extraction models. These results demonstrate that automatically generated prompts are a viable parameter-free alternative to existing probing methods, and as pretrained LMs become more sophisticated and capable, potentially a replacement for finetuning.
Motivation & Objective
- Investigate what knowledge pretrained language models acquire during pretraining (linguistic, factual, commonsense, and task-specific).
- Develop an automated method to generate prompts for a wide range of tasks without manual prompt crafting.
- Show that gradient-guided prompts can reveal strong performance on sentiment analysis and natural language inference without finetuning.
Proposed method
- Represent tasks as fill-in-the-blank problems using a template that incorporates input prompts, trigger tokens, and a [MASK] token.
- Use a gradient-guided search to learn trigger tokens that maximize the label likelihood across batches (Equation 2 in the paper).
- Marginalize over label tokens to obtain class probabilities when labels correspond to vocabulary tokens (Equation 1).
- Automate label-token selection by training a logistic classifier on the [MASK] embedding and scoring candidate label tokens by their compatibility with the output embeddings (Equations 3–5).
- Evaluate prompts on pretrained MLMs (BERT base, RoBERTa large) across tasks (sentiment analysis, NLI, fact retrieval, relation extraction) without finetuning; compare against manual prompts and finetuned baselines.
- Provide publicly available implementation to generate prompts for HuggingFace models.
Experimental results
Research questions
- RQ1Can automatically generated prompts reveal task knowledge in pretrained MLMs without finetuning?
- RQ2Do gradient-guided prompts outperform manually crafted prompts across sentiment analysis, NLI, and knowledge retrieval tasks?
- RQ3How do AutoPrompt prompts compare to finetuning in low-data regimes?
- RQ4To what extent can MLMs, prompted by AutoPrompt, extract factual and relational knowledge from text?
Key findings
- AutoPrompt enables MLMs to perform sentiment analysis and NLI without finetuning, sometimes matching state-of-the-art supervised models.
- Prompts discovered by AutoPrompt elicit more accurate factual knowledge on LAMA than manually created prompts (P@1 improvements noted in the text).
- MLMs prompted by AutoPrompt can outperform supervised relation extraction models under certain conditions and are sensitive to context authenticity.
- In low-data settings, AutoPrompt can outperform finetuning for NLI and provide higher average accuracy and stability for RoBERTa in some cases, while sometimes lagging finetuning on sentiment analysis.
- AutoPrompt reduces the need for task-specific finetuning and storage of multiple task-specific checkpoints, enabling a single pretrained model to handle many tasks via prompts.
Better researchstarts right now
From paper design to paper writing, dramatically reduce your research time.
No credit card · Free plan available
This review was created by AI and reviewed by human editors.