QUICK REVIEW

[論文レビュー] Selective Annotation Makes Language Models Better Few-Shot Learners

Hongjin Su, Jungo Kasai|arXiv (Cornell University)|Sep 5, 2022

Topic Modeling被引用数 62

ひとこと要約

本論文は、テスト前に注釈を付けるべき小さく多様な unlabeled のサンプル集合を選択するための、graph-based vote-k 法を用いた二段階のフレームワーク—選択的注釈とプロンプト取得—を提案する。これにより、10 のデータセットにわたる文脈内学習の性能を向上させ、注釈コストを大幅に削減し、ファインチューニング性能と競合する。

ABSTRACT

Many recent approaches to natural language tasks are built on the remarkable abilities of large language models. Large language models can perform in-context learning, where they learn a new task from a few task demonstrations, without any parameter updates. This work examines the implications of in-context learning for the creation of datasets for new natural language tasks. Departing from recent in-context learning methods, we formulate an annotation-efficient, two-step framework: selective annotation that chooses a pool of examples to annotate from unlabeled data in advance, followed by prompt retrieval that retrieves task examples from the annotated pool at test time. Based on this framework, we propose an unsupervised, graph-based selective annotation method, voke-k, to select diverse, representative examples to annotate. Extensive experiments on 10 datasets (covering classification, commonsense reasoning, dialogue, and text/code generation) demonstrate that our selective annotation method improves the task performance by a large margin. On average, vote-k achieves a 12.9%/11.4% relative gain under an annotation budget of 18/100, as compared to randomly selecting examples to annotate. Compared to state-of-the-art supervised finetuning approaches, it yields similar performance with 10-100x less annotation cost across 10 tasks. We further analyze the effectiveness of our framework in various scenarios: language models with varying sizes, alternative selective annotation methods, and cases where there is a test data domain shift. We hope that our studies will serve as a basis for data annotations as large language models are increasingly applied to new tasks. Our code is available at https://github.com/HKUNLP/icl-selective-annotation.

研究の動機と目的

Reduce manual annotation cost for new NLP tasks while preserving high in-context learning performance.
Investigate how to select a small, diverse, representative annotated pool prior to test time.
Evaluate the impact of selective annotation and prompt retrieval across diverse tasks and model sizes.
Analyze robustness to domain shift and compare with finetuning under limited annotations.

提案手法

Propose a two-step framework: selective annotation of a small unlabeled pool, followed by prompt retrieval from the annotated pool at test time.
Introduce vote-k, an unsupervised graph-based selective annotation method that promotes diversity and representativeness by constructing a k-NN graph in Sentence-BERT space and iteratively selecting labeled examples with a decaying similarity score.
Compute in-context prompts by retrieving most similar annotated examples to each test instance using cosine similarity on Sentence-BERT embeddings.
Evaluate across 10 datasets spanning classification, commonsense reasoning, dialogue, and text/code generation with models from 2B to 175B parameters.
Compare against random annotation, other selective methods, and finetuning to assess annotation efficiency and robustness.

実験結果

リサーチクエスチョン

RQ1Can selective annotation reduce the annotation cost required for effective in-context learning across diverse NLP tasks?
RQ2How does the vote-k method balance diversity and representativeness to improve prompt retrieval performance?
RQ3Does selective annotation with prompt retrieval remain effective across language model sizes and under domain shifts?
RQ4How does in-context learning with selective annotation compare to supervised finetuning under limited annotation budgets?
RQ5What is the impact of using similarity-based versus random prompt retrieval on performance?

主な発見

vote-k selective annotation yields large performance gains over random annotation across 10 tasks, with 12.9% relative gain at budget 18 and 11.4% at budget 100.
18 annotated samples can match or exceed performance with 100 randomly selected annotations on several tasks, and overall, vote-k shows robust gains across model sizes (2B–175B).
vote-k combined with similarity-based prompt retrieval matches or exceeds state-of-the-art finetuning performance while using 10–100× less annotation cost across 10 tasks.
selective annotation reduces variance and improves stability of in-context learning, especially under unlabeled data randomness and domain shifts.
when using random prompt retrieval, vote-k provides little benefit, highlighting the importance of similarity-based retrieval in leveraging annotated data.
compared to standard finetuning, in-context learning with vote-k often requires far fewer labeled examples to achieve comparable performance.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。