QUICK REVIEW

[論文レビュー] Promptagator: Few-shot Dense Retrieval From 8 Examples

Zhuyun Dai, Vincent Y. Zhao|arXiv (Cornell University)|Sep 23, 2022

Topic Modeling被引用数 46

ひとこと要約

Promptagator は few-shot prompt-based LLM クエリ生成器を使って合成タスク固有データを作成し、BEIR タスクで MS MARCO 学習モデルを上回る小型の end-to-end デュアルエンコーダリトリーバを訓練し、追加のリランキングブーストを提供します。

ABSTRACT

Much recent research on information retrieval has focused on how to transfer from one task (typically with abundant supervised data) to various other tasks where supervision is limited, with the implicit assumption that it is possible to generalize from one task to all the rest. However, this overlooks the fact that there are many diverse and unique retrieval tasks, each targeting different search intents, queries, and search domains. In this paper, we suggest to work on Few-shot Dense Retrieval, a setting where each task comes with a short description and a few examples. To amplify the power of a few examples, we propose Prompt-base Query Generation for Retriever (Promptagator), which leverages large language models (LLM) as a few-shot query generator, and creates task-specific retrievers based on the generated data. Powered by LLM's generalization ability, Promptagator makes it possible to create task-specific end-to-end retrievers solely based on a few examples {without} using Natural Questions or MS MARCO to train %question generators or dual encoders. Surprisingly, LLM prompting with no more than 8 examples allows dual encoders to outperform heavily engineered models trained on MS MARCO like ColBERT v2 by more than 1.2 nDCG on average on 11 retrieval sets. Further training standard-size re-rankers using the same generated data yields another 5.0 point nDCG improvement. Our studies determine that query generation can be far more effective than previously observed, especially when a small amount of task-specific knowledge is given.

研究の動機と目的

retrieval tasks の多様性を強調し、タスク固有の few-shot retrieval 設定の必要性を示す。
LLM プロンプトを用いて微調整なしで合成のタスク固有トレーニングデータを生成する Promptagator を提案する。
生成データで訓練した小型デュアルエンコーダが BEIR タスクで MS MARCO 学習モデルを上回ることを示す。
同じデータで訓練した後続のリランキング器が追加の性能向上をもたらすことを示す。

提案手法

BEIR でタスクごとに 2–8 のドメイン内の例を用いた Few-shot Retrieval 設定を定義する。
タスク説明と少数の例に条件付けて、大規模言語モデル（FLAN 137B）を用いて合成クエリを生成する。
合成データで予備リトリーバを訓練し、ソース文書が高く評価されるペアを保持することで往復一貫性フィルターを適用する。
合成データで T5 ベースのエンコーダから初期化したデュアルエンコーダリトリーバを訓練し、フィルタリング済みデータでファインチューニングする。
同じ合成データでクロスアテンションリランキングャ（Promptagator++）を訓練してトップ候補を精査する。
Promptagator の zero-shot および few-shot バリアントを提供し、MS MARCO 学習ベースラインと比較する。

実験結果

リサーチクエスチョン

RQ1Can a few-shot prompt-based data generation paradigm enable effective end-to-end dense retrievers without in-domain annotated query-document pairs?
RQ2How does prompt-based synthetic data quality, aided by consistency filtering, affect retrieval performance across diverse BEIR tasks?
RQ3What is the comparative impact of few-shot versus zero-shot prompt-generated data on retrieval and reranking performance?
RQ4How does Promptagator compare to MS MARCO-trained baselines and to specialized rerankers on BEIR?

主な発見

Zero-shot Promptagator establishes a strong baseline that rivals baselines trained on MS MARCO data.
Few-shot Promptagator significantly improves over zero-shot, increasing average nDCG@10 by over 2 points on BEIR datasets.
Promptagator outperforms strong MS MARCO-trained models such as ColBERT v2 and SPLADE v2 on 11 BEIR tasks.
Promptagator++ (a cross-attention reranker) adds about 5 points to nDCG@10 beyond Promptagator, surpassing several reranking approaches.
Consistency filtering improves performance on the majority of datasets and demonstrates the value of synthetic data quality control.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。