QUICK REVIEW

[論文レビュー] In-Context Retrieval-Augmented Language Models

Ori Ram, Yoav Levine|arXiv (Cornell University)|Jan 31, 2023

Topic Modeling被引用数 12

ひとこと要約

この論文は、 retrieved documents を入力に前置して (In-Context RALM) 、既製のリトリーバー、特に BM25 を用いても言語モデルを変更せずに大規模言語モデルの性能を大きく向上させられること、また retrieved docs の LM 指向リランキングによってさらなる改善が得られることを示している。

ABSTRACT

Retrieval-Augmented Language Modeling (RALM) methods, which condition a language model (LM) on relevant documents from a grounding corpus during generation, were shown to significantly improve language modeling performance. In addition, they can mitigate the problem of factually inaccurate text generation and provide natural source attribution mechanism. Existing RALM approaches focus on modifying the LM architecture in order to facilitate the incorporation of external information, significantly complicating deployment. This paper considers a simple alternative, which we dub In-Context RALM: leaving the LM architecture unchanged and prepending grounding documents to the input, without any further training of the LM. We show that In-Context RALM that builds on off-the-shelf general purpose retrievers provides surprisingly large LM gains across model sizes and diverse corpora. We also demonstrate that the document retrieval and ranking mechanism can be specialized to the RALM setting to further boost performance. We conclude that In-Context RALM has considerable potential to increase the prevalence of LM grounding, particularly in settings where a pretrained LM must be used without modification or even via API access.

研究の動機と目的

Architectures や学習を変更せずに、大規模言語モデル (LM) の grounding を動機づけること。
多様なコーパスとモデルサイズを横断して、単純なインコンテキストの Retrieval-Augmented 框組を評価すること。
retrieval 戦略と文書の reranking を検討し、LM の性能を最大化すること。
オープンドメイン質問回答への適用性を示し、展開の利点を論じること。

提案手法

In-Context RALM を提案: LM の重みを変更せずに retrieved documents を LM 入力に前置する。
generation 中に retrieval が起こる頻度を control するために retrieval stride s を用いる。
retriever へのクエリ部分として前置きの一部を限定するために retrieval query length ell を用いる。
複数のオープンソースLM（GPT-2、GPT-Neo/J、OPT、LLaMA）を用いて、5つのコーパス（WikiText-103、RealNews、ArXiv、Stack Exchange、FreeLaw）を評価する。
疎結合リトリーバー（BM25） vs 密結合リトリーバーを比較する。ゼロショット設定ではしばしば BM25 がニューラルリトリーバーより優れることを示す。
2つの LM 指向 reranking 方法を導入する: (a) LM を用いたゼロショット reranking、(b) 領域データで訓練された予測 reranking を用いて top-k doc の選択を行う。
Natural Questions と TriviaQA を用いてオープンドメイン QA 性能を評価する。

実験結果

リサーチクエスチョン

RQ1オフ・ザ・サックの LM の入力に retrieved documents を前置するだけで、LM の性能はどれくらい改善できるのか？
RQ2どのリトリーバーの種類と retrieval 設定（stride と query length）が、言語モデリングのインコンテキスト grounded を最大化するのか？
RQ3LM 指向の retrieved documents の reranking は、単純な top-1 以上の利得を生むのか？
RQ4LM の変更やファインチューニングなしで、In-Context RALM はオープンディメインQA タスクへ転用できるのか？

主な発見

モデル	検索	WikiText-103（語彙 ppl）	RealNews（トークン ppl）	ArXiv（トークン ppl）	Stack Exchange（トークン ppl）	FreeLaw（トークン ppl）
GPT-2 S	–	37.5	21.3	12.0	12.8	13.0
GPT-2 S (BM25 § 5)	BM25	29.6	16.1	10.9	11.3	9.6
GPT-2 S (BM25)	BM25	28.6	15.5	10.1	10.6	8.8
GPT-2 S (BM25, Predictive)	BM25	26.8	–	–	–	–
GPT-2 M	–	26.3	15.7	9.3	8.8	9.6
GPT-2 M (BM25)	BM25	21.5	12.4	8.6	8.1	7.4
GPT-2 M (BM25, Zero-shot)	BM25	20.8	12.0	8.0	7.7	6.9
GPT-2 L	–	22.0	13.6	8.4	8.0	8.0
GPT-2 L (BM25)	BM25	18.1	10.9	7.8	7.8	6.8
GPT-2 L (BM25, Zero-shot)	BM25	17.6	10.6	7.3	7.4	6.4
GPT-2 XL	–	20.0	12.4	7.8	8.0	8.0
GPT-2 XL (BM25)	BM25	16.6	10.1	7.2	7.4	6.4
GPT-2 XL (BM25, Zero-shot)	BM25	16.1	9.8	6.8	7.1	6.0

BM25 リトリーバーはインコンテキストLM grounding で dense neural リトリーバーより優れていることが多い。
頻繁な retrieval（より小さな stride s）は疎結合 retrieval より perplexity の改善が大きく、実用的なデフォルトとして s = 4 が提案される。
この設定では、BM25 にとって retrieval query length ell が約 32 トークンのときが「甘い spot」になる。
In-Context RALM は既製のリトリーバーとともに、コーパス間で 2–3 倍大きいモデルの性能に匹敵することがある。
LM 指向の reranking（ゼロショット）と予測 reranking は、Vanilla BM25 を超える perplexity 減少を提供し、領域データで訓練された予測 reranking は顕著な gains を与える。
大規模モデルでは、In-Context RALM はかなり小さなモデルの性能を高め、より大きなモデルに匹敵させることができる（例：BM25 を用いた 6.7B OPT が特定の設定で 66B OPT に匹敵）。
オープンドメイン QA では retrieved documents を文脈として提供するだけで frozen LM でも性能が大幅に向上し、2 文書で十分であることが多い。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。