QUICK REVIEW

[論文レビュー] Improving Language Models via Plug-and-Play Retrieval Feedback

Wenhao Yu, Zhihan Zhang|arXiv (Cornell University)|May 23, 2023

Topic Modeling被引用数 10

ひとこと要約

ReFeed は、初期回答を条件付けて文書を取得することで LLM の出力を改良する plug-and-play の取得フィードバック・パイプラインを導入し、ファクトの正確性を向上させる。ファインチューニングなしでゼロショットおよび少数ショットの知識集約タスクで改善を実現する。

ABSTRACT

Large language models (LLMs) exhibit remarkable performance across various NLP tasks. However, they often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios. Human feedback has been shown to effectively enhance the factuality and quality of generated content, addressing some of these limitations. However, this approach is resource-intensive, involving manual input and supervision, which can be time-consuming and expensive. Moreover, it cannot be provided during inference, further limiting its practical utility in dynamic and interactive applications. In this paper, we introduce ReFeed, a novel pipeline designed to enhance LLMs by providing automatic retrieval feedback in a plug-and-play framework without the need for expensive fine-tuning. ReFeed first generates initial outputs, then utilizes a retrieval model to acquire relevant information from large document collections, and finally incorporates the retrieved information into the in-context demonstration for output refinement, thereby addressing the limitations of LLMs in a more efficient and cost-effective manner. Experiments on four knowledge-intensive benchmark datasets demonstrate our proposed ReFeed could improve over +6.0% under zero-shot setting and +2.5% under few-shot setting, compared to baselines without using retrieval feedback.

研究の動機と目的

LLMs の幻覚と事実誤認を高コストなファインチューニングなしで削減する動機付け。
取得文書を用いて初期出力を洗練させる取得フィードバック・パイプラインの開発。
初期生成の多様性とアンサンブル戦略の活用を探求し、最終回答の信頼性を向上させる。

提案手法

クエリに対して初期回答を生成するように LLM にプロンプトをかける。
初期回答を検索クエリの一部として用い、トップk文書を取得する。
取得情報を条件付きで組み込み、初期回答を改良する。
豊富なフィードバックのため、初期生成を多様化して複数の回答候補を得る。
最終回答を選択するために、ネガティブ対数尤度でランク付けして、取得前後の出力をアンサンブルする。

Figure 1: ReFeed operates by initially prompting a large language model to generate an answer in response to a given query, followed by the retrieval of documents from extensive document collections. Subsequently, the pipeline refines the initial answer by incorporating the information gleaned from

実験結果

リサーチクエスチョン

RQ1人間の注釈ではなく自動検索を用いて、ファインチューニングなしで取得フィードバックを用いて LLM の出力を改善できるか。
RQ2既存出力を効率的に洗練するために、取得フィードバックを plug-and-play 的に統合できるか。
RQ3初期生成の多様性とアンサンブル戦略は、正確性と頑健性を向上させるか。

主な発見

ReFeed は NQ と TriviaQA のゼロショット EM/F1 を改善し、取得フィードバックなしのベースラインと比較して一部の知識集約タスクで +6.0% 以上の改善を達成。
少数ショット設定でも改善を示し、対応するベースラインに比べて EM/F1 が約 +2.5% 程度向上。
初期生成の多様性とアンサンブル手法は頑健性に寄与し、アブレーションではこれらの成分を削除すると性能が低下することが示される。
CoT プロンプトと組み合わせた場合、マルチホップ QA (HotpotQA) の思考過程推論を強化できる。
事例研究は、うまく改良されたケースと、取得文書がモデルを誤導したケースの両方を示し、アンサンブル戦略の必要性を強調する。

Figure 2: Rather than generating only one initial answer, we prompt the language model to sample multiple answers, allowing for a more comprehensive retrieval feedback based on different answers.

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。