QUICK REVIEW

[论文解读] Improving Language Models via Plug-and-Play Retrieval Feedback

Wenhao Yu, Zhihan Zhang|arXiv (Cornell University)|May 23, 2023

Topic Modeling被引用 10

一句话总结

ReFeed 引入了一种即插即用的检索反馈管道，通过在初始答案条件下检索文档来细化大语言模型输出，在零样本和少样本的知识密集型任务中提高事实准确性，无需微调。

ABSTRACT

Large language models (LLMs) exhibit remarkable performance across various NLP tasks. However, they often generate incorrect or hallucinated information, which hinders their practical applicability in real-world scenarios. Human feedback has been shown to effectively enhance the factuality and quality of generated content, addressing some of these limitations. However, this approach is resource-intensive, involving manual input and supervision, which can be time-consuming and expensive. Moreover, it cannot be provided during inference, further limiting its practical utility in dynamic and interactive applications. In this paper, we introduce ReFeed, a novel pipeline designed to enhance LLMs by providing automatic retrieval feedback in a plug-and-play framework without the need for expensive fine-tuning. ReFeed first generates initial outputs, then utilizes a retrieval model to acquire relevant information from large document collections, and finally incorporates the retrieved information into the in-context demonstration for output refinement, thereby addressing the limitations of LLMs in a more efficient and cost-effective manner. Experiments on four knowledge-intensive benchmark datasets demonstrate our proposed ReFeed could improve over +6.0% under zero-shot setting and +2.5% under few-shot setting, compared to baselines without using retrieval feedback.

研究动机与目标

推动在不需要高成本微调的情况下减少大型语言模型的幻觉和事实错误。
开发一个检索反馈管道，利用检索到的文档来细化知识密集型任务的初始输出。
探索初始生成的多样性和集成策略，以提高最终答案的可靠性。

提出的方法

提示一个大语言模型为一个查询生成初始答案。
使用初始答案作为检索查询的一部分来检索前 k 条文档。
通过有条件地纳入检索信息来细化初始答案。
对初始生成进行多样化，以获得多个答案候选，从而获得更丰富的反馈。
通过负对数似然排序在检索前后输出之间进行集成，以选择最终答案。

Figure 1: ReFeed operates by initially prompting a large language model to generate an answer in response to a given query, followed by the retrieval of documents from extensive document collections. Subsequently, the pipeline refines the initial answer by incorporating the information gleaned from

实验结果

研究问题

RQ1在不进行微调且使用自动检索而非人工注释的情况下，检索反馈是否可以改善 LLM 的输出？
RQ2检索反馈是否可以以即插即用的方式整合，以高效地改进先前的输出？
RQ3初始生成的多样性和集成策略是否会提高正确性和鲁棒性？

主要发现

与未使用检索反馈的基线相比，ReFeed 在 NQ 和 TriviaQA 的零样本时的 EM/F1 提升超过 6.0%（在某些知识密集型任务中）。
在少量示例设置中也显示出提升，相对于相应基线，EM/F1 提升约 2.5%。
初始生成的多样性和集成方法有助于鲁棒性，消融实验显示移除这些组件时性能下降。
与推理链（CoT）提示结合时，ReFeed 可以提升多跳问答（HotpotQA）的链式推理能力。
案例研究显示了成功的改进，也有检索到的文档误导模型的情况，突出了需要集成策略。

Figure 2: Rather than generating only one initial answer, we prompt the language model to sample multiple answers, allowing for a more comprehensive retrieval feedback based on different answers.

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。