QUICK REVIEW

[论文解读] Speculative RAG: Enhancing Retrieval Augmented Generation through Drafting

Zilong Wang, Zifeng Wang|arXiv (Cornell University)|Jul 11, 2024

Natural Language Processing Techniques被引用 7

一句话总结

Speculative RAG 使用一个小型专门的 RAG 起草者从检索到的多样子集生成多份草案，然后由一个更大的通用型 LM 进行验证并挑选最佳草案，在多个 RAG 基准测试中提高准确性并降低延迟。

ABSTRACT

Retrieval augmented generation (RAG) combines the generative abilities of large language models (LLMs) with external knowledge sources to provide more accurate and up-to-date responses. Recent RAG advancements focus on improving retrieval outcomes through iterative LLM refinement or self-critique capabilities acquired through additional instruction tuning of LLMs. In this work, we introduce Speculative RAG - a framework that leverages a larger generalist LM to efficiently verify multiple RAG drafts produced in parallel by a smaller, distilled specialist LM. Each draft is generated from a distinct subset of retrieved documents, offering diverse perspectives on the evidence while reducing input token counts per draft. This approach enhances comprehension of each subset and mitigates potential position bias over long context. Our method accelerates RAG by delegating drafting to the smaller specialist LM, with the larger generalist LM performing a single verification pass over the drafts. Extensive experiments demonstrate that Speculative RAG achieves state-of-the-art performance with reduced latency on TriviaQA, MuSiQue, PopQA, PubHealth, and ARC-Challenge benchmarks. It notably enhances accuracy by up to 12.97% while reducing latency by 50.83% compared to conventional RAG systems on PubHealth.

研究动机与目标

在知识密集型问答中通过检索增强生成（RAG）提升效率和准确性。
提出一个分而治之的框架，将起草任务下放给较小的专门化 LM，将验证任务交给较大的通用 LM。
在维持强事实基础的同时，降低从长检索上下文中产生的冗余和位置偏差。
在多个基准测试中展示最先进的性能和更低的延迟。

提出的方法

使用内容感知嵌入将检索到的文献分成 k 个簇，并从每个簇采样一份文献以形成 m 个多样化子集。
让较小的 RAG 起草者并行生成每个子集的答案草案及推理。
较大的通用验证者使用条件生成概率和自我反思提示对每个草案-推理对进行打分。
选择得分最高的草案并将其整合到最终答案中。
RAG 起草者的训练使用在增强三元组（Q, D, A, E）上的指令微调，以生成有据可依的草案和推理（最大化 P(A, E | Q, D)）。
通过多视角采样和聚类覆盖不同的检索视角来确保多样性。

实验结果

研究问题

RQ1小型专门化 RAG 起草者是否能从分区检索中产生高质量、具有多样性的草案，且由更大的 LM 能高效验证？
RQ2通用型 LM 使用理由-grounded 草案进行单次验证是否在准确性和延迟方面优于标准 RAG 和自我批评方法？
RQ3采样策略和评分组件（草案概率、自我包含性、自我反思）如何影响整体性能？

主要发现

RAG 方法	自由形式	TriviaQA	MuSiQue	PubHealth	ARC-Challenge
Standard RAG, Mistral 7B	-	54.15	16.71	34.85	42.75
Standard RAG, Mixtral 8x7B	-	59.85	19.16	37.08	48.72
Standard RAG, Mistral-Instruct 7B	-	67.11	17.99	42.15	47.70
Standard RAG, Mixtral-Instruct 8x7B	-	73.91	29.42	63.63	78.41
Standard RAG, Alpaca 7B	-	64.1	-	40.2	48.1
Self-Reflective RAG (Self-RAG), Mistral 7B	-	64.84	21.72	72.44	74.91
Corrective RAG (CRAG), Mistral 7B	-	-	-	59.04	74.87
Self-CRAG, Mistral 7B	-	-	72.85	75.26	-
Speculative RAG (Drafter 7B alone)	71.11	27.89	75.58	74.49	-
Speculative RAG, Verifier-7B + Drafter-7B	73.91	31.03	75.79	76.19	-
Speculative RAG, Verifier-8x7B + Drafter-7B	74.24	31.57	76.60	80.55	-

Speculative RAG 在 TriviaQA、MuSiQue、PubHealth 和 ARC-Challenge 上始终优于标准 RAG 与若干增强基线。
使用带有指令化的起草者的验证器，在 PubHealth 上的准确度提升最高可达 12.97%，在 ARC-Challenge 上提升至 2.14%，相比最佳标准 RAG 基线。
相较标准 RAG，延迟有所降低，在 PubHealth 上的性能提升可达 51% 的加速。
指令微调的起草者显著提升结果（例如，与 Mixtral-8x7B 搭配时 TriviaQA 提升 14.39%，PubHealth 提升 39.52%）。
消融实验表明多样性增强采样和草案与自洽性/自我反思分数的结合对性能至关重要。
在不同数据集上延迟收益持续存在，Speculative RAG 的延迟低于张量并行基线。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。