QUICK REVIEW

[论文解读] Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu|arXiv (Cornell University)|Mar 23, 2026

Adversarial Robustness in Machine Learning被引用 0

一句话总结

这篇论文对检索增强生成（RAG）中的安全风险进行了端到端的综述，按RAG管线对威胁向量、防御和评估基准进行分类。

ABSTRACT

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.

研究动机与目标

澄清RAG体系结构并识别其模块（向量数据库构建、检索器、生成器）中的安全风险。
对威胁向量进行分类，包括数据污染、对抗攻击、嵌入反演和成员身份推断攻击。
总结防御技术和评估基准，以引导鲁棒且可信的RAG系统。
整合数据集、标准和框架，建立统一的RAG研究安全评估。

提出的方法

基于对152篇论文的综述，系统性绘制RAG管线中的威胁模型及防御措施。
将威胁分为数据污染、对抗/反演与成员身份推断攻击。
评审输入端和输出端的防御机制，包括隐私保护和鲁棒性技术。
整合测试数据集、安全标准和评估框架，提出统一的基准视图。

实验结果

研究问题

RQ1RAG体系结构（向量数据库构建、检索器、生成器）中存在哪些主要安全威胁，它们如何运作？
RQ2RAG系统中输入侧和输出侧的安全有哪些防御策略，它们的有效性如何？
RQ3现有哪些评估RAG安全性的基准和标准，如何为未来研究统一？
RQ4数据污染、对抗、嵌入反演和成员身份推断攻击如何利用RAG的弱点？
RQ5哪些未来方向可以提升RAG系统的安全性和可信度？

主要发现

论文提出了覆盖向量数据库构建、检索和生成阶段的RAG威胁与防御的分类法。
数据污染攻击被识别为主要威胁向量，攻击方式从启发式拼接到双层优化不断发展。
RAG中的成员身份推断攻击利用检索-生成动态推断知识库成员身份，带来隐私风险。
综述指出当前的防御多聚焦于一般框架和隐私保护，需要统一的评估基准。
整合了数据集、安全标准和评估框架以指导未来在RAG安全方面的实验设计。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。