QUICK REVIEW

[Paper Review] Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu|arXiv (Cornell University)|Mar 23, 2026

Adversarial Robustness in Machine Learning0 citations

TL;DR

This paper provides an end-to-end survey of security risks in Retrieval-Augmented Generation (RAG), categorizing threat vectors, defenses, and evaluation benchmarks across the RAG pipeline.

ABSTRACT

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.

Motivation & Objective

Clarify RAG architecture and identify security risks across its modules (vector DB construction, retriever, generator).
Categorize threat vectors including data poisoning, adversarial attacks, embedding inversion, and membership inference attacks.
Summarize defense technologies and evaluation benchmarks to guide robust and trustworthy RAG systems.
Consolidate datasets, standards, and frameworks to establish unified security evaluation for RAG research.

Proposed method

Systematically map threat models and defenses along the RAG pipeline based on a survey of 152 papers.
Classify threats into data poisoning, adversarial/inversion, and membership inference attacks.
Review defense mechanisms at input and output stages, including privacy-preserving and robustness techniques.
Consolidate test datasets, security standards, and evaluation frameworks to propose a unified benchmarking view.

Experimental results

Research questions

RQ1What are the major security threats across the RAG architecture (vector DB construction, retriever, generator) and how do they operate?
RQ2What defense strategies exist for input-side and output-side security in RAG systems, and how effective are they?
RQ3What benchmarks and standards exist for evaluating RAG security, and how can they be unified for future research?
RQ4How do data poisoning, adversarial, embedding inversion, and membership inference attacks exploit RAG weaknesses?
RQ5What future directions can strengthen the security and trustworthiness of RAG systems?

Key findings

The paper presents a taxonomy of RAG threats and defenses across vector DB construction, retrieval, and generation stages.
Data poisoning attacks are identified as a dominant threat vector with evolving attack methods spanning from heuristic splicing to bi-level optimization.
Membership inference attacks in RAG leverage the retrieval-generation dynamic to infer knowledge base membership, posing privacy risks.
The survey highlights that current defenses focus on general frameworks and privacy preservation, with a need for unified evaluation benchmarks.
It consolidates datasets, security standards, and evaluation frameworks to guide future experimental design in RAG security.

Better researchstarts right now

From paper design to paper writing, dramatically reduce your research time.

No credit card · Free plan available

This review was created by AI and reviewed by human editors.