QUICK REVIEW

[논문 리뷰] Towards Secure Retrieval-Augmented Generation: A Comprehensive Review of Threats, Defenses and Benchmarks

Yanming Mu, Hao Hu|arXiv (Cornell University)|2026. 03. 23.

Adversarial Robustness in Machine Learning인용 수 0

한 줄 요약

이 논문은 RAG( Retrieval-Augmented Generation )의 보안 위험에 대한 엔드-투-엔드 설문조사를 제공하며, 위협 벡터, 방어책 및 RAG 파이프라인 전반의 평가 벤치마크를 분류한다.

ABSTRACT

Retrieval-Augmented Generation (RAG) significantly mitigates the hallucinations and domain knowledge deficiency in large language models by incorporating external knowledge bases. However, the multi-module architecture of RAG introduces complex system-level security vulnerabilities. Guided by the RAG workflow, this paper analyzes the underlying vulnerability mechanisms and systematically categorizes core threat vectors such as data poisoning, adversarial attacks, and membership inference attacks. Based on this threat assessment, we construct a taxonomy of RAG defense technologies from a dual perspective encompassing both input and output stages. The input-side analysis reviews data protection mechanisms including dynamic access control, homomorphic encryption retrieval, and adversarial pre-filtering. The output-side examination summarizes advanced leakage prevention techniques such as federated learning isolation, differential privacy perturbation, and lightweight data sanitization. To establish a unified benchmark for future experimental design, we consolidate authoritative test datasets, security standards, and evaluation frameworks. To the best of our knowledge, this paper presents the first end-to-end survey dedicated to the security of RAG systems. Distinct from existing literature that isolates specific vulnerabilities, we systematically map the entire pipeline-providing a unified analysis of threat models, defense mechanisms, and evaluation benchmarks. By enabling deep insights into potential risks, this work seeks to foster the development of highly robust and trustworthy next-generation RAG systems.

연구 동기 및 목표

RAG 아키텍처를 명확히 하고 그 모듈들(벡터 DB 구축, 검색기, 생성기) 전반의 보안 위험을 식별한다.
데이터 중독, 적대적 공격, 임베딩 역전, 그리고 멤버십 추론 공격을 포함한 위협 벡터를 분류한다.
강건하고 신뢰할 수 있는 RAG 시스템을 위한 방어 기술과 평가 벤치마크를 요약한다.
RAG 연구를 위한 통합 보안 평가를 확립하기 위한 데이터세트, 표준 및 프레임워크를 통합한다.

제안 방법

152편의 논문 조사를 바탕으로 RAG 파이프라인에 따른 위협 모델과 방어책을 체계적으로 매핑한다.
위협을 데이터 중독, 적대적/역전, 멤버십 추론 공격으로 분류한다.
입력 및 출력 단계의 방어 메커니즘, 프라이버시 보존 및 강건성 기법을 검토한다.
테스트 데이터세트, 보안 표준 및 평가 프레임워크를 통합하여 통합 벤치마킹 관점을 제시한다.

실험 결과

연구 질문

RQ1RAG 아키텍처(벡터 DB 구축, 검색기, 생성기) 전체에 걸친 주요 보안 위협은 무엇이며 어떻게 작동하는가?
RQ2RAG 시스템의 입력 측과 출력 측 보안에 어떤 방어 전략이 존재하며 그것들은 얼마나 효과적인가?
RQ3RAG 보안을 평가하기 위한 벤치마크와 표준은 무엇이며 향후 연구를 위해 어떻게 통합될 수 있는가?
RQ4데이터 중독, 적대적/역전, 임베딩 역전, 멤버십 추론 공격이 RAG의 취약점을 어떻게 악용하는가?
RQ5RAG 시스템의 보안성과 신뢰성을 강화할 수 있는 미래 방향은 무엇인가?

주요 결과

본 논문은 벡터 DB 구축, 검색, 생성 단계 전반에 걸친 RAG 위협과 방어의 분류체계를 제시한다.
데이터 중독 공격은 지배적 위협 벡터로 식별되며 휴리스틱 스플라이싱에서 이중 수준 최적화에 이르는 진화하는 공격 방법을 포괄하는 지배적 위협 벡터로 식별된다.
RAG의 멤버십 추론 공격은 검색-생성 동적을 활용해 지식 기반의 멤버십 여부를 추론하고 프라이버시 위험을 야기한다.
본 조사는 현재 방어가 일반 프레임워크와 프라이버시 보존에 초점을 맞추고 있으며 통합 평가 벤치마크의 필요성이 있음을 강조한다.
본 연구는 향후 RAG 보안 실험 설계를 안내하기 위해 데이터 세트, 보안 표준 및 평가 프레임워크를 통합한다.

더 나은 연구,지금 바로 시작하세요

연구 설계부터 논문 작성까지, 연구 시간을 획기적으로 줄여보세요.

카드 등록 없음 · 무료 플랜 제공

이 리뷰는 AI가 만들고, 인간 에디터가 검토했습니다.