[論文レビュー] Certifiably Robust RAG against Retrieval Corruption
tldr: RobustRAG は、取得の破損に対して認証可能な堅牢性を提供する isolate-then-aggregate 防御を導入し、孤立したパッセージ応答をキーワードまたはデコードベースの手法で集約することにより実現します。
Retrieval-augmented generation (RAG) has been shown vulnerable to retrieval corruption attacks: an attacker can inject malicious passages into retrieval results to induce inaccurate responses. In this paper, we propose RobustRAG as the first defense framework against retrieval corruption attacks. The key insight of RobustRAG is an isolate-then-aggregate strategy: we get LLM responses from each passage in isolation and then securely aggregate these isolated responses. To instantiate RobustRAG, we design keyword-based and decoding-based algorithms for securely aggregating unstructured text responses. Notably, RobustRAG can achieve certifiable robustness: we can formally prove and certify that, for certain queries, RobustRAG can always return accurate responses, even when the attacker has full knowledge of our defense and can arbitrarily inject a small number of malicious passages. We evaluate RobustRAG on open-domain QA and long-form text generation datasets and demonstrate its effectiveness and generalizability across various tasks and datasets.
研究の動機と目的
- Motivate robustness in RAG systems against retrieval corruption attacks.
- Propose RobustRAG with an isolate-then-aggregate workflow to prevent corrupted passages from influencing final outputs.
- Design and certify secure aggregation techniques (keyword and decoding) for unstructured text responses.
- Demonstrate robustness guarantees and effectiveness across open-domain QA and long-form generation tasks.
提案手法
- Adopt an isolate-then-aggregate strategy: compute LLM responses from each passage in isolation, then securely aggregate results.
- Develop two secure text aggregation techniques: (i) Secure Keyword Aggregation which extracts and counts keywords across responses and prompts the LLM with top keywords; (ii) Secure Decoding Aggregation which aggregates next-token probability vectors across isolated responses during decoding.
- Provide formal robustness certification (tau-certifiable robustness) showing guarantees under bounded retrieval corruption (k' malicious passages in top-k).
- Use greedy decoding for determinism in experiments and enable certifiable analysis.
- Evaluate across multiple datasets (RealtimeQA, NQ, Bio) and LLMs (Mistral, Llama, GPT-3.5) to demonstrate generality.
実験結果
リサーチクエスチョン
- RQ1Can RobustRAG guarantee correct outputs for certain queries despite up to k' injected malicious passages?
- RQ2How can unstructured text responses be securely aggregated to resist retrieval corruption?
- RQ3Do keyword-based and decoding-based aggregation methods provide formal robustness guarantees across open-domain QA and long-form generation tasks?
- RQ4How does RobustRAG perform in terms of clean accuracy versus robustness under different task settings and models?
主な発見
- RobustRAG achieves 69.0–71.0% certifiable robust accuracy on RQA-MC across evaluated LLMs.
- RobustRAG achieves 24.0–49.0% certifiable robust accuracy on RQA, and 27.0–47.0% on NQ, with 24.0–51.2% certifiable LLM-judge scores on Bio.
- Clean performance remains high, with drops typically below 11% compared to vanilla RAG across tasks.
- Empirical attacks (PIA and Poison) show RobustRAG maintaining robust accuracy or judge scores with attack success rates largely below 10%.
- RobustRAG’s certifiable robustness is a lower bound to empirical robustness, and retrieval-enhanced generation remains advantageous over no-retrieval baselines under corruption.
より良い研究を、今すぐ始めましょう
論文設計から論文執筆まで、研究時間を劇的に削減しましょう。
クレジットカード登録不要
このレビューはAIが作成し、人間の編集者が確認しました。