QUICK REVIEW

[論文レビュー] PoisonedRAG: Knowledge Corruption Attacks to Retrieval-Augmented Generation of Large Language Models

Wei Zou, Runpeng Geng|arXiv (Cornell University)|Feb 12, 2024

Topic Modeling被引用数 11

ひとこと要約

PoisonedRAGは、Retrieval-Augmented Generation (RAG) に対する知識汚染攻撃を提示し、攻撃者が選んだ質問に対して攻撃者が選んだ回答を強制する。二部構成の汚染デザインと白箱・黒箱の攻撃戦略を用いる。最小限の汚染で高い攻撃成功率を達成し、防御が不十分であることを示す。

ABSTRACT

Large language models (LLMs) have achieved remarkable success due to their exceptional generative capabilities. Despite their success, they also have inherent limitations such as a lack of up-to-date knowledge and hallucination. Retrieval-Augmented Generation (RAG) is a state-of-the-art technique to mitigate these limitations. The key idea of RAG is to ground the answer generation of an LLM on external knowledge retrieved from a knowledge database. Existing studies mainly focus on improving the accuracy or efficiency of RAG, leaving its security largely unexplored. We aim to bridge the gap in this work. We find that the knowledge database in a RAG system introduces a new and practical attack surface. Based on this attack surface, we propose PoisonedRAG, the first knowledge corruption attack to RAG, where an attacker could inject a few malicious texts into the knowledge database of a RAG system to induce an LLM to generate an attacker-chosen target answer for an attacker-chosen target question. We formulate knowledge corruption attacks as an optimization problem, whose solution is a set of malicious texts. Depending on the background knowledge (e.g., black-box and white-box settings) of an attacker on a RAG system, we propose two solutions to solve the optimization problem, respectively. Our results show PoisonedRAG could achieve a 90% attack success rate when injecting five malicious texts for each target question into a knowledge database with millions of texts. We also evaluate several defenses and our results show they are insufficient to defend against PoisonedRAG, highlighting the need for new defenses.

研究の動機と目的

Retrieval-Augmented Generation (RAG) における知識汚染攻撃に対するセキュリティ上のギャップを強調する。
選択された質問に対して攻撃者が選んだ回答を引き出す標的汚染フレームワークとして PoisonedRAG を提案する。
複数データセット・リトリーバ・LLMにわたる小規模な汚染で攻撃の効果を示す。
既存の防御を評価し、PoisonedRAGの緩和に対する無力さを示す。

提案手法

汚染を、攻撃者定義の回答成功を最大化する制約付き最適化問題として定式化する。
汚染テキストを、検索性と有効性の条件を満たすよう S および I の二つのサブテキストに分解する。
Iを、文脈として用いたときに目標の回答が出現するようにLLMを用いて生成する。
黒箱設定では、S = Qとしてターゲット質問への検索類似性を促進する。
白箱設定では、埋め込み類似性を通じてSを最適化し、検索の発生確率を最大化する。

実験結果

リサーチクエスチョン

RQ1小さな汚染テキストのセットが、RAGシステム内のLLMに対して攻撃者が選んだ質問に対して攻撃者が選んだターゲット回答を生成させることができますか？
RQ2PoisonedRAG攻撃は、異なるデータセット・リトリーバ・大規模言語モデル間でどれだけ効果がありますか？
RQ3現在の防御は、RAGにおける知識汚染攻撃に対して十分でしょうか？
RQ4汚染テキストを構築する際の実践的なトレードオフ（汚染率、必要なクエリ数）は何ですか？

主な発見

PoisonedRAGは高い攻撃成功率を達成でき、例えばNatural Questionsで1つのターゲット質問につき5つの汚染テキストを注入して97%のASRを達成する。
黒箱設定では、2,681,468件中5件の汚染率（約0.0002%）でNQに対して高いASRを達成できる。
PoisonedRAGは、評価対象データセットで最先端ベースラインを上回る。
パラフレーズや困惑度に基づく検出などの防御はPoisonedRAGに対して不十分である。
PoisonedRAGは複数のデータセット（NQ、HotpotQA、MS-MARCO）およびLLM（PaLM 2、GPT-4、LLaMA-2 など）にわたり有効であり続ける。
アブレーション研究は、RAGとPoisonedRAGの両方のハイパーパラメータに対する頑健性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。