QUICK REVIEW

[論文レビュー] StruQ: Defending Against Prompt Injection with Structured Queries

Sizhe Chen, Julien Piet|arXiv (Cornell University)|Feb 9, 2024

Advanced Database Systems and Queries被引用数 5

ひとこと要約

StruQ は、データとプロンプトを構造化クエリで分離し、構造化命令チューニングを用いてデータ部分の指示を無視する専用の LLM を訓練することにより、データ部分の命令を無視し、プロンプト注入の成功を多くの攻撃タイプに対して低減し、実用性の低下を最小限に抑える。

ABSTRACT

Recent advances in Large Language Models (LLMs) enable exciting LLM-integrated applications, which perform text-based tasks by utilizing their advanced language understanding capabilities. However, as LLMs have improved, so have the attacks against them. Prompt injection attacks are an important threat: they trick the model into deviating from the original application's instructions and instead follow user directives. These attacks rely on the LLM's ability to follow instructions and inability to separate prompts and user data. We introduce structured queries, a general approach to tackle this problem. Structured queries separate prompts and data into two channels. We implement a system that supports structured queries. This system is made of (1) a secure front-end that formats a prompt and user data into a special format, and (2) a specially trained LLM that can produce high-quality outputs from these inputs. The LLM is trained using a novel fine-tuning strategy: we convert a base (non-instruction-tuned) LLM to a structured instruction-tuned model that will only follow instructions in the prompt portion of a query. To do so, we augment standard instruction tuning datasets with examples that also include instructions in the data portion of the query, and fine-tune the model to ignore these. Our system significantly improves resistance to prompt injection attacks, with little or no impact on utility. Our code is released at https://github.com/Sizhe-Chen/StruQ.

研究の動機と目的

OWASP がトップリスクとして識別した LL M 統合アプリケーションにおけるプロンプト注入をセキュリティリスクとして対処する。
コントロール（プロンプト）とデータを分離した安全設計の LLM インタフェースを提案する。
データベースの指示を無視し、プロンプトベースの指示のみ従うようにするトレーニング手法を開発する。

提案手法

特殊トークンとセパレータを用いてプロンプトとデータをエンコードするフロントエンドを備えた構造化クエリを導入する。
ユーザデータから区切りトークンを除外して完了ベースの攻撃を防ぐ。
構造化命令チューニングを用いてプロンプト部分の指示に従い、データ部分の指示を無視するよう基本LLMを訓練する。
データベースの指示を無視するよう教えるため、クリーンなサンプルと attacked samples (Naive and Completion-Other attacks) を含む構造化命令チューニングデータセットを構築する。
11種の攻撃技法に対するセキュリティを評価し、AlpacaEval 1.0 で実用性を測定する。
完成攻撃を緩和するためにフロントエンドのトークン埋め込み初期化と専用のデリミタポリシーを使用する。

実験結果

リサーチクエスチョン

RQ1構造化クエリはコントロールとデータを意味のある方法で分離して、プロンプト注入攻撃から防御できるか。
RQ2構造化命令チューニングは LLM がデータに埋め込まれた指示を無視しつつ、タスクの実用性を維持できるか。
RQ3StruQ は Completion および TAP 攻撃を含む幅広いプロンプト注入技法に対してどれだけ効果的か。

主な発見

StruQ は Alpaca および Mistral の多くの tested technique に対する攻撃成功率を 2% 未満に低減する。
StruQ は TAP ベースの攻撃成功率を著しく低減（Alpaca で 97%→9%、Mistral で 100%→36%）。
ユーティリティは AlpacaEval で約1標準誤差分しか低下しない。
デリミタ設計とフロントエンドのフィルタリングにより Completion 攻撃を抑制。
このアプローチはユーティリティの小さな劣化にとどまりつつ、多くのプロンプト注入手法に対する広範な防御を提供する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。