QUICK REVIEW

[論文レビュー] Practical Membership Inference Attacks against Fine-tuned Large Language Models via Self-prompt Calibration

Wenjie Fu, Huandong Wang|arXiv (Cornell University)|Nov 10, 2023

Topic Modeling被引用数 10

ひとこと要約

本論文は SPV-MIA を導入します。これは微調整された LLM に対する記憶ベースのメンバーシップ推定攻撃で、自己プロンプトで作成した参照モデルを用いて確率的変動を校正し、ベースラインより高い AUC を達成します（報告されている比較で約 23.6%〜30%のゲイン）。

ABSTRACT

Membership Inference Attacks (MIA) aim to infer whether a target data record has been utilized for model training or not. Existing MIAs designed for large language models (LLMs) can be bifurcated into two types: reference-free and reference-based attacks. Although reference-based attacks appear promising performance by calibrating the probability measured on the target model with reference models, this illusion of privacy risk heavily depends on a reference dataset that closely resembles the training set. Both two types of attacks are predicated on the hypothesis that training records consistently maintain a higher probability of being sampled. However, this hypothesis heavily relies on the overfitting of target models, which will be mitigated by multiple regularization methods and the generalization of LLMs. Thus, these reasons lead to high false-positive rates of MIAs in practical scenarios. We propose a Membership Inference Attack based on Self-calibrated Probabilistic Variation (SPV-MIA). Specifically, we introduce a self-prompt approach, which constructs the dataset to fine-tune the reference model by prompting the target LLM itself. In this manner, the adversary can collect a dataset with a similar distribution from public APIs. Furthermore, we introduce probabilistic variation, a more reliable membership signal based on LLM memorization rather than overfitting, from which we rediscover the neighbour attack with theoretical grounding. Comprehensive evaluation conducted on three datasets and four exemplary LLMs shows that SPV-MIA raises the AUC of MIAs from 0.7 to a significantly high level of 0.9. Our code and dataset are available at: https://github.com/tsinghua-fib-lab/NeurIPS2024_SPV-MIA

研究の動機と目的

微調整された LLM パイプラインにおけるプライバシーリスクを動機付け、オーバーフィット仮定を超えたメンバーシップリスクを定量化する。
オーバーフィットよりも記憶化をメンバーシップシグナルとして依存する頑健な MIA を開発する。
ターゲット LLM 自身を用いて校正する自己プロンプト手法を導入し、参照モデルを作成する。
SPV-MIA を複数の LLM およびデータセットで評価し、実用的なプライバシー漏洩を示す。

提案手法

局所確率最大値周辺の記憶ベースのシグナルとして確率的変動を定義する。
マスクフィルモデル（例：T5）によって生成されたターゲット文の paraphrase 変異体を用いて確率的変動を推定する。
自己プロンプトされたデータで学習した参照モデルを用いて記憶化シグナルを較正する（自己プロンプト）。
攻撃を A_our(x, θ, φ) = 1[ ˜p_θ(x) - ˜p_φ(x) ˜ ≤ τ ] の形で定式化する。ここで ˜p はターゲットモデルと参照モデルの確率的変動推定を表す。
二段階のワークフローを組み込む：パラフレーズベースの近傍サンプリングによる p̃_θ の評価と、自己プロンプトで得られたデータでファインチューニングされた φ による較正。

実験結果

リサーチクエスチョン

RQ1SPV-MIA は実用的で記憶化駆動の LLM に対する最先端 MIAs を上回るか？
RQ2自己プロンプト参照モデルの品質が攻撃性能にどのように影響するか？
RQ3SPV-MIA に対する異なるファインチューニング技術の影響は？
RQ4プライバシー防御は SPV-MIA 攻撃に耐えられるか？

主な発見

SPV-MIA は four LLMs と three datasets にわたりベースラインを一貫して上回り、平均 AUC は 92.4% である。
最も強力なベースライン（LiRA-Candidate）と比較して、報告された比較で SPV-MIA は AUC を約 30% 向上させる。
要約で SPV-MIA の AUC がベースラインより全体で約 23.6% 改善されると報告されている。
自己プロンプト参照モデルは、トレーニング分布からの適合参照データセットにアクセスせずに効果的に較正できる。
アブレーション研究は、SPV-MIA の各モジュール（確率的変動評価と自己プロンプト較正）が攻撃の有効性に与える寄与を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。