QUICK REVIEW

[論文レビュー] DAWN: Dynamic Adversarial Watermarking of Neural Networks

Sebastian Szyller, Buse Gul Atli|arXiv (Cornell University)|Jun 3, 2019

Adversarial Robustness in Machine Learning参考文献 49被引用数 61

ひとこと要約

DAWN は、モデルの予測 API 上に動的でクライアント固有の透かしを埋め込み、モデル抽出攻撃を抑止し、実用性の低下をほぼ生じさせずに所有権を検証できる。

ABSTRACT

Training machine learning (ML) models is expensive in terms of computational power, amounts of labeled data and human expertise. Thus, ML models constitute intellectual property (IP) and business value for their owners. Embedding digital watermarks during model training allows a model owner to later identify their models in case of theft or misuse. However, model functionality can also be stolen via model extraction, where an adversary trains a surrogate model using results returned from a prediction API of the original model. Recent work has shown that model extraction is a realistic threat. Existing watermarking schemes are ineffective against IP theft via model extraction since it is the adversary who trains the surrogate model. In this paper, we introduce DAWN (Dynamic Adversarial Watermarking of Neural Networks), the first approach to use watermarking to deter model extraction IP theft. Unlike prior watermarking schemes, DAWN does not impose changes to the training process but it operates at the prediction API of the protected model, by dynamically changing the responses for a small subset of queries (e.g., <0.5%) from API clients. This set is a watermark that will be embedded in case a client uses its queries to train a surrogate model. We show that DAWN is resilient against two state-of-the-art model extraction attacks, effectively watermarking all extracted surrogate models, allowing model owners to reliably demonstrate ownership (with confidence $>1- 2^{-64}$), incurring negligible loss of prediction accuracy (0.03-0.5%).

研究の動機と目的

MLモデルの知的財産保護を動機づけ、予測 API を介したモデル抽出を抑止する。
動的で対 adversarial な透かし機構を導入し、API クエリの間に動作させる。
透かしがクライアント固有で、削除が困難で、問合せを行ったクライアントにリンク可能であることを保証する。

提案手法

透かし生成は、どの入力を透かしにするか、どの不正確なラベルを返すかを決定するために、暗号学的ハッシュ（HMAC-SHA256）を使用する。
透かしはバックドアであり、トリガ集合 T_A は、F_V の応答を B_V(x) に置換する API クエリの一部割合 r_w で構成される。
バックドア関数 B_V は、元の予測確率の鍵付き順序置換で、透かし出力が正当な予測に似せるようにする。
小さな入力摂動に対して W_V および B_V を安定化させるため、入力を表現 M_V(x) に写像して識別不可性を改善する。
検証は、トリガ集合の入力のうち、代理モデル F_A の出力が透かしラベルと異なる割合を計算し、所有権を宣言する確率的閾値 e を用いる。
所有権の実証には信頼できる審判と公開元帳を用い、コミットメントを公開し代理性を検証する。

実験結果

リサーチクエスチョン

RQ1DAWN は、モデル抽出によって作成されるすべての代理モデルに対して信頼できる透かしを適用でき、元のモデルの有用性を大幅に劣化させずに済むか。
RQ2透かしは敵対的操作や最先端の抽出攻撃に対して頑健か。
RQ3所有権を特定の API クライアントにリンク付けし、信頼できるプロセスを介して検証可能に示すことができるか。

主な発見

DAWN は 1 − 2^(-64) を超える信頼度で所有権の実証を達成する。
透かしは保護されたモデルにおいて精度低下を0.03%から0.5%程度の甚だしく小さなものに留める。
透かしはクライアント固有で、問合せ API クライアントにリンク可能である。
DAWN は最先端のモデル抽出攻撃2件に対して頑健である。
この手法は保護対象モデルの学習プロセスを変更することを必要とせず、予測 API レベルで動作する。
複数のモデルとデータセットでの実験は、敵対的操作と回避に対する頑健性を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。