QUICK REVIEW

[論文レビュー] BlackMarks: Blackbox Multibit Watermarking for Deep Neural Networks

Huili Chen, Bita Darvish Rouhani|arXiv (Cornell University)|Mar 31, 2019

Advanced Steganography and Watermarking Techniques参考文献 32被引用数 41

ひとこと要約

BlackMarksは、DNNのマルチビットウォーターマーキングのための、最初のエンドツーエンドのブラックボックスフレームワークを導入します。ファインチューニングを通じてモデル出力に二進署名を埋め込み、クエリキーを用いて抽出し、オーバーヘッドを低くし高い堅牢性を実現します。

ABSTRACT

Deep Neural Networks have created a paradigm shift in our ability to comprehend raw data in various important fields ranging from computer vision and natural language processing to intelligence warfare and healthcare. While DNNs are increasingly deployed either in a white-box setting where the model internal is publicly known, or a black-box setting where only the model outputs are known, a practical concern is protecting the models against Intellectual Property (IP) infringement. We propose BlackMarks, the first end-to-end multi-bit watermarking framework that is applicable in the black-box scenario. BlackMarks takes the pre-trained unmarked model and the owner's binary signature as inputs and outputs the corresponding marked model with a set of watermark keys. To do so, BlackMarks first designs a model-dependent encoding scheme that maps all possible classes in the task to bit '0' and bit '1' by clustering the output activations into two groups. Given the owner's watermark signature (a binary string), a set of key image and label pairs are designed using targeted adversarial attacks. The watermark (WM) is then embedded in the prediction behavior of the target DNN by fine-tuning the model with generated WM key set. To extract the WM, the remote model is queried by the WM key images and the owner's signature is decoded from the corresponding predictions according to the designed encoding scheme. We perform a comprehensive evaluation of BlackMarks's performance on MNIST, CIFAR10, ImageNet datasets and corroborate its effectiveness and robustness. BlackMarks preserves the functionality of the original DNN and incurs negligible WM embedding runtime overhead as low as 2.054%.

研究の動機と目的

ブラックボックス設定（MLaaS）におけるDNNの知的財産保護を動機づける。
内部情報を使用せずに動作する、スケーラブルでマルチビットのウォーターマーキングフレームワークを開発する。
クラス出力をビットにモデル依存的にエンコードする設計と、特定の敵対的攻撃を通じてウォーターマーク鍵を生成する。
精度を維持しつつウォーターマークを埋め込むよう水印付きモデルをファインチューニングする。
誤検知/見逃しが少ない頑健な抽出と検証手順を提供する。

提案手法

クラス平均値（前ソフトマックス）に基づくK-meansでクラス出力を2ビットクラスタにクラスタリングし、所有者署名をDNN出力活性化にエンコードする。
エンコーディング方式に合わせて、ターゲット攻撃を用いてウォーターマーク鍵の画像とラベルを生成する。
標準交差エントロピーとWM特有のロスを組み合わせた正則化損失で事前訓練モデルをファインチューニングして署名を埋め込む。
WM鍵で問い合わせてエンコード方式を適用することでモデル予測から所有者署名をデコードし、ビット誤り率(BER)を算出する。
ウォーターマーク鍵の転移性を防ぐため、初期鍵集合を大きく設定し、鍵選択時に未マーキングのモデル variante での交差を取る。
一度きりの埋め込みオーバーヘッド（最小で2.054%）とブラックボックス抽出コストの効率分析を提供する。

実験結果

リサーチクエスチョン

RQ1ブラックボックスDNNのウォーターマークはマルチビット容量で知的財産保護を強化できるか？
RQ2モデル依存のエンコーディングをどのようにして出力をビットへマップし、精度を損なうことなくマルチビットウォーターマーキングをサポートできるか？
RQ3このようなウォーターマークのファインチューニング、プルーニング、上書きに対するブラックボックス設定での堅牢性はどの程度か？
RQ4ブラックボックス環境におけるウォーターマークの完全性と信頼性をどのように定量化し、保証できるか？

主な発見

BlackMarksは、指定された鍵セットを使用した場合、MNIST、CIFAR-10、ImageNetで埋め込み後にBERゼロで高いウォーターマーク検出を達成します。
このフレームワークはMNIST、CIFAR-10、ImageNetでそれぞれ95%、80%、90%のパラメータプルーニングを検出精度の低下なしに許容します（ただし過度なプルーニングは精度を損ないます）。
モデルのファインチューニング後もウォーターマークは検出可能で（実験では最大100エポック）、すべてのベンチマークでBERはゼロのままです。
新しいウォーターマークで上書きしても元のウォーターマークの回復を妨げず、BERはゼロのままです。
埋め込みは低い実行時オーバーヘッド（最小で2.054%）をもたらし、ウォーターマーキング手法は敵対的堅牢性の向上にも寄与するようで、いくつかの攻撃下で精度が上昇すると著者は指摘します。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。