QUICK REVIEW

[論文レビュー] DeepSigns: A Generic Watermarking Framework for IP Protection of Deep Learning Models

Bita Darvish Rouhani, Huili Chen|arXiv (Cornell University)|Apr 2, 2018

Adversarial Robustness in Machine Learning被引用数 84

ひとこと要約

DeepSigns は層間の活性化分布を改変し、出力層のポスト処理ステップを利用して深層学習モデルに頑健なデジタルウォーターマークを埋め込み、white-box および black-box 設定の双方でIP所有権の証明を可能にしつつ、モデルの精度を維持し一般的な攻撃に耐性を持たせます。

ABSTRACT

Deep Learning (DL) models have caused a paradigm shift in our ability to comprehend raw data in various important fields, ranging from intelligence warfare and healthcare to autonomous transportation and automated manufacturing. A practical concern, in the rush to adopt DL models as a service, is protecting the models against Intellectual Property (IP) infringement. The DL models are commonly built by allocating significant computational resources that process vast amounts of proprietary training data. The resulting models are therefore considered to be the IP of the model builder and need to be protected to preserve the owner's competitive advantage. This paper proposes DeepSigns, a novel end-to-end IP protection framework that enables insertion of coherent digital watermarks in contemporary DL models. DeepSigns, for the first time, introduces a generic watermarking methodology that can be used for protecting DL owner's IP rights in both white-box and black-box settings, where the adversary may or may not have the knowledge of the model internals. The suggested methodology is based on embedding the owner's signature (watermark) in the probability density function (pdf) of the data abstraction obtained in different layers of a DL model. DeepSigns can demonstrably withstand various removal and transformation attacks, including model compression, model fine-tuning, and watermark overwriting. Proof-of-concept evaluations on MNIST, and CIFAR10 datasets, as well as a wide variety of neural network architectures including Wide Residual Networks, Convolution Neural Networks, and Multi-Layer Perceptrons corroborate DeepSigns' effectiveness and applicability.

研究の動機と目的

サービスとしての展開が増える中、深層学習モデルのIP保護を動機づける。
white-box と black-box の両方の設定で機能する汎用ウォーターマークフレームワークを提案する。
隠れ層の活性化分布および訓練後の出力層にウォーターマークを埋め込み、精度を損なわない。
モデルの圧縮、微調整、およびウォーターマークの上書きに対する頑健性を示す。
実用的な指標と API を提供し、さまざまなアーキテクチャでの採用を促進する。

提案手法

レイヤーごとの活性化分布の平均値（Gaussian Mixture Model prior）に N ビットのウォーターマーク文字列を埋め込む。
訓練損失を、活性化の平均値を選択した Gaussian centers に合わせるよう促す項 (loss1) で増補し、もう1つの項 (loss2) を含めて活性化特徴をウォーターマークビットの二値化投影に向けて SGD 最適化で押し出す。
ランダム射影行列 A を用い、シグモイドの後に硬閾値処理を適用して、選択した Gaussian centers をウォーターマークビット (b) に写像する。
訓練中に loss0（分類）、loss1（GMM整列）、loss2（ウォーターマークビット整列）を共同最適化して、精度を犠牲にせずウォーターマークを埋め込む。
出力層をポスト処理ステップとして、クラス条件付き分布の尾部領域を利用して K 個の入力キーを生成し、それらのキーでファインチューニングしてキーサンプルの正しいタグ付けを強制する。

実験結果

リサーチクエスチョン

RQ1汎用ウォーターマークフレームワークは white-box および black-box の展開の両方で DL モデルの所有権を保護できるか？
RQ2アーキテクチャ（MLP、CNN、ResNet、WideResNet）を横断して、ベースモデルの精度を劣化させずに頑健なウォーターマークを埋め込むことは可能か？
RQ3剪定、微調整、上書きといった一般的な DL モデル変換に対するウォーターマークの耐性はどの程度か？
RQ4設定を超えて誤検出を低く抑えつつ、 ownership を信頼性高く検証できるウォーターマーク抽出は可能か？
RQ5実世界の DL 実践での採用を可能にする実用的な指標と API サポートは何が必要か？

主な発見

DeepSigns は中間活性化の pdf と出力層に binary information を埋め込むことで、評価されたモデルの予測精度を低下させずにウォーターマークを埋め込むことができる。
同フレームワークは MNIST、CIFAR-10、および複数のアーキテクチャ（MLP、CNN、WideResNet）にわたる大規模な実験で剪定、微調整、およびウォーターマークの上書きに対する頑健性を示す。
隠れ層での機能的ウォーターマーキング（Gaussian centers と射影による）と、訓練後データ-キーに基づくトリガーによる出力層ウォーターマーキングという二重のウォーターマーキング手法を提供する。
慎重に選択されたキーと閾値を通じて偽陽性を制御するメカニズムを備え、高い検出能力を発揮し、white-box および black-box の両方のシナリオに適用可能である。
今後の DL ウォーターマーキング手法との採用と比較を促進する API と評価指標のセットが提案されている。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。