QUICK REVIEW

[論文レビュー] Unlearnable Examples: Making Personal Data Unexploitable

Hanxun Huang, Xingjun Ma|arXiv (Cornell University)|Jan 13, 2021

Privacy-Preserving Technologies in Data参考文献 58被引用数 46

ひとこと要約

本論文は誤差を最小化するノイズを導入して学習不能な訓練例を作成し、通常の利用時にはデータの有用性を保つ。

ABSTRACT

The volume of "free" data on the internet has been key to the current success of deep learning. However, it also raises privacy concerns about the unauthorized exploitation of personal data for training commercial models. It is thus crucial to develop methods to prevent unauthorized data exploitation. This paper raises the question: \emph{can data be made unlearnable for deep learning models?} We present a type of \emph{error-minimizing} noise that can indeed make training examples unlearnable. Error-minimizing noise is intentionally generated to reduce the error of one or more of the training example(s) close to zero, which can trick the model into believing there is "nothing" to learn from these example(s). The noise is restricted to be imperceptible to human eyes, and thus does not affect normal data utility. We empirically verify the effectiveness of error-minimizing noise in both sample-wise and class-wise forms. We also demonstrate its flexibility under extensive experimental settings and practicability in a case study of face recognition. Our work establishes an important first step towards making personal data unexploitable to deep learning models.

研究の動機と目的

個人データを無許可の深層学習トレーニングから保護する必要性を動機づける。
訓練誤差を最小化して例を学習不能にする新しい形式のノイズを提案する。
サンプルごとおよびクラスごとの学習不能ノイズを生成する双レベル最適化フレームワークを開発する。
複数のデータセットと顔認識のケーススタディを通じて手法の有効性を示す。

提案手法

内側の最小化が訓練損失を最小化する不可知識なノイズをLp界内で見つけ、外側の最小化が摂動データ上の損失を最小化するよう分類器を更新する、双レベル最適化として学習不能データを定式化する。
2つのノイズ形態を使用：サンプルごと（例ごと）およびクラスごと（ラベルごと）の摂動。
境界付き摂動epsilonを用いた射影勾配法（PGD）で内側の問題を解く。
訓練中の一定間隔でノイズ生成プロセスを適用し、モデルが内容ではなくノイズから学習するようにする。
顔認識のケーススタディを含む、さまざまなデータセットとアーキテクチャにおける頑健性を評価する。

実験結果

リサーチクエスチョン

RQ1不可知覚なノイズは深層ニューラルネットワークにとって訓練例を学習不能にできるか。
RQ2サンプルごとおよびクラスごとの誤差最小化ノイズは、効果と頑健性の点でどう比較されるか。
RQ3このアプローチはデータセットやモデルアーキテクチャ間で転用可能か。
RQ4顔認識などの現実世界のシナリオで方法は個人データを保護できるか。

主な発見

誤差最小化ノイズは、サンプルごと・クラスごといずれの形態でもCIFAR-10でクリーンなテスト精度を23%未満に低下させる。
クラスごとノイズは通常、サンプルごとノイズより効果的で、時には精度をランダム推測に近づける。
本手法はSVHN、CIFAR-10/100、およびImageNetのサブセットで有効で、いくつかの外部データセットへ転用可能。
部分的な学習不能性（データの一部のみ学習不能）は学習を損なうが、全体の学習不能性はより強力な保護をもたらす。
顔認識/検証のケーススタディは、対象となる個体識別に対して有意な保護を示す。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。