QUICK REVIEW

[論文レビュー] A Survey of Machine Unlearning

Thành Tâm Nguyên, Thanh Trung Huynh|arXiv (Cornell University)|Sep 6, 2022

Privacy-Preserving Technologies in Data被引用数 65

ひとこと要約

この論文は機械学習におけるアンリーニングの包括的な調査を提供し、その概念、フレームワーク、除去シナリオ、検証手法、データセット、将来の研究方向を詳述する。

ABSTRACT

Today, computer systems hold large amounts of personal data. Yet while such an abundance of data allows breakthroughs in artificial intelligence, and especially machine learning (ML), its existence can be a threat to user privacy, and it can weaken the bonds of trust between humans and AI. Recent regulations now require that, on request, private information about a user must be removed from both computer systems and from ML models, i.e. ``the right to be forgotten''). While removing data from back-end databases should be straightforward, it is not sufficient in the AI context as ML models often `remember' the old data. Contemporary adversarial attacks on trained models have proven that we can learn whether an instance or an attribute belonged to the training data. This phenomenon calls for a new paradigm, namely machine unlearning, to make ML models forget about particular data. It turns out that recent works on machine unlearning have not been able to completely solve the problem due to the lack of common frameworks and resources. Therefore, this paper aspires to present a comprehensive examination of machine unlearning's concepts, scenarios, methods, and applications. Specifically, as a category collection of cutting-edge studies, the intention behind this article is to serve as a comprehensive resource for researchers and practitioners seeking an introduction to machine unlearning and its formulations, design criteria, removal requests, algorithms, and applications. In addition, we aim to highlight the key findings, current trends, and new research areas that have not yet featured the use of machine unlearning but could benefit greatly from it. We hope this survey serves as a valuable resource for ML researchers and those seeking to innovate privacy technologies. Our resources are publicly available at https://github.com/tamlhp/awesome-machine-unlearning.

研究の動機と目的

機械学習におけるアンリーニングを定義し、データ削除と区別する。
デザイン要件と検証機構を備えたアンリーニングフレームワークを提示する。
アンリーニングのシナリオと削除リクエストを分類する（項目、特徴、クラス、タスク、ストリーム）。
統一された分類法を提案する（モデル非依存、モデル内在、データ駆動）とデータセット/オープンソース資源を要約する。
プライバシー保護MLにおける発見、傾向、未解決の研究方向を強調する。

提案手法

セキュリティ、プライバシー、使いやすさ、忠実性を含む機械学習のアンリーニングの動機を説明する。
学習、アンリーニング、検証、および潜在的な再学習の手順を含むエンドツーエンドのアンリーニングフレームワークを提案する。
アンリーニングリクエストを定義する（項目、特徴、クラス、タスク、ストリーム）そしてスケーラブルな戦略を検討する（影響度ベース、特徴の分離、データ拡張、線形フィルタリング）。
設計要件を概説する：完全性、適時性、正確性、軽量性、証明可能な保証、検証可能性。
アンリーニング検証アプローチを要約する（特徴注入テスト、忘却測定、リーク/メンバーシップ推論攻撃、バックドアベースの検証、クラス間混乱テスト、連合検証、暗号的証明）。
3つの分岐を持つ分類法を提供する（モデル非依存、モデル内在、データ駆動）とデータセットおよび実装のリソースカタログ。

実験結果

リサーチクエスチョン

RQ1データ削除を超える機械学習のアンリーニングの正式な定義と範囲は何か？
RQ2実用的で検証可能なアンリーニングをMLモデルで実現するフレームワークと設計要件は何か？
RQ3アンサーニンングリクエストはどのように分類すべきか、各カテゴリに対処するスケーラブルな手法は何か？
RQ4どの分類法がアンリーニング手法を最も適切に捉え、ベンチマークを支えるデータセット/実装は何か？
RQ5機械学習のアンリーニングにおける現在の動向、発見、未解決の研究方向は何か？

主な発見

本調査は機械学習のアンリーニングの定義、シナリオ、機構、および適用を統合する。
検証と再学習の潜在的経路を備えた統一的なアンリーニングフレームワークを提供する。
アンリーニングを項目、特徴、クラス、タスク、ストリームの削除に分類し、それに対応する手法を示す。
3分岐の分類法を提案する：モデル非依存、モデル内在、データ駆動型アプローチ。
データセットとオープンソース資源を集約し、ベンチマークと研究を支援する。
潜在的な攻撃と防御を含む課題と検証戦略を論じ、将来の研究方向を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。