QUICK REVIEW

[論文レビュー] WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection

Bojia Zi, Minghao Chang|arXiv (Cornell University)|Jan 5, 2021

Generative Adversarial Networks and Image Synthesis参考文献 37被引用数 40

ひとこと要約

本論文は WildDeepfake という実世界の deepfake データセットを紹介し、既存の検出器がこのデータで苦戦することを示す。さらに ADDNets（2D および 3D の注意ベース検出器）を提案し、特に WildDeepfake で性能を向上させる。

ABSTRACT

In recent years, the abuse of a face swap technique called deepfake has raised enormous public concerns. So far, a large number of deepfake videos (known as "deepfakes") have been crafted and uploaded to the internet, calling for effective countermeasures. One promising countermeasure against deepfakes is deepfake detection. Several deepfake datasets have been released to support the training and testing of deepfake detectors, such as DeepfakeDetection and FaceForensics++. While this has greatly advanced deepfake detection, most of the real videos in these datasets are filmed with a few volunteer actors in limited scenes, and the fake videos are crafted by researchers using a few popular deepfake softwares. Detectors developed on these datasets may become less effective against real-world deepfakes on the internet. To better support detection against real-world deepfakes, in this paper, we introduce a new dataset WildDeepfake which consists of 7,314 face sequences extracted from 707 deepfake videos collected completely from the internet. WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes. We conduct a systematic evaluation of a set of baseline detection networks on both existing and our WildDeepfake datasets, and show that WildDeepfake is indeed a more challenging dataset, where the detection performance can decrease drastically. We also propose two (eg. 2D and 3D) Attention-based Deepfake Detection Networks (ADDNets) to leverage the attention masks on real/fake faces for improved detection. We empirically verify the effectiveness of ADDNets on both existing datasets and WildDeepfake. The dataset is available at: https://github.com/OpenTAI/wild-deepfake.

研究の動機と目的

仮想的・実験室で生成されたデータセットを超えた実世界の deepfake ベンチマークの必要性を動機づける。
多様な場面・顔・高品質の偽造を含む大規模なインターネット出典データセット WildDeepfake を作成。
野外データと既存データセットでベースライン検出器を系統的に評価して一般化ギャップを特徴づける。
ADDNets（2D および 3D）を提案し、注意マスクを活用してディープフェイク検出性能を改善。

提案手法

WildDeepfake をインターネット動画から編成（707 deepfake 動画、7,314 顔シーケンス、1,180,099 顔画像）し、シーケンスを人間のアノテーターで注釈。
顔検出には Mtcnn、顔特徴抽出には MobileNetV2、顔のアライメントには dlib ランドマークを使用。
ADDNet-2D を提案：ADD ブロック（注意ベースの特徴スケーリング）に続く 2D CNN を画像レベル検出に使用； ADDNet-3D：複数の ADD ブロックが 3D CNN に入力され、シーケンスレベル検出。
Attention Mask 生成：68 点の顔ランドマークから顔マスクと器官マスクを作成し、ガウシアンブラーで平滑化し、0–1 の注意マップとして結合。
クロスエントロピー損失と Adam でネットワークを最適化； six datasets（DFD、DF-TIMIT LQ/HQ、FF++ LQ/HQ、WildDeepfake）で評価。
ベースラインネットワーク（例：XceptionNet、VGG16、ResNet 系列）との比較を提供し、WildDeepfake の難易度と ADDNets の有効性を示す。

実験結果

リサーチクエスチョン

RQ1既存の仮想 deepfake データセットで訓練された検出器は、WildDeepfake の実世界 deepfakes に対してどの程度の性能を示すか？
RQ2注意マスクを画像レベルおよびシーケンスレベルで活用することで、注意ベースの ADDNets は検出性能を改善できるか？
RQ3Wild の deepfake 検出における 2D 対 3D アーキテクチャの相対的な強みは？
RQ4WildDeepfake は既存データセットと比較して最先端検出器の性能をどの程度低下させるか、現状の限界をどの程度露呈するか？

主な発見

WildDeepfake はより挑戦的である：ベースライン検出器は WildDeepfake の画像レベル検査で約 70% の精度を超えることができず、既存データセットでの高い性能とは異なる。
ADDNet-2D は既存データセットで競争力または優越した性能を達成し、WildDeepfake では大幅に良好な性能を示す（例：WildDeepfake で 76.25%、ベースラインは 60–69% の範囲）。
ADDNet-3D は WildDeepfake で 65.50% に達するが、一般に ADDNet-2D および一部の 2D ベースラインより低く、野生の偽造データの時系列情報がシーケンスレベルの手掛かりとしては信頼性が低いことを示唆。
総じて、仮想 deepfake で訓練された検出器は実世界の deepfake に十分に generalize しないことを示し、実世界ベンチマークと堅牢な検出器の必要性を強調する。
注意ベースの特徴調整（ADD ブロック）を複数層にまたがって行うことは deepfake 検出に有効であり、ADDNet アプローチを裏付ける。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。