QUICK REVIEW

[論文レビュー] The DeepFake Detection Challenge (DFDC) Dataset

Brian Dolhansky|arXiv (Cornell University)|Jun 12, 2020

Generative Adversarial Networks and Image Synthesis参考文献 24被引用数 50

ひとこと要約

この論文は DFDC データセットと Kaggle コンペを紹介し、データセットの規模・構築手法・拡張・ベースラインが、実世界の Deepfakes への一般化を示す。

ABSTRACT

Deepfakes are a recent off-the-shelf manipulation technique that allows anyone to swap two identities in a single video. In addition to Deepfakes, a variety of GAN-based face swapping methods have also been published with accompanying code. To counter this emerging threat, we have constructed an extremely large face swap video dataset to enable the training of detection models, and organized the accompanying DeepFake Detection Challenge (DFDC) Kaggle competition. Importantly, all recorded subjects agreed to participate in and have their likenesses modified during the construction of the face-swapped dataset. The DFDC dataset is by far the largest currently and publicly available face swap video dataset, with over 100,000 total clips sourced from 3,426 paid actors, produced with several Deepfake, GAN-based, and non-learned methods. In addition to describing the methods used to construct the dataset, we provide a detailed analysis of the top submissions from the Kaggle contest. We show although Deepfake detection is extremely difficult and still an unsolved problem, a Deepfake detection model trained only on the DFDC can generalize to real "in-the-wild" Deepfake videos, and such a model can be a valuable analysis tool when analyzing potentially Deepfaked videos. Training, validation and testing corpuses can be downloaded from https://ai.facebook.com/datasets/dfdc.

研究の動機と目的

公衆向けの操作ツールの普及に直面して、スケーラブルな Deepfake 検出の必要性を動機づける。
多様な手法と現実世界のばらつきを含む、同意を得た大規模な DFDC データセットを作成する。
実世界の Deepfake への一般化を評価するため、検出モデルを公開ベンチマークする。
不均衡な Deepfake 検出に適した augmentation、distractors、評価指標に関する洞察を提供する。

提案手法

同意ベースの大規模顔スワップ動画データセットを組成し、3,426 人の有償俳優（約48,190 本の動画、25 TB の生データ）を収集。
DFAE、MM/NN、NTH、FSGAN、StyleGAN などの複数の顔スワッピング手法を用いて Deepfakes を生成。
70% の動画に対して、幾何学的/カラー変換、オーバーレイ、犬/花フィルターなどの拡張とノイズ要素を適用。
公開テスト用リーダーボードと最終評価用のプライベートテストの2段階テスト体制と、ログ損失に加えて加重精度を実装。
実世界での性能を測定するため、2,114 チームの大規模な Kaggle 競技を開催。
複数のアーキテクチャを比較し、複数のリコールレベルでの精度に対するトップ5の結果を報告。

実験結果

リサーチクエスチョン

RQ1DFDC データセットで訓練された検出器は、実世界の未見 Deepfake 動画へ一般化できるのか？
RQ2異なる顔スワッピング手法が検出性と時間的一貫性に与える影響はどの程度か？
RQ3拡張とディストラクタは、現実的なクラス不均衡下での検出性能にどのような影響を与えるか？
RQ4実世界の展開で検出器の有用性を最も反映する指標は何か（偏った分布下の精度など）？

主な発見

Team name	Overall log loss	DFDC log loss	Real log loss	Real P@0.1	Real P@0.3	Real P@0.9
Selim Seferbekov [24]	0.4279	0.1983	0.6605	0.9803	0.7610	0.5389
WM [34]	0.4284	0.1787	0.6805	0.9294	0.6717	0.5775
NTechLab [4]	0.4345	0.1703	0.7039	0.9804	0.8244	0.5541
Eighteen Years Old [25]	0.4347	0.1882	0.6831	0.9843	0.6329	0.5625
The Medics [11]	0.4371	0.2157	0.6621	0.9653	0.7354	0.5516

DFDC で訓練すると実動画へ一般化し、実動画での平均 precision は 0.753、ROC-AUC は 0.734。
公開リーダーボードの性能はプライベートテストの性能と相関し、一般化傾向を示す。
トップ5の提出物は DFDC 動画での log loss が約 0.43 付近で高い DFDC パフォーマンスを達成したが、実動画の性能は低めながらも一般化を示す。
DFAE ベースの手法は、GAN 的手法に比べて様々な条件下でより高品質なスワップを生み出す傾向があり、 uncontrolled な設定では難航する。
StyleGAN ベースのスワップは、視線追従と照明の一致が不安定で、評価手法の中で最も悪い成績となった。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。