QUICK REVIEW

[論文レビュー] Exposing DeepFake Videos By Detecting Face Warping Artifacts

Yuezun Li, Siwei Lyu|arXiv (Cornell University)|Nov 1, 2018

Digital Media Forensic Detection参考文献 36被引用数 571

ひとこと要約

この論文は、アフィン顔ワーピングからのアーティファクトを利用してDeepFake動画を検出するCNNベースの手法を提示します。合成ネガティブデータは画像処理によって生成され、DeepFakesの訓練を必要としません。

ABSTRACT

In this work, we describe a new deep learning based method that can effectively distinguish AI-generated fake videos (referred to as {\em DeepFake} videos hereafter) from real videos. Our method is based on the observations that current DeepFake algorithm can only generate images of limited resolutions, which need to be further warped to match the original faces in the source video. Such transforms leave distinctive artifacts in the resulting DeepFake videos, and we show that they can be effectively captured by convolutional neural networks (CNNs). Compared to previous methods which use a large amount of real and DeepFake generated images to train CNN classifier, our method does not need DeepFake generated images as negative training examples since we target the artifacts in affine face warping as the distinctive feature to distinguish real and fake images. The advantages of our method are two-fold: (1) Such artifacts can be simulated directly using simple image processing operations on a image to make it as negative example. Since training a DeepFake model to generate negative examples is time-consuming and resource-demanding, our method saves a plenty of time and resources in training data collection; (2) Since such artifacts are general existed in DeepFake videos from different sources, our method is more robust compared to others. Our method is evaluated on two sets of DeepFake video datasets for its effectiveness in practice.

研究の動機と目的

顔合成パイプラインからのアーティファクトを標的にして堅牢なDeepFake検出を動機づける。
DeepFakeフェイス合成にはターゲット顔に合わせて固定サイズの画像がワーピングされるという洞察を活用する。
画像処理を通じてワーピングアーティファクトをシミュレートすることで実データのネガティブを不要にする。
異なるDeepFakeソースに対して普遍的なワーピングアーティファクトに焦点を当て、堅牢性を示す。

提案手法

顔を検出し、ランドマーク付きで顔領域を抽出してアフィン変換行列を特定する。
複数のスケールに顔を合わせ、Gaussianブラーを適用し、元のサイズへアフィンワーピングを戻してネガティブの例をシミュレートする。
色・明るさ・コントラスト・歪み・多角形ベースの顔形状を多様化してリアリティを高め、データを拡張する。
顔と周辺領域を含む興味領域を切り出し、224×224にリスケールしてCNN（VGG16、ResNet50/101/152）を訓練する。
推論時には画像あたりROIサンプリングを10回適用し、CNN出力を平均して最終的な偽陽性確率を得る。

実験結果

リサーチクエスチョン

RQ1DeepFakeパイプラインにおけるアフィン顔ワーピングからのアーティファクトをCNNで安定して検出できるか。
RQ2ネガティブサンプルの合成（非DeepFake生成）が堅牢な検出器の訓練に十分か。
RQ3どのCNNアーキテクチャがワーピングアーティファクトの手掛かりを最大限に活用して公的DeepFakeデータセットで高い検出性能を発揮するか。

主な発見

ResNet50がUADFVの画像ベースAUCで最高（97.4%）およびビデオベースで最高（98.7%）を記録。
ResNet101とResNet152も高性能を示し、UADFVの画像で約95–99%、ビデオ検証で97–99%のAUCを達成。
DeepfakeTIMIT HQではResNet50が画像ベースで99.9%のAUCに達し、他手法を著しく上回った。
DeepfakeTIMIT HQではResNet152がHQで91.2%のAUC、ResNet50が93.2%（LQは99.9%）となり、画質設定を跨いだ堅牢性を示す。
本手法はTwo-stream NN、MesoNet系、HeadPoseよりも両データセットで優れており、DeepFakeの多様性への頑健性を強調する。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。