QUICK REVIEW

[論文レビュー] FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces

Andreas Rössler, Davide Cozzolino|arXiv (Cornell University)|Mar 24, 2018

Digital Media Forensic Detection参考文献 3被引用数 362

ひとこと要約

FaceForensicsを紹介する大規模な顔操作ビデオデータセット（1004本の動画から500k超のフレーム）を forgery の検出とセグメンテーションのために提供し、ベースラインのベンチマークと洗練化手法を追加します。

ABSTRACT

With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especially when the videos are compressed or have low resolution, as it often happens on social networks. Research on the detection of face manipulations has been seriously hampered by the lack of adequate datasets. To this end, we introduce a novel face manipulation dataset of about half a million edited images (from over 1000 videos). The manipulations have been generated with a state-of-the-art face editing approach. It exceeds all existing video manipulation datasets by at least an order of magnitude. Using our new dataset, we introduce benchmarks for classical image forensic tasks, including classification and segmentation, considering videos compressed at various quality levels. In addition, we introduce a benchmark evaluation for creating indistinguishable forgeries with known ground truth; for instance with generative refinement models.

研究の動機と目的

manipulated facial videos をデータ駆動の forgery 検出を可能にする大規模で現実的なデータセットを提供する。
さまざまな圧縮レジーム下で forgery の分類とピクセルレベルのセグメンテーションをベンチマークする。
FaceForensics 上で最新検出器を評価し、将来の研究のためのベースラインを確立する。
監視ありのリファインメント手法を探究し、偽顔のリアルさを向上させ detectionrobustness を評価する。

提案手法

Face2Face リイネクション法を用いて 1004 本の YouTube 動画から 50 万フレーム超のデータセットを生成し、 source-to-target および self-reenactment の操作を作成する。
セグメンテーションタスクのために modified な領域を示すグラウンドトラuts per-pixel マスクを提供する。
非圧縮および圧縮（easy および hard）ビデオ上で複数の学習ベースおよび手作りの forgery 検出器を評価する。
顔中心の forgery 分類のために XceptionNet や他のアーキテクチャを適用し、スライディングウィンドウアプローチによるピクセルレベルのセグメンテーションを行う。
VGGFace2 で事前学習したオートエンコーダ基盤のリファインマモデルを提案し、偽造の視覚品質を向上させ detection の影響を評価する。
raw および refined な forgery の比較による知覚品質のユーザスタディを実施する。

実験結果

リサーチクエスチョン

RQ1現在の最新検出アプローチは、Realistic な大規模 Face2Forensics 操作を異なる圧縮レベル下でどの程度性能を出せるか。
RQ2データ駆動型データセットは、動画内の顔のフォージェリ分類とセグメンテーションを頑健に可能にするか。
RQ3 supervised autoencoder refinement は偽造の視覚品質を改善し、それが分類器の検出可能性にどう影響するか。

主な発見

Method / Dataset	no-c	easy-c	hard-c
Steganalysis Features + SVM	99.40	75.87	58.16
Cozzolino et al. (CNN-based on handcrafted features)	99.60	79.80	55.77
Bayar and Stamm (CNN)	99.53	86.10	73.63
Rahmouni et al. (global pooling CNN)	98.60	88.50	61.50
Raghavendra et al. (VGG19+AlexNet + CPF)	97.70	93.50	82.13
Zhou et al. (Two-stream CNN + steganalysis features)	99.93	96.00	86.83
XceptionNet ( ImageNet-pretrained fine-tuned )	99.93	98.13	87.81
Dataset variant 128x128 evaluation: Face2Face w/o AE	99.42	96.17	84.56
Dataset variant 128x128 evaluation: Face2Face refined w/ AE	99.23	96.07	80.97

データセットは source-to-target および self-reenactment 操作とグラウンドトラストマスクを含む 1004 本の動画から 50 万以上のフレームを含む。
分類性能は手法と圧縮により異なる；深層モデル（XceptionNet）は圧縮下で手作業特徴より優れており、no-c および easy-c で約 87–98%、hard-c で最大 87.81% に達する。
CNN による forgery の局在化は非圧縮データで高い性能を示すが、圧縮で低下する； tested な手法の中で XceptionNet が最も堅牢。
自己リイネクションのグラウンドトラストデータは、偽造のリアルさを向上させる教師ありリファインメント訓練を可能にし、圧縮下で人間が検出するのが難しくなるという知覚的な向上がユーザースタディで示された。
オートエンコーダベースのリファイナーはアーティファクト領域（あご、鼻、頬）と照明を改善するが、偽造出力で訓練されたモデルの検出精度は依然高い。
定量的な結果は、128x128 入力の refined fakes が検出をわずかに低下させる可能性を示すものの、高度な検出器による検出可能性はなお高い。

より良い研究を、今すぐ始めましょう

論文設計から論文執筆まで、研究時間を劇的に削減しましょう。

クレジットカード登録不要

このレビューはAIが作成し、人間の編集者が確認しました。