QUICK REVIEW

[论文解读] FaceForensics: A Large-scale Video Dataset for Forgery Detection in Human Faces

Andreas Rössler, Davide Cozzolino|arXiv (Cornell University)|Mar 24, 2018

Digital Media Forensic Detection参考文献 3被引用 362

一句话总结

介绍 FaceForensics，一个用于伪造检测与分割的大规模人脸操控视频数据集（超过 50 万帧，来自 1004 个视频），并提供基线基准和一个细化方法。

ABSTRACT

With recent advances in computer vision and graphics, it is now possible to generate videos with extremely realistic synthetic faces, even in real time. Countless applications are possible, some of which raise a legitimate alarm, calling for reliable detectors of fake videos. In fact, distinguishing between original and manipulated video can be a challenge for humans and computers alike, especially when the videos are compressed or have low resolution, as it often happens on social networks. Research on the detection of face manipulations has been seriously hampered by the lack of adequate datasets. To this end, we introduce a novel face manipulation dataset of about half a million edited images (from over 1000 videos). The manipulations have been generated with a state-of-the-art face editing approach. It exceeds all existing video manipulation datasets by at least an order of magnitude. Using our new dataset, we introduce benchmarks for classical image forensic tasks, including classification and segmentation, considering videos compressed at various quality levels. In addition, we introduce a benchmark evaluation for creating indistinguishable forgeries with known ground truth; for instance with generative refinement models.

研究动机与目标

提供一个大规模、真实感强的人脸被操控视频数据集，以支持数据驱动的伪造检测。
在不同压缩条件下，对伪造分类和像素级分割进行基准评估。
在 FaceForensics 上评估最先进的检测器，并为未来工作建立基线。
探索一种有监督的细化方法，以提升伪造人脸的真实感并评估检测鲁棒性。

提出的方法

使用 Face2Face 重现方法从 1004 个 YouTube 视频中生成超过 50 万帧的数据集，以创建源到目标的操纵和自我重现操纵。
提供逐像素的真实标签掩码，指示被修改的区域以用于分割任务。
在未压缩和已压缩（easy 与 hard）视频上评估多种基于学习的和手工设计的伪造检测器。
将 XceptionNet 等架构用于以人脸为中心的伪造分类，以及通过滑动窗口方法实现像素级分割。
提出一个基于自编码器的细化模型（在 VGGFace2 上预训练）以提升伪造的视觉质量，并测试其对可检测性的影响。
通过用户研究比较原始伪造与经过细化的伪造来评估感知质量。

实验结果

研究问题

RQ1在现实、规模较大的 Face2Forensics 操作下，在不同压缩级别上，当前最先进的检测方法的表现如何？
RQ2数据驱动的数据集是否能够实现对视频中人脸的鲁棒伪造分类与分割？
RQ3有监督的自编码器细化是否提升伪造的视觉质量，以及它对分类器的可检测性有何影响？

主要发现

该数据集包含来自 1004 个视频的超过 50 万帧，具有源到目标和自我重演的操纵，以及 ground-truth 掩码。
分类性能因方法和压缩而异；深度模型（XceptionNet）在压缩下优于手工特征，no-c 与 easy-c 的准确率约为 87–98%，hard-c 高达 87.81%。
通过卷积神经网络进行伪造定位在未压缩数据上表现强劲，但随压缩而下降；在测试方法中，XceptionNet 仍然最稳健。
自我重演的 ground-truth 数据使有监督细化训练成为可能，以提升伪造真实感；用户研究显示经过细化的伪造在感知上提升，人类更难检测到，尤其在压缩情境下。
基于自编码器的细化器改善了伪造区域（下巴、鼻子、颊部）和光照，但使用细化数据的检测准确率对于在伪造输出上训练的模型仍然很高。
定量结果表明，细化伪造在 128x128 输入时可能略微降低检测，但仍能被先进检测器强力检测到。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。