[论文解读] WildDeepfake: A Challenging Real-World Dataset for Deepfake Detection
本论文介绍了 WildDeepfake,这是一个从互联网上收集的现实世界深度伪造数据集,并显示现有检测器在其上表现不佳;同时提出 ADDNets(基于2D和3D注意力的检测器),在性能上有所提升,特别是在 WildDeepfake 上。
In recent years, the abuse of a face swap technique called deepfake has raised enormous public concerns. So far, a large number of deepfake videos (known as "deepfakes") have been crafted and uploaded to the internet, calling for effective countermeasures. One promising countermeasure against deepfakes is deepfake detection. Several deepfake datasets have been released to support the training and testing of deepfake detectors, such as DeepfakeDetection and FaceForensics++. While this has greatly advanced deepfake detection, most of the real videos in these datasets are filmed with a few volunteer actors in limited scenes, and the fake videos are crafted by researchers using a few popular deepfake softwares. Detectors developed on these datasets may become less effective against real-world deepfakes on the internet. To better support detection against real-world deepfakes, in this paper, we introduce a new dataset WildDeepfake which consists of 7,314 face sequences extracted from 707 deepfake videos collected completely from the internet. WildDeepfake is a small dataset that can be used, in addition to existing datasets, to develop and test the effectiveness of deepfake detectors against real-world deepfakes. We conduct a systematic evaluation of a set of baseline detection networks on both existing and our WildDeepfake datasets, and show that WildDeepfake is indeed a more challenging dataset, where the detection performance can decrease drastically. We also propose two (eg. 2D and 3D) Attention-based Deepfake Detection Networks (ADDNets) to leverage the attention masks on real/fake faces for improved detection. We empirically verify the effectiveness of ADDNets on both existing datasets and WildDeepfake. The dataset is available at: https://github.com/OpenTAI/wild-deepfake.
研究动机与目标
- 推动对现实世界深度伪造基准的需求,超越虚拟、实验室生成的数据集。
- 创建 WildDeepfake,这是一个来自互联网、包含多样场景、人脸和高质量伪造样本的大型数据集。
- 系统性评估基线检测器在 WildDeepfake 与现有数据集上的性能,以刻画泛化差距。
- 提出 ADDNets(2D 与 3D),利用注意力掩码提升深度伪造检测性能。
提出的方法
- 从互联网上的视频中整理 WildDeepfake(707 段深度伪造视频,7,314 条脸序列,1,180,099 张脸图像)并通过人工标注来注释序列。
- 使用 Mtcnn 进行人脸检测、MobileNetV2 进行人脸特征提取、并用 dlib 的关键点对齐人脸。
- 提出 ADDNet-2D:一个 ADD 模块(基于注意力的特征缩放)后接一个 2D CNN 进行图像级检测;ADDNet-3D:多个 ADD 模块输入到 3D CNN,以实现序列级检测。
- 注意力掩码生成:从 68 点面部标志点创建脸部掩码和器官掩码,使用高斯模糊平滑后合并成一个 [0,1] 的注意力图。
- 用交叉熵损失和 Adam 进行网络优化;在六个数据集上评估(DFD、DF-TIMIT LQ/HQ、FF++ LQ/HQ、WildDeepfake)。
- 提供与基线网络(如 XceptionNet、VGG16、ResNet 变体等)的对比,以展示 WildDeepfake 的难度及 ADDNets 的有效性。
实验结果
研究问题
- RQ1在现有虚拟深度伪造数据集上训练的检测器在 WildDeepfake 这样的现实世界深度伪造样本上表现如何?
- RQ2是否可以通过在图像和序列层面利用注意力掩码,提升 ADDNets 的检测性能?
- RQ32D 与 3D 架构在野外深度伪造检测中的相对优势为何?
- RQ4相比现有数据集,WildDeepfake 对最先进检测器的性能有多大程度的下降?
主要发现
- WildDeepfake 更具挑战性:基线检测器在 WildDeepfake 的图像级测试中很难超过约 70% 的准确率,而在现有数据集上往往表现更好。
- ADDNet-2D 在现有数据集上实现了有竞争力或优越的性能,在 WildDeepfake 上的表现显著更好(例如 WildDeepfake 上为 76.25%,而基线在 60–69% 左右)。
- ADDNet-3D 在 WildDeepfake 上达到 65.50%,通常不及 ADDNet-2D 及部分 2D 基线,暗示野外伪造中的时序信息对序列层线索的可靠性较低。
- 总体而言,在虚拟深度伪造数据上训练的检测器对野外深度伪造的泛化性较差,凸显现实世界基准和鲁棒检测器的必要性。
- 基于多层(ADD 块)进行注意力特征调整对深度伪造检测有效,验证了 ADDNet 的思路。
更好的研究,从现在开始
从论文设计到论文写作,大幅缩短您的研究时间。
无需绑定信用卡
本解读由 AI 生成,并经人工编辑审核。