QUICK REVIEW

[论文解读] Recurrent Convolutional Strategies for Face Manipulation Detection in Videos

Ekraam Sabir, Jiaxin Cheng|arXiv (Cornell University)|May 2, 2019

Digital Media Forensic Detection参考文献 47被引用 337

一句话总结

论文提出了一个带有面部对齐的循环卷积框架，在视频中检测被操纵的人脸，通过利用时间信息在 FaceForensics++ 上达到最先进的准确度。

ABSTRACT

The spread of misinformation through synthetically generated yet realistic images and videos has become a significant problem, calling for robust manipulation detection methods. Despite the predominant effort of detecting face manipulation in still images, less attention has been paid to the identification of tampered faces in videos by taking advantage of the temporal information present in the stream. Recurrent convolutional models are a class of deep learning models which have proven effective at exploiting the temporal information from image streams across domains. We thereby distill the best strategy for combining variations in these models along with domain specific face preprocessing techniques through extensive experimentation to obtain state-of-the-art performance on publicly available video-based facial manipulation benchmarks. Specifically, we attempt to detect Deepfake, Face2Face and FaceSwap tampered faces in video streams. Evaluation is performed on the recently introduced FaceForensics++ dataset, improving the previous state-of-the-art by up to 4.55% in accuracy.

研究动机与目标

通过利用时序一致性以及空间线索来推动对视频中的被操纵人脸的检测。
评估人脸预处理（对齐）对检测准确性的影响。
探索架构选择（骨干 CNN 和循环设计）以最大化在视频操纵基准上的检测性能。

提出的方法

使用基于标志点对齐或空间变换网络（STN）从视频帧中裁剪并对齐人脸区域。
构建在面部管道（由对齐裁剪组成的序列）上运行的循环-卷积检测器。
尝试骨干 CNN（DenseNet 与 ResNet 变体），随后进行基于 GRU 的时序递归。
比较单一递归与多级递归以捕捉微观、介观和宏观特征。
在 FF++ 上端到端训练，使用二元真实/伪造监督；采用学习率 1e-4 的 Adam 优化器。

实验结果

研究问题

RQ1视频中的时序信息是否能超过帧级线索提高人脸操纵检测？
RQ2显式的标志点对齐是否在该任务上优于隐式对齐（STN）？
RQ3哪种骨干网络（DenseNet 与 ResNet）以及时序策略（单一递归还是多级递归；双向 vs 单向）在各操纵类型上实现最佳性能？
RQ4在 FF++ 数据规模下，多级递归是否有益，还是可能导致过拟合？

主要发现

Table 1: Model variant, frames, and accuracy by manipulation type (FF++ benchmarks).	Table 2: Alignment and recurrence variations impact on performance.
Deepfake	1	93.46	94.8	94.5	96.1	96.4	-	-
Deepfake	5	-	94.6	94.7	96.0	96.7	94.9	96.9
Face2Face	1	89.8	90.25	90.65	89.31	87.18	-	-
Face2Face	5	-	90.25	89.8	92.4	93.21	93.05	94.35
FaceSwap	1	92.72	91.34	91.04	93.85	96.1	-	-
FaceSwap	5	-	90.95	93.11	95.07	95.8	95.4	96.3

DenseNet 结合基于标志点的对齐与双向 GRU 递归达到最佳性能。
人脸对齐相较无对齐基线提升检测准确性。
使用五帧序列输入优于单帧输入。
双向递归优于单向递归。
基于 STN 的对齐和多递归策略未能提升性能，且可能降低稳定性或导致过拟合。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。