QUICK REVIEW

[论文解读] Spatial-Phase Shallow Learning: Rethinking Face Forgery Detection in Frequency Domain

Honggu Liu, Xiaodan Li|arXiv (Cornell University)|Mar 2, 2021

Digital Media Forensic Detection参考文献 48被引用 23

一句话总结

本文提出了一种新型的人脸伪造检测方法——空间-相位浅层学习（SPSL），该方法利用频域中的相位谱来检测生成式人脸伪造流水线中常见的上采样伪影，同时通过浅层网络聚焦于局部纹理并抑制高层语义特征。SPSL在跨数据集评估中取得了当前最优性能，相较于基线模型，AUC提升了13%。

ABSTRACT

The remarkable success in face forgery techniques has received considerable attention in computer vision due to security concerns. We observe that up-sampling is a necessary step of most face forgery techniques, and cumulative up-sampling will result in obvious changes in the frequency domain, especially in the phase spectrum. According to the property of natural images, the phase spectrum preserves abundant frequency components that provide extra information and complement the loss of the amplitude spectrum. To this end, we present a novel Spatial-Phase Shallow Learning (SPSL) method, which combines spatial image and phase spectrum to capture the up-sampling artifacts of face forgery to improve the transferability, for face forgery detection. And we also theoretically analyze the validity of utilizing the phase spectrum. Moreover, we notice that local texture information is more crucial than high-level semantic information for the face forgery detection task. So we reduce the receptive fields by shallowing the network to suppress high-level features and focus on the local region. Extensive experiments show that SPSL can achieve the state-of-the-art performance on cross-datasets evaluation as well as multi-class classification and obtain comparable results on single dataset evaluation.

研究动机与目标

为解决现有面部伪造检测方法在未见数据集和伪造类型上的泛化能力有限的问题。
探究频域中的相位谱是否包含可利用的上采样操作伪影，这些伪影源自生成式人脸伪造流水线。
通过减少网络深度以抑制高层语义特征，聚焦于局部纹理模式，从而提升检测鲁棒性。
开发一种可泛化的框架，以提升在跨数据集和多类别人脸伪造检测任务中的性能。

提出的方法

该方法使用二维傅里叶变换从人脸图像中提取相位谱，并将其与空间域特征联合学习。
采用浅层卷积神经网络架构以减小感受野，抑制高层语义表征，转而聚焦于局部纹理模式。
模型端到端训练，以检测生成对抗网络（GANs）和变分自编码器（VAEs）中因重复上采样而产生的细微频域伪影，尤其是相位谱中的伪影。
提供了理论分析，以证明相位谱对上采样操作的敏感性，表明其在捕捉伪造痕迹方面优于振幅谱。
使用多种主干网络（Xception、ResNet-34、ResNet-50）进行评估，证实了该框架在不同网络架构间的泛化能力。
采用Grad-CAM可视化与t-SNE特征空间分析，验证SPSL聚焦于微纹理区域而非全局图像结构。

实验结果

研究问题

RQ1频域中的相位谱能否作为检测伪造人脸内上采样伪影的可靠信号？
RQ2通过减少网络深度，聚焦于局部纹理并抑制高层语义特征，是否能提升检测性能？
RQ3空间域与相位域特征的结合在不同人脸伪造数据集之间如何提升可迁移性？
RQ4SPSL在多类别分类任务中，对不同主干网络和伪造类型具有多大程度的泛化能力？
RQ5为何相位谱在伪造人脸图像中对上采样比振幅谱更敏感？

主要发现

当在FF++ HQ上进行训练时，SPSL在Celeb-DF数据集上达到72.39%的AUC，相较于基线Xception模型提升了13%。
消融实验表明，结合相位谱使用与浅层网络设计可带来最高性能增益，AUC从59.98%提升至72.39%。
t-SNE可视化显示，SPSL学习到的特征簇对不同伪造类型更具判别性且更紧凑，优于基线模型。
Grad-CAM分析证实，SPSL聚焦于微纹理区域而非全局面部结构，符合其设计目标。
SPSL在不同主干网络上泛化良好：基于ResNet-50的SPSL在FF++上达到91.04% AUC，在Celeb-DF上达到73.09% AUC，优于原始ResNet-50。
该方法在多类别分类中表现强劲，基于ResNet-50的SPSL在FF++上实现86.64%准确率与91.04% AUC。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。