QUICK REVIEW

[论文解读] Generalizing Face Forgery Detection with High-frequency Features

Yuchen Luo, Yong Zhang|arXiv (Cornell University)|Mar 23, 2021

Digital Media Forensic Detection参考文献 59被引用 23

一句话总结

本文提出了一种可泛化的面部伪造检测方法，利用高频图像噪声以克服基于CNN的检测器在方法特定色彩纹理上的过拟合问题。通过引入多尺度高频特征提取模块、残差引导的空间注意力机制，以及RGB与噪声特征之间的跨模态注意力机制，该模型在跨数据库评估中实现了最先进水平的泛化性能，在CelebDF上的AUC性能优于先前方法超过15%，在FF++（LQ）数据集上达到了98.6%的准确率。

ABSTRACT

Current face forgery detection methods achieve high accuracy under the within-database scenario where training and testing forgeries are synthesized by the same algorithm. However, few of them gain satisfying performance under the cross-database scenario where training and testing forgeries are synthesized by different algorithms. In this paper, we find that current CNN-based detectors tend to overfit to method-specific color textures and thus fail to generalize. Observing that image noises remove color textures and expose discrepancies between authentic and tampered regions, we propose to utilize the high-frequency noises for face forgery detection. We carefully devise three functional modules to take full advantage of the high-frequency features. The first is the multi-scale high-frequency feature extraction module that extracts high-frequency noises at multiple scales and composes a novel modality. The second is the residual-guided spatial attention module that guides the low-level RGB feature extractor to concentrate more on forgery traces from a new perspective. The last is the cross-modality attention module that leverages the correlation between the two complementary modalities to promote feature learning for each other. Comprehensive evaluations on several benchmark databases corroborate the superior generalization performance of our proposed method.

研究动机与目标

解决基于CNN的面部伪造检测器在训练与测试伪造图像使用不同合成方法的跨数据库场景下泛化失败的问题。
探究现有检测器为何会过拟合于方法特定的色彩纹理，并在未见伪造样本上失效的原因。
通过利用能抑制色彩纹理并暴露伪造痕迹的高频图像噪声，提升模型鲁棒性。
设计一种双模态网络，通过注意力机制联合学习RGB纹理与高频噪声特征。
在无需领域特定微调或大量标注数据的情况下，实现在跨数据库基准测试中的优异性能。

提出的方法

提出一种多尺度高频特征提取模块，不仅对输入图像，也对多尺度下的低层特征应用基于SRM的高通滤波器，生成丰富且基于噪声的模态特征。
引入一种残差引导的空间注意力模块，利用残差图引导RGB特征提取器从新的、更具判别性的视角聚焦于伪造痕迹。
设计一种双模态跨模态注意力模块，建模RGB特征与高频噪声特征之间的相关性，实现表示学习的相互增强。
采用双流网络架构，分别处理RGB与高频噪声模态，随后通过交叉注意力机制进行特征融合，完成最终分类。
在FF++数据集的高质量（HQ）与低质量（LQ）版本上端到端训练模型，并在未见数据集（如CelebDF与BI）上进行评估。
使用标准指标（包括准确率与AUC）进行评估，并通过消融实验验证各模块的贡献。

Figure 1 : Training and testing forgeries of within-database detection are synthesized by the same algorithm while those of cross-database detection are synthesized by different algorithms. We focus on the latter which is more challenging.

实验结果

研究问题

RQ1为何基于CNN的面部伪造检测器在不同篡改算法之间泛化失败？
RQ2高频图像噪声能否有效抑制方法特定的色彩纹理并揭示一致的伪造痕迹？
RQ3如何有效提取并融合高频特征与RGB特征，以提升检测鲁棒性？
RQ4RGB与噪声特征之间的跨模态注意力对模型泛化能力有何影响？
RQ5一个在单一数据集上训练的统一模型能否在无需微调的情况下，对多样化且未见的伪造样本实现强大性能？

主要发现

在FF++（LQ）数据集上，该模型达到98.6%的准确率，优于F3Net（98.0%）及其他基于高频特征的方法。
在CelebDF基准测试中，所提方法AUC达0.794，较F3D（0.644）与FWA（0.538）高出超过15个百分点。
在F2F（HQ）测试集上准确率达99.2%，在FS（HQ）集上达86.7%，优于多任务学习基线模型ForensicTrans（分别为72.6%与94.5%）。
消融实验证实，所提出的各模块——多尺度高频特征提取、残差引导注意力与跨模态注意力——均对性能提升有显著贡献。
Grad-CAM可视化显示，该模型在不同伪造样本中均聚焦于一致的伪造痕迹区域（如口部区域），而基线模型则过度关注特定纹理。
该方法在多种数据库（包括F2F、DF、FS与CelebDF）上均表现出良好泛化能力，证明其对未见篡改技术具有鲁棒性。

Figure 2 : Grad-CAM maps from the Xception model trained on F2F forgeries. Numbers in the bracket denote the probability of being classified as fake. The mouth region is especially highlighted in F2F images, indicating that the model learns F2F’s specific texture. But when evaluated on unseen forger

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。