QUICK REVIEW

[论文解读] Multi-task Learning For Detecting and Segmenting Manipulated Facial Images and Videos

Huy H. Nguyen, Fuming Fang|arXiv (Cornell University)|Jun 17, 2019

Digital Media Forensic Detection参考文献 31被引用 40

一句话总结

本论文介绍一个Y形自编码器，联合作用于检测被操纵的面部图像/视频并分割被操纵的区域，利用半监督学习提高两项任务并对未见攻击具备泛化能力。

ABSTRACT

Detecting manipulated images and videos is an important topic in digital media forensics. Most detection methods use binary classification to determine the probability of a query being manipulated. Another important topic is locating manipulated regions (i.e., performing segmentation), which are mostly created by three commonly used attacks: removal, copy-move, and splicing. We have designed a convolutional neural network that uses the multi-task learning approach to simultaneously detect manipulated images and videos and locate the manipulated regions for each query. Information gained by performing one task is shared with the other task and thereby enhance the performance of both tasks. A semi-supervised learning approach is used to improve the network's generability. The network includes an encoder and a Y-shaped decoder. Activation of the encoded features is used for the binary classification. The output of one branch of the decoder is used for segmenting the manipulated regions while that of the other branch is used for reconstructing the input, which helps improve overall performance. Experiments using the FaceForensics and FaceForensics++ databases demonstrated the network's effectiveness against facial reenactment attacks and face swapping attacks as well as its ability to deal with the mismatch condition for previously seen attacks. Moreover, fine-tuning using just a small amount of data enables the network to deal with unseen attacks.

研究动机与目标

推动在图像和视频中对被操纵面部内容进行鲁棒检测。
开发一个同时进行真实性分类与定位被操纵区域的系统。
探索跨任务共享信息以提升分类和分割性能。
利用半监督学习提高对未见攻击的泛化能力。

提出的方法

提出一个带有编码器和Y形解码器的卷积神经网络用于联合检测和分割。
使用基于激活的潜在空间划分，将信息路由到相应的解码分支。
采用三种损失：激活损失、分割损失和重建损失，并采用等权重的组合。
应用半监督训练规程以提高泛化。
在 FaceForensics 和 FaceForensics++ 数据集上进行评估，包括匹配/不匹配和未见攻击场景。
用小样本进行微调以适应未见攻击。

实验结果

研究问题

RQ1多任务自编码器是否能够在面部内容中联合检测操纵并定位被操纵的区域？
RQ2在分类、分割和重建任务之间共享信息，是否比单任务基线有更好表现？
RQ3模型对未见攻击以及在不同压缩水平下的泛化能力如何？
RQ4少量微调能否使模型适应新的操作方法？

主要发现

更深的网络在分类准确性上显著超越较浅的基线（例如 Deeper_FT 在 Test 1 中达到 93.63% 的准确率）。
所提出的等任务权重的新设置在分割准确率方面表现强劲（Test 1 中例如 90.27%），并且分类性能具有竞争力。
重建分支和残差输入变体提高对不匹配条件的鲁棒性并有助于分割。
未见攻击在所有方法中显著降低准确性，但分割仍相对具有信息性（例如 Test 4 显示分割仍具有意义）。
用少量数据进行微调（例如每个视频10帧）显著提升分类和分割——FT_Res、No_Recon 和 Proposed_New 显示显著提升。
所提出的方法在对未见攻击的适应方面相较于一些基线显示更快的适应，并支持扩展到视听域。

更好的研究，从现在开始

从论文设计到论文写作，大幅缩短您的研究时间。

无需绑定信用卡

本解读由 AI 生成，并经人工编辑审核。